Re: [PATCH v6 0/2] Add Realtek Otto GPIO support

2021-03-31 Thread Sander Vanheule
On Wed, 2021-03-31 at 09:49 +0200, Bartosz Golaszewski wrote:
> On Tue, Mar 30, 2021 at 7:48 PM Sander Vanheule
>  wrote:
> > 
> > Add support for the GPIO controller employed by Realtek in multiple
> > series of MIPS SoCs. These include the supported RTL838x and
> > RTL839x. The register layout also matches the one found in the GPIO
> > controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D,
> > and RTL8197F.
> 
> Series applied, thanks!

Thanks for merging, and thanks for the discussion everyone!

Best,
Sander




[PATCH v6 0/2] Add Realtek Otto GPIO support

2021-03-30 Thread Sander Vanheule
Add support for the GPIO controller employed by Realtek in multiple series of
MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout
also matches the one found in the GPIO controller of other (Lexra-based) SoCs
such as RTL8196E, RTL8197D, and RTL8197F.

For the platform name 'otto', I am not aware of any official resources as to
what hardware this specifically applies to. However, in all of the GPL archives
we've received, from vendors using compatible SoCs in their design, the
platform under the MIPS architecture is referred to by this name.

The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel
GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have
been tested on a Netgear GS110TPPv1 (RTL8381).

Changes in v6:
- Use devm_gpiochip_add_data()
- Code style for reading ngpios, header order
- Add Andy's Reviewed-by tag

Changes in v5:
- Edited code comments
- Fold functions that were used only once or twice (ISR/IMR accessors)
- Drop trivial functions for line to port/pin calculations
- Use gpio_irq_chip->init_hw() to initialise IRQ registers
- Invert GPIO_INTERRUPTS flag to GPIO_INTERRUPTS_DISABLED
- Support building as module
- Add Rob's Reviewed-by tag

Changes in v4:
- Fix pointer notation style
- Drop unused read_u16_reg() function
- Drop 'inline' specifier from functions

Changes in v3:
- Remove OF dependencies in driver probe
- Don't accept IRQ_TYPE_NONE as a valid interrupt type
- Remove (now unused) dev property from control structure
- Use u8/u16 port registers, instead of raw u32 registers
- Use 'line' name for gpiochip, 'port' and 'pin' names for hardware
- Renamed DT bindings file
- Dropped fallback-only DT compatible
- Various code style clean-ups

Changes in v2:
- Clarify structure and usage of IMR registers
- Added Linus' Reviewed-by tags

Sander Vanheule (2):
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  gpio: Add Realtek Otto GPIO support

 .../bindings/gpio/realtek,otto-gpio.yaml  |  78 +
 drivers/gpio/Kconfig  |  13 +
 drivers/gpio/Makefile |   1 +
 drivers/gpio/gpio-realtek-otto.c  | 325 ++
 4 files changed, 417 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

-- 
2.30.2



[PATCH v6 2/2] gpio: Add Realtek Otto GPIO support

2021-03-30 Thread Sander Vanheule
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to
64 GPIOs, divided over two banks. Each bank has a set of registers for
32 GPIOs, with support for edge-triggered interrupts.

Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most
registers pack one bit per GPIO, except for the IMR register, which
packs two bits per GPIO (AB-CD).

Although the byte order is currently assumed to have port A..D at offset
0x0..0x3, this has been observed to be reversed on other, Lexra-based,
SoCs (e.g. RTL8196E/97D/97F).

Interrupt support is disabled for the fallback devicetree-compatible
'realtek,otto-gpio'. This allows for quick support of GPIO banks in
which the byte order would be unknown. In this case, the port ordering
in the IMR registers may not match the reversed order in the other
registers (DCBA, and BA-DC or DC-BA).

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
Reviewed-by: Andy Shevchenko 
---
 drivers/gpio/Kconfig |  13 ++
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-realtek-otto.c | 325 +++
 3 files changed, 339 insertions(+)
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..6fb13d6507db 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -502,6 +502,19 @@ config GPIO_RDA
help
  Say Y here to support RDA Micro GPIO controller.
 
+config GPIO_REALTEK_OTTO
+   tristate "Realtek Otto GPIO support"
+   depends on MACH_REALTEK_RTL
+   default MACH_REALTEK_RTL
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ The GPIO controller on the Otto MIPS platform supports up to two
+ banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs
+ are grouped in four 8-bit wide ports.
+
+ When built as a module, the module will be called realtek_otto_gpio.
+
 config GPIO_REG
bool
help
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..8ace5934e3c3 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583)  += gpio-rc5t583.o
 obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o
 obj-$(CONFIG_GPIO_RDA) += gpio-rda.o
 obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o
+obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o
 obj-$(CONFIG_GPIO_REG) += gpio-reg.o
 obj-$(CONFIG_ARCH_SA1100)  += gpio-sa1100.o
 obj-$(CONFIG_GPIO_SAMA5D2_PIOBU)   += gpio-sama5d2-piobu.o
diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c
new file mode 100644
index ..cb64fb5a51aa
--- /dev/null
+++ b/drivers/gpio/gpio-realtek-otto.c
@@ -0,0 +1,325 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Total register block size is 0x1C for one bank of four ports (A, B, C, D).
+ * An optional second bank, with ports E, F, G, and H, may be present, starting
+ * at register offset 0x1C.
+ */
+
+/*
+ * Pin select: (0) "normal", (1) "dedicate peripheral"
+ * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits
+ * in the peripheral registers.
+ */
+#define REALTEK_GPIO_REG_CNR   0x00
+/* Clear bit (0) for input, set bit (1) for output */
+#define REALTEK_GPIO_REG_DIR   0x08
+#define REALTEK_GPIO_REG_DATA  0x0C
+/* Read bit for IRQ status, write 1 to clear IRQ */
+#define REALTEK_GPIO_REG_ISR   0x10
+/* Two bits per GPIO in IMR registers */
+#define REALTEK_GPIO_REG_IMR   0x14
+#define REALTEK_GPIO_REG_IMR_AB0x14
+#define REALTEK_GPIO_REG_IMR_CD0x18
+#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0)
+#define REALTEK_GPIO_IRQ_EDGE_FALLING  1
+#define REALTEK_GPIO_IRQ_EDGE_RISING   2
+#define REALTEK_GPIO_IRQ_EDGE_BOTH 3
+
+#define REALTEK_GPIO_MAX   32
+#define REALTEK_GPIO_PORTS_PER_BANK4
+
+/**
+ * realtek_gpio_ctrl - Realtek Otto GPIO driver data
+ *
+ * @gc: Associated gpio_chip instance
+ * @base: Base address of the register block for a GPIO bank
+ * @lock: Lock for accessing the IRQ registers and values
+ * @intr_mask: Mask for interrupts lines
+ * @intr_type: Interrupt type selection
+ *
+ * Because the interrupt mask register (IMR) combines the function of IRQ type
+ * selection and masking, two extra values are stored. @intr_mask is used to
+ * mask/unmask the interrupts for a GPIO port, and @intr_type is used to store
+ * the selected interrupt types. The logical AND of these values is written to
+ * IMR on changes.
+ */
+struct realtek_gpio_ctrl {
+   struct gpio_chip gc;
+   void __iomem *base;
+   raw_spinlock_t lock;
+   u16 intr_mask[REALTEK_GPIO_PORTS_PER_BANK];
+   u16 intr_type[REALTEK_GPI

[PATCH v6 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO

2021-03-30 Thread Sander Vanheule
Add a binding description for Realtek's GPIO controller found on several
of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and
RTL839x series of switch SoCs.

A fallback binding 'realtek,otto-gpio' is provided for cases where the
actual port ordering is not known yet, and enabling the interrupt
controller may result in uncaught interrupts.

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
Reviewed-by: Rob Herring 
---
 .../bindings/gpio/realtek,otto-gpio.yaml  | 78 +++
 1 file changed, 78 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml

diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml 
b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
new file mode 100644
index ..100f20cebd76
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
@@ -0,0 +1,78 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek Otto GPIO controller
+
+maintainers:
+  - Sander Vanheule 
+  - Bert Vermeulen 
+
+description: |
+  Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists
+  of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts.
+  Each bank's interrupts are cascased into one interrupt line on the parent
+  interrupt controller, if provided.
+  This binding allows defining a single bank in the devicetree. The interrupt
+  controller is not supported on the fallback compatible name, which only
+  allows for GPIO port use.
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+items:
+  - enum:
+  - realtek,rtl8380-gpio
+  - realtek,rtl8390-gpio
+  - const: realtek,otto-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  ngpios:
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+additionalProperties: false
+
+dependencies:
+  interrupt-controller: [ interrupts ]
+
+examples:
+  - |
+  gpio@3500 {
+compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio";
+reg = <0x3500 0x1c>;
+gpio-controller;
+#gpio-cells = <2>;
+ngpios = <24>;
+interrupt-controller;
+#interrupt-cells = <2>;
+interrupt-parent = <&rtlintc>;
+interrupts = <23>;
+  };
+
+...
-- 
2.30.2



[PATCH v5 0/2] Add Realtek Otto GPIO support

2021-03-30 Thread Sander Vanheule
Add support for the GPIO controller employed by Realtek in multiple series of
MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout
also matches the one found in the GPIO controller of other (Lexra-based) SoCs
such as RTL8196E, RTL8197D, and RTL8197F.

For the platform name 'otto', I am not aware of any official resources as to
what hardware this specifically applies to. However, in all of the GPL archives
we've received, from vendors using compatible SoCs in their design, the
platform under the MIPS architecture is referred to by this name.

The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel
GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have
been tested on a Netgear GS110TPPv1 (RTL8381).

Changes in v5:
- Edited code comments
- Fold functions that were used only once or twice (ISR/IMR accessors)
- Drop trivial functions for line to port/pin calculations
- Use gpio_irq_chip->init_hw() to initialise IRQ registers
- Invert GPIO_INTERRUPTS flag to GPIO_INTERRUPTS_DISABLED
- Support building as module
- Add Rob's Reviewed-by tag

Changes in v4:
- Fix pointer notation style
- Drop unused read_u16_reg() function
- Drop 'inline' specifier from functions

Changes in v3:
- Remove OF dependencies in driver probe
- Don't accept IRQ_TYPE_NONE as a valid interrupt type
- Remove (now unused) dev property from control structure
- Use u8/u16 port registers, instead of raw u32 registers
- Use 'line' name for gpiochip, 'port' and 'pin' names for hardware
- Renamed DT bindings file
- Dropped fallback-only DT compatible
- Various code style clean-ups

Changes in v2:
- Clarify structure and usage of IMR registers
- Added Linus' Reviewed-by tags

Sander Vanheule (2):
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  gpio: Add Realtek Otto GPIO support

 .../bindings/gpio/realtek,otto-gpio.yaml  |  78 +
 drivers/gpio/Kconfig  |  13 +
 drivers/gpio/Makefile |   1 +
 drivers/gpio/gpio-realtek-otto.c  | 326 ++
 4 files changed, 418 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

-- 
2.30.2



[PATCH v5 2/2] gpio: Add Realtek Otto GPIO support

2021-03-30 Thread Sander Vanheule
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to
64 GPIOs, divided over two banks. Each bank has a set of registers for
32 GPIOs, with support for edge-triggered interrupts.

Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most
registers pack one bit per GPIO, except for the IMR register, which
packs two bits per GPIO (AB-CD).

Although the byte order is currently assumed to have port A..D at offset
0x0..0x3, this has been observed to be reversed on other, Lexra-based,
SoCs (e.g. RTL8196E/97D/97F).

Interrupt support is disabled for the fallback devicetree-compatible
'realtek,otto-gpio'. This allows for quick support of GPIO banks in
which the byte order would be unknown. In this case, the port ordering
in the IMR registers may not match the reversed order in the other
registers (DCBA, and BA-DC or DC-BA).

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 drivers/gpio/Kconfig |  13 ++
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-realtek-otto.c | 326 +++
 3 files changed, 340 insertions(+)
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..6fb13d6507db 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -502,6 +502,19 @@ config GPIO_RDA
help
  Say Y here to support RDA Micro GPIO controller.
 
+config GPIO_REALTEK_OTTO
+   tristate "Realtek Otto GPIO support"
+   depends on MACH_REALTEK_RTL
+   default MACH_REALTEK_RTL
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ The GPIO controller on the Otto MIPS platform supports up to two
+ banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs
+ are grouped in four 8-bit wide ports.
+
+ When built as a module, the module will be called realtek_otto_gpio.
+
 config GPIO_REG
bool
help
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..8ace5934e3c3 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583)  += gpio-rc5t583.o
 obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o
 obj-$(CONFIG_GPIO_RDA) += gpio-rda.o
 obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o
+obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o
 obj-$(CONFIG_GPIO_REG) += gpio-reg.o
 obj-$(CONFIG_ARCH_SA1100)  += gpio-sa1100.o
 obj-$(CONFIG_GPIO_SAMA5D2_PIOBU)   += gpio-sama5d2-piobu.o
diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c
new file mode 100644
index ..05ce5d48e121
--- /dev/null
+++ b/drivers/gpio/gpio-realtek-otto.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Total register block size is 0x1C for one bank of four ports (A, B, C, D).
+ * An optional second bank, with ports E, F, G, and H, may be present, starting
+ * at register offset 0x1C.
+ */
+
+/*
+ * Pin select: (0) "normal", (1) "dedicate peripheral"
+ * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits
+ * in the peripheral registers.
+ */
+#define REALTEK_GPIO_REG_CNR   0x00
+/* Clear bit (0) for input, set bit (1) for output */
+#define REALTEK_GPIO_REG_DIR   0x08
+#define REALTEK_GPIO_REG_DATA  0x0C
+/* Read bit for IRQ status, write 1 to clear IRQ */
+#define REALTEK_GPIO_REG_ISR   0x10
+/* Two bits per GPIO in IMR registers */
+#define REALTEK_GPIO_REG_IMR   0x14
+#define REALTEK_GPIO_REG_IMR_AB0x14
+#define REALTEK_GPIO_REG_IMR_CD0x18
+#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0)
+#define REALTEK_GPIO_IRQ_EDGE_FALLING  1
+#define REALTEK_GPIO_IRQ_EDGE_RISING   2
+#define REALTEK_GPIO_IRQ_EDGE_BOTH 3
+
+#define REALTEK_GPIO_MAX   32
+#define REALTEK_GPIO_PORTS_PER_BANK4
+
+/**
+ * realtek_gpio_ctrl - Realtek Otto GPIO driver data
+ *
+ * @gc: Associated gpio_chip instance
+ * @base: Base address of the register block for a GPIO bank
+ * @lock: Lock for accessing the IRQ registers and values
+ * @intr_mask: Mask for interrupts lines
+ * @intr_type: Interrupt type selection
+ *
+ * Because the interrupt mask register (IMR) combines the function of IRQ type
+ * selection and masking, two extra values are stored. @intr_mask is used to
+ * mask/unmask the interrupts for a GPIO port, and @intr_type is used to store
+ * the selected interrupt types. The logical AND of these values is written to
+ * IMR on changes.
+ */
+struct realtek_gpio_ctrl {
+   struct gpio_chip gc;
+   void __iomem *base;
+   raw_spinlock_t lock;
+   u16 intr_mask[REALTEK_GPIO_PORTS_PER_BANK];
+   u16 intr_type[REALTEK_GPIO_PORTS_PER_BANK];
+};
+
+/

[PATCH v5 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO

2021-03-30 Thread Sander Vanheule
Add a binding description for Realtek's GPIO controller found on several
of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and
RTL839x series of switch SoCs.

A fallback binding 'realtek,otto-gpio' is provided for cases where the
actual port ordering is not known yet, and enabling the interrupt
controller may result in uncaught interrupts.

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
Reviewed-by: Rob Herring 
---
 .../bindings/gpio/realtek,otto-gpio.yaml  | 78 +++
 1 file changed, 78 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml

diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml 
b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
new file mode 100644
index ..100f20cebd76
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
@@ -0,0 +1,78 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek Otto GPIO controller
+
+maintainers:
+  - Sander Vanheule 
+  - Bert Vermeulen 
+
+description: |
+  Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists
+  of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts.
+  Each bank's interrupts are cascased into one interrupt line on the parent
+  interrupt controller, if provided.
+  This binding allows defining a single bank in the devicetree. The interrupt
+  controller is not supported on the fallback compatible name, which only
+  allows for GPIO port use.
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+items:
+  - enum:
+  - realtek,rtl8380-gpio
+  - realtek,rtl8390-gpio
+  - const: realtek,otto-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  ngpios:
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+additionalProperties: false
+
+dependencies:
+  interrupt-controller: [ interrupts ]
+
+examples:
+  - |
+  gpio@3500 {
+compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio";
+reg = <0x3500 0x1c>;
+gpio-controller;
+#gpio-cells = <2>;
+ngpios = <24>;
+interrupt-controller;
+#interrupt-cells = <2>;
+interrupt-parent = <&rtlintc>;
+interrupts = <23>;
+  };
+
+...
-- 
2.30.2



Re: [PATCH v4 2/2] gpio: Add Realtek Otto GPIO support

2021-03-29 Thread Sander Vanheule
Hi Andy,

Thank you for clarifying your remarks. I'll support for building as a
module, and have implemented the gpio_irq_chip->init_hw() callback.

On Mon, 2021-03-29 at 13:26 +0300, Andy Shevchenko wrote:
> On Fri, Mar 26, 2021 at 11:11 PM Sander Vanheule <
> san...@svanheule.net> wrote:
> > On Fri, 2021-03-26 at 20:19 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 26, 2021 at 2:05 PM Sander Vanheule <
> > > san...@svanheule.net>
> > > wrote:
> > > > +static const struct of_device_id realtek_gpio_of_match[] = {
> > > > +   { .compatible = "realtek,otto-gpio" },
> > > > +   {
> > > > +   .compatible = "realtek,rtl8380-gpio",
> > > > +   .data = (void *)GPIO_INTERRUPTS
> > > 
> > > Not sure why this flag is needed right now. Drop it completely for
> > > good.
> > > > +   },
> > > > +   {
> > > > +   .compatible = "realtek,rtl8390-gpio",
> > > > +   .data = (void *)GPIO_INTERRUPTS
> > > 
> > > Ditto
> > 
> > Linus Walleij asked this question too after v1:
> > https://lore.kernel.org/linux-gpio/e9f0651e5fb52b7d56361ceb30b41759b6f2ec13.ca...@svanheule.net/
> > 
> > Note that the fall-back compatible doesn't have this flag set.
> 
> AFAICS all, except one have this flag, I suggest you to do other way
> around, i.e. check compatible string in the code. Or do something more
> clever. What happens if you have this flag enabled for the fallback
> node?
> 
> If two people ask the same, it might be a smoking gun.
> 

Testing for the fallback wouldn't work, since of_device_is_compatible()
would always match. Setting the (inverse) flag only on the fallback
would indeed reduce the clutter.

If the port order is reversed w.r.t. to the current implementation,
enabling a GPIO+IRQ would enable the same pin on a different port. I
don't think the result would be catastrophical, but it would result in
unexpected behaviour. When A0 and C0 are then enabled, A0 interrupts
would actually come from C0, and vice versa.

   Intended port | A | B | C | D
-+---+---+---+---
Actual GPIO port | D | C | B | A
 Actual IRQ port | B | A | D | C

If only the actual GPIO ports change, at least you can still use a
modified GPIO line number and polling. The user could just leave out
the optional irq-controller from the devicetree, but I would rather
have it enforced in some way.


Best,
Sander



Re: [PATCH v4 2/2] gpio: Add Realtek Otto GPIO support

2021-03-26 Thread Sander Vanheule
Hi Andy,

Replies inline below.

On Fri, 2021-03-26 at 20:19 +0200, Andy Shevchenko wrote:
> On Fri, Mar 26, 2021 at 2:05 PM Sander Vanheule 
> wrote:
> 
> > +config GPIO_REALTEK_OTTO
> > +   bool "Realtek Otto GPIO support"
> 
> Why not module?

This driver is only useful on a few specific MIPS SoCs, where this GPIO
peripheral is a part of that SoC. What would be the point of providing
this driver as a module?

> 
> > +   depends on MACH_REALTEK_RTL
> > +   default MACH_REALTEK_RTL
> > +   select GPIO_GENERIC
> > +   select GPIOLIB_IRQCHIP
> 
> > +   help
> > + The GPIO controller on the Otto MIPS platform supports up
> > to two
> > + banks of 32 GPIOs, with edge triggered interrupts. The 32
> > GPIOs
> > + are grouped in four 8-bit wide ports.
> 
> When allowing module build, here you may add what will be the name of
> it.
> 
> ...
> 
> > +/*
> > + * Total register block size is 0x1C for four ports.
> > + * On the RTL8380/RLT8390 platforms port A, B, and C are
> > implemented.
> 
> D?

No port D on 8380/8390. Only 24 GPIO lines are present on these
platforms. I'll rephrase this comment.

> 
> > + * RTL8389 and RTL8328 implement a second bank with ports E, F, G,
> > and H.
> > + *
> > + * Port information is stored with the first port at offset 0,
> > followed by the
> > + * second, etc. Most registers store one bit per GPIO and should be
> > read out in
> > + * reversed endian order. The two interrupt mask registers store two
> > bits per
> > + * GPIO, and should be manipulated with swahw32, if required.
> > + */

This reference to swahw32 and the include of linux/swab.h will be
dropped.

> 
> > +/*
> 
> Seems like kernel doc format with missed ** header and properly formed
> summary and description.

I'll reformat.

> 
> > + * Realtek GPIO driver data
> > + * Because the interrupt mask register (IMR) combines the function
> > of
> > + * IRQ type selection and masking, two extra values are stored.
> > + * intr_mask is used to mask/unmask the interrupts for certain
> > GPIO,
> > + * and intr_type is used to store the selected interrupt types.
> > The
> > + * logical AND of these values is written to IMR on changes.
> > + *
> > + * @gc Associated gpio_chip instance
> > + * @base Base address of the register block
> > + * @lock Lock for accessing the IRQ registers and values
> > + * @intr_mask Mask for GPIO interrupts
> > + * @intr_type GPIO interrupt type selection
> > + */
> > +struct realtek_gpio_ctrl {
> > +   struct gpio_chip gc;
> > +   void __iomem *base;
> > +   raw_spinlock_t lock;
> > +   u16 intr_mask[REALTEK_GPIO_PORTS_PER_BANK];
> > +   u16 intr_type[REALTEK_GPIO_PORTS_PER_BANK];
> > +};
> > +
> > +enum realtek_gpio_flags {
> > +   GPIO_INTERRUPTS = BIT(0),
> > +};
> 
> ...

See below. I'll add a comment.

> 
> > +static struct realtek_gpio_ctrl *irq_data_to_ctrl(struct irq_data
> > *data)
> > +{
> > +   struct gpio_chip *gc = irq_data_get_irq_chip_data(data);
> > +
> > +   return container_of(gc, struct realtek_gpio_ctrl, gc);
> > +}
> 
> > +static unsigned int line_to_port(unsigned int line)
> > +{
> > +   return line / 8;
> > +}
> > +
> > +static unsigned int line_to_port_pin(unsigned int line)
> > +{
> > +   return line % 8;
> > +}
> 
> These are useless. Just use them inline.

I added these as the alternative of the /16 and %16 I had for the IMR
offsets in v2. The function names tell the reader _why_ I'm doing the
division and modulo operations, but I guess a properly named variable
would do the same.

> 
> > +static u8 read_u8_reg(void __iomem *reg, unsigned int port)
> > +{
> > +   return ioread8(reg + port);
> > +}
> > +
> > +static void write_u8_reg(void __iomem *reg, unsigned int port, u8
> > value)
> > +{
> > +   iowrite8(value, reg + port);
> > +}
> > +
> > +static void write_u16_reg(void __iomem *reg, unsigned int port, u16
> > value)
> > +{
> > +   iowrite16(value, reg + 2 * port);
> > +}
> 
> What's the point? You better provide a controller structure as a
> parameter. Look into other drivers. There are plenty of examples how
> to provide IO accessors in smarter way.

Since these are currently only really used for IMR and ISR, I'll fold
them into their accessor functions for v5.

> 
> > +static void realtek_gpio_write

[PATCH v4 2/2] gpio: Add Realtek Otto GPIO support

2021-03-26 Thread Sander Vanheule
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to
64 GPIOs, divided over two banks. Each bank has a set of registers for
32 GPIOs, with support for edge-triggered interrupts.

Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most
registers pack one bit per GPIO, except for the IMR register, which
packs two bits per GPIO (AB-CD).

Although the byte order is currently assumed to have port A..D at offset
0x0..0x3, this has been observed to be reversed on other, Lexra-based,
SoCs (e.g. RTL8196E/97D/97F).

Interrupt support is disabled for the fallback devicetree-compatible
'realtek,otto-gpio'. This allows for quick support of GPIO banks in
which the byte order would be unknown. In this case, the port ordering
in the IMR registers may not match the reversed order in the other
registers (DCBA, and BA-DC or DC-BA).

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 drivers/gpio/Kconfig |  11 ++
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-realtek-otto.c | 330 +++
 3 files changed, 342 insertions(+)
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..d3be17812f94 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -502,6 +502,17 @@ config GPIO_RDA
help
  Say Y here to support RDA Micro GPIO controller.
 
+config GPIO_REALTEK_OTTO
+   bool "Realtek Otto GPIO support"
+   depends on MACH_REALTEK_RTL
+   default MACH_REALTEK_RTL
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ The GPIO controller on the Otto MIPS platform supports up to two
+ banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs
+ are grouped in four 8-bit wide ports.
+
 config GPIO_REG
bool
help
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..8ace5934e3c3 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583)  += gpio-rc5t583.o
 obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o
 obj-$(CONFIG_GPIO_RDA) += gpio-rda.o
 obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o
+obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o
 obj-$(CONFIG_GPIO_REG) += gpio-reg.o
 obj-$(CONFIG_ARCH_SA1100)  += gpio-sa1100.o
 obj-$(CONFIG_GPIO_SAMA5D2_PIOBU)   += gpio-sama5d2-piobu.o
diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c
new file mode 100644
index ..07641a1686eb
--- /dev/null
+++ b/drivers/gpio/gpio-realtek-otto.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Total register block size is 0x1C for four ports.
+ * On the RTL8380/RLT8390 platforms port A, B, and C are implemented.
+ * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H.
+ *
+ * Port information is stored with the first port at offset 0, followed by the
+ * second, etc. Most registers store one bit per GPIO and should be read out in
+ * reversed endian order. The two interrupt mask registers store two bits per
+ * GPIO, and should be manipulated with swahw32, if required.
+ */
+
+/*
+ * Pin select: (0) "normal", (1) "dedicate peripheral"
+ * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits
+ * in the peripheral registers.
+ */
+#define REALTEK_GPIO_REG_CNR   0x00
+/* Clear bit (0) for input, set bit (1) for output */
+#define REALTEK_GPIO_REG_DIR   0x08
+#define REALTEK_GPIO_REG_DATA  0x0C
+/* Read bit for IRQ status, write 1 to clear IRQ */
+#define REALTEK_GPIO_REG_ISR   0x10
+/* Two bits per GPIO in IMR registers */
+#define REALTEK_GPIO_REG_IMR   0x14
+#define REALTEK_GPIO_REG_IMR_AB0x14
+#define REALTEK_GPIO_REG_IMR_CD0x18
+#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0)
+#define REALTEK_GPIO_IRQ_EDGE_FALLING  1
+#define REALTEK_GPIO_IRQ_EDGE_RISING   2
+#define REALTEK_GPIO_IRQ_EDGE_BOTH 3
+
+#define REALTEK_GPIO_MAX   32
+#define REALTEK_GPIO_PORTS_PER_BANK4
+
+/*
+ * Realtek GPIO driver data
+ * Because the interrupt mask register (IMR) combines the function of
+ * IRQ type selection and masking, two extra values are stored.
+ * intr_mask is used to mask/unmask the interrupts for certain GPIO,
+ * and intr_type is used to store the selected interrupt types. The
+ * logical AND of these values is written to IMR on changes.
+ *
+ * @gc Associated gpio_chip instance
+ * @base Base address of the register block
+ * @lock Lock for accessing the IRQ registers and values
+ * @intr_mask Mask for GPIO interrupts
+ * @intr_type GPIO interrupt type selection
+ */
+struct realtek_gpio_ctrl {
+

[PATCH v4 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO

2021-03-26 Thread Sander Vanheule
Add a binding description for Realtek's GPIO controller found on several
of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and
RTL839x series of switch SoCs.

A fallback binding 'realtek,otto-gpio' is provided for cases where the
actual port ordering is not known yet, and enabling the interrupt
controller may result in uncaught interrupts.

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 .../bindings/gpio/realtek,otto-gpio.yaml  | 78 +++
 1 file changed, 78 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml

diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml 
b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
new file mode 100644
index ..100f20cebd76
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
@@ -0,0 +1,78 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek Otto GPIO controller
+
+maintainers:
+  - Sander Vanheule 
+  - Bert Vermeulen 
+
+description: |
+  Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists
+  of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts.
+  Each bank's interrupts are cascased into one interrupt line on the parent
+  interrupt controller, if provided.
+  This binding allows defining a single bank in the devicetree. The interrupt
+  controller is not supported on the fallback compatible name, which only
+  allows for GPIO port use.
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+items:
+  - enum:
+  - realtek,rtl8380-gpio
+  - realtek,rtl8390-gpio
+  - const: realtek,otto-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  ngpios:
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+additionalProperties: false
+
+dependencies:
+  interrupt-controller: [ interrupts ]
+
+examples:
+  - |
+  gpio@3500 {
+compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio";
+reg = <0x3500 0x1c>;
+gpio-controller;
+#gpio-cells = <2>;
+ngpios = <24>;
+interrupt-controller;
+#interrupt-cells = <2>;
+interrupt-parent = <&rtlintc>;
+interrupts = <23>;
+  };
+
+...
-- 
2.30.2



[PATCH v4 0/2] Add Realtek Otto GPIO support

2021-03-26 Thread Sander Vanheule
Add support for the GPIO controller employed by Realtek in multiple series of
MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout
also matches the one found in the GPIO controller of other (Lexra-based) SoCs
such as RTL8196E, RTL8197D, and RTL8197F.

For the platform name 'otto', I am not aware of any official resources as to
what hardware this specifically applies to. However, in all of the GPL archives
we've received, from vendors using compatible SoCs in their design, the
platform under the MIPS architecture is referred to by this name.

The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel
GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have
been tested on a Netgear GS110TPPv1 (RTL8381).

Changes in v4:
- Fix pointer notation style
- Drop unused read_u16_reg() function
- Drop 'inline' specifier from functions

Changes in v3:
- Remove OF dependencies in driver probe
- Don't accept IRQ_TYPE_NONE as a valid interrupt type
- Remove (now unused) dev property from control structure
- Use u8/u16 port registers, instead of raw u32 registers
- Use 'line' name for gpiochip, 'port' and 'pin' names for hardware
- Renamed DT bindings file
- Dropped fallback-only DT compatible
- Various code style clean-ups

Changes in v2:
- Clarify structure and usage of IMR registers
- Added Linus' Reviewed-by tags

Sander Vanheule (2):
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  gpio: Add Realtek Otto GPIO support

 .../bindings/gpio/realtek,otto-gpio.yaml  |  78 +
 drivers/gpio/Kconfig  |  11 +
 drivers/gpio/Makefile |   1 +
 drivers/gpio/gpio-realtek-otto.c  | 330 ++
 4 files changed, 420 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

-- 
2.30.2



Re: [PATCH v3 2/2] gpio: Add Realtek Otto GPIO support

2021-03-24 Thread Sander Vanheule
On Wed, 2021-03-24 at 22:22 +0100, Sander Vanheule wrote:
> +static inline u8 read_u8_reg(void __iomem* reg, unsigned int port)
> +{
> +   return ioread8(reg + port);
> +}
> +
> +static inline void write_u8_reg(void __iomem* reg, unsigned int port,
> u8 value)
> +{
> +   iowrite8(value, reg + port);
> +}
> +
> +static inline u16 read_u16_reg(void __iomem* reg, unsigned int port)
> +{
> +   return ioread16(reg + 2 * port);
> +}
> +
> +static inline void write_u16_reg(void __iomem* reg, unsigned int
> port, u16 value)
> +{
> +   iowrite16(value, reg + 2 * port);
> +}

Of course I only noticed this after sending v3, but these functions
should have "void __iomem *reg" instead. I can fix this in a next
version.

Best,
Sander



[PATCH v3 0/2] Add Realtek Otto GPIO support

2021-03-24 Thread Sander Vanheule
Add support for the GPIO controller employed by Realtek in multiple series of
MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout
also matches the one found in the GPIO controller of other (Lexra-based) SoCs
such as RTL8196E, RTL8197D, and RTL8197F.

For the platform name 'otto', I am not aware of any official resources as to
what hardware this specifically applies to. However, in all of the GPL archives
we've received, from vendors using compatible SoCs in their design, the
platform under the MIPS architecture is referred to by this name.

The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel
GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have
been tested on a Netgear GS110TPPv1 (RTL8381).

Changes in v3:
- Remove OF dependencies in driver probe
- Don't accept IRQ_TYPE_NONE as a valid interrupt type
- Remove (now unused) dev property from control structure
- Use u8/u16 port registers, instead of raw u32 registers
- Use 'line' name for gpiochip, 'port' and 'pin' names for hardware
- Renamed DT bindings file
- Dropped fallback-only DT compatible
- Various code style clean-ups

Changes in v2:
- Clarify structure and usage of IMR registers
- Added Linus' Reviewed-by tags

Sander Vanheule (2):
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  gpio: Add Realtek Otto GPIO support

 .../bindings/gpio/realtek,otto-gpio.yaml  |  78 
 drivers/gpio/Kconfig  |  11 +
 drivers/gpio/Makefile |   1 +
 drivers/gpio/gpio-realtek-otto.c  | 335 ++
 4 files changed, 425 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

-- 
2.30.2



[PATCH v3 2/2] gpio: Add Realtek Otto GPIO support

2021-03-24 Thread Sander Vanheule
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to
64 GPIOs, divided over two banks. Each bank has a set of registers for
32 GPIOs, with support for edge-triggered interrupts.

Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most
registers pack one bit per GPIO, except for the IMR register, which
packs two bits per GPIO (AB-CD).

Although the byte order is currently assumed to have port A..D at offset
0x0..0x3, this has been observed to be reversed on other, Lexra-based,
SoCs (e.g. RTL8196E/97D/97F).

Interrupt support is disabled for the fallback devicetree-compatible
'realtek,otto-gpio'. This allows for quick support of GPIO banks in
which the byte order would be unknown. In this case, the port ordering
in the IMR registers may not match the reversed order in the other
registers (DCBA, and BA-DC or DC-BA).

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 drivers/gpio/Kconfig |  11 +
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-realtek-otto.c | 335 +++
 3 files changed, 347 insertions(+)
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..d3be17812f94 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -502,6 +502,17 @@ config GPIO_RDA
help
  Say Y here to support RDA Micro GPIO controller.
 
+config GPIO_REALTEK_OTTO
+   bool "Realtek Otto GPIO support"
+   depends on MACH_REALTEK_RTL
+   default MACH_REALTEK_RTL
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ The GPIO controller on the Otto MIPS platform supports up to two
+ banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs
+ are grouped in four 8-bit wide ports.
+
 config GPIO_REG
bool
help
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..8ace5934e3c3 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583)  += gpio-rc5t583.o
 obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o
 obj-$(CONFIG_GPIO_RDA) += gpio-rda.o
 obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o
+obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o
 obj-$(CONFIG_GPIO_REG) += gpio-reg.o
 obj-$(CONFIG_ARCH_SA1100)  += gpio-sa1100.o
 obj-$(CONFIG_GPIO_SAMA5D2_PIOBU)   += gpio-sama5d2-piobu.o
diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c
new file mode 100644
index ..0714d54e08d1
--- /dev/null
+++ b/drivers/gpio/gpio-realtek-otto.c
@@ -0,0 +1,335 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Total register block size is 0x1C for four ports.
+ * On the RTL8380/RLT8390 platforms port A, B, and C are implemented.
+ * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H.
+ *
+ * Port information is stored with the first port at offset 0, followed by the
+ * second, etc. Most registers store one bit per GPIO and should be read out in
+ * reversed endian order. The two interrupt mask registers store two bits per
+ * GPIO, and should be manipulated with swahw32, if required.
+ */
+
+/*
+ * Pin select: (0) "normal", (1) "dedicate peripheral"
+ * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits
+ * in the peripheral registers.
+ */
+#define REALTEK_GPIO_REG_CNR   0x00
+/* Clear bit (0) for input, set bit (1) for output */
+#define REALTEK_GPIO_REG_DIR   0x08
+#define REALTEK_GPIO_REG_DATA  0x0C
+/* Read bit for IRQ status, write 1 to clear IRQ */
+#define REALTEK_GPIO_REG_ISR   0x10
+/* Two bits per GPIO in IMR registers */
+#define REALTEK_GPIO_REG_IMR   0x14
+#define REALTEK_GPIO_REG_IMR_AB0x14
+#define REALTEK_GPIO_REG_IMR_CD0x18
+#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0)
+#define REALTEK_GPIO_IRQ_EDGE_FALLING  1
+#define REALTEK_GPIO_IRQ_EDGE_RISING   2
+#define REALTEK_GPIO_IRQ_EDGE_BOTH 3
+
+#define REALTEK_GPIO_MAX   32
+#define REALTEK_GPIO_PORTS_PER_BANK4
+
+/*
+ * Realtek GPIO driver data
+ * Because the interrupt mask register (IMR) combines the function of
+ * IRQ type selection and masking, two extra values are stored.
+ * intr_mask is used to mask/unmask the interrupts for certain GPIO,
+ * and intr_type is used to store the selected interrupt types. The
+ * logical AND of these values is written to IMR on changes.
+ *
+ * @gc Associated gpio_chip instance
+ * @base Base address of the register block
+ * @lock Lock for accessing the IRQ registers and values
+ * @intr_mask Mask for GPIO interrupts
+ * @intr_type GPIO interrupt type selection
+ */
+struct realtek_gpio_ctrl {
+

[PATCH v3 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO

2021-03-24 Thread Sander Vanheule
Add a binding description for Realtek's GPIO controller found on several
of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and
RTL839x series of switch SoCs.

A fallback binding 'realtek,otto-gpio' is provided for cases where the
actual port ordering is not known yet, and enabling the interrupt
controller may result in uncaught interrupts.

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 .../bindings/gpio/realtek,otto-gpio.yaml  | 78 +++
 1 file changed, 78 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml

diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml 
b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
new file mode 100644
index ..100f20cebd76
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml
@@ -0,0 +1,78 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek Otto GPIO controller
+
+maintainers:
+  - Sander Vanheule 
+  - Bert Vermeulen 
+
+description: |
+  Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists
+  of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts.
+  Each bank's interrupts are cascased into one interrupt line on the parent
+  interrupt controller, if provided.
+  This binding allows defining a single bank in the devicetree. The interrupt
+  controller is not supported on the fallback compatible name, which only
+  allows for GPIO port use.
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+items:
+  - enum:
+  - realtek,rtl8380-gpio
+  - realtek,rtl8390-gpio
+  - const: realtek,otto-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  ngpios:
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+additionalProperties: false
+
+dependencies:
+  interrupt-controller: [ interrupts ]
+
+examples:
+  - |
+  gpio@3500 {
+compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio";
+reg = <0x3500 0x1c>;
+gpio-controller;
+#gpio-cells = <2>;
+ngpios = <24>;
+interrupt-controller;
+#interrupt-cells = <2>;
+interrupt-parent = <&rtlintc>;
+interrupts = <23>;
+  };
+
+...
-- 
2.30.2



Re: [PATCH v2 2/2] gpio: Add Realtek Otto GPIO support

2021-03-19 Thread Sander Vanheule
On Fri, 2021-03-19 at 23:24 +0200, Andy Shevchenko wrote:
> On Fri, Mar 19, 2021 at 11:20 PM Sander Vanheule <
> san...@svanheule.net> wrote:
> > On Fri, 2021-03-19 at 19:57 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 19, 2021 at 5:51 PM Sander Vanheule
> > >  wrote:
> > > > On Wed, 2021-03-17 at 15:08 +0200, Andy Shevchenko wrote:
> > > > > On Mon, Mar 15, 2021 at 11:11 PM Sander Vanheule <
> > > > > san...@svanheule.net> wrote:
> 
> ...
> 
> > > > > > +   return swab32(readl(ctrl->base +
> > > > > > REALTEK_GPIO_REG_ISR));
> > > > > 
> > > > > Why swab?! How is this supposed to work on BE CPUs?
> > > > > Ditto for all swabXX() usage.
> > > > 
> > > > My use of swab32/swahw32 has little to do with the CPU being BE
> > > > or
> > > > LE,
> > > > but more with the register packing in the GPIO peripheral.
> > > > 
> > > > The supported SoCs have port layout A-B-C-D in the registers,
> > > > where
> > > > firmware built with Realtek's SDK always denotes A0 as the first
> > > > GPIO
> > > > line. So bit 24 in a register has the value for A0 (with the
> > > > exception
> > > > of the IMR register).
> > > > 
> > > > I wrote these wrapper functions to be able to use the BIT() macro
> > > > with
> > > > the GPIO line number, similar to how gpio-mmio uses ioread32be()
> > > > when
> > > > the BGPIOF_BIG_ENDIAN_BYTE_ORDER flag is used.
> > > > 
> > > > For the IMR register, port A again comes first, but is now 16
> > > > bits
> > > > wide
> > > > instead of 8, with A0 at bits 16:17. That's why swahw32 is used
> > > > for
> > > > this register.
> > > > 
> > > > On the currently unsupported RTL9300-series, the port layout is
> > > > reversed: D-C-B-A. GPIO line A0 is then at bit 0, so the swapping
> > > > functions won't be required. When support for this alternate port
> > > > layout is added, some code will need to be added to differentiate
> > > > between the two cases.
> > > 
> > > Yes, you have different endianess on the hardware level, why not to
> > > use the proper accessors (with or without utilization of the above
> > > mentioned BGPIOF_BIG_ENDIAN_BYTE_ORDER)?
> > 
> > The point I was trying to make, is that it isn't an endianess issue.
> > I
> > shouldn't have used a register with single byte values to try to
> > illustrate that.
> > 
> > Consider instead the interrupt masking registers. To write the IMR
> > bits
> > for port A (GPIO 0-7), a 16-bit value must be written. This value
> > (e.g.
> > u16 port_a_imr) is always BE, independent of the packing order of the
> > ports in the registers:
> > 
> >    // On RTL8380: port A is in the upper word
> >    writew(port_a_imr, base + OFFSET_IMR_AB);
> > 
> >    // On RTL9300: port A is in the lower word
> >    writew(port_a_imr, base + OFFSET_IMR_AB + 2);
> > 
> > I want the low GPIO lines to be in the lower half-word, so I can
> > manipulate GPIO lines 0-15 with simple mask and shift operations.
> > 
> > It just so happens, that all registers needed by bgpio_init contain
> > single-byte values. With BGPIO_BIG_ENDIAN_BYTE_ORDER  the port order
> > is
> > reversed as required, but it's a bit of a misnomer here.
> 
> How many registers (per GPIO / port) do you have?
> Can you list them and show endianess of the data for each of them and
> for old and new hardware (something like a 3 column table)?

Each GPIO bank, with 32 GPIO lines, consists of four 8-line ports.
There are seven registers per port, but only five are used:

   || Data| RTL8380| RTL9300 
Reg| Offset | type| byte order | byte order
---++-++---
DIR| 0x08   | 4 * u8  | A-B-C-D| D-C-B-A
DATA   | 0x0C   | 4 * u8  | A-B-C-D| D-C-B-A
ISR| 0x10   | 4 * u8  | A-B-C-D| D-C-B-A
IMR_AB | 0x14   | 2 * u16 | A-A-B-B| B-B-A-A
IMR_CD | 0x18   | 2 * u16 | C-C-D-D| D-D-C-C

The unused other registers are all 4*u8.

A-B-C-D means:  (A << 24) | (B << 16) | (C << 8) | D
A-A-B-B means:  (A << 16) | B

--
Best,
Sander



Re: [PATCH v2 2/2] gpio: Add Realtek Otto GPIO support

2021-03-19 Thread Sander Vanheule
On Fri, 2021-03-19 at 19:57 +0200, Andy Shevchenko wrote:
> On Fri, Mar 19, 2021 at 5:51 PM Sander Vanheule
>  wrote:
> > On Wed, 2021-03-17 at 15:08 +0200, Andy Shevchenko wrote:
> > > On Mon, Mar 15, 2021 at 11:11 PM Sander Vanheule <
> > > san...@svanheule.net> wrote:
> 
> ...
> 
> > > > +#include 
> > > 
> > > Not sure why you need this? See below.
> 
> > > > +   return swab32(readl(ctrl->base +
> > > > REALTEK_GPIO_REG_ISR));
> > > 
> > > Why swab?! How is this supposed to work on BE CPUs?
> > > Ditto for all swabXX() usage.
> > 
> > My use of swab32/swahw32 has little to do with the CPU being BE or
> > LE,
> > but more with the register packing in the GPIO peripheral.
> > 
> > The supported SoCs have port layout A-B-C-D in the registers, where
> > firmware built with Realtek's SDK always denotes A0 as the first
> > GPIO
> > line. So bit 24 in a register has the value for A0 (with the
> > exception
> > of the IMR register).
> > 
> > I wrote these wrapper functions to be able to use the BIT() macro
> > with
> > the GPIO line number, similar to how gpio-mmio uses ioread32be()
> > when
> > the BGPIOF_BIG_ENDIAN_BYTE_ORDER flag is used.
> > 
> > For the IMR register, port A again comes first, but is now 16 bits
> > wide
> > instead of 8, with A0 at bits 16:17. That's why swahw32 is used for
> > this register.
> > 
> > On the currently unsupported RTL9300-series, the port layout is
> > reversed: D-C-B-A. GPIO line A0 is then at bit 0, so the swapping
> > functions won't be required. When support for this alternate port
> > layout is added, some code will need to be added to differentiate
> > between the two cases.
> 
> Yes, you have different endianess on the hardware level, why not to
> use the proper accessors (with or without utilization of the above
> mentioned BGPIOF_BIG_ENDIAN_BYTE_ORDER)?

The point I was trying to make, is that it isn't an endianess issue. I
shouldn't have used a register with single byte values to try to
illustrate that.

Consider instead the interrupt masking registers. To write the IMR bits
for port A (GPIO 0-7), a 16-bit value must be written. This value (e.g.
u16 port_a_imr) is always BE, independent of the packing order of the
ports in the registers:

   // On RTL8380: port A is in the upper word
   writew(port_a_imr, base + OFFSET_IMR_AB);
   
   // On RTL9300: port A is in the lower word
   writew(port_a_imr, base + OFFSET_IMR_AB + 2);

I want the low GPIO lines to be in the lower half-word, so I can
manipulate GPIO lines 0-15 with simple mask and shift operations.

It just so happens, that all registers needed by bgpio_init contain
single-byte values. With BGPIO_BIG_ENDIAN_BYTE_ORDER  the port order is
reversed as required, but it's a bit of a misnomer here.


Best,
Sander



Re: [PATCH v2 2/2] gpio: Add Realtek Otto GPIO support

2021-03-19 Thread Sander Vanheule
Hi Andy,

Thanks for the review. I'll address the style comments in a v3. Some
further comments and discussion below.


On Wed, 2021-03-17 at 15:08 +0200, Andy Shevchenko wrote:
> On Mon, Mar 15, 2021 at 11:11 PM Sander Vanheule < 
> san...@svanheule.net> wrote:
> > +   depends on OF_GPIO
> 
> Don't see how it's used.

It isn't, so I'll remove it.


> > +#include 
> 
> Why?
> Perhaps what you need is property.h and mod_devicetable.h. See below.

With you suggestions, I was able to drop most explicit OF references.
Only of_device_id remains, for which I'll include mod_devicetable.h.


> > +#include 
> 
> Not sure why you need this? See below.

[snip]

> 
> > +
> > +static inline u32 realtek_gpio_isr_read(struct realtek_gpio_ctrl
> > *ctrl)
> > +{
> > +   return swab32(readl(ctrl->base + REALTEK_GPIO_REG_ISR));
> 
> Why swab?! How is this supposed to work on BE CPUs?
> Ditto for all swabXX() usage.

My use of swab32/swahw32 has little to do with the CPU being BE or LE,
but more with the register packing in the GPIO peripheral.

The supported SoCs have port layout A-B-C-D in the registers, where
firmware built with Realtek's SDK always denotes A0 as the first GPIO
line. So bit 24 in a register has the value for A0 (with the exception
of the IMR register).

I wrote these wrapper functions to be able to use the BIT() macro with
the GPIO line number, similar to how gpio-mmio uses ioread32be() when
the BGPIOF_BIG_ENDIAN_BYTE_ORDER flag is used.

For the IMR register, port A again comes first, but is now 16 bits wide
instead of 8, with A0 at bits 16:17. That's why swahw32 is used for
this register.

On the currently unsupported RTL9300-series, the port layout is
reversed: D-C-B-A. GPIO line A0 is then at bit 0, so the swapping
functions won't be required. When support for this alternate port
layout is added, some code will need to be added to differentiate
between the two cases.


> > +}
> > +
> > +static inline void realtek_gpio_isr_clear(struct realtek_gpio_ctrl
> > *ctrl,
> > +   unsigned int pin_mask)
> > +{
> > +   writel(swab32(pin_mask), ctrl->base +
> > REALTEK_GPIO_REG_ISR);
> > +}
> > +
> > +static inline void realtek_gpio_update_imr(struct
> > realtek_gpio_ctrl *ctrl,
> > +   unsigned int imr_offset, u32 type, u32 mask)
> > +{
> > +   unsigned int reg;
> > +
> > +   if (imr_offset == 0)
> > +   reg = REALTEK_GPIO_REG_IMR_AB;
> > +   else
> > +   reg = REALTEK_GPIO_REG_IMR_CD;
> > +   writel(swahw32(type & mask), ctrl->base + reg);
> > +}

[snip]

> > +   switch (flow_type & IRQ_TYPE_SENSE_MASK) {
> 
> > +   case IRQ_TYPE_NONE:
> > +   type = 0;
> > +   handler = handle_bad_irq;
> > +   break;
> 
> Why is it here? Make it default like many other GPIO drivers do.
> 
> > +   case IRQ_TYPE_EDGE_FALLING:
> > +   type = REALTEK_GPIO_IRQ_EDGE_FALLING;
> > +   handler = handle_edge_irq;
> > +   break;
> > +   case IRQ_TYPE_EDGE_RISING:
> > +   type = REALTEK_GPIO_IRQ_EDGE_RISING;
> > +   handler = handle_edge_irq;
> > +   break;
> > +   case IRQ_TYPE_EDGE_BOTH:
> > +   type = REALTEK_GPIO_IRQ_EDGE_BOTH;
> > +   handler = handle_edge_irq;
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +
> > +   irq_set_handler_locked(data, handler);
> 
> handler is always the same. Use it directly here.

I'll drop the IRQ_TYPE_NONE case. Do I understand it correctly, that
IRQ_TYPE_NONE should never be used as the new value, but only as the
default initial value?


Best,
Sander






Re: [PATCH 2/2] gpio: Add Realtek Otto GPIO support

2021-03-15 Thread Sander Vanheule
On Mon, 2021-03-15 at 16:10 +0100, Linus Walleij wrote:
> On Mon, Mar 15, 2021 at 9:26 AM Sander Vanheule
>  wrote:
> 
> > Realtek MIPS SoCs (platform name Otto) have GPIO controllers with
> > up to
> > 64 GPIOs, divided over two banks. Each bank has a set of registers
> > for
> > 32 GPIOs, with support for edge-triggered interrupts.
> > 
> > Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH).
> > Most
> > registers pack one bit per GPIO, except for the IMR register, which
> > packs two bits per GPIO (AB-CD).
> > 
> > Although the byte order is currently assumed to have port A..D at
> > offset
> > 0x0..0x3, this has been observed to be reversed on other, Lexra-
> > based,
> > SoCs (e.g. RTL8196E/97D/97F).
> > 
> > Interrupt support is disabled for the fallback devicetree-
> > compatible
> > 'realtek,otto-gpio'. This allows for quick support of GPIO banks in
> > which the byte order would be unknown. In this case, the port
> > ordering
> > in the IMR registers may not match the reversed order in the other
> > registers (DCBA, and BA-DC or DC-BA).
> > 
> > Signed-off-by: Sander Vanheule 
> 
> Overall this is a beautiful driver and it makes use of all the
> generic
> frameworks I can think of. I don't see any reason not to merge
> it so:
> Reviewed-by: Linus Walleij 

Thanks for the review and the kind comments!


> 
> The following is some notes and nitpicks, nothing blocking any
> merge, more like discussion.
> 
> > +enum realtek_gpio_flags {
> > +   GPIO_INTERRUPTS = BIT(0),
> > +};
> 
> I suppose this looks like this because more flags will be introduced
> when you add more functionality to the driver. Otherwise it seems
> like overkill so a bool would suffice.
> 
> I would add a comment /* TODO: this will be expanded */
> 

That's correct, I would like this to be extendable. Like the commit
message noted, some other SoC appear to have port order D-C-B-A. The
current driver only supports the A-B-C-D port order, so a flag could be
added to differentiate between A-first and D-first.

Another flag that will be added in the future, is one to indicate that
the GPIO block has extra interrupt control registers, located after the
second GPIO bank.

For example, the rtl9300-series appears to have both the reversed port
order, and an extra "interrupt enable" register. This is not yet
implemented, since I don't currently have a device with this type of
SoC.


> > +static inline u32 realtek_gpio_imr_bits(unsigned int pin, u32
> > value)
> > +{
> > +   return ((value & 0x3) << 2*(pin % 16));
> > +}
> 
> I would explain a bit about this, obviouslt it is two bit per
> line, but it took me some time to parse, so a comment
> about the bit layout would be nice.
> 
> > +   unsigned int offset = pin/16;
> 
> Here that number appears again.
> 

I've updated the patch (and added your Reviewed-by tags) for a v2.
Hopefully this is now more obvious from the code and comments.

Best,
Sander

> The use of GPIO_GENERIC and GPIO irqchip is flawless
> and first class.
> 
> Thanks!
> Linus Walleij




[PATCH v2 0/2] Add Realtek Otto GPIO support

2021-03-15 Thread Sander Vanheule
Add support for the GPIO controller employed by Realtek in multiple series of
MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout
also matches the one found in the GPIO controller of other (Lexra-based) SoCs
such as RTL8196E, RTL8197D, and RTL8197F.

For the platform name 'otto', I am not aware of any official resources as to
what hardware this specifically applies to. However, in all of the GPL archives
we've received, from vendors using compatible SoCs in their design, the
platform under the MIPS architecture is referred to by this name.

The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel
GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have
been tested on a Netgear GS110TPPv1 (RTL8381).

Changes in v2:
- Clarify structure and usage of IMR registers

Sander Vanheule (2):
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  gpio: Add Realtek Otto GPIO support

 .../bindings/gpio/gpio-realtek-otto.yaml  |  80 +
 drivers/gpio/Kconfig  |  12 +
 drivers/gpio/Makefile |   1 +
 drivers/gpio/gpio-realtek-otto.c  | 331 ++
 4 files changed, 424 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

-- 
2.30.2



[PATCH v2 2/2] gpio: Add Realtek Otto GPIO support

2021-03-15 Thread Sander Vanheule
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to
64 GPIOs, divided over two banks. Each bank has a set of registers for
32 GPIOs, with support for edge-triggered interrupts.

Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most
registers pack one bit per GPIO, except for the IMR register, which
packs two bits per GPIO (AB-CD).

Although the byte order is currently assumed to have port A..D at offset
0x0..0x3, this has been observed to be reversed on other, Lexra-based,
SoCs (e.g. RTL8196E/97D/97F).

Interrupt support is disabled for the fallback devicetree-compatible
'realtek,otto-gpio'. This allows for quick support of GPIO banks in
which the byte order would be unknown. In this case, the port ordering
in the IMR registers may not match the reversed order in the other
registers (DCBA, and BA-DC or DC-BA).

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 drivers/gpio/Kconfig |  12 ++
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-realtek-otto.c | 331 +++
 3 files changed, 344 insertions(+)
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..fedf1e49469e 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -502,6 +502,18 @@ config GPIO_RDA
help
  Say Y here to support RDA Micro GPIO controller.
 
+config GPIO_REALTEK_OTTO
+   bool "Realtek Otto GPIO support"
+   depends on MACH_REALTEK_RTL
+   depends on OF_GPIO
+   default MACH_REALTEK_RTL
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ The GPIO controller on the Otto MIPS platform supports up to two
+ banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs
+ are grouped in four 8-bit wide ports.
+
 config GPIO_REG
bool
help
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..8ace5934e3c3 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583)  += gpio-rc5t583.o
 obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o
 obj-$(CONFIG_GPIO_RDA) += gpio-rda.o
 obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o
+obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o
 obj-$(CONFIG_GPIO_REG) += gpio-reg.o
 obj-$(CONFIG_ARCH_SA1100)  += gpio-sa1100.o
 obj-$(CONFIG_GPIO_SAMA5D2_PIOBU)   += gpio-sama5d2-piobu.o
diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c
new file mode 100644
index ..818412687346
--- /dev/null
+++ b/drivers/gpio/gpio-realtek-otto.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Total register block size is 0x1C for four ports.
+ * On the RTL8380/RLT8390 platforms port A, B, and C are implemented.
+ * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H.
+ *
+ * Port information is stored with the first port at offset 0, followed by the
+ * second, etc. Most registers store one bit per GPIO and should be read out in
+ * reversed endian order. The two interrupt mask registers store two bits per
+ * GPIO, and should be manipulated with swahw32, if required.
+ */
+
+/*
+ * Pin select: (0) "normal", (1) "dedicate peripheral"
+ * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits
+ * in the peripheral registers.
+ */
+#define REALTEK_GPIO_REG_CNR   0x00
+/* Clear bit (0) for input, set bit (1) for output */
+#define REALTEK_GPIO_REG_DIR   0x08
+#define REALTEK_GPIO_REG_DATA  0x0C
+/* Read bit for IRQ status, write 1 to clear IRQ */
+#define REALTEK_GPIO_REG_ISR   0x10
+/* Two bits per GPIO */
+#define REALTEK_GPIO_REG_IMR_AB0x14
+#define REALTEK_GPIO_REG_IMR_CD0x18
+#define REALTEK_GPIO_IMR_LINE_MASK 3
+#define REALTEK_GPIO_IRQ_EDGE_FALLING  1
+#define REALTEK_GPIO_IRQ_EDGE_RISING   2
+#define REALTEK_GPIO_IRQ_EDGE_BOTH 3
+
+#define REALTEK_GPIO_MAX   32
+
+/*
+ * Realtek GPIO driver data
+ * Because the interrupt mask register (IMR) combines the function of
+ * IRQ type selection and masking, two extra values are stored.
+ * intr_mask is used to mask/unmask the interrupts for certain GPIO,
+ * and intr_type is used to store the selected interrupt types. The
+ * logical AND of these values is written to IMR on changes.
+ *
+ * @dev Parent device
+ * @gc Associated gpio_chip instance
+ * @base Base address of the register block
+ * @lock Lock for accessing the IRQ registers and values
+ * @intr_mask Mask for GPIO interrupts
+ * @intr_type GPIO interrupt type selection
+ */
+struct realtek_gpio_ctrl {
+   struct device *dev;
+   struct gpio_chip gc;
+   void __iomem *base;

[PATCH v2 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO

2021-03-15 Thread Sander Vanheule
Add a binding description for Realtek's GPIO controller found on several
of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and
RTL839x series of switch SoCs.

A fallback binding 'realtek,otto-gpio' is provided for cases where the
actual port ordering is not known yet, and enabling the interrupt
controller may result in uncaught interrupts.

Signed-off-by: Sander Vanheule 
Reviewed-by: Linus Walleij 
---
 .../bindings/gpio/gpio-realtek-otto.yaml  | 80 +++
 1 file changed, 80 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml

diff --git a/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml 
b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml
new file mode 100644
index ..3e8151e3a169
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/gpio-realtek-otto.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek Otto GPIO controller
+
+maintainers:
+  - Sander Vanheule 
+  - Bert Vermeulen 
+
+description: |
+  Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists
+  of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts.
+  Each bank's interrupts are cascased into one interrupt line on the parent
+  interrupt controller, if provided.
+  This binding allows defining a single bank in the devicetree. The interrupt
+  controller is not supported on the fallback compatible name, which only
+  allows for GPIO port use.
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+oneOf:
+  - items:
+  - enum:
+  - realtek,rtl8380-gpio
+  - realtek,rtl8390-gpio
+  - const: realtek,otto-gpio
+  - const: realtek,otto-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  ngpios:
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+additionalProperties: false
+
+dependencies:
+  interrupt-controller: [ interrupts ]
+
+examples:
+  - |
+  gpio@3500 {
+compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio";
+reg = <0x3500 0x1c>;
+gpio-controller;
+#gpio-cells = <2>;
+ngpios = <24>;
+interrupt-controller;
+#interrupt-cells = <2>;
+interrupt-parent = <&rtlintc>;
+interrupts = <23>;
+  };
+
+...
-- 
2.30.2



[PATCH 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO

2021-03-15 Thread Sander Vanheule
Add a binding description for Realtek's GPIO controller found on several
of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and
RTL839x series of switch SoCs.

A fallback binding 'realtek,otto-gpio' is provided for cases where the
actual port ordering is not known yet, and enabling the interrupt
controller may result in uncaught interrupts.

Signed-off-by: Sander Vanheule 
---
 .../bindings/gpio/gpio-realtek-otto.yaml  | 80 +++
 1 file changed, 80 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml

diff --git a/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml 
b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml
new file mode 100644
index ..3e8151e3a169
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/gpio-realtek-otto.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek Otto GPIO controller
+
+maintainers:
+  - Sander Vanheule 
+  - Bert Vermeulen 
+
+description: |
+  Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists
+  of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts.
+  Each bank's interrupts are cascased into one interrupt line on the parent
+  interrupt controller, if provided.
+  This binding allows defining a single bank in the devicetree. The interrupt
+  controller is not supported on the fallback compatible name, which only
+  allows for GPIO port use.
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+oneOf:
+  - items:
+  - enum:
+  - realtek,rtl8380-gpio
+  - realtek,rtl8390-gpio
+  - const: realtek,otto-gpio
+  - const: realtek,otto-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  ngpios:
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+additionalProperties: false
+
+dependencies:
+  interrupt-controller: [ interrupts ]
+
+examples:
+  - |
+  gpio@3500 {
+compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio";
+reg = <0x3500 0x1c>;
+gpio-controller;
+#gpio-cells = <2>;
+ngpios = <24>;
+interrupt-controller;
+#interrupt-cells = <2>;
+interrupt-parent = <&rtlintc>;
+interrupts = <23>;
+  };
+
+...
-- 
2.30.2



[PATCH 0/2] Add Realtek Otto GPIO support

2021-03-15 Thread Sander Vanheule
Add support for the GPIO controller employed by Realtek in multiple series of
MIPS SoCs. These include the supported RTL838x and RTL839x series.
The register layout also matches the one found in GPIO controllers of
other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F.

For the platform name 'otto', I am not aware of any official resources as to
what hardware this specifically applies to. However, in all of the GPL archives
we've received, from vendors using compatible SoCs in their design, the
platform under the MIPS architecture is referred to by this name.

The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380M), and
Zyxel GS1900-48 (RTL8393M). Furthermore, the GPIO ports and interrupt
controller have been tested on a Netgear GS110TPPv1 (RTL8381M).

Sander Vanheule (2):
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  gpio: Add Realtek Otto GPIO support

 .../bindings/gpio/gpio-realtek-otto.yaml  |  80 +
 drivers/gpio/Kconfig  |  12 +
 drivers/gpio/Makefile |   1 +
 drivers/gpio/gpio-realtek-otto.c  | 320 ++
 4 files changed, 413 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

-- 
2.30.2



[PATCH 2/2] gpio: Add Realtek Otto GPIO support

2021-03-15 Thread Sander Vanheule
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to
64 GPIOs, divided over two banks. Each bank has a set of registers for
32 GPIOs, with support for edge-triggered interrupts.

Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most
registers pack one bit per GPIO, except for the IMR register, which
packs two bits per GPIO (AB-CD).

Although the byte order is currently assumed to have port A..D at offset
0x0..0x3, this has been observed to be reversed on other, Lexra-based,
SoCs (e.g. RTL8196E/97D/97F).

Interrupt support is disabled for the fallback devicetree-compatible
'realtek,otto-gpio'. This allows for quick support of GPIO banks in
which the byte order would be unknown. In this case, the port ordering
in the IMR registers may not match the reversed order in the other
registers (DCBA, and BA-DC or DC-BA).

Signed-off-by: Sander Vanheule 
---
 drivers/gpio/Kconfig |  12 ++
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-realtek-otto.c | 320 +++
 3 files changed, 333 insertions(+)
 create mode 100644 drivers/gpio/gpio-realtek-otto.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..fedf1e49469e 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -502,6 +502,18 @@ config GPIO_RDA
help
  Say Y here to support RDA Micro GPIO controller.
 
+config GPIO_REALTEK_OTTO
+   bool "Realtek Otto GPIO support"
+   depends on MACH_REALTEK_RTL
+   depends on OF_GPIO
+   default MACH_REALTEK_RTL
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ The GPIO controller on the Otto MIPS platform supports up to two
+ banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs
+ are grouped in four 8-bit wide ports.
+
 config GPIO_REG
bool
help
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..8ace5934e3c3 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583)  += gpio-rc5t583.o
 obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o
 obj-$(CONFIG_GPIO_RDA) += gpio-rda.o
 obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o
+obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o
 obj-$(CONFIG_GPIO_REG) += gpio-reg.o
 obj-$(CONFIG_ARCH_SA1100)  += gpio-sa1100.o
 obj-$(CONFIG_GPIO_SAMA5D2_PIOBU)   += gpio-sama5d2-piobu.o
diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c
new file mode 100644
index ..04c11b2085f8
--- /dev/null
+++ b/drivers/gpio/gpio-realtek-otto.c
@@ -0,0 +1,320 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Total register block size is 0x1C for four ports.
+ * On the RTL8380/RLT8390 platforms port A, B, and C are implemented.
+ * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H.
+ *
+ * Port information is stored with the first port at offset 0, followed by the
+ * second, etc. Most registers store one bit per GPIO and should be read out in
+ * reversed endian order. The two interrupt mask registers store two bits per
+ * GPIO, and should be manipulated with swahw32, if required.
+ */
+
+/*
+ * Pin select: (0) "normal", (1) "dedicate peripheral"
+ * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits
+ * in the peripheral registers.
+ */
+#define REALTEK_GPIO_REG_CNR   0x00
+/* Clear bit (0) for input, set bit (1) for output */
+#define REALTEK_GPIO_REG_DIR   0x08
+#define REALTEK_GPIO_REG_DATA  0x0C
+/* Read bit for IRQ status, write 1 to clear IRQ */
+#define REALTEK_GPIO_REG_ISR   0x10
+/* Two bits per GPIO */
+#define REALTEK_GPIO_REG_IMR_AB0x14
+#define REALTEK_GPIO_REG_IMR_CD0x18
+#define REALTEK_GPIO_IRQ_EDGE_FALLING  1
+#define REALTEK_GPIO_IRQ_EDGE_RISING   2
+#define REALTEK_GPIO_IRQ_EDGE_BOTH 3
+
+#define REALTEK_GPIO_MAX   32
+
+/*
+ * Realtek GPIO driver data
+ * Because the interrupt mask register (IMR) combines the function of
+ * IRQ type selection and masking, two extra values are stored.
+ * intr_mask is used to mask/unmask the interrupts for certain GPIO,
+ * and intr_type is used to store the selected interrupt types. The
+ * logical AND of these values is written to IMR on changes.
+ *
+ * @dev Parent device
+ * @gc Associated gpio_chip instance
+ * @base Base address of the register block
+ * @lock Lock for accessing the IRQ registers and values
+ * @intr_mask Mask for GPIO interrupts
+ * @intr_type GPIO interrupt type selection
+ */
+struct realtek_gpio_ctrl {
+   struct device *dev;
+   struct gpio_chip gc;
+   void __iomem *base;
+   raw_spinlock_t lock;
+   u32 intr_mask[2];
+   u32 intr

[PATCH] MIPS: ralink: manage low reset lines

2021-02-03 Thread Sander Vanheule
Reset lines with indices smaller than 8 are currently considered invalid
by the rt2880-reset reset controller.

The MT7621 SoC uses a number of these low reset lines. The DTS defines
reset lines "hsdma", "fe", and "mcm" with respective values 5, 6, and 2.
As a result of the above restriction, these resets cannot be asserted or
de-asserted by the reset controller. In cases where the bootloader does
not de-assert these lines, this results in e.g. the MT7621's internal
switch staying in reset.

Change the reset controller to only ignore the system reset, so all
reset lines with index greater than 0 are considered valid.

Signed-off-by: Sander Vanheule 
---
This patch was tested on a TP-Link EAP235-Wall, with an MT7621DA SoC.
The bootloader on this device would leave reset line 2 ("mcm") asserted,
which caused the internal switch to be unresponsive on an uninterrupted
boot from flash.

When tftpboot was used in the bootloader to load an initramfs, it did
initialise the internal switch, and cleared the mcm reset line. In this
case the switch could be used from the OS. With this patch applied, the
switch works both in an initramfs, and when (cold) booting from flash.

 arch/mips/ralink/reset.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/ralink/reset.c b/arch/mips/ralink/reset.c
index 8126f1260407..274d33078c5e 100644
--- a/arch/mips/ralink/reset.c
+++ b/arch/mips/ralink/reset.c
@@ -27,7 +27,7 @@ static int ralink_assert_device(struct reset_controller_dev 
*rcdev,
 {
u32 val;
 
-   if (id < 8)
+   if (id == 0)
return -1;
 
val = rt_sysc_r32(SYSC_REG_RESET_CTRL);
@@ -42,7 +42,7 @@ static int ralink_deassert_device(struct reset_controller_dev 
*rcdev,
 {
u32 val;
 
-   if (id < 8)
+   if (id == 0)
return -1;
 
val = rt_sysc_r32(SYSC_REG_RESET_CTRL);
-- 
2.29.2



Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V

2019-09-26 Thread Tim Sander
Hi 

Am Mittwoch, 11. September 2019, 04:37:46 CEST schrieb Masahiro Yamada:
> Hi Dinh,
> 
> On Wed, Sep 11, 2019 at 12:22 AM Dinh Nguyen  wrote:
> > On 9/10/19 8:48 AM, Tim Sander wrote:
> > > Hi
> > > 
> > > I have noticed that my SPF records where not in place after moving the
> > > server, so it seems the mail didn't go to the mailing list. Hopefully
> > > that's fixed now.> > 
> > > Am Dienstag, 10. September 2019, 09:16:37 CEST schrieb Masahiro Yamada:
> > >> On Fri, Sep 6, 2019 at 9:39 PM Tim Sander  wrote:
> > >>> Hi
> > >>> 
> > >>> I have noticed that there multiple breakages piling up for the denali
> > >>> nand
> > >>> driver on the Intel/Altera Cyclone V. Unfortunately i had no time to
> > >>> track
> > >>> the mainline kernel closely. So the breakage seems to pile up. I am a
> > >>> little disapointed that Intel is not on the lookout that the kernel
> > >>> works
> > >>> on the chips they are selling. I was really happy about the state of
> > >>> the
> > >>> platform before concerning mainline support.
> > >>> 
> > >>> The failure starts with kernel 4.19 or stable kernel release 4.18.19.
> > >>> The
> > >>> commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f.
> > >> 
> > >> Just for clarification, this corresponds to
> > >> 0d55c668b218a1db68b5044bce4de74e1bd0f0c8 upstream.
> > >> 
> > >>> The problem here is that
> > >>> our platform works with a zero in the SPARE_AREA_SKIP_BYTES register.
> > >> 
> > >> Please clarify the scope of "our platform".
> > >> (Only you, or your company, or every individual using this chip?)
> > > 
> > > The company i work for uses this chip as a base for multiple products.
> > > 
> > >> First, SPARE_AREA_SKIP_BYTES is not the property of the hardware.
> > >> Rather, it is about the OOB layout, in other words, this parameter
> > >> is defined by software.
> > >> 
> > >> For example, U-Boot supports the Denali NAND driver.
> > >> The SPARE_AREA_SKIP_BYTES is a user-configurable parameter:
> > >> https://github.com/u-boot/u-boot/blob/v2019.10-rc3/drivers/mtd/nand/raw
> > >> /Kcon fig#L112
I am using barebox for booting. I looked at the code and found a comment in 
denali_hw_init: 
 * tell driver how many bit controller will skip before
 * writing ECC code in OOB, this register may be already
 * set by firmware. So we read this value out.
 * if this value is 0, just let it be.

I have checked the barebox code and the denali register SPARE_AREA_SKIP_BYTES 
(offset 0x230) is read only once on booting. I have not found any occurrence of 
the register being set by barebox. So i would concur as the value is zero in 
my case that the boot ROM seems not to set the value. The code in barebox is 
mostly imported from linux in 2015 which is before the reorganization which 
happened on the linux side later on.

> > >> 
> > >> 
> > >> Your platform works with a zero in the SPARE_AREA_SKIP_BYTES register
> > >> because the NAND chip on the board was initialized with a zero
> > >> set to the SPARE_AREA_SKIP_BYTES register.
> > >> 
> > >> If the NAND chip had been initialized with 8
> > >> set to the SPARE_AREA_SKIP_BYTES register, it would have
> > >> been working with 8 to the SPARE_AREA_SKIP_BYTES.
> > >> 
> > >> The Boot ROM is the only (semi-)software that is unconfigurable by
> > >> users,
> > >> so the value of SPARE_AREA_SKIP_BYTES should be aligned with
> > >> the boot ROM.
> > >> I recommend you to check the spec of the boot ROM.
> > > 
> > > We boot from NOR flash. That's why i didn't see a problem booting
> > > probably.
> > > 
> > >> (The maintainer of the platform, Dihn is CC'ed,
> > >> so I hope he will jump in)
> > > 
> > > Yes i hope so too.
> > 
> > I don't have access to a NAND device at the moment. I'll try to find one
> > and debug.
I have hardware available to me, so i would be happy to test any ideas/
guesses.

> Dinh,
> Do you have answers for the following questions?
> 
> 
> - Does the SOCFPGA boot ROM support the NAND boot mode?
> 
> - If so, which value does it use for SPARE_AREA_SKIP_BYTES?

Best regards
Tim






Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V

2019-09-11 Thread Tim Sander
Hi

Am Mittwoch, 11. September 2019, 04:37:46 CEST schrieb Masahiro Yamada:
> - Does the SOCFPGA boot ROM support the NAND boot mode?
Cyclone V HPS TRM Section "A3 Booting and Configuration" lists QSPI, SD/MMC and 
Nand as bootsource.

> - If so, which value does it use for SPARE_AREA_SKIP_BYTES?
I have no idea about this one.

Tim




Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V

2019-09-10 Thread Tim Sander
Hi

I have noticed that my SPF records where not in place after moving the server,
so it seems the mail didn't go to the mailing list. Hopefully that's fixed now.

Am Dienstag, 10. September 2019, 09:16:37 CEST schrieb Masahiro Yamada:
> On Fri, Sep 6, 2019 at 9:39 PM Tim Sander  wrote:
> > Hi
> > 
> > I have noticed that there multiple breakages piling up for the denali nand
> > driver on the Intel/Altera Cyclone V. Unfortunately i had no time to track
> > the mainline kernel closely. So the breakage seems to pile up. I am a
> > little disapointed that Intel is not on the lookout that the kernel works
> > on the chips they are selling. I was really happy about the state of the
> > platform before concerning mainline support.
> > 
> > The failure starts with kernel 4.19 or stable kernel release 4.18.19. The
> > commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f.
> 
> Just for clarification, this corresponds to
> 0d55c668b218a1db68b5044bce4de74e1bd0f0c8 upstream.
> 
> > The problem here is that
> > our platform works with a zero in the SPARE_AREA_SKIP_BYTES register.
> 
> Please clarify the scope of "our platform".
> (Only you, or your company, or every individual using this chip?)
The company i work for uses this chip as a base for multiple products.

> First, SPARE_AREA_SKIP_BYTES is not the property of the hardware.
> Rather, it is about the OOB layout, in other words, this parameter
> is defined by software.
> 
> For example, U-Boot supports the Denali NAND driver.
> The SPARE_AREA_SKIP_BYTES is a user-configurable parameter:
> https://github.com/u-boot/u-boot/blob/v2019.10-rc3/drivers/mtd/nand/raw/Kcon
> fig#L112
> 
> 
> Your platform works with a zero in the SPARE_AREA_SKIP_BYTES register
> because the NAND chip on the board was initialized with a zero
> set to the SPARE_AREA_SKIP_BYTES register.
> 
> If the NAND chip had been initialized with 8
> set to the SPARE_AREA_SKIP_BYTES register, it would have
> been working with 8 to the SPARE_AREA_SKIP_BYTES.
> 
> The Boot ROM is the only (semi-)software that is unconfigurable by users,
> so the value of SPARE_AREA_SKIP_BYTES should be aligned with
> the boot ROM.
> I recommend you to check the spec of the boot ROM.
We boot from NOR flash. That's why i didn't see a problem booting probably.

> (The maintainer of the platform, Dihn is CC'ed,
> so I hope he will jump in)
Yes i hope so too.
 

> Second, I doubt 0 is a good value for SPARE_AREA_SKIP_BYTES.
> 
> As explained in commit log, SPARE_AREA_SKIP_BYTES==0 means
> the OOB is used for ECC without any offset.
> So, the BBM marked in the factory will be destroyed.
Oh my! Thats bad news.

> > But in
> > this case the patch assumes the default value 8 which is straight out 
> > wrong on this variant. Without this patch reverted all blocks of the nand
> > flash are beeing marked bad :-(.
> > 
> > When reverting the patch ba4a1b62a2d742df9e9c607ac53b3bf33496508f  i can
> > boot 4.19.10 again.
> > 
> > With 5.0 the it goes further down the drain and i didn't manage to boot it
> > even with the above patch reverted.
> > 
> > I also tried 5.3-rc7 with the above patch reverted and the variable t_x
> > dirty hacked to the value 0x1388 as i got the impression that the timing
> > calculation is off too. I still get an
> > interrupt error and boot failure:
> git-bisect is a general solution to pin point the problem.
> 
> BTW, if you end up with hacking the clock frequency, something is already
> wrong.
This was just a dirty hack to verify that this is the problem. 

> denali->clk_rate, denali->clk_x_rate should be 50MHz, 200MHz, respectively.
> 
> If not, please check the clock driver and your DT.
We include the device tree file for this chip directly from kernel sources.
Which means that we are using the settings which are within the kernel tree in

linux-5.3-rc8/arch/arm/boot/dts/socfpga.dtsi

The dts entries taken verbatim from the above file are:

nand0: nand@ff90 {
#address-cells = <0x1>;
#size-cells = <0x1>;
compatible = "altr,socfpga-denali-nand";
reg = <0xff90 0x10>,
  <0xffb8 0x1>;
reg-names = "nand_data", "denali_reg";
interrupts = <0x0 0x90 0x4>;
clocks = <&nand_clk>, <&nand_x_clk>, <&nand_ecc_clk>;
clock-names = "nand", "nand_x", "ecc";
resets = <&rst NAND_RESET>;
  

mtd raw nand denali.c broken for Intel/Altera Cyclone V

2019-09-06 Thread Tim Sander
Hi

I have noticed that there multiple breakages piling up for the denali nand
driver on the Intel/Altera Cyclone V. Unfortunately i had no time to track the
mainline kernel closely. So the breakage seems to pile up. I am a little
disapointed that Intel is not on the lookout that the kernel works on the
chips they are selling. I was really happy about the state of the platform
before concerning mainline support.

The failure starts with kernel 4.19 or stable kernel release 4.18.19. The
commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f. The problem here is that
our platform works with a zero in the SPARE_AREA_SKIP_BYTES register. But in
this case the patch assumes the default value 8 which is straight out  wrong
on this variant. Without this patch reverted all blocks of the nand flash are
beeing marked bad :-(.

When reverting the patch ba4a1b62a2d742df9e9c607ac53b3bf33496508f  i can boot
4.19.10 again.

With 5.0 the it goes further down the drain and i didn't manage to boot it
even with the above patch reverted.

I also tried 5.3-rc7 with the above patch reverted and the variable t_x dirty 
hacked to the
value 0x1388 as i got the impression that the timing calculation is off too. I 
still get an
interrupt error and boot failure:
[0.817588] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[0.823946] nand: Micron MT29F2G08ABAEAWP
[0.827965] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB 
size: 64
[1.887052] denali-nand-dt ff90.nand: timeout while waiting for irq 
0x1000
[2.911056] denali-nand-dt ff90.nand: timeout while waiting for irq 
0x1000

I have seen this https://lore.kernel.org/patchwork/patch/983055/ thread and
this might fix at least the 4.19 boot problem.

I would be really happy for hints how to get the Intel Cyclone V working again.

Best regards
Tim





Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-24 Thread Sander Eikelenboom
On 17/08/2019 18:35, Eric Dumazet wrote:
> 
> 
> On 8/17/19 10:24 AM, Sander Eikelenboom wrote:
>> On 12/08/2019 19:56, Eric Dumazet wrote:
>>>
>>>
>>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>>>> L.S.,
>>>>
>>>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
>>>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>>>> one of my Xen VM's (which gets quite some network load) crashed.
>>>> See below for the stacktrace.
>>>>
>>>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to 
>>>> be an option at the moment. 
>>>> I haven't encountered this on 5.2, so it seems to be an regression against 
>>>> 5.2.
>>>>
>>>> Any ideas ?
>>>>
>>>> --
>>>> Sander
>>>>
>>>>
>>>> [16930.653595] general protection fault:  [#1] SMP NOPTI
>>>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
>>>> 5.3.0-rc3-20190809-doflr+ #1
>>>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>>>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>>>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 
>>>> <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>>>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
>>>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>>>> 801b
>>>
>>> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
>>> kernel address)
>>>
>>> Look like one bit corruption maybe.
>>>
>>> Nothing comes to mind really between 5.2 and 53 that could explain this.
>>>
>>>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
>>>> 888016b00880
>>>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
>>>> 
>>>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
>>>> 05a0
>>>> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
>>>> 0004
>>>> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
>>>> knlGS:
>>>> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
>>>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>>>> 06f0
>>>> [16930.653993] Call Trace:
>>>> [16930.654005]  
>>>> [16930.654018]  tcp_ack+0xbb0/0x1230
>>>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>>>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>>>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>>>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>>>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>>>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>>>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>>>> [16930.654163]  ip_rcv+0x4c/0xd0
>>>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>>>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>>>> [16930.654219]  napi_gro_receive+0xe7/0x140
>>>> [16930.654237]  xennet_poll+0x9be/0xae0
>>>> [16930.654254]  net_rx_action+0x136/0x340
>>>> [16930.654271]  __do_softirq+0xdd/0x2cf
>>>> [16930.654287]  irq_exit+0x7a/0xa0
>>>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>>>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>>>> [16930.654339]  
>>>> [16930.654349] RIP: 0033:0x55de0d87db99
>>>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
>>>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a 
>>>> <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>>>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
>>>> ff0c
>>>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
>>>> 007f
>>>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
>>>> 0002
>>>> [16930.655062] RBP: 7fff R08: 80ea R09: 
>>>> 01f0
>>>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
>>>> 55de0f3e0f2a
>>>>

Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-17 Thread Sander Eikelenboom
On 12/08/2019 19:56, Eric Dumazet wrote:
> 
> 
> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>> L.S.,
>>
>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>> one of my Xen VM's (which gets quite some network load) crashed.
>> See below for the stacktrace.
>>
>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be 
>> an option at the moment. 
>> I haven't encountered this on 5.2, so it seems to be an regression against 
>> 5.2.
>>
>> Any ideas ?
>>
>> --
>> Sander
>>
>>
>> [16930.653595] general protection fault:  [#1] SMP NOPTI
>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
>> 5.3.0-rc3-20190809-doflr+ #1
>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
> 
> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
> kernel address)
> 
> Look like one bit corruption maybe.
> 
> Nothing comes to mind really between 5.2 and 53 that could explain this.
> 
>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
>> 
>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
>> 05a0
>> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
>> 0004
>> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
>> knlGS:
>> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>> 06f0
>> [16930.653993] Call Trace:
>> [16930.654005]  
>> [16930.654018]  tcp_ack+0xbb0/0x1230
>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [16930.654163]  ip_rcv+0x4c/0xd0
>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>> [16930.654219]  napi_gro_receive+0xe7/0x140
>> [16930.654237]  xennet_poll+0x9be/0xae0
>> [16930.654254]  net_rx_action+0x136/0x340
>> [16930.654271]  __do_softirq+0xdd/0x2cf
>> [16930.654287]  irq_exit+0x7a/0xa0
>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>> [16930.654339]  
>> [16930.654349] RIP: 0033:0x55de0d87db99
>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 
>> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
>> ff0c
>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
>> 007f
>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
>> 0002
>> [16930.655062] RBP: 7fff R08: 80ea R09: 
>> 01f0
>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
>> 55de0f3e0f2a
>> [16930.655116] R13: 0010 R14: 7f16 R15: 
>> 0080
>> [16930.655144] Modules linked in:
>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
>> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.655387] RBP: 888016b00880 R08

Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-12 Thread Sander Eikelenboom
On 12/08/2019 19:56, Eric Dumazet wrote:
> 
> 
> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>> L.S.,
>>
>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest 
>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>> one of my Xen VM's (which gets quite some network load) crashed.
>> See below for the stacktrace.
>>
>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be 
>> an option at the moment. 
>> I haven't encountered this on 5.2, so it seems to be an regression against 
>> 5.2.
>>
>> Any ideas ?
>>
>> --
>> Sander
>>
>>
>> [16930.653595] general protection fault:  [#1] SMP NOPTI
>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
>> 5.3.0-rc3-20190809-doflr+ #1
>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 
>> 801b
> 
> crash in " mov0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid 
> kernel address)
> 
> Look like one bit corruption maybe.
> 
> Nothing comes to mind really between 5.2 and 53 that could explain this.

Hi Eric,

Hmm could be it's a rare coincidence, sp that it just never occurred on pre 5.3 
by chance.
Let's wait and see if it reoccurs, will report back if it does.

Thanks for your explanation.

--
Sander


>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 
>> 888016b00880
>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: 
>> 
>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 
>> 05a0
>> [16930.653875] R13: 0001 R14: bfe62d46 R15: 
>> 0004
>> [16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
>> knlGS:
>> [16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 
>> 06f0
>> [16930.653993] Call Trace:
>> [16930.654005]  
>> [16930.654018]  tcp_ack+0xbb0/0x1230
>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [16930.654163]  ip_rcv+0x4c/0xd0
>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>> [16930.654219]  napi_gro_receive+0xe7/0x140
>> [16930.654237]  xennet_poll+0x9be/0xae0
>> [16930.654254]  net_rx_action+0x136/0x340
>> [16930.654271]  __do_softirq+0xdd/0x2cf
>> [16930.654287]  irq_exit+0x7a/0xa0
>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>> [16930.654339]  
>> [16930.654349] RIP: 0033:0x55de0d87db99
>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 
>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 
>> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
>> ff0c
>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 
>> 007f
>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 
>> 0002
>> [16930.655062] RBP: 7fff R08: 80ea R09: 
>> 01f0
>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 
>> 55de0f3e0f2a
>> [16930.655116] R13: 0010 R14: 7f16 R15: 
>> 0080
>> [16930.655144] Modules linked in:
>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 
>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 
>> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
>> [16930.655331] RAX: f

5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0

2019-08-12 Thread Sander Eikelenboom
L.S.,

While testing a somewhere-after-5.3-rc3 kernel (which included the latest net 
merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
one of my Xen VM's (which gets quite some network load) crashed.
See below for the stacktrace.

Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an 
option at the moment. 
I haven't encountered this on 5.2, so it seems to be an regression against 5.2.

Any ideas ?

--
Sander


[16930.653595] general protection fault:  [#1] SMP NOPTI
[16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 
5.3.0-rc3-20190809-doflr+ #1
[16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
[16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 
53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 
66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
[16930.653741] RSP: :c9003ad8 EFLAGS: 00010286
[16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 801b
[16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 888016b00880
[16930.653819] RBP: 888016b00880 R08: 0001 R09: 
[16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 05a0
[16930.653875] R13: 0001 R14: bfe62d46 R15: 0004
[16930.653913] FS:  7fe71fe2cb80() GS:88801f20() 
knlGS:
[16930.653943] CS:  0010 DS:  ES:  CR0: 80050033
[16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 06f0
[16930.653993] Call Trace:
[16930.654005]  
[16930.654018]  tcp_ack+0xbb0/0x1230
[16930.654033]  tcp_rcv_established+0x2e8/0x630
[16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
[16930.654070]  tcp_v4_rcv+0xac9/0xcb0
[16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
[16930.654109]  ip_local_deliver_finish+0x3f/0x50
[16930.654128]  ip_local_deliver+0x4d/0xe0
[16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[16930.654163]  ip_rcv+0x4c/0xd0
[16930.654179]  __netif_receive_skb_one_core+0x79/0x90
[16930.654200]  netif_receive_skb_internal+0x2a/0xa0
[16930.654219]  napi_gro_receive+0xe7/0x140
[16930.654237]  xennet_poll+0x9be/0xae0
[16930.654254]  net_rx_action+0x136/0x340
[16930.654271]  __do_softirq+0xdd/0x2cf
[16930.654287]  irq_exit+0x7a/0xa0
[16930.654304]  xen_evtchn_do_upcall+0x27/0x40
[16930.654320]  xen_hvm_callback_vector+0xf/0x20
[16930.654339]  
[16930.654349] RIP: 0033:0x55de0d87db99
[16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 
eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 
38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
[16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: 
ff0c
[16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 007f
[16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 0002
[16930.655062] RBP: 7fff R08: 80ea R09: 01f0
[16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 55de0f3e0f2a
[16930.655116] R13: 0010 R14: 7f16 R15: 0080
[16930.655144] Modules linked in:
[16930.655200] ---[ end trace 533367c95501b645 ]---
[16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
[16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 
53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 
66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
[16930.655312] RSP: :c9003ad8 EFLAGS: 00010286
[16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 801b
[16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 888016b00880
[16930.655387] RBP: 888016b00880 R08: 0001 R09: 
[16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 05a0
[16930.655441] R13: 0001 R14: bfe62d46 R15: 0004
[16930.655475] FS:  7fe71fe2cb80() GS:88801f20() 
knlGS:
[16930.655502] CS:  0010 DS:  ES:  CR0: 80050033
[16930.655525] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 06f0
[16930.63] Kernel panic - not syncing: Fatal exception in interrupt
[16930.655789] Kernel Offset: disabled


Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0

2019-08-09 Thread Sander Eikelenboom
On 08/08/2019 12:21, Paolo Valente wrote:
> 
> 
>> Il giorno 8 ago 2019, alle ore 12:21, Sander Eikelenboom 
>>  ha scritto:
>>
>> On 08/08/2019 11:10, Paolo Valente wrote:
>>>
>>>
>>>> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom 
>>>>  ha scritto:
>>>>
>>>> L.S.,
>>>>
>>>> While testing a linux 5.3-rc3 kernel on my Xen server I come across the 
>>>> splat below when trying to shutdown all the VM's.
>>>> This is after the server has ran for a few days without any problem. It 
>>>> seems to happen consistently.
>>>>
>>>> It seems it's in the same area as 
>>>> dbc3117d4ca9e17819ac73501e914b8422686750, but already rc3 incorporates 
>>>> that patch.
>>>>
>>>> Any ideas ?
>>>>
>>>
>>> Could you try these fixes I proposed yesterday:
>>> https://lkml.org/lkml/2019/8/7/536
>>> or, on patchwork:
>>> https://patchwork.kernel.org/patch/11082247/
>>> https://patchwork.kernel.org/patch/11082249/
>>
>> Hi Paolo,
>>
>> These two above seem to fix the issue !
>> So thanks for the swift reply (and the patchwork links for easy
>> downloading the patches).
>>
>> I will test the third unrelated patch as well, but if you don't hear
>> back , it's all good.
>>
> 
> Great! Thank you for offering to test also the other patch. Tested-by are 
> welcome too :)

Hi,

Haven't seen any problems with the patch so far, but haven't tested it
on constraint memory, so i don't think a tested-by is justified in this
case.

--
Sander

> Thanks,
> Paolo
> 
>> Thanks again !
>>
>> --
>> Sander
>>
>>> I posted a further fix too, which should be unrelated. But, just in case:
>>> https://lkml.org/lkml/2019/8/7/715
>>> or, on patchwork:
>>> https://patchwork.kernel.org/patch/11082521/
>>>
>>> Crossing my fingers (and think you for reporting this),
>>> Paolo
>>>
>>>> --
>>>> Sander
>>>>
>>>>
>>>> [80915.716048] BUG: unable to handle page fault for address: 
>>>> 1008
>>>> [80915.724188] #PF: supervisor write access in kernel mode
>>>> [80915.733182] #PF: error_code(0x0002) - not-present page
>>>> [80915.741455] PGD 0 P4D 0 
>>>> [80915.750538] Oops: 0002 [#1] SMP NOPTI
>>>> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW
>>>>  5.3.0-rc3-20190807-doflr+ #1
>>>> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>>>> V1.8B1 09/13/2010
>>>> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
>>>> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 
>>>> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 
>>>> <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
>>>> [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006
>>>> [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: 
>>>> 888076c4a9f8
>>>> [80915.810254] device vif17.0 left promiscuous mode
>>>> [80915.811906] RDX: 1000 RSI: 1000 RDI: 
>>>> 
>>>> [80915.811908] RBP: 888077efc398 R08: 0004 R09: 
>>>> 81106800
>>>> [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: 
>>>> 888005256bf0
>>>> [80915.811909] R13:  R14: 888005256800 R15: 
>>>> 82a6a3c0
>>>> [80915.811919] FS:  7f1c30a8dbc0() GS:88807d50() 
>>>> knlGS:
>>>> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state
>>>> [80915.826569] CS:  1e030 DS:  ES:  CR0: 80050033
>>>> [80915.826571] CR2: 1008 CR3: 5d9d CR4: 
>>>> 0660
>>>> [80915.826575] Call Trace:
>>>> [80915.826592]  bfq_exit_icq+0xe/0x20
>>>> [80915.826595]  put_io_context_active+0x52/0x80
>>>> [80915.826599]  do_exit+0x774/0xac0
>>>> [80915.906037]  ? xen_blkif_be_int+0x30/0x30
>>>> [80915.913311]  kthread+0xda/0x130
>>>> [80915.920398]  ? kthread_park+0x80/0x80
>>>> [80915.927524]  ret_from_fork+0x22/0x40
>>>> [80915.934512] Modules linked in:
>>>> [8

Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0

2019-08-08 Thread Sander Eikelenboom
On 08/08/2019 11:10, Paolo Valente wrote:
> 
> 
>> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom 
>>  ha scritto:
>>
>> L.S.,
>>
>> While testing a linux 5.3-rc3 kernel on my Xen server I come across the 
>> splat below when trying to shutdown all the VM's.
>> This is after the server has ran for a few days without any problem. It 
>> seems to happen consistently.
>>
>> It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, 
>> but already rc3 incorporates that patch.
>>
>> Any ideas ?
>>
> 
> Could you try these fixes I proposed yesterday:
> https://lkml.org/lkml/2019/8/7/536
> or, on patchwork:
> https://patchwork.kernel.org/patch/11082247/
> https://patchwork.kernel.org/patch/11082249/

Hi Paolo,

These two above seem to fix the issue !
So thanks for the swift reply (and the patchwork links for easy
downloading the patches).

I will test the third unrelated patch as well, but if you don't hear
back , it's all good.

Thanks again !

--
Sander

> I posted a further fix too, which should be unrelated. But, just in case:
> https://lkml.org/lkml/2019/8/7/715
> or, on patchwork:
> https://patchwork.kernel.org/patch/11082521/
> 
> Crossing my fingers (and think you for reporting this),
> Paolo
> 
>> --
>> Sander
>>
>>
>> [80915.716048] BUG: unable to handle page fault for address: 1008
>> [80915.724188] #PF: supervisor write access in kernel mode
>> [80915.733182] #PF: error_code(0x0002) - not-present page
>> [80915.741455] PGD 0 P4D 0 
>> [80915.750538] Oops: 0002 [#1] SMP NOPTI
>> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW 
>> 5.3.0-rc3-20190807-doflr+ #1
>> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
>> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 
>> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 
>> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
>> [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006
>> [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: 
>> 888076c4a9f8
>> [80915.810254] device vif17.0 left promiscuous mode
>> [80915.811906] RDX: 1000 RSI: 1000 RDI: 
>> 
>> [80915.811908] RBP: 888077efc398 R08: 0004 R09: 
>> 81106800
>> [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: 
>> 888005256bf0
>> [80915.811909] R13:  R14: 888005256800 R15: 
>> 82a6a3c0
>> [80915.811919] FS:  7f1c30a8dbc0() GS:88807d50() 
>> knlGS:
>> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state
>> [80915.826569] CS:  1e030 DS:  ES:  CR0: 80050033
>> [80915.826571] CR2: 1008 CR3: 5d9d CR4: 
>> 0660
>> [80915.826575] Call Trace:
>> [80915.826592]  bfq_exit_icq+0xe/0x20
>> [80915.826595]  put_io_context_active+0x52/0x80
>> [80915.826599]  do_exit+0x774/0xac0
>> [80915.906037]  ? xen_blkif_be_int+0x30/0x30
>> [80915.913311]  kthread+0xda/0x130
>> [80915.920398]  ? kthread_park+0x80/0x80
>> [80915.927524]  ret_from_fork+0x22/0x40
>> [80915.934512] Modules linked in:
>> [80915.941412] CR2: 1008
>> [80915.948221] ---[ end trace 61315493e0f8ef40 ]---
>> [80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
>> [80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 
>> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 
>> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
>> [80915.976124] RSP: e02b:c9000473be28 EFLAGS: 00010006
>> [80915.983205] RAX: 888070393200 RBX: 888076c4a800 RCX: 
>> 888076c4a9f8
>> [80915.990321] RDX: 1000 RSI: 1000 RDI: 
>> 
>> [80915.997319] RBP: 888077efc398 R08: 0004 R09: 
>> 81106800
>> [80916.004427] R10: 88807804ca40 R11: c9000473be31 R12: 
>> 888005256bf0
>> [80916.011525] R13:  R14: 888005256800 R15: 
>> 82a6a3c0
>> [80916.018679] FS:  7f1c30a8dbc0() GS:88807d50() 
>> knlGS:
>> [80916.025897] CS:  1e030 DS:  ES:  CR0: 80050033
>> [80916.033116] CR2: 1008 CR3: 5d9d CR4: 
>> 0660
>> [80916.040348] Fixing recursive fault but reboot is needed!
> 



RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0

2019-08-08 Thread Sander Eikelenboom
L.S.,

While testing a linux 5.3-rc3 kernel on my Xen server I come across the splat 
below when trying to shutdown all the VM's.
This is after the server has ran for a few days without any problem. It seems 
to happen consistently.

It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, but 
already rc3 incorporates that patch.

Any ideas ?

--
Sander


[80915.716048] BUG: unable to handle page fault for address: 1008
[80915.724188] #PF: supervisor write access in kernel mode
[80915.733182] #PF: error_code(0x0002) - not-present page
[80915.741455] PGD 0 P4D 0 
[80915.750538] Oops: 0002 [#1] SMP NOPTI
[80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW 
5.3.0-rc3-20190807-doflr+ #1
[80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
[80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 
01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 
08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
[80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006
[80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: 888076c4a9f8
[80915.810254] device vif17.0 left promiscuous mode
[80915.811906] RDX: 1000 RSI: 1000 RDI: 
[80915.811908] RBP: 888077efc398 R08: 0004 R09: 81106800
[80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: 888005256bf0
[80915.811909] R13:  R14: 888005256800 R15: 82a6a3c0
[80915.811919] FS:  7f1c30a8dbc0() GS:88807d50() 
knlGS:
[80915.819456] xen_bridge: port 18(vif17.0) entered disabled state
[80915.826569] CS:  1e030 DS:  ES:  CR0: 80050033
[80915.826571] CR2: 1008 CR3: 5d9d CR4: 0660
[80915.826575] Call Trace:
[80915.826592]  bfq_exit_icq+0xe/0x20
[80915.826595]  put_io_context_active+0x52/0x80
[80915.826599]  do_exit+0x774/0xac0
[80915.906037]  ? xen_blkif_be_int+0x30/0x30
[80915.913311]  kthread+0xda/0x130
[80915.920398]  ? kthread_park+0x80/0x80
[80915.927524]  ret_from_fork+0x22/0x40
[80915.934512] Modules linked in:
[80915.941412] CR2: 1008
[80915.948221] ---[ end trace 61315493e0f8ef40 ]---
[80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
[80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 
01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 
08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00
[80915.976124] RSP: e02b:c9000473be28 EFLAGS: 00010006
[80915.983205] RAX: 888070393200 RBX: 888076c4a800 RCX: 888076c4a9f8
[80915.990321] RDX: 1000 RSI: 1000 RDI: 
[80915.997319] RBP: 888077efc398 R08: 0004 R09: 81106800
[80916.004427] R10: 88807804ca40 R11: c9000473be31 R12: 888005256bf0
[80916.011525] R13:  R14: 888005256800 R15: 82a6a3c0
[80916.018679] FS:  7f1c30a8dbc0() GS:88807d50() 
knlGS:
[80916.025897] CS:  1e030 DS:  ES:  CR0: 80050033
[80916.033116] CR2: 1008 CR3: 5d9d CR4: 0660
[80916.040348] Fixing recursive fault but reboot is needed!


Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-10 Thread Sander Eikelenboom
On 10/02/2019 12:44, Heiner Kallweit wrote:
> On 10.02.2019 10:16, Sander Eikelenboom wrote:
>> On 09/02/2019 12:50, Heiner Kallweit wrote:
>>> On 09.02.2019 11:07, Sander Eikelenboom wrote:
>>>> On 09/02/2019 10:59, Heiner Kallweit wrote:
>>>>> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>>>>>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>>>>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>>>>>> L.S.,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top 
>>>>>>>>>>>>>> but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and 
>>>>>>>>>>>>>> bisecting could be nasty due to another (networking related) 
>>>>>>>>>>>>>> kernel bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please 
>>>>>>>>>>>>>> feel free to ask.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 
>>>>>>>>>>>>> driver between
>>>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root 
>>>>>>>>>>>>> cause could
>>>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm i did some diging and i think:
>>>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded 
>>>>>>>>>>>> mmiowb barriers
>>>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of 
>>>>>>>>>>>> xmit_more and __netdev_sent_queue
>>>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add 
>>>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>>>>>
>>>>>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I 
>>>>>>>>>>> haven't heard about
>>>>>>>>>>> this issue from any user of physical hw. And due to the fact that a 
>>>>>>>>>>> lot of mainboards
>>>>>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>>>>>> Does the issue occur under specific circumstances like very high 
>>>>>>>>>>> load?
>>>>>>>>>>
>>>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I 
>>>>>>>>>> remember correctly it occurred while kernel compiling
>>>>>>>>>> on the host.
>>>>>>>>>>
>>>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to 
>>>>>>>>>>> involve Eric Dumazet
>>>>>>>>>>> as author of the underlying changes.
>>>>>>>>>>
>>>>>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>>&g

Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-10 Thread Sander Eikelenboom
On 10/02/2019 12:44, Heiner Kallweit wrote:
> On 10.02.2019 10:16, Sander Eikelenboom wrote:
>> On 09/02/2019 12:50, Heiner Kallweit wrote:
>>> On 09.02.2019 11:07, Sander Eikelenboom wrote:
>>>> On 09/02/2019 10:59, Heiner Kallweit wrote:
>>>>> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>>>>>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>>>>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>>>>>> L.S.,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top 
>>>>>>>>>>>>>> but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and 
>>>>>>>>>>>>>> bisecting could be nasty due to another (networking related) 
>>>>>>>>>>>>>> kernel bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please 
>>>>>>>>>>>>>> feel free to ask.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 
>>>>>>>>>>>>> driver between
>>>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root 
>>>>>>>>>>>>> cause could
>>>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm i did some diging and i think:
>>>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded 
>>>>>>>>>>>> mmiowb barriers
>>>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of 
>>>>>>>>>>>> xmit_more and __netdev_sent_queue
>>>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add 
>>>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>>>>>
>>>>>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I 
>>>>>>>>>>> haven't heard about
>>>>>>>>>>> this issue from any user of physical hw. And due to the fact that a 
>>>>>>>>>>> lot of mainboards
>>>>>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>>>>>> Does the issue occur under specific circumstances like very high 
>>>>>>>>>>> load?
>>>>>>>>>>
>>>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I 
>>>>>>>>>> remember correctly it occurred while kernel compiling
>>>>>>>>>> on the host.
>>>>>>>>>>
>>>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to 
>>>>>>>>>>> involve Eric Dumazet
>>>>>>>>>>> as author of the underlying changes.
>>>>>>>>>>
>>>>>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>>&g

Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-10 Thread Sander Eikelenboom
On 09/02/2019 12:50, Heiner Kallweit wrote:
> On 09.02.2019 11:07, Sander Eikelenboom wrote:
>> On 09/02/2019 10:59, Heiner Kallweit wrote:
>>> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>>>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>>>
>>>>>>
>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>>>> L.S.,
>>>>>>>>>>>>
>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but 
>>>>>>>>>>>> they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and 
>>>>>>>>>>>> bisecting could be nasty due to another (networking related) 
>>>>>>>>>>>> kernel bug.
>>>>>>>>>>>>
>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please 
>>>>>>>>>>>> feel free to ask.
>>>>>>>>>>>>
>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver 
>>>>>>>>>>> between
>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root 
>>>>>>>>>>> cause could
>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>>>
>>>>>>>>>> Hmm i did some diging and i think:
>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded 
>>>>>>>>>> mmiowb barriers
>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of 
>>>>>>>>>> xmit_more and __netdev_sent_queue
>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add 
>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>>>
>>>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I 
>>>>>>>>> haven't heard about
>>>>>>>>> this issue from any user of physical hw. And due to the fact that a 
>>>>>>>>> lot of mainboards
>>>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>>>>
>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I 
>>>>>>>> remember correctly it occurred while kernel compiling
>>>>>>>> on the host.
>>>>>>>>
>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to 
>>>>>>>>> involve Eric Dumazet
>>>>>>>>> as author of the underlying changes.
>>>>>>>>
>>>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>>>
>>>>>>> The barriers were removed after adding xmit_more handling. Therefore it 
>>>>>>> would be good to
>>>>>>> test also with only 
>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>>>>>>> barriers
>>>>>>> removed.
>>>>>>>
>>>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>>>>
>>>>>>> Sure, thanks.
>>>>>>>

Re: Linux 5.0 regression: BUG: unable to handle kernel paging request at ffff888023e26778 RIP: e030:move_page_tables+0x7c1/0xae0

2019-02-09 Thread Sander Eikelenboom
On 09/02/2019 19:48, Juergen Gross wrote:
> On 09/02/2019 19:45, Sander Eikelenboom wrote:
>> On 09/02/2019 09:26, Sander Eikelenboom wrote:
>>> L.S.,
>>>
>>>
>>> While testing a Linux 5.0-rc5-ish kernel (pull of yesterday) with some 
>>> additional patches for
>>> already reported other issues i came across the issue below which i haven't 
>>> seen with 4.20.x
>>>
>>> I haven't got a reproducer so i might be hard to hit it again, 
>>> system is AMD and this is from the host kernel running under
>>> the Xen hypervisor might it matter.
>>
>>> --
>>>
>>> Sander
>>
>> Hi Boris / Juergen,
>>
>> The commit causing this is:
>> 2c91bd4a4e2e530582d6fd643ea7b86b27907151 mm: speed up mremap by 20x on large 
>> regions
>>
>> Since it seems there haven't been any other reports about this .. 
>> could it be this doesn't specifically work well with a Xen PVH dom0 ?
> 
> PVH? Not PV?

Ah sorry, indeed PV !

> 
> Juergen
> 



Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-09 Thread Sander Eikelenboom
On 09/02/2019 10:59, Heiner Kallweit wrote:
> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>> L.S.,
>>>>>>>>>>
>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but 
>>>>>>>>>> they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>
>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and 
>>>>>>>>>> bisecting could be nasty due to another (networking related) kernel 
>>>>>>>>>> bug.
>>>>>>>>>>
>>>>>>>>>> If you need more info, want me to run a debug patch etc., please 
>>>>>>>>>> feel free to ask.
>>>>>>>>>>
>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver 
>>>>>>>>> between
>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause 
>>>>>>>>> could
>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>
>>>>>>>> Hmm i did some diging and i think:
>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>>>>>>>> barriers
>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more 
>>>>>>>> and __netdev_sent_queue
>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add 
>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>
>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't 
>>>>>>> heard about
>>>>>>> this issue from any user of physical hw. And due to the fact that a lot 
>>>>>>> of mainboards
>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>>
>>>>>> Yep, the box is already quite contented with the Xen VM's and if I 
>>>>>> remember correctly it occurred while kernel compiling
>>>>>> on the host.
>>>>>>
>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to 
>>>>>>> involve Eric Dumazet
>>>>>>> as author of the underlying changes.
>>>>>>
>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>
>>>>> The barriers were removed after adding xmit_more handling. Therefore it 
>>>>> would be good to
>>>>> test also with only 
>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>>>>> barriers
>>>>> removed.
>>>>>
>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>>
>>>>> Sure, thanks.
>>>>>
>>>>>> BTW am i correct these patches are merely optimizations ?
>>>>>
>>>>> Yes
>>>>>
>>>>>> If so and concluding they revert cleanly, perhaps it should be 
>>>>>> considered at this point in the RC's
>>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>>
>>>>> Before removing both it would be good to test with only the 
>>>>> barrier-removal removed.
>>>>>
>>>>
>>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169

Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-09 Thread Sander Eikelenboom
On 09/02/2019 10:02, Heiner Kallweit wrote:
> On 09.02.2019 00:09, Eric Dumazet wrote:
>>
>>
>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>> L.S.,
>>>>>>>>
>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but 
>>>>>>>> they don't seem related) under Xen i the nasty splat below, 
>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>
>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting 
>>>>>>>> could be nasty due to another (networking related) kernel bug.
>>>>>>>>
>>>>>>>> If you need more info, want me to run a debug patch etc., please feel 
>>>>>>>> free to ask.
>>>>>>>>
>>>>>>> Thanks for the report. However I see no change in the r8169 driver 
>>>>>>> between
>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause 
>>>>>>> could
>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>
>>>>>> Hmm i did some diging and i think:
>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>>>>>> barriers
>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more 
>>>>>> and __netdev_sent_queue
>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add 
>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>
>>>>> You're right. Thought this was added in 4.20 already.
>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't 
>>>>> heard about
>>>>> this issue from any user of physical hw. And due to the fact that a lot 
>>>>> of mainboards
>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>> Does the issue occur under specific circumstances like very high load?
>>>>
>>>> Yep, the box is already quite contented with the Xen VM's and if I 
>>>> remember correctly it occurred while kernel compiling
>>>> on the host.
>>>>
>>>>> If indeed the xmit_more patch causes the issue, I think we have to 
>>>>> involve Eric Dumazet
>>>>> as author of the underlying changes.
>>>>
>>>> It could also be the barriers weren't that unneeded as assumed.
>>>
>>> The barriers were removed after adding xmit_more handling. Therefore it 
>>> would be good to
>>> test also with only 
>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>>> barriers
>>> removed.
>>>
>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>
>>> Sure, thanks.
>>>
>>>> BTW am i correct these patches are merely optimizations ?
>>>
>>> Yes
>>>
>>>> If so and concluding they revert cleanly, perhaps it should be considered 
>>>> at this point in the RC's
>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>
>>> Before removing both it would be good to test with only the barrier-removal 
>>> removed.
>>>
>>
>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more 
>> and __netdev_sent_queue
>> looks buggy to me, since the skb might have been freed already on another 
>> cpu when you call
>>
>> You could try :
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c 
>> b/drivers/net/ethernet/realtek/r8169.c
>> index 
>> 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93
>>  100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff 
>> *skb,
>> dma_addr_t mapping;
>> u32 opts[2], len;
>> bool stop_queue;
>> +

Linux 5.0 regression: BUG: unable to handle kernel paging request at ffff888023e26778

2019-02-09 Thread Sander Eikelenboom
L.S.,


While testing a Linux 5.0-rc5-ish kernel (pull of yesterday) with some 
additional patches for
already reported other issues i came across the issue below which i haven't 
seen with 4.20.x

I haven't got a reproducer so i might be hard to hit it again, 
system is AMD and this is from the host kernel running under
the Xen hypervisor might it matter.

--

Sander


[17035.016433] BUG: unable to handle kernel paging request at 888023e26778
[17035.025887] #PF error: [PROT] [WRITE]
[17035.035146] PGD 2a2a067 P4D 2a2a067 PUD 2a2b067 PMD 7fe01067 PTE 
801023e26065
[17035.044371] Oops: 0003 [#1] SMP NOPTI
[17035.053720] CPU: 3 PID: 28310 Comm: apt-get Not tainted 
5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
[17035.063440] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[17035.072635] RIP: e030:move_page_tables+0x7c1/0xae0
[17035.081585] Code: ce 00 48 8b 03 31 ff 48 89 44 24 20 e8 9e 72 e4 ff 66 90 
48 89 c6 48 89 df e8 8b 89 e4 ff 66 90 48 8b 44 24 20 b9 0c 00 00 00 <48> 89 45 
00 41 f6 46 52 40 0f 85 3f 02 00 00 49 8b 7e 40 45 31 c0
[17035.100225] RSP: e02b:c9f2bd40 EFLAGS: 00010282
[17035.109208] RAX: 000475e42067 RBX: 888023e267e0 RCX: 000c
[17035.118332] RDX:  RSI:  RDI: 0201
[17035.127378] RBP: 888023e26778 R08:  R09: 00051c1d9000
[17035.136310] R10: deadbeefdeadf00d R11: 88807fc17000 R12: 7fc59fa0
[17035.145433] R13: ea8f89a8 R14: 88801c2286c0 R15: 7fc59f80
[17035.154171] FS:  7fc5a5591100() GS:88807d4c() 
knlGS:
[17035.162730] CS:  e030 DS:  ES:  CR0: 80050033
[17035.171180] CR2: 888023e26778 CR3: 1c3f6000 CR4: 0660
[17035.179545] Call Trace:
[17035.187736]  move_vma.isra.3+0xd1/0x2d0
[17035.195837]  __se_sys_mremap+0x3c6/0x5b0
[17035.203986]  do_syscall_64+0x49/0x100
[17035.212109]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[17035.219971] RIP: 0033:0x7fc5a453527a
[17035.227558] Code: 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64 89 01 48 83 c8 ff 
c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d ee fb 2a 00 f7 d8 64 89 01 48
[17035.243255] RSP: 002b:7ffda22d96f8 EFLAGS: 0246 ORIG_RAX: 
0019
[17035.251121] RAX: ffda RBX: 557d40923a30 RCX: 7fc5a453527a
[17035.258986] RDX: 01a0 RSI: 0190 RDI: 7fc59f7ff000
[17035.267127] RBP: 01a0 R08: 0020 R09: 0040
[17035.275259] R10: 0001 R11: 0246 R12: 7fc59f7ff060
[17035.282681] R13: 7fc59f7ff000 R14: 557d40923a30 R15: 557d40829aa0
[17035.290322] Modules linked in:
[17035.297875] CR2: 888023e26778
[17035.305405] ---[ end trace 6ff49f09286816b6 ]---
[17035.313131] RIP: e030:move_page_tables+0x7c1/0xae0
[17035.320326] Code: ce 00 48 8b 03 31 ff 48 89 44 24 20 e8 9e 72 e4 ff 66 90 
48 89 c6 48 89 df e8 8b 89 e4 ff 66 90 48 8b 44 24 20 b9 0c 00 00 00 <48> 89 45 
00 41 f6 46 52 40 0f 85 3f 02 00 00 49 8b 7e 40 45 31 c0
[17035.334851] RSP: e02b:c9f2bd40 EFLAGS: 00010282
[17035.341727] RAX: 000475e42067 RBX: 888023e267e0 RCX: 000c
[17035.348838] RDX:  RSI:  RDI: 0201
[17035.356000] RBP: 888023e26778 R08:  R09: 00051c1d9000
[17035.363623] R10: deadbeefdeadf00d R11: 88807fc17000 R12: 7fc59fa0
[17035.371454] R13: ea8f89a8 R14: 88801c2286c0 R15: 7fc59f80
[17035.378958] FS:  7fc5a5591100() GS:88807d4c() 
knlGS:
[17035.386585] CS:  e030 DS:  ES:  CR0: 80050033
[17035.393797] CR2: 888023e26778 CR3: 1c3f6000 CR4: 0660





Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-08 Thread Sander Eikelenboom
On 08/02/2019 22:50, Heiner Kallweit wrote:
> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>> L.S.,
>>>>>>
>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they 
>>>>>> don't seem related) under Xen i the nasty splat below, 
>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>
>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting 
>>>>>> could be nasty due to another (networking related) kernel bug.
>>>>>>
>>>>>> If you need more info, want me to run a debug patch etc., please feel 
>>>>>> free to ask.
>>>>>>
>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause 
>>>>> could
>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>
>>>> Hmm i did some diging and i think:
>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>>>> barriers
>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and 
>>>> __netdev_sent_queue
>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add 
>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>
>>> You're right. Thought this was added in 4.20 already.
>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't 
>>> heard about
>>> this issue from any user of physical hw. And due to the fact that a lot of 
>>> mainboards
>>> have onboard Realtek network I have quite a few testers out there.
>>> Does the issue occur under specific circumstances like very high load?
>>
>> Yep, the box is already quite contented with the Xen VM's and if I remember 
>> correctly it occurred while kernel compiling
>> on the host.
>>
>>> If indeed the xmit_more patch causes the issue, I think we have to involve 
>>> Eric Dumazet
>>> as author of the underlying changes.
>>
>> It could also be the barriers weren't that unneeded as assumed.
> 
> The barriers were removed after adding xmit_more handling. Therefore it would 
> be good to
> test also with only 
> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
> barriers
> removed.

*arghh* *grmbl*

with both:
bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3
and
2e6eedb4813e34d8d84ac0eb3afb668966f3f356 
reverted i get yet another splat:

[ 3769.246083] ld: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), 
nodemask=(null),cpuset=/,mems_allowed=0
[ 3769.246095] CPU: 2 PID: 3201 Comm: ld Not tainted 
5.0.0-rc5-20190208-thp-net-florian-rtl8169-doflr+ #1
[ 3769.246096] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[ 3769.246098] Call Trace:
[ 3769.246104]  
[ 3769.246114]  dump_stack+0x5c/0x7b
[ 3769.246120]  warn_alloc+0x103/0x190
[ 3769.246122]  __alloc_pages_nodemask+0xe3d/0xe80
[ 3769.246128]  ? inet_gro_receive+0x232/0x2c0
[ 3769.246130]  page_frag_alloc+0x117/0x150
[ 3769.246132]  __napi_alloc_skb+0x83/0xd0
[ 3769.246137]  rtl8169_poll+0x210/0x640
[ 3769.246140]  net_rx_action+0x23d/0x370
[ 3769.246145]  __do_softirq+0xed/0x229
[ 3769.246149]  irq_exit+0xb7/0xc0
[ 3769.246152]  xen_evtchn_do_upcall+0x27/0x40
[ 3769.246154]  xen_do_hypervisor_callback+0x29/0x40
[ 3769.246155]  
[ 3769.246161] RIP: e030:__pv_queued_spin_lock_slowpath+0xda/0x280
[ 3769.246163] Code: 14 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f 
b6 c0 48 89 04 24 c6 45 14 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 
0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
[ 3769.246164] RSP: e02b:c90005b0f780 EFLAGS: 0202
[ 3769.246166] RAX: 0001 RBX: 8880047c9200 RCX: 0001
[ 3769.246167] RDX: 7d75 RSI:  RDI: 8880047c9200
[ 3769.246167] RBP: 88807d4a1a80 R08: c90005b0f978 R09: c90005b0f978
[ 3769.246168] R10: c90005b0f9d0 R11: 88807fc17000 R12: 0001
[ 3769.246169] R13: 0100 R14:  R15: 000c
[ 3769.246173]  _raw_spin_lock+0x16/0x20
[ 3769.246176]  list_lru_add+0x59/0x170
[ 3769.246179]  inode_lru_list_add+0x1b/0x40
[ 3769.246182]  iput+0x18b/0x1a0
[ 3769.246184]  __dentry_kill+0xc5/0x170
[

Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-08 Thread Sander Eikelenboom
On 08/02/2019 22:22, Heiner Kallweit wrote:
> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>> L.S.,
>>>>
>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they 
>>>> don't seem related) under Xen i the nasty splat below, 
>>>> that I haven encountered with Linux 4.20.x.
>>>>
>>>> Unfortunately I haven't got a clear reproducer for this and bisecting 
>>>> could be nasty due to another (networking related) kernel bug.
>>>>
>>>> If you need more info, want me to run a debug patch etc., please feel free 
>>>> to ask.
>>>>
>>> Thanks for the report. However I see no change in the r8169 driver between
>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>
>> Hmm i did some diging and i think:
>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb 
>> barriers
>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and 
>> __netdev_sent_queue
>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue 
>> as variant of __netdev_tx_sent_queue
>>
> You're right. Thought this was added in 4.20 already.
> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard 
> about
> this issue from any user of physical hw. And due to the fact that a lot of 
> mainboards
> have onboard Realtek network I have quite a few testers out there.
> Does the issue occur under specific circumstances like very high load?

Yep, the box is already quite contented with the Xen VM's and if I remember 
correctly it occurred while kernel compiling
on the host.

> If indeed the xmit_more patch causes the issue, I think we have to involve 
> Eric Dumazet
> as author of the underlying changes.

It could also be the barriers weren't that unneeded as assumed.
Since we are almost at RC6 i took the liberty to CC Eric now.

BTW am i correct these patches are merely optimizations ?
If so and concluding they revert cleanly, perhaps it should be considered at 
this point in the RC's
to revert them for 5.0 and try again for 5.1 ?

--
Sander


> 
>> would be candidates, which were merged in 5.0.
>>
>> I have reverted the first two, see how that works out.
>>
>> --
>> Sander
>>
> Heiner
> 
>>  
>>>> --
>>>> Sander
>>>>
>>> Heiner
>>>
>>>>
>>>> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27!
>>>> [ 6466.571425] invalid opcode:  [#1] SMP NOPTI
>>>> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 
>>>> 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1
>>>> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>>>> V1.8B1 09/13/2010
>>>> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140
>>>> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 
>>>> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff 
>>>> <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
>>>> [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297
>>>> [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: 
>>>> 
>>>> [ 6466.672835] RDX: 0001 RSI: 0042 RDI: 
>>>> 8880049cf8c0
>>>> [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: 
>>>> 
>>>> [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: 
>>>> 1000
>>>> [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: 
>>>> 00454677
>>>> [ 6466.722165] FS:  7fd869147200() GS:88807d4c() 
>>>> knlGS:
>>>> [ 6466.733228] CS:  e030 DS:  ES:  CR0: 80050033
>>>> [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: 
>>>> 0660
>>>> [ 6466.758366] Call Trace:
>>>> [ 6466.768118]  
>>>> [ 6466.778214]  rtl8169_poll+0x4f4/0x640
>>>> [ 6466.789198]  net_rx_action+0x23d/0x370
>>>> [ 6466.798467]  __do_softirq+0xed/0x229
>>>> [ 6466.807039]  irq_exit+0xb7/0xc0
>>>> [ 6466.815471]  xen_evtchn_do_upcall+0x27/0x40
>>>> [ 6466.826647]  xen_

Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-08 Thread Sander Eikelenboom
On 08/02/2019 19:52, Heiner Kallweit wrote:
> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>> L.S.,
>>
>> While testing a linux 5.0-rc5 kernel (with some patches on top but they 
>> don't seem related) under Xen i the nasty splat below, 
>> that I haven encountered with Linux 4.20.x.
>>
>> Unfortunately I haven't got a clear reproducer for this and bisecting could 
>> be nasty due to another (networking related) kernel bug.
>>
>> If you need more info, want me to run a debug patch etc., please feel free 
>> to ask.
>>
> Thanks for the report. However I see no change in the r8169 driver between
> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
> be somewhere else. Therefore I'm afraid a bisect will be needed.

Hmm i did some diging and i think:
bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and 
__netdev_sent_queue
620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as 
variant of __netdev_tx_sent_queue

would be candidates, which were merged in 5.0.

I have reverted the first two, see how that works out.

--
Sander

 
>> --
>> Sander
>>
> Heiner
> 
>>
>> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27!
>> [ 6466.571425] invalid opcode:  [#1] SMP NOPTI
>> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 
>> 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1
>> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140
>> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 
>> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 
>> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
>> [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297
>> [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: 
>> 
>> [ 6466.672835] RDX: 0001 RSI: 0042 RDI: 
>> 8880049cf8c0
>> [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: 
>> 
>> [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: 
>> 1000
>> [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: 
>> 00454677
>> [ 6466.722165] FS:  7fd869147200() GS:88807d4c() 
>> knlGS:
>> [ 6466.733228] CS:  e030 DS:  ES:  CR0: 80050033
>> [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: 
>> 0660
>> [ 6466.758366] Call Trace:
>> [ 6466.768118]  
>> [ 6466.778214]  rtl8169_poll+0x4f4/0x640
>> [ 6466.789198]  net_rx_action+0x23d/0x370
>> [ 6466.798467]  __do_softirq+0xed/0x229
>> [ 6466.807039]  irq_exit+0xb7/0xc0
>> [ 6466.815471]  xen_evtchn_do_upcall+0x27/0x40
>> [ 6466.826647]  xen_do_hypervisor_callback+0x29/0x40
>> [ 6466.835902]  
>> [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20
>> [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc 
>> cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> 
>> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>> [ 6466.874031] RSP: e02b:c90003c0bdd0 EFLAGS: 0246
>> [ 6466.883452] RAX:  RBX: 00041f83bfe8 RCX: 
>> 8100102a
>> [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: 
>> deadbeefdeadf00d
>> [ 6466.903402] RBP: 0fe8 R08: 000b R09: 
>> 
>> [ 6466.911201] R10: deadbeefdeadf00d R11: 0246 R12: 
>> 80050c346067
>> [ 6466.918491] R13: 8880607c4fe8 R14: 888005082800 R15: 
>> 
>> [ 6466.926647]  ? xen_hypercall_mmu_update+0xa/0x20
>> [ 6466.938195]  ? xen_set_pte_at+0x78/0xe0
>> [ 6466.947046]  ? __handle_mm_fault+0xc43/0x1060
>> [ 6466.955772]  ? do_mmap+0x44b/0x5b0
>> [ 6466.964410]  ? handle_mm_fault+0xf8/0x200
>> [ 6466.973290]  ? __do_page_fault+0x231/0x4a0
>> [ 6466.981973]  ? page_fault+0x8/0x30
>> [ 6466.990904]  ? page_fault+0x1e/0x30
>> [ 6466.999585] Modules linked in:
>> [ 6467.007533] ---[ end trace 94bec01608fe4061 ]---
>> [ 6467.016751] RIP: e030:dql_completed+0x126/0x140
>> [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 
>> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 
>> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 

Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!

2019-02-08 Thread Sander Eikelenboom
L.S.,

While testing a linux 5.0-rc5 kernel (with some patches on top but they don't 
seem related) under Xen i the nasty splat below, 
that I haven encountered with Linux 4.20.x.

Unfortunately I haven't got a clear reproducer for this and bisecting could be 
nasty due to another (networking related) kernel bug.

If you need more info, want me to run a debug patch etc., please feel free to 
ask.

--
Sander


[ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27!
[ 6466.571425] invalid opcode:  [#1] SMP NOPTI
[ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 
5.0.0-rc5-20190208-thp-net-florian-doflr+ #1
[ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[ 6466.611579] RIP: e030:dql_completed+0x126/0x140
[ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 
8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 
47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
[ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297
[ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: 
[ 6466.672835] RDX: 0001 RSI: 0042 RDI: 8880049cf8c0
[ 6466.684521] RBP: 888077df7260 R08: 0001 R09: 
[ 6466.696824] R10: 387c2336 R11: 387c2336 R12: 1000
[ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: 00454677
[ 6466.722165] FS:  7fd869147200() GS:88807d4c() 
knlGS:
[ 6466.733228] CS:  e030 DS:  ES:  CR0: 80050033
[ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: 0660
[ 6466.758366] Call Trace:
[ 6466.768118]  
[ 6466.778214]  rtl8169_poll+0x4f4/0x640
[ 6466.789198]  net_rx_action+0x23d/0x370
[ 6466.798467]  __do_softirq+0xed/0x229
[ 6466.807039]  irq_exit+0xb7/0xc0
[ 6466.815471]  xen_evtchn_do_upcall+0x27/0x40
[ 6466.826647]  xen_do_hypervisor_callback+0x29/0x40
[ 6466.835902]  
[ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20
[ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> 5b 59 
c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[ 6466.874031] RSP: e02b:c90003c0bdd0 EFLAGS: 0246
[ 6466.883452] RAX:  RBX: 00041f83bfe8 RCX: 8100102a
[ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: deadbeefdeadf00d
[ 6466.903402] RBP: 0fe8 R08: 000b R09: 
[ 6466.911201] R10: deadbeefdeadf00d R11: 0246 R12: 80050c346067
[ 6466.918491] R13: 8880607c4fe8 R14: 888005082800 R15: 
[ 6466.926647]  ? xen_hypercall_mmu_update+0xa/0x20
[ 6466.938195]  ? xen_set_pte_at+0x78/0xe0
[ 6466.947046]  ? __handle_mm_fault+0xc43/0x1060
[ 6466.955772]  ? do_mmap+0x44b/0x5b0
[ 6466.964410]  ? handle_mm_fault+0xf8/0x200
[ 6466.973290]  ? __do_page_fault+0x231/0x4a0
[ 6466.981973]  ? page_fault+0x8/0x30
[ 6466.990904]  ? page_fault+0x1e/0x30
[ 6466.999585] Modules linked in:
[ 6467.007533] ---[ end trace 94bec01608fe4061 ]---
[ 6467.016751] RIP: e030:dql_completed+0x126/0x140
[ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 
8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 
47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
[ 6467.039726] RSP: e02b:88807d4c3e78 EFLAGS: 00010297
[ 6467.047243] RAX: 0042 RBX: 8880049cf800 RCX: 
[ 6467.054202] RDX: 0001 RSI: 0042 RDI: 8880049cf8c0
[ 6467.062000] RBP: 888077df7260 R08: 0001 R09: 
[ 6467.069664] R10: 387c2336 R11: 387c2336 R12: 1000
[ 6467.077715] R13: 888077df6898 R14: 888077df75c0 R15: 00454677
[ 6467.084916] FS:  7fd869147200() GS:88807d4c() 
knlGS:
[ 6467.093352] CS:  e030 DS:  ES:  CR0: 80050033
[ 6467.101492] CR2: 7fd867dfd000 CR3: 74884000 CR4: 0660
[ 6467.110542] Kernel panic - not syncing: Fatal exception in interrupt
[ 6467.118166] Kernel Offset: disabled
(XEN) [2019-02-08 18:04:48.854] Hardware Dom0 crashed: rebooting machine in 5 
seconds.


Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt

2019-02-08 Thread Sander Eikelenboom
On 08/02/2019 12:54, Florian Westphal wrote:
> Florian Westphal  wrote:
>> Sander Eikelenboom  wrote:
>>> L.S.,
>>>
>>> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression 
>>> with NAT.
>>> (using an nftables firewall with NAT and connection tracking).
>>>
>>> Unfortunately it isn't too obvious since no errors are logged, but on 
>>> clients it
>>> causes symptoms like firefox intermittently not being able to load pages 
>>> with:
>>> Network Protocol Error
>>> An error occurred during a connection to www.example.com
>>> The page you are trying to view cannot be shown because an error in the 
>>> network protocol was detected.
>>> Please contact the website owners to inform them of this problem.
>>>
>>> But it's only intermittently, so i can still visit some webpages with 
>>> clients, 
>>> could be that packet size and or fragments are at play ?
>>>
>>> So I tried testing with 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
>>> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have 
>>> the latest netdev has to offer,
>>> but to no avail. 
>>>
>>> After that I tried to git bisect and ended up with:
>>>
>>> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
>>> commit faec18dbb0405c7d4dda025054511dc3a6696918
>>> Author: Florian Westphal 
>>> Date:   Thu Dec 13 16:01:33 2018 +0100
>>>
>>> netfilter: nat: remove l4proto->manip_pkt
>>
>> Thanks, this is immensely helpful.
>>
>> I think I see the bug, we can't use target->dst.protonum in
>> nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
>> with a related icmp packet.
>>
>> I will send a patch in a few hours when I get back.
> 
> Sander, does this patch fix things for you?

Hi Florian,

You may stick on a reported/tested-by if you like.
Thanks for the swift fix !

--
Sander

> 
> Thanks!
> 
> diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c 
> b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> @@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
>  
>   /* Change outer to look like the reply to an incoming packet */
>   nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> + target.dst.protonum = IPPROTO_ICMP;
>   if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip))
>   return 0;
>  
> diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c 
> b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
> --- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
> +++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
> @@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
>   }
>  
>   nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> + target.dst.protonum = IPPROTO_ICMPV6;
>   if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip))
>   return 0;
>  
> 



Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt

2019-02-07 Thread Sander Eikelenboom
L.S.,

While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression 
with NAT.
(using an nftables firewall with NAT and connection tracking).

Unfortunately it isn't too obvious since no errors are logged, but on clients it
causes symptoms like firefox intermittently not being able to load pages with:
Network Protocol Error
An error occurred during a connection to www.example.com
The page you are trying to view cannot be shown because an error in the 
network protocol was detected.
Please contact the website owners to inform them of this problem.

But it's only intermittently, so i can still visit some webpages with clients, 
could be that packet size and or fragments are at play ?

So I tried testing with 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the 
latest netdev has to offer,
but to no avail. 

After that I tried to git bisect and ended up with:

faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
commit faec18dbb0405c7d4dda025054511dc3a6696918
Author: Florian Westphal 
Date:   Thu Dec 13 16:01:33 2018 +0100

netfilter: nat: remove l4proto->manip_pkt

This removes the last l4proto indirection, the two callers, the l3proto
packet mangling helpers for ipv4 and ipv6, now call the
nf_nat_l4proto_manip_pkt() helper.

nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
they contain no functionality anymore to not clutter this patch.

Next patch will remove the empty files and the nf_nat_l4proto
struct.

nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
other nat manip functionality as well, not just udp and udplite.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

:04 04 22d8706921e03cbd6d78a6ebcc5f253ccfd2bf0c 
b6f8ab2779215b4495dfe641f50e798da73859ac M  include
:04 04 af212a756f1acf00cbe45c3be5b71f38f01f1d34 
165c440f9e6f2e05738628a19b51f7603f95752a M  net

Any ideas or debugging hints ?

--
Sander


Re: [ANNOUNCE] v4.18.12-rt7 stall

2018-10-10 Thread Tim Sander
Hi

I just tested this kernel and saw the stall output below. I think there is 
something
fishy with the ethernet driver. I had one time where it just locked up on
network traffic on issuing "ip a" via serial port on the device. All the 
problems i see,
seem to be related to network traffic via the socfpga-dwmac stmicro/stmmac.
Platform is pretty dated Intel/Altera Cortex A9 socfpga.

I think this problem is there for a while but since i had problems due to the
watchdog i was not able to detect it.

Best regards
Tim

[  251.440019] INFO: rcu_preempt self-detected stall on CPU
[  251.440036]  1-...!: (21000 ticks this GP) idle=5ae/1/1073741826 softirq=0/0 
fqs=0 
[  251.440039]   (t=21000 jiffies g=7702 c=7701 q=346)
[  251.440053] rcu_preempt kthread starved for 21000 jiffies! g7702 c7701 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
[  251.440055] RCU grace-period kthread stack dump:
[  251.440059] rcu_preempt I011  2 0x
[  251.440066] Backtrace: 
[  251.440086] [<8062d4b0>] (__schedule) from [<8062da30>] (schedule+0x68/0x128)
[  251.440096]  r10:80a1569e r9:87d9a680 r8:80a04100 r7:80055eec r6:87d9a680 
r5:80034600
[  251.440100]  r4:80054000
[  251.440111] [<8062d9c8>] (schedule) from [<806306dc>] 
(schedule_timeout+0x1cc/0x368)
[  251.440116]  r5:80a06488 r4:fffef04c
[  251.440128] [<80630510>] (schedule_timeout) from [<80184fdc>] 
(rcu_gp_kthread+0x750/0xac0)
[  251.440137]  r10:80a1569e r9:80a04100 r8:0001 r7:0003 r6:80a15690 
r5:80a1569c
[  251.440140]  r4:80a154c0
[  251.440150] [<8018488c>] (rcu_gp_kthread) from [<801461a8>] 
(kthread+0x138/0x168)
[  251.440153]  r7:80a154c0
[  251.440163] [<80146070>] (kthread) from [<801010bc>] 
(ret_from_fork+0x14/0x38)
[  251.440168] Exception stack(0x80055fb0 to 0x80055ff8)
[  251.440174] 5fa0:   
 
[  251.440183] 5fc0:       
 
[  251.440189] 5fe0:     0013 
[  251.440198]  r10: r9: r8: r7: r6: 
r5:80146070
[  251.440202]  r4:8150fac0 r3:80054000
[  251.440215] NMI backtrace for cpu 1
[  251.440226] CPU: 1 PID: 157 Comm: RawMeasThread Tainted: GW  O  
4.18.12-rt7 #1
[  251.440229] Hardware name: Altera SOCFPGA
[  251.440231] Backtrace: 
[  251.440243] [<8010dda4>] (dump_backtrace) from [<8010e09c>] 
(show_stack+0x20/0x24)
[  251.440250]  r7:80a573f8 r6: r5:600d0193 r4:80a573f8
[  251.440264] [<8010e07c>] (show_stack) from [<80616120>] 
(dump_stack+0xb0/0xdc)
[  251.440278] [<80616070>] (dump_stack) from [<8061cb74>] 
(nmi_cpu_backtrace+0xc0/0xc4)
[  251.440286]  r9:800d0193 r8:0180 r7:807017c4 r6:0001 r5: 
r4:0001
[  251.440296] [<8061cab4>] (nmi_cpu_backtrace) from [<8061ccdc>] 
(nmi_trigger_cpumask_backtrace+0x164/0x1b0)
[  251.440301]  r5:80a0906c r4:8010fa94
[  251.440312] [<8061cb78>] (nmi_trigger_cpumask_backtrace) from [<8011077c>] 
(arch_trigger_cpumask_backtrace+0x20/0x24)
[  251.440318]  r7:80a154c0 r6:807017bc r5:80a06534 r4:80a154c0
[  251.440328] [<8011075c>] (arch_trigger_cpumask_backtrace) from [<80187944>] 
(rcu_dump_cpu_stacks+0xac/0xdc)
[  251.440337] [<80187898>] (rcu_dump_cpu_stacks) from [<801864b0>] 
(rcu_check_callbacks+0x9e8/0xb08)
[  251.440346]  r10:80a06574 r9:80a154c0 r8:80a06528 r7:80a154c0 r6:07439000 
r5:87d9edc0
[  251.440350]  r4:80965dc0 r3:6c2a9c31
[  251.440360] [<80185ac8>] (rcu_check_callbacks) from [<8018e834>] 
(update_process_times+0x40/0x6c)
[  251.440368]  r10:801a3024 r9:87d9b1a0 r8:87d9b000 r7:003a r6:8afdf535 
r5:0001
[  251.440372]  r4:871baa00
[  251.440383] [<8018e7f4>] (update_process_times) from [<801a30ac>] 
(tick_sched_timer+0x88/0xf4)
[  251.440387]  r5:867cffb0 r4:87d9b310
[  251.440396] [<801a3024>] (tick_sched_timer) from [<8018fc54>] 
(__hrtimer_run_queues+0x194/0x3e8)
[  251.440403]  r7:80a064b0 r6:867ce000 r5:87d9b060 r4:87d9b310
[  251.440411] [<8018fac0>] (__hrtimer_run_queues) from [<80190648>] 
(hrtimer_interrupt+0x138/0x2b0)
[  251.440419]  r10:87d9b00c r9:87d9b1a0 r8: r7:7fff r6:0003 
r5:200d0193
[  251.440422]  r4:87d9b000
[  251.440432] [<80190510>] (hrtimer_interrupt) from [<8011140c>] 
(twd_handler+0x40/0x50)
[  251.440441]  r10:765b03e0 r9:0010 r8:80a06d3c r7: r6:8001a500 
r5:0010
[  251.440444]  r4:0001
[  251.440454] [<801113cc>] (twd_handler) from [<80178510>] 
(handle_percpu_devid_irq+0x98/0x2dc)
[  251.440459]  r5:0010 r4:81503cc0
[  251.440472] [<80178478>] (handle_percpu_devid_irq) from [<8017230c>] 
(generic_handle_irq+0x34/0x44)
[  251.440480]  r10:765b03e0 r9:90803100 r8:80009000 r7: r6: 
r5:0010
[  251.440484]  r4:80965208 r3:80178478
[  251.440495] [<801722d8>] (generic_handle_irq) from [<801729e0>] 
(__handle_domain_irq+0x6c/0xc4)
[  251.440505] [<80172974>] (__handle_domain_irq) from [<80102310>] 
(gic_handle_irq+0x5c/0xa0)
[

Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer

2018-09-27 Thread Sander Eikelenboom
On 27/09/18 23:48, Boris Ostrovsky wrote:
> On 9/27/18 5:37 PM, Jens Axboe wrote:
>> On 9/27/18 2:33 PM, Sander Eikelenboom wrote:
>>> On 27/09/18 21:06, Boris Ostrovsky wrote:
>>>> On 9/27/18 2:56 PM, Jens Axboe wrote:
>>>>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote:
>>>>>> On 27/09/18 16:26, Jens Axboe wrote:
>>>>>>> On 9/27/18 1:12 AM, Juergen Gross wrote:
>>>>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote:
>>>>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>>>>> added support for purging persistent grants when they are not in use. 
>>>>>>>>> As
>>>>>>>>> part of the purge, the grants were removed from the grant buffer, This
>>>>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in
>>>>>>>>> get_free_grant(). This can be observed even on an idle system, within
>>>>>>>>> 20-30 minutes.
>>>>>>>>>
>>>>>>>>> We should keep the grants in the buffer when purging, and only free 
>>>>>>>>> the
>>>>>>>>> grant ref.
>>>>>>>>>
>>>>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>>>>> Signed-off-by: Boris Ostrovsky 
>>>>>>>> Reviewed-by: Juergen Gross 
>>>>>>> Since Konrad is out, I'm going to queue this up for 4.19.
>>>>>>>
>>>>>> Hi Boris/Juergen.
>>>>>>
>>>>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch 
>>>>>> from Boris pulled on top. 
>>>>>> Unfortunately it made a VM hang (probably because it's rootFS is 
>>>>>> shuffled from under it's feet 
>>>> What do you mean by "rootFS is shuffled from under it's feet " ?
>>> Assumption that block-front getting borked and either a kernel crash or 
>>> rootfs becoming mounted readonly. Didn't (try) to check though.
>>>
>>>>>> and it gave these in dom0 dmesg:
>>>>>>
>>>>>> [ 9251.696090] xen-blkback: requesting a grant already in use
>>>>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the 
>>>>>> tree
>>>>>> [ 9251.715781] xen-blkback: requesting a grant already in use
>>>>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the 
>>>>>> tree
>>>>>> [ 9251.735698] xen-blkback: requesting a grant already in use
>>>>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the 
>>>>>> tree
>>>>>>
>>>>>> The VM was a HVM with 4 vcpu's and 2 phy disks:
>>>>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) 
>>>>>> persistent grants
>>>>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) 
>>>>>> persistent grants
>>>>>>
>>>>>>
>>>>>> Currently i have been running 4.19-rc5 with xen-next on top and commit
>>>>>> a46b53672b2c reverted, for a couple of days. That seems to run stable
>>>>>> for me (since it's a small box so i'm not hit by what a46b53672b2c
>>>>>> tried to fix.
>>>>>>
>>>>>> If you can come up with a debug patch i can give that a spin tomorrow
>>>>>> evening or in the weekend, so we are hopefully still in time for the
>>>>>> 4.19 release.
>>>>> At this late in the game, might make more sense to simply revert the
>>>>> buggy commit.  Especially since what is currently out there doesn't fix
>>>>> the issue for you.
>>> Don't know if Boris or Juergen have a hunch about the issue, if not
>>> perhaps a revert is the best.
>> Anyone? Unless I hear otherwise, I'll revert the series tomorrow.
> 
> Juergen may have something to say by tomorrow, but from my perspective,
> given that we are coming up on rc6 --- yes.
> 
> I looked at the patches again and didn't see anything obvious.
> 
> -boris

Could also be that what i hit is a latent bug, 
that is not caused by these patches but merely got uncovered by them.

xl dmesg also shows quite some:
(XEN) [2018-09-24 03:15:46.847] grant_table.c:1755:d14v0 Expanding d14 
grant table from 19 to 20 frames
(XEN) [2018-09-24 03:15:46.849] grant_table.c:1755:d14v0 Expanding d14 
grant table from 20 to 21 frames
(and has done that for ages on my box not leading to any direct problems to my 
knowledge)

I don't know if there could be related and something around the (persistent) 
grants for block devices could be leaking under some conditions?

--
Sander



Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer

2018-09-27 Thread Sander Eikelenboom
On 27/09/18 21:06, Boris Ostrovsky wrote:
> On 9/27/18 2:56 PM, Jens Axboe wrote:
>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote:
>>> On 27/09/18 16:26, Jens Axboe wrote:
>>>> On 9/27/18 1:12 AM, Juergen Gross wrote:
>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote:
>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>> added support for purging persistent grants when they are not in use. As
>>>>>> part of the purge, the grants were removed from the grant buffer, This
>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in
>>>>>> get_free_grant(). This can be observed even on an idle system, within
>>>>>> 20-30 minutes.
>>>>>>
>>>>>> We should keep the grants in the buffer when purging, and only free the
>>>>>> grant ref.
>>>>>>
>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>> Signed-off-by: Boris Ostrovsky 
>>>>> Reviewed-by: Juergen Gross 
>>>> Since Konrad is out, I'm going to queue this up for 4.19.
>>>>
>>> Hi Boris/Juergen.
>>>
>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch 
>>> from Boris pulled on top. 
>>> Unfortunately it made a VM hang (probably because it's rootFS is shuffled 
>>> from under it's feet 
> 
> What do you mean by "rootFS is shuffled from under it's feet " ?

Assumption that block-front getting borked and either a kernel crash or rootfs 
becoming mounted readonly. Didn't (try) to check though.

>>> and it gave these in dom0 dmesg:
>>>
>>> [ 9251.696090] xen-blkback: requesting a grant already in use
>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree
>>> [ 9251.715781] xen-blkback: requesting a grant already in use
>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree
>>> [ 9251.735698] xen-blkback: requesting a grant already in use
>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree
>>>
>>> The VM was a HVM with 4 vcpu's and 2 phy disks:
>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) 
>>> persistent grants
>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) 
>>> persistent grants
>>>
>>>
>>> Currently i have been running 4.19-rc5 with xen-next on top and commit
>>> a46b53672b2c reverted, for a couple of days. That seems to run stable
>>> for me (since it's a small box so i'm not hit by what a46b53672b2c
>>> tried to fix.
>>>
>>> If you can come up with a debug patch i can give that a spin tomorrow
>>> evening or in the weekend, so we are hopefully still in time for the
>>> 4.19 release.
>> At this late in the game, might make more sense to simply revert the
>> buggy commit.  Especially since what is currently out there doesn't fix
>> the issue for you.
Don't know if Boris or Juergen have a hunch about the issue, if not perhaps a 
revert is the best. 

> If decision is to revert then I think the whole series needs to be
> reverted.
> 
> -boris
> 

For Boris and Juergen:
Would it make sense to have an "xen-next" branch in the xen-tip tree that is:
- based on the previous stable kernel
- and has the for-linus branches for the upcoming kernel release on top;
- and has the pathes for net(-next) and block changes on top (since these don't 
go via the tree but only via mailing-list patches);
  (which are scattered, difficult to track and use for automated testing)
- and dependency patches for the above if necessary to be able to build.

So there is one branch that can be used to test ALL pending kernel related Xen 
patches and which could be used in OSStest without as
many potential false alarms as linux-next will have ?

--
Sander


Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer

2018-09-27 Thread Sander Eikelenboom
On 27/09/18 16:26, Jens Axboe wrote:
> On 9/27/18 1:12 AM, Juergen Gross wrote:
>> On 22/09/18 21:55, Boris Ostrovsky wrote:
>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>> added support for purging persistent grants when they are not in use. As
>>> part of the purge, the grants were removed from the grant buffer, This
>>> eventually causes the buffer to become empty, with BUG_ON triggered in
>>> get_free_grant(). This can be observed even on an idle system, within
>>> 20-30 minutes.
>>>
>>> We should keep the grants in the buffer when purging, and only free the
>>> grant ref.
>>>
>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>> Signed-off-by: Boris Ostrovsky 
>>
>> Reviewed-by: Juergen Gross 
> 
> Since Konrad is out, I'm going to queue this up for 4.19.
> 

Hi Boris/Juergen.

Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch from 
Boris pulled on top. 
Unfortunately it made a VM hang (probably because it's rootFS is shuffled from 
under it's feet 
and it gave these in dom0 dmesg:

[ 9251.696090] xen-blkback: requesting a grant already in use
[ 9251.705861] xen-blkback: trying to add a gref that's already in the tree
[ 9251.715781] xen-blkback: requesting a grant already in use
[ 9251.725756] xen-blkback: trying to add a gref that's already in the tree
[ 9251.735698] xen-blkback: requesting a grant already in use
[ 9251.745573] xen-blkback: trying to add a gref that's already in the tree

The VM was a HVM with 4 vcpu's and 2 phy disks:
xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) 
persistent grants
xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) 
persistent grants


Currently i have been running 4.19-rc5 with xen-next on top and commit 
a46b53672b2c reverted,
for a couple of days. That seems to run stable for me (since it's a small box 
so i'm not hit
by what a46b53672b2c tried to fix.

If you can come up with a debug patch i can give that a spin tomorrow evening 
or in the weekend,
so we are hopefully still in time for the 4.19 release.

--
Sander


Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"

2018-03-18 Thread Sander Eikelenboom
On 13/02/18 14:07, Ulf Magnusson wrote:
> On Tue, Feb 13, 2018 at 1:35 PM, Ulf Magnusson  wrote:
>> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote:
>>> On Tue, Feb 13, 2018 at 11:00:49AM +0100, Sander Eikelenboom wrote:
>>>> On 13/02/18 05:09, Masahiro Yamada wrote:
>>>>> 2018-02-13 12:00 GMT+09:00 Woody Suwalski :
>>>>>> Sander Eikelenboom wrote:
>>>>>>>
>>>>>>> L.S.,
>>>>>>>
>>>>>>> The Debian kernel-package tool make-kpkg for easy building of upstream
>>>>>>> kernels on Debian fails with linux 4.16-rc1.
>>>>>>>
>>>>>>> The tool (perl script) while invoked with:
>>>>>>>  make-kpkg --initrd --append_to_version -20180212 kernel_image
>>>>>>>
>>>>>>> On a git tree with a .config from the previous kernel release, so new
>>>>>>> KConfig questions have to be asked on new or changed options.
>>>>>>>
>>>>>>> The script stalls indefinitely while it seems to be excuting:
>>>>>>>  exec make kpkg_version=13.018+nmu1 -f
>>>>>>> /usr/share/kernel-package/ruleset/minimal.mk debian
>>>>>>> APPEND_TO_VERSION=-t440s-20180212  INITRD=YES
>>>>>>>
>>>>>>> After using ctrl-c to break out it, i get:
>>>>>>> ^CFailed to create a ./debian directory: No such file or directory 
>>>>>>> at
>>>>>>> /usr/bin/make-kpkg line 970.
>>>>>>>
>>>>>>> Bisection turned up as culprit:
>>>>>>>  commit d2a04648a5dbc3d1d043b35257364f0197d4d868
>>>>>>>  kconfig: remove check_stdin()
>>>>>>>   Except silentoldconfig, valid_stdin is 1, so check_stdin() is
>>>>>>> no-op.
>>>>>>>   oldconfig and silentoldconfig work almost in the same way 
>>>>>>> except
>>>>>>> that
>>>>>>>  the latter generates additional files under include/.  Both ask 
>>>>>>> users
>>>>>>>  for input for new symbols.
>>>>>>>   I do not know why only silentoldconfig requires stdio be tty.
>>>>>>> $ rm -f .config; touch .config
>>>>>>>$ yes "" | make oldconfig > stdout
>>>>>>>$ rm -f .config; touch .config
>>>>>>>$ yes "" | make silentoldconfig > stdout
>>>>>>>make[1]: *** [silentoldconfig] Error 1
>>>>>>>make: *** [silentoldconfig] Error 2
>>>>>>>$ tail -n 4 stdout
>>>>>>>Console input/output is redirected. Run 'make oldconfig' to 
>>>>>>> update
>>>>>>> configuration.
>>>>>>> scripts/kconfig/Makefile:40: recipe for target
>>>>>>> 'silentoldconfig' failed
>>>>>>>Makefile:507: recipe for target 'silentoldconfig' failed
>>>>>>>   Redirection is useful, for example, for testing where we want 
>>>>>>> to
>>>>>>> give
>>>>>>>  particular key inputs from a test file, then check the result.
>>>>>>>   Signed-off-by: Masahiro Yamada 
>>>>>>>  Reviewed-by: Ulf Magnusson 
>>>>>>>
>>>>>>> Reverting this specific commit makes make-kpkg work again as usual.
>>>>>>>
>>>>>>> Version of the kernel-package used:
>>>>>>> ii  kernel-package
>>>>>>> 13.018+nmu1
>>>>>>>
>>>>>>>
>>>>>>> I also cc'ed the Debian developer who maintains the kernel-package
>>>>>>> package: Manoj Srivastava
>>>>>>>
>>>>>>> --
>>>>>>> Sander
>>>>>>>
>>>>>> I have noticed today the same - the kernel-build blockage was in (as I
>>>>>> recall)
>>>>>> srcipts/kconfig/conf -s --silentoldconfig Kbuild
>>>>>>
>>>>>> I have bypassed it by regenerating the .config "by hand"...
>>>>>
>>>>>
>>>>> silentold

Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"

2018-02-13 Thread Sander Eikelenboom
On 13/02/18 05:09, Masahiro Yamada wrote:
> 2018-02-13 12:00 GMT+09:00 Woody Suwalski :
>> Sander Eikelenboom wrote:
>>>
>>> L.S.,
>>>
>>> The Debian kernel-package tool make-kpkg for easy building of upstream
>>> kernels on Debian fails with linux 4.16-rc1.
>>>
>>> The tool (perl script) while invoked with:
>>>  make-kpkg --initrd --append_to_version -20180212 kernel_image
>>>
>>> On a git tree with a .config from the previous kernel release, so new
>>> KConfig questions have to be asked on new or changed options.
>>>
>>> The script stalls indefinitely while it seems to be excuting:
>>>  exec make kpkg_version=13.018+nmu1 -f
>>> /usr/share/kernel-package/ruleset/minimal.mk debian
>>> APPEND_TO_VERSION=-t440s-20180212  INITRD=YES
>>>
>>> After using ctrl-c to break out it, i get:
>>> ^CFailed to create a ./debian directory: No such file or directory at
>>> /usr/bin/make-kpkg line 970.
>>>
>>> Bisection turned up as culprit:
>>>  commit d2a04648a5dbc3d1d043b35257364f0197d4d868
>>>  kconfig: remove check_stdin()
>>>   Except silentoldconfig, valid_stdin is 1, so check_stdin() is
>>> no-op.
>>>   oldconfig and silentoldconfig work almost in the same way except
>>> that
>>>  the latter generates additional files under include/.  Both ask users
>>>  for input for new symbols.
>>>   I do not know why only silentoldconfig requires stdio be tty.
>>> $ rm -f .config; touch .config
>>>$ yes "" | make oldconfig > stdout
>>>$ rm -f .config; touch .config
>>>$ yes "" | make silentoldconfig > stdout
>>>make[1]: *** [silentoldconfig] Error 1
>>>make: *** [silentoldconfig] Error 2
>>>$ tail -n 4 stdout
>>>Console input/output is redirected. Run 'make oldconfig' to update
>>> configuration.
>>> scripts/kconfig/Makefile:40: recipe for target
>>> 'silentoldconfig' failed
>>>Makefile:507: recipe for target 'silentoldconfig' failed
>>>   Redirection is useful, for example, for testing where we want to
>>> give
>>>  particular key inputs from a test file, then check the result.
>>>   Signed-off-by: Masahiro Yamada 
>>>  Reviewed-by: Ulf Magnusson 
>>>
>>> Reverting this specific commit makes make-kpkg work again as usual.
>>>
>>> Version of the kernel-package used:
>>> ii  kernel-package
>>> 13.018+nmu1
>>>
>>>
>>> I also cc'ed the Debian developer who maintains the kernel-package
>>> package: Manoj Srivastava
>>>
>>> --
>>> Sander
>>>
>> I have noticed today the same - the kernel-build blockage was in (as I
>> recall)
>> srcipts/kconfig/conf -s --silentoldconfig Kbuild
>>
>> I have bypassed it by regenerating the .config "by hand"...
> 
> 
> silentoldconfig asks you values for new symbols.
> So, you must answer questions to proceed.

I know, but it stalls before asking the questions.
 
> 
> How does 'make-kpkg' handle silentoldconfig?
> 
> Re-direct stdio, then make it forcibly fail?

I don't know, it is a bunch of perl and shell scripts that gets invoked, not 
the most easy to comprehend if you are not familiar with them. I'm just a user 
of the tool.

So i would have to defer that question to the Debian package maintainer, 
hopefully he will chime in.

--
Sander

> 
> 
> 



Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"

2018-02-12 Thread Sander Eikelenboom
L.S.,

The Debian kernel-package tool make-kpkg for easy building of upstream kernels 
on Debian fails with linux 4.16-rc1.

The tool (perl script) while invoked with:
make-kpkg --initrd --append_to_version -20180212 kernel_image

On a git tree with a .config from the previous kernel release, so new KConfig 
questions have to be asked on new or changed options.

The script stalls indefinitely while it seems to be excuting:
exec make kpkg_version=13.018+nmu1 -f 
/usr/share/kernel-package/ruleset/minimal.mk debian 
APPEND_TO_VERSION=-t440s-20180212  INITRD=YES

After using ctrl-c to break out it, i get:
   ^CFailed to create a ./debian directory: No such file or directory at 
/usr/bin/make-kpkg line 970.
 

Bisection turned up as culprit:
commit d2a04648a5dbc3d1d043b35257364f0197d4d868
kconfig: remove check_stdin()

Except silentoldconfig, valid_stdin is 1, so check_stdin() is no-op.

oldconfig and silentoldconfig work almost in the same way except that
the latter generates additional files under include/.  Both ask users
for input for new symbols.

I do not know why only silentoldconfig requires stdio be tty.

  $ rm -f .config; touch .config
  $ yes "" | make oldconfig > stdout
  $ rm -f .config; touch .config
  $ yes "" | make silentoldconfig > stdout
  make[1]: *** [silentoldconfig] Error 1
  make: *** [silentoldconfig] Error 2
  $ tail -n 4 stdout
  Console input/output is redirected. Run 'make oldconfig' to update 
configuration.

  scripts/kconfig/Makefile:40: recipe for target 'silentoldconfig' failed
  Makefile:507: recipe for target 'silentoldconfig' failed

Redirection is useful, for example, for testing where we want to give
particular key inputs from a test file, then check the result.

Signed-off-by: Masahiro Yamada 
Reviewed-by: Ulf Magnusson 

Reverting this specific commit makes make-kpkg work again as usual.

Version of the kernel-package used:
ii  kernel-package  13.018+nmu1 


I also cc'ed the Debian developer who maintains the kernel-package package: 
Manoj Srivastava

--
Sander



Linux 4.14-rc6 bisected regression tun devices not working anymore in openvpn

2017-10-28 Thread Sander Eikelenboom
L.S.,

While testing a linux 4.14-rc6 kernel i noticed OpenVPN didn't function 
anymore. 
My openvpn config uses tun devices and is pretty standard.
The openvpn version is current Debian stable: openvpn 2.4.0-6+deb9u2

>From the openvpn logging:
Sat Oct 28 16:03:34 2017 us=175829 TUN/TAP device  opened
Sat Oct 28 16:03:34 2017 us=183027 Note: Cannot set tx queue length on : No 
such device (errno=19)
Sat Oct 28 16:03:34 2017 us=183055 do_ifconfig, 
tt->did_ifconfig_ipv6_setup=0
Sat Oct 28 16:03:34 2017 us=183071 /sbin/ip link set dev  up mtu 1500
Cannot find device ""
Sat Oct 28 16:03:34 2017 us=200445 Linux ip link set failed: external 
program exited with error status: 1
Sat Oct 28 16:03:34 2017 us=200482 Exiting due to fatal error
Sat Oct 28 16:38:17 2017 us=923381 TCP/UDP: Closing socket
Sat Oct 28 16:38:17 2017 us=925986 Closing TUN/TAP interface


The offending commit is: 
0ad646c81b2182f7fa67ec0c8c825e0ee165696d
"tun: call dev_get_valid_name() before register_netdevice()" 

Reverting this commit fixes the issue for me, it's unfortunate that the commit 
it self seems to fix an other issue.

--
Sander


Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 19:49, Craig Bergstrom wrote:
> Sander, thanks for the details, they've been very useful.
> 
> I suspect that your host system's mem=2048M parameter is causing the
> problem.  Any chance you can confirm by removing the parameter and
> running the guest code path?

I removed it, but kept the hypervisor limiting dom0 memory to 2046M intact (in 
grub using the xen bootcmd: 
"multiboot   /xen-4.10.gz  dom0_mem=2048M,max:2048M ."

Unfortunately that doesn't change anything, the guest still fails to start with 
the same errors.

> More specifically, since you're telling the kernel that it's high
> memory address is at 2048M and your device is at 0xfe1fe000 (~4G), the
> new mmap() limits are preventing you from mapping addresses that are
> explicitly disallowed by the parameter.
> 

Which would probably mean the current patch prohibits hard limiting the dom0 
memory to a certain value (below 4G)
at least in combination with PCI-passthrough. So the only thing left would be 
to have no hard memory restriction on dom0
and rely on auto-ballooning, but I'm not a great fan of that.

I don't know how KVM handles setting memory limits for the host system, but 
perhaps it suffers from the same issue.

I also tried the patch from one of your last mails to make the check "less 
strict", 
but still get the same errors (when using the hard memory limits).

--
Sander

 
> 
> On Thu, Oct 26, 2017 at 10:39 AM, Ingo Molnar  wrote:
>>
>> * Craig Bergstrom  wrote:
>>
>>> Yes, not much time left for 4.14, it might be reasonable to pull the
>>> change out since it's causing problems. [...]
>>
>> Ok, I'll queue up a revert tomorrow morning and send it to Linus ASAP if 
>> there's
>> no good fix by then. In hindsight I should have queued it for v4.15 ...
>>
>> Thanks,
>>
>> Ingo



Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 10:12, Sander Eikelenboom wrote:
> On 26/10/17 10:05, Sander Eikelenboom wrote:
>> On 26/10/17 00:02, Craig Bergstrom wrote:
>>> Thanks for the notification, my apologies for the breakage.  I'll take a
>>> close look and see if I can figure out what went wrong.
>>>
>>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call
>>> that fail on your affected system?
>>
>> Hi Craig,
>>
>> The output from /proc/iomem is simple to get and attached.
>> The mmap call is probably issued by qemu and will require more digging.
> 
> Ahh grepping qemu gave a pointer, it's probably the code in:
> 
> http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40
> 
> around line 571, that would also explain why it's only this device that
> has the problem, since it's the only one trying to use MSI(-X)
> interrupts. Will see it i can add some logging to that function.

Attached is the qemu debug output with an extra line outputting all stuff
used to calculate the arguments used by the mmap-call.
--
Sander

 
> --
> Sander
> 
> 
>>
>> I don't know if there is that much time left for 4.14, since we are at
>> RC6 already.
>>
>> --
>> Sander
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >>> wrote:
>>>
>>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote:
>>>>> Greetings,
>>>>>
>>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>> master
>>>>>
>>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
>>>>> Author: Craig Bergstrom 
>>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600
>>>>> Commit: Ingo Molnar 
>>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200
>>>>>
>>>>>  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
>>>>
>>>> Also note
>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html
>>>>
>>>> -boris
>>>>
>>>
>>
> 

qemu-system-i386: -serial pty: char device redirected to /dev/pts/16 (label serial0)
[00:05.0] xen_pt_realize: Assigning real physical device 08:00.0 to devfn 0x28
[00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x2000 base_addr=0xfe1fe000 type: 0x4)
[00:05.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x, syncing to 0x0080.
[00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x, host=0xfe1fe004, syncing to 0xfe1fe004.
[00:05.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x, host=0x4803, syncing to 0x0003.
[00:05.0] xen_pt_config_reg_init: Offset 0x0072 mismatch! Emulated=0x, host=0x0086, syncing to 0x0080.
[00:05.0] xen_pt_config_reg_init: Offset 0x00a4 mismatch! Emulated=0x, host=0x8fc0, syncing to 0x8fc0.
[00:05.0] xen_pt_config_reg_init: Offset 0x00b2 mismatch! Emulated=0x, host=0x1012, syncing to 0x1012.
[00:05.0] xen_pt_msix_init: get MSI-X table BAR base 0xfe1fe000
[00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8
[00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8, PCI_MSIX_ENTRY_SIZE = 0x10,  msix->table_offset_adjust = 0,  msix->table_base = 0xfe1fe000
[00:05.0] xen_pt_msix_init: Error: Can't map physical MSI-X table: Invalid argument
[00:05.0] xen_pt_msix_size_init: Error: Internal error: Invalid xen_pt_msix_init.
Failed to initialize 12/15, type = 0x1, rc: -22
[00:05.0] xen_pt_msi_set_enable: disabling MSI.
*** Error in `/usr/local/lib/xen/bin/qemu-system-i386': corrupted size vs. prev_size: 0x55ce13565570 ***
=== Backtrace: =
/lib/x86_64-linux-gnu/libc.so.6(+0x70bcb)[0x7f700ab7ebcb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76f96)[0x7f700ab84f96]
/lib/x86_64-linux-gnu/libc.so.6(+0x77388)[0x7f700ab85388]
/lib/x86_64-linux-gnu/libc.so.6(+0x78dca)[0x7f700ab86dca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7f700ab89b4b]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_malloc0+0x21)[0x7f700bbbee61]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6d78ee)[0x55ce114298ee]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6d309e)[0x55ce1142509e]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6d316f)[0x55ce1142516f]
/usr/local/lib/xen/bin/qemu-system-i386(+0x24d79b)[0x55ce10f9f79b]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6da8bf)[0x55ce1142c8bf]
/usr/local/lib/xen/bin/qemu-system-i386(+0x70717c)[0x55ce1145917c]
/usr/local/lib/xen/

Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 00:02, Craig Bergstrom wrote:
> Thanks for the notification, my apologies for the breakage.  I'll take a
> close look and see if I can figure out what went wrong.
> 
> Sander, any chance you can send /proc/iomem and the inputs to the mmap call
> that fail on your affected system?

Hi Craig,

The output from /proc/iomem is simple to get and attached.
The mmap call is probably issued by qemu and will require more digging.

I don't know if there is that much time left for 4.14, since we are at
RC6 already.

--
Sander


> 
> 
> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky > wrote:
> 
>> On 10/23/2017 10:44 PM, Fengguang Wu wrote:
>>> Greetings,
>>>
>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> master
>>>
>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
>>> Author: Craig Bergstrom 
>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600
>>> Commit: Ingo Molnar 
>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200
>>>
>>>  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
>>
>> Also note
>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html
>>
>> -boris
>>
> 

-0fff : Reserved
1000-00095fff : System RAM
00096000-000963ff : RAM buffer
00096400-000f : Reserved
  000a-000b : PCI Bus :00
  000c-000cfdff : Video ROM
  000d-000d : PCI Bus :00
000d4800-000d4bff : Adapter ROM
  000f-000f : System ROM
0010-7fff : System RAM
  0100-01d2a703 : Kernel code
  01d2a704-025450ff : Kernel data
  02b3f000-02cc1fff : Kernel bss
c7f9-c7f9dfff : ACPI Tables
c7f9e000-c7fd : ACPI Non-volatile Storage
c7fe-c7ff : Reserved
c800-dfff : PCI Bus :00
  cfe0-cfef : PCI Bus :0c
cfef8000-cfefbfff : :0c:00.0
  cfef8000-cfefbfff : r8169
cfeff000-cfef : :0c:00.0
  cfeff000-cfef : r8169
  cff0-cfff : PCI Bus :0d
cfff8000-cfffbfff : :0d:00.0
  cfff8000-cfffbfff : r8169
c000-cfff : :0d:00.0
  c000-cfff : r8169
  d000-dfff : PCI Bus :0f
d000-dfff : :0f:00.0
  d000-d0ff : vesafb
e000-efff : PCI MMCONFIG  [bus 00-ff]
  e000-efff : pnp 00:07
f000-febf : PCI Bus :00
  f600-f6003fff : Reserved
f600-f6003fff : pnp 00:01
  fdcf7000-fdcf7fff : :00:12.0
fdcf7000-fdcf7fff : ohci_hcd
  fdcf8000-fdcfbfff : :00:14.2
  fdcfc000-fdcfcfff : :00:13.0
fdcfc000-fdcfcfff : ohci_hcd
  fdcfd000-fdcfdfff : :00:14.5
fdcfd000-fdcfdfff : ohci_hcd
  fdcfe000-fdcfefff : :00:16.0
fdcfe000-fdcfefff : ohci_hcd
  fdcff000-fdcff3ff : :00:11.0
fdcff000-fdcff3ff : ahci
  fdcff400-fdcff4ff : :00:12.2
fdcff400-fdcff4ff : ehci_hcd
  fdcff800-fdcff8ff : :00:13.2
fdcff800-fdcff8ff : ehci_hcd
  fdcffc00-fdcffcff : :00:16.2
fdcffc00-fdcffcff : ehci_hcd
  fde0-fdef : PCI Bus :04
fdef8000-fdef8fff : :04:00.0
fdef9000-fdef9fff : :04:00.1
fdefa000-fdefafff : :04:00.2
fdefb000-fdefbfff : :04:00.3
fdefc000-fdefcfff : :04:00.4
fdefd000-fdefdfff : :04:00.5
fdefe000-fdefefff : :04:00.6
fdeff000-fdef : :04:00.7
  fdf0-fe1f : PCI Bus :05
fdfe-fdff : :05:00.0
fe00-fe1f : PCI Bus :06
  fe00-fe0f : PCI Bus :07
fe0e-fe0e : :07:00.0
fe0ff800-fe0f : :07:00.0
  fe0ff800-fe0f : ahci
  fe10-fe1f : PCI Bus :08
fe1fe000-fe1f : :08:00.0
  fe20-fe3f : PCI Bus :09
fe20-fe3f : :09:00.0
  fe40-fe4f : PCI Bus :0a
fe4f8000-fe4f8fff : :0a:00.0
fe4f9000-fe4f9fff : :0a:00.1
fe4fa000-fe4fafff : :0a:00.2
fe4fb000-fe4fbfff : :0a:00.3
fe4fc000-fe4fcfff : :0a:00.4
fe4fd000-fe4fdfff : :0a:00.5
fe4fe000-fe4fefff : :0a:00.6
fe4ff000-fe4f : :0a:00.7
  fe50-fe5f : PCI Bus :0b
fe5fe000-fe5f : :0b:00.0
  fe60-fe6f : PCI Bus :0c
fe6e-fe6f : :0c:00.0
  fe70-fe7f : PCI Bus :0d
fe7e-fe7f : :0d:00.0
  fe80-fe8f : PCI Bus :0e
fe8fe000-fe8f : :0e:00.0
  fe90-fe9f : PCI Bus :0f
fe9e-fe9e : :0f:00.0
fe9fc000-fe9f : :0f:00.1
  fe9fc000-fe9f : ICH HD audio
fec0-fec00fff : Reserved
  fec0-fec003ff : IOAPIC 0
fec1-fec1001f : pnp 00:06
fec2-fec20fff : Reserved
  fec2-fec203ff : IOAPIC 1
fed0-fed003ff : HPET 2
  fed0-fed003ff : PNP0103:0

Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 10:05, Sander Eikelenboom wrote:
> On 26/10/17 00:02, Craig Bergstrom wrote:
>> Thanks for the notification, my apologies for the breakage.  I'll take a
>> close look and see if I can figure out what went wrong.
>>
>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call
>> that fail on your affected system?
> 
> Hi Craig,
> 
> The output from /proc/iomem is simple to get and attached.
> The mmap call is probably issued by qemu and will require more digging.

Ahh grepping qemu gave a pointer, it's probably the code in:

http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40

around line 571, that would also explain why it's only this device that
has the problem, since it's the only one trying to use MSI(-X)
interrupts. Will see it i can add some logging to that function.

--
Sander


> 
> I don't know if there is that much time left for 4.14, since we are at
> RC6 already.
> 
> --
> Sander
> 
> 
>>
>>
>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >> wrote:
>>
>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote:
>>>> Greetings,
>>>>
>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> master
>>>>
>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
>>>> Author: Craig Bergstrom 
>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600
>>>> Commit: Ingo Molnar 
>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200
>>>>
>>>>  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
>>>
>>> Also note
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html
>>>
>>> -boris
>>>
>>
> 



ptp device strangeness

2017-09-01 Thread Tim Sander
Hi

I am currently using ptp on a Altera/Intel SOC with a dp8640 PHY.
PTP functionality seems to be right. But i am doing timestamping
with gpio0 and sometimes i loose the sync of the stamping and
the events. So i would like to read out all messages. Reading O_NONBLOCK
does not work so i tried polling from usermode with the below code:

np = poll(&ev, 1, 0);
ev.fd=ptpDev;
ev.events = POLLIN;
if (np>0) {
if (ev.revents>0) {
std::cout<<"discarded ptp event"<

4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

2017-05-22 Thread Sander Eikelenboom
Hi,

I encountered this splat with 4.12-RC2.
--

Sander

[  119.021594] BUG: scheduling while atomic: irq/47-iwlwifi/517/0x0200
[  119.021604] Modules linked in: xt_tcpudp ip6t_rpfilter ipt_REJECT 
nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc 
ip6table_raw ip6table_security ip6table_mangle iptable_raw iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_security iptable_mangle ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter ip_tables x_tables rfcomm bnep binfmt_misc arc4 
iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_core videodev intel_rapl cdc_mbim iwlmvm 
x86_pkg_temp_thermal intel_powerclamp mac80211 media cdc_wdm btusb coretemp 
cdc_ncm kvm_intel usbnet mii cdc_acm iwlwifi kvm btintel joydev pcspkr 
serio_raw cfg80211 snd_hda_codec_hdmi
[  119.021701]  bluetooth lpc_ich snd_hda_codec_realtek snd_hda_codec_generic 
shpchp sg ecdh_generic snd_hda_intel thinkpad_acpi snd_hda_codec snd_hwdep 
snd_hda_core snd_pcm snd_timer nvram snd soundcore evdev tpm_tis tpm_tis_core 
tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 crypto_simd 
cryptd glue_helper psmouse i2c_i801 sd_mod ehci_pci ehci_hcd e1000e rtsx_pci 
mfd_core ptp xhci_pci pps_core xhci_hcd
[  119.021759] CPU: 1 PID: 517 Comm: irq/47-iwlwifi Not tainted 
4.12.0-rc2-t440s-20170522+ #1
[  119.021763] Hardware name: LENOVO 20AQS03H00/20AQS03H00, BIOS GJET91WW (2.41 
) 09/21/2016
[  119.021766] Call Trace:
[  119.021778]  ? dump_stack+0x5c/0x84
[  119.021784]  ? __schedule_bug+0x4c/0x70
[  119.021792]  ? __schedule+0x496/0x5c0
[  119.021798]  ? schedule+0x2d/0x80
[  119.021804]  ? schedule_preempt_disabled+0x5/0x10
[  119.021810]  ? __mutex_lock.isra.0+0x18e/0x4c0
[  119.021817]  ? __wake_up+0x2f/0x50
[  119.021833]  ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211]
[  119.021844]  ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211]
[  119.021859]  ? iwl_mvm_rx_lmac_scan_iter_complete_notif+0x17/0x30 [iwlmvm]
[  119.021869]  ? iwl_pcie_rx_handle+0x2a9/0x7e0 [iwlwifi]
[  119.021878]  ? iwl_pcie_irq_handler+0x17c/0x730 [iwlwifi]
[  119.021884]  ? irq_forced_thread_fn+0x60/0x60
[  119.021887]  ? irq_thread_fn+0x16/0x40
[  119.021892]  ? irq_thread+0x109/0x180
[  119.021896]  ? wake_threads_waitq+0x30/0x30
[  119.021901]  ? kthread+0xf2/0x130
[  119.021905]  ? irq_thread_dtor+0x90/0x90
[  119.021910]  ? kthread_create_on_node+0x40/0x40
[  119.021915]  ? ret_from_fork+0x26/0x40


[PATCH] i2c-designware: add i2c gpio recovery option

2017-05-10 Thread Tim Sander
This patch contains much input from Phil Reid and has been tested
on Intel/Altera Cyclone V SOC Hardware with Altera GPIO's for the 
SCL and SDA GPIO's. I am still a little unsure about the recover
in the timeout case (i2c-designware-core.c:770) as i could not
test this codepath.

Signed-off-by: Tim Sander 
---
 drivers/i2c/busses/i2c-designware-core.c| 14 -
 drivers/i2c/busses/i2c-designware-core.h|  4 ++
 drivers/i2c/busses/i2c-designware-platdrv.c | 90 -
 3 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-core.c 
b/drivers/i2c/busses/i2c-designware-core.c
index 7a3faa551cf8..f955f29ff8e7 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -317,6 +317,7 @@ static void i2c_dw_release_lock(struct dw_i2c_dev *dev)
dev->release_lock(dev);
 }
 
+
 /**
  * i2c_dw_init() - initialize the designware i2c master hardware
  * @dev: device private data
@@ -463,7 +464,12 @@ static int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev)
while (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) {
if (timeout <= 0) {
dev_warn(dev->dev, "timeout waiting for bus ready\n");
-   return -ETIMEDOUT;
+   i2c_recover_bus(&dev->adapter);
+
+   if (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY)
+   return -ETIMEDOUT;
+   else
+   return 0;
}
timeout--;
usleep_range(1000, 1100);
@@ -719,9 +725,10 @@ static int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev)
for_each_set_bit(i, &abort_source, ARRAY_SIZE(abort_sources))
dev_err(dev->dev, "%s: %s\n", __func__, abort_sources[i]);
 
-   if (abort_source & DW_IC_TX_ARB_LOST)
+   if (abort_source & DW_IC_TX_ARB_LOST) {
+   i2c_recover_bus(&dev->adapter);
return -EAGAIN;
-   else if (abort_source & DW_IC_TX_ABRT_GCALL_READ)
+   } else if (abort_source & DW_IC_TX_ABRT_GCALL_READ)
return -EINVAL; /* wrong msgs[] data */
else
return -EIO;
@@ -766,6 +773,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg 
msgs[], int num)
if (!wait_for_completion_timeout(&dev->cmd_complete, adap->timeout)) {
dev_err(dev->dev, "controller timed out\n");
/* i2c_dw_init implicitly disables the adapter */
+   i2c_recover_bus(&dev->adapter);
i2c_dw_init(dev);
ret = -ETIMEDOUT;
goto done;
diff --git a/drivers/i2c/busses/i2c-designware-core.h 
b/drivers/i2c/busses/i2c-designware-core.h
index d9aaf1790e0e..cedc895a795d 100644
--- a/drivers/i2c/busses/i2c-designware-core.h
+++ b/drivers/i2c/busses/i2c-designware-core.h
@@ -23,6 +23,7 @@
  */
 
 #include 
+#include 
 
 #define DW_IC_DEFAULT_FUNCTIONALITY (I2C_FUNC_I2C |\
I2C_FUNC_SMBUS_BYTE |   \
@@ -126,6 +127,9 @@ struct dw_i2c_dev {
int (*acquire_lock)(struct dw_i2c_dev *dev);
void(*release_lock)(struct dw_i2c_dev *dev);
boolpm_runtime_disabled;
+   struct i2c_bus_recovery_info rinfo;
+   struct  gpio_desc   *gpio_sda;
+   struct  gpio_desc   *gpio_scl;
 };
 
 #define ACCESS_SWAP0x0001
diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c 
b/drivers/i2c/busses/i2c-designware-platdrv.c
index 79c4b4ea0539..b2d5adc8df2b 100644
--- a/drivers/i2c/busses/i2c-designware-platdrv.c
+++ b/drivers/i2c/busses/i2c-designware-platdrv.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "i2c-designware-core.h"
 
@@ -174,6 +175,88 @@ static void dw_i2c_set_fifo_size(struct dw_i2c_dev *dev, 
int id)
}
 }
 
+/*
+ * This routine does i2c bus recovery by using i2c_generic_gpio_recovery
+ * which is provided by I2C Bus recovery infrastructure.
+ */
+static void i2c_dw_prepare_recovery(struct i2c_adapter *adap)
+{
+   struct platform_device *pdev = to_platform_device(&adap->dev);
+   struct dw_i2c_dev *i_dev = platform_get_drvdata(pdev);
+
+   i2c_dw_disable(i_dev);
+   reset_control_assert(i_dev->rst);
+   i2c_dw_plat_prepare_clk(i_dev, false);
+}
+
+void i2c_dw_unprepare_recovery(struct i2c_adapter *adap)
+{
+   struct platform_device *pdev = to_platform_device(&adap->dev);
+   struct dw_i2c_dev *i_dev = platform_get_drvdata(pdev);
+
+   i2c_dw_plat_prepare_clk(i_dev, true);
+   reset_control_deassert(i_dev->rst);
+   i2c_dw_init(i_dev);
+}
+
+
+static int i2c_dw_get_scl(struct i2

DWC2 USB Host Mode Lockup 4.11

2017-05-10 Thread Tim Sander
Hi

I am currently seeing a error with the designware driver on
Intel/Altera ARM Cortex A9 Cyclone SOC V Hardware. The USB PHY is a
TUSB1210 without a hw reset line connected. The error only occurs on
plugging in of the device in host mode. Once the USB device is
enumerated i have not seen any errors. Ocassionally i get an error
that the USB Device is no longer enumerated. Even a reboot does not
help to recover to normal operation. This points IMHO to the PHY as
source of problem as all other components are getting a hw reset on
reboot. I have not worked with USB on a driver level so my knowledge
is a little thin. Nevertheless i tried to pin down the problem. I have
added the patch below to the 4.11 kernel. The observation is that when
the error has not been hit i see lots of "dwc2: STATUS EINPROGRESS"
messages. Which means the bug_on statement i added is not hit on
normal operation.

The usb hw-schematic looks like this:
https://rocketboards.org/foswiki/pub/Documentation/EBVSoCratesEvaluationBoard/SoCrates-Schematic.pdf

So my take is that for some reason the communication between PHY
and controller is broken in a way that either no request gets send to
the PHY or that the PHY is sending no reply.

Any idea how i can get this USB port back to normal operation?

Attached below is the patch which i added to produce the two output
dumps further below. The first output dump is the seldom error case,
the second is the success case.

Best regards
Tim

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index a73722e27d07..1c18104e432f 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -38,6 +38,8 @@
  * This file contains the core HCD code, and implements the Linux hc_driver
  * API
  */
+#define DEBUG
+
 #include 
 #include 
 #include 
@@ -4663,6 +4665,7 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, 
struct urb *urb,
dwc2_urb->flags = tflags;
dwc2_urb->interval = urb->interval;
dwc2_urb->status = -EINPROGRESS;
+   printk("dwc2: STATUS EINPROGRESS\n");
 
for (i = 0; i < urb->number_of_packets; ++i)
dwc2_hcd_urb_set_iso_desc_params(dwc2_urb, i,
@@ -4773,6 +4776,7 @@ static int _dwc2_hcd_urb_dequeue(struct usb_hcd *hcd, 
struct urb *urb,
 
dev_dbg(hsotg->dev, "Called usb_hcd_giveback_urb()\n");
dev_dbg(hsotg->dev, "  urb->status = %d\n", urb->status);
+   BUG_ON(urb->status <0);
 out:
spin_unlock_irqrestore(&hsotg->lock, flags);

Here is the output in the error case:
[   11.245681] usbcore: registered new interface driver usbfs
[   11.254272] usbcore: registered new interface driver hub
[   11.262479] usbcore: registered new device driver usb
[   11.346143] dwc2 ffb0.usb: mapped PA ffb0 to VA 9155
[   11.346236] dwc2 ffb0.usb: Looking up vusb_d-supply from device tree
[   11.346254] dwc2 ffb0.usb: Looking up vusb_d-supply property in node 
/soc/usb@ffb0 failed
[   11.346273] dwc2 ffb0.usb: ffb0.usb supply vusb_d not found, using 
dummy regulator
[   11.354882] dwc2 ffb0.usb: Looking up vusb_a-supply from device tree
[   11.354897] dwc2 ffb0.usb: Looking up vusb_a-supply property in node 
/soc/usb@ffb0 failed
[   11.354909] dwc2 ffb0.usb: ffb0.usb supply vusb_a not found, using 
dummy regulator
[   11.363660] dwc2 ffb0.usb: registering common handler for irq43
[   11.363848] dwc2 ffb0.usb: Forcing mode to host
[   11.363868] dwc2 ffb0.usb: Core Release: 2.93a (snpsid=4f54293a)
[   11.363882] dwc2 ffb0.usb: Forcing mode to host
[   11.363909] dwc2 ffb0.usb: DWC OTG HCD INIT
[   11.363921] dwc2 ffb0.usb: hcfg=0200
[   11.363950] dwc2 ffb0.usb: dwc2_core_init(8481e010)
[   11.363962] dwc2 ffb0.usb: HS ULPI PHY selected
[   11.363974] dwc2 ffb0.usb: Internal DMA Mode
[   11.363987] dwc2 ffb0.usb: host_dma:1 dma_desc_enable:1
[   11.363998] dwc2 ffb0.usb: Using Descriptor DMA mode
[   11.364010] dwc2 ffb0.usb: Host Mode
[   11.375756] dwc2 ffb0.usb: DWC OTG Controller
[   11.380596] dwc2 ffb0.usb: new USB bus registered, assigned bus number 1
[   11.387883] dwc2 ffb0.usb: irq 43, io mem 0xffb0
[   11.393368] dwc2 ffb0.usb: DWC OTG HCD START
[   11.393389] dwc2 ffb0.usb: dwc2_core_host_init(8481e010)
[   11.393403] dwc2 ffb0.usb: Initializing HCFG.FSLSPClkSel to 
[   11.393417] dwc2 ffb0.usb: initial grxfsiz=2000
[   11.393429] dwc2 ffb0.usb: new grxfsiz=0200
[   11.393441] dwc2 ffb0.usb: initial gnptxfsiz=20002000
[   11.393454] dwc2 ffb0.usb: new gnptxfsiz=02000200
[   11.393465] dwc2 ffb0.usb: initial hptxfsiz=20004000
[   11.393477] dwc2 ffb0.usb: new hptxfsiz=02000400
[   11.393495] dwc2 ffb0.usb: Init: Port Power? op_state=9
[   11.393502] dwc2 ffb0.usb: Init: Power Port (0)
[   11.393508] dwc2 ffb0.usb: dwc2_enable_host_interrupts()
[   11.393519] dwc2 ffb0.usb: DWC OTG HCD Has Root Hub
[   11.393979] usb usb1: Ne

Re: RFC: i2c designware gpio recovery

2017-05-03 Thread Tim Sander
Good Day Phil

Am Mittwoch, 3. Mai 2017, 09:30:50 CEST schrieb Phil Reid:
> G'day Tim,
> 
> On 1/05/2017 21:31, Tim Sander wrote:
> > Good Day Phil
> > 
> > Am Montag, 1. Mai 2017, 09:57:35 CEST schrieb Phil Reid:
> >>> So i took a look into the device tree file socfpga.dtsi and found that
> >>> the
> >>> reset lines where not defined (although available in the corresponding
> >>> reset manager). Is there a reason for this? Other components are
> >>> connected.
> >> 
> >> There's a few thing like that where the bootloader has been expected to
> >> setup the resets etc.
> > 
> > Yes, but if the resets are not connected in the device tree, the linux
> > drivers are not going to use them?
> 
> Yes, so they should be added. I don't think we should assume the bootloader
> sets things up. But that doesn't seem to have been the assumption with the
> Alter SOC's.
I will prepare a patch for this.

> >>> However with the patch below my previously sent patch works!
> >>> 
> >>> If there is interest in would cleanup the patch and send it in for
> >>> mainlining. I think the most unacceptable part would be this line:
> >>> +   ret = gpio_request_one(bri->scl_gpio, //GPIOF_OPEN_DRAIN |
> >>> My gpio drivers refuse to work as output as they have no open drain
> >>> mode.
> >>> So i wonder how to get this solved in a clean manner.
> >> 
> >> I thought the gpio system would emulate open drain by switching the pin
> >> between an input and output driven low in this case. How are you
> >> configuring the GPIO's in the FPGA?
> > 
> > I don't switch to GPIO mode. As the I2C logic is only pulling active low,
> > i only do a wired and with the gpio (implemented in the fpga) port output
> > on the output enable line for the SCL output.  SDA is only an additional
> > input for the second in fpga gpio port.
> > 
> > A picture should make it a clearer:
> > 
> > gpio scl--\
> > i2c   scl --&---i2c mode output pin (configured as fpga loan)
> > 
> > In my case the original i2c pins where occupied by some other logic
> > conflicting so the i2c pins had to be shifted to some other pins using
> > fpga logic. So it was just a matter of adding a two port gpio port.
> 
> I think I understand. What soft core gpio controller are you using?
I am using the standard altera fpga gpios.

> >> Given a couple of days I can test this on some flack i2c hardware I have
> >> with a Cyclone-V SOC. I'm interested in the functionality as well.
> > 
> > Sounds good. If you need some further input how i have configured the fpga
> > drop me a line.
> > 
> >> For i2c that are connected to the dedicated HPS pins it should be
> >> possible
> >> to reconfigure the pin mux controller (see system manager) in the HPS to
> >> avoid the need to go thru the fpga to get direct control. The docs say
> >> this
> >> is "unsupport" but I've done some test and it does seem to work.
> > 
> > As far as i know the internal jtag chain is only used in the bootloader
> > and there is no linux driver? But it shouldn't be a too big problem to
> > port it to linux.
> > 
> > What i am unsure about is the fact that the internal jtag chain which
> > controls the pinmuxing might wreak havoc on other pin states if you
> > reconfigure it?
> Have a look at the Cyclone V handbook "pin mux control Group REgister
> Descriptions" From what I can see the chain is used to configure IO
> standards and drive strength. But not the actual muxes
Mh, there is not much to see in Volume 3. Just one paragraph and then a 
very encouraging closing line:
"Do not modify the pin multiplexing selection registers after I/O 
configuration."

I find the following lines in my favorite bootloader a little more enlightening:
The following function:
https://git.pengutronix.de/cgit/barebox/tree/arch/arm/mach-socfpga/system-manager.c
get feed with data from e.g.:
https://git.pengutronix.de/cgit/barebox/tree/arch/arm/boards/terasic-de0-nano-soc/pinmux_config.c
which doesn't look like beeing very memory mapped?

> >> I'm guess
> >> the no support is in a similar vain to the emac ptp FPGA interface
> >> couldn't
> >> be used when the HPS pin where used. But that got changed when the user's
> >> proved otherwise. There's just no pin ctrl driver yet to manage it.
> > 
> > I am interested in this ptp solution too. Is there anything on the way to
> > mainline?
> This was working the last time I tried it. I submitted a couple of minor
> patches for it a while ago. My hardware has a DSA switch attached to the
> ethernet port and so far I haven't figured out how to enable ptp when using
> the virtual lan ports on the DSA. But it worked fine when directly
> connected to a phy.
Thanks, will take a look.

Best regards
Tim




Re: RFC: i2c designware gpio recovery

2017-05-01 Thread Tim Sander
Good Day Phil

Am Montag, 1. Mai 2017, 09:57:35 CEST schrieb Phil Reid:
> > So i took a look into the device tree file socfpga.dtsi and found that the
> > reset lines where not defined (although available in the corresponding
> > reset manager). Is there a reason for this? Other components are
> > connected.
> 
> There's a few thing like that where the bootloader has been expected to
> setup the resets etc.
Yes, but if the resets are not connected in the device tree, the linux drivers
are not going to use them?

> > However with the patch below my previously sent patch works!
> > 
> > If there is interest in would cleanup the patch and send it in for
> > mainlining. I think the most unacceptable part would be this line:
> > +   ret = gpio_request_one(bri->scl_gpio, //GPIOF_OPEN_DRAIN |
> > My gpio drivers refuse to work as output as they have no open drain mode.
> > So i wonder how to get this solved in a clean manner.
> 
> I thought the gpio system would emulate open drain by switching the pin
> between an input and output driven low in this case. How are you
> configuring the GPIO's in the FPGA?
I don't switch to GPIO mode. As the I2C logic is only pulling active low, i 
only do
a wired and with the gpio (implemented in the fpga) port output on the output
enable line for the SCL output.  SDA is only an additional input for the second 
in
fpga gpio port. 

A picture should make it a clearer:

gpio scl--\
i2c   scl --&---i2c mode output pin (configured as fpga loan)

In my case the original i2c pins where occupied by some other logic conflicting
so the i2c pins had to be shifted to some other pins using fpga logic. So it was
just a matter of adding a two port gpio port.

> Given a couple of days I can test this on some flack i2c hardware I have
> with a Cyclone-V SOC. I'm interested in the functionality as well.
Sounds good. If you need some further input how i have configured the fpga
drop me a line.

> For i2c that are connected to the dedicated HPS pins it should be possible
> to reconfigure the pin mux controller (see system manager) in the HPS to
> avoid the need to go thru the fpga to get direct control. The docs say this
> is "unsupport" but I've done some test and it does seem to work. 
As far as i know the internal jtag chain is only used in the bootloader and 
there 
is no linux driver? But it shouldn't be a too big problem to port it to linux.

What i am unsure about is the fact that the internal jtag chain which controls 
the
pinmuxing might wreak havoc on other pin states if you reconfigure it?

> I'm guess
> the no support is in a similar vain to the emac ptp FPGA interface couldn't
> be used when the HPS pin where used. But that got changed when the user's
> proved otherwise. There's just no pin ctrl driver yet to manage it.
I am interested in this ptp solution too. Is there anything on the way to 
mainline?

Best regards
Tim


Re: RFC: i2c designware gpio recovery

2017-04-28 Thread Tim Sander
Hi 

After sending this mail i just found out how i could reset the i2c-1 controller 
manually with
devmem 0xffd05014 32 0x2000
devmem 0xffd05014 32 0

So i took a look into the device tree file socfpga.dtsi and found that the 
reset lines
where not defined (although available in the corresponding reset manager). Is 
there a
reason for this? Other components are connected.

However with the patch below my previously sent patch works!

If there is interest in would cleanup the patch and send it in for mainlining.
I think the most unacceptable part would be this line:
+   ret = gpio_request_one(bri->scl_gpio, //GPIOF_OPEN_DRAIN |
My gpio drivers refuse to work as output as they have no open drain mode.
So i wonder how to get this solved in a clean manner.

Best regards
Tim
---
 arch/arm/boot/dts/socfpga.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 2c43c4d85dee..5f28632bc88c 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -643,6 +643,7 @@
#size-cells = <0>;
compatible = "snps,designware-i2c";
reg = <0xffc04000 0x1000>;
+   resets = <&rst I2C0_RESET>;
clocks = <&l4_sp_clk>;
interrupts = <0 158 0x4>;
status = "disabled";
@@ -653,6 +654,7 @@
#size-cells = <0>;
compatible = "snps,designware-i2c";
reg = <0xffc05000 0x1000>;
+   resets = <&rst I2C1_RESET>;
clocks = <&l4_sp_clk>;
interrupts = <0 159 0x4>;
status = "disabled";
@@ -663,6 +665,7 @@
#size-cells = <0>;
compatible = "snps,designware-i2c";
reg = <0xffc06000 0x1000>;
+   resets = <&rst I2C2_RESET>;
clocks = <&l4_sp_clk>;
interrupts = <0 160 0x4>;
status = "disabled";
@@ -673,6 +676,7 @@
#size-cells = <0>;
compatible = "snps,designware-i2c";
reg = <0xffc07000 0x1000>;
+   resets = <&rst I2C3_RESET>;
clocks = <&l4_sp_clk>;
interrupts = <0 161 0x4>;
status = "disabled";
-- 
2.7.4



RFC: i2c designware gpio recovery

2017-04-28 Thread Tim Sander
Hi

I have tried to add a gpio recovery gpio controller to the designware i2c 
driver. The attempt is
attached below. I have a Intel(Altera) Cyclone V SOC Platform attached to a 
buggy power
supply which gives a lockup on the i2c controller as a external device gives to 
much noise
on the signal and destroys a clock signal on its way to a i2c device.
I don't care to much about this buggy power supply but as the cable to one 
i2c-slave is
rather long i fear that power surge conformance tests might give also some 
problems.
So i would like to be safe than sorry and recover from this problem.

I have created two gpio ports in fpga and have routed the designware pins 
through the fpga.
I can now read SDA input status and control SCL via these gpios. The recovery 
gets triggered
and after that i get lots of:
i2c_designware ffc05000.i2c: controller timed out
so i guess that my i2c_dw_unprepare_recovery does not enought to get the 
controller back.

I have also noticed that there does not seem do be a reset controller in the 
standard configuration.
so reset_control_(de)assert(i_dev->rst) seems to do nothing.

I have verified that the recovery of the bus works and if i do a warm reboot 
the i2c-bus is working
again. Which it doesn't without recovery. So i am pretty sure that the recovery 
works as far as the
i2c-slave is not pulling down SDA and that my gpio pins are in the correct 
state that they would not
interfere with the i2c-operation of the controller.

Any ideas what i can do to get the controller back up running with some special 
treatment in
i2c_dw_(un)prepare_recovery without having to resort to a warm reboot?

Best regards
Tim
---
 drivers/i2c/busses/i2c-designware-core.c| 15 ++--
 drivers/i2c/busses/i2c-designware-core.h|  1 +
 drivers/i2c/busses/i2c-designware-platdrv.c | 60 -
 drivers/i2c/i2c-core.c  | 10 -
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-core.c 
b/drivers/i2c/busses/i2c-designware-core.c
index 7a3faa551cf8..b98fab40ce9a 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -317,6 +317,7 @@ static void i2c_dw_release_lock(struct dw_i2c_dev *dev)
dev->release_lock(dev);
 }
 
+
 /**
  * i2c_dw_init() - initialize the designware i2c master hardware
  * @dev: device private data
@@ -463,7 +464,11 @@ static int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev)
while (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) {
if (timeout <= 0) {
dev_warn(dev->dev, "timeout waiting for bus ready\n");
-   return -ETIMEDOUT;
+   i2c_recover_bus(&dev->adapter);
+
+   if (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY)
+   return -EIO;
+   else return 0;
}
timeout--;
usleep_range(1000, 1100);
@@ -719,9 +724,10 @@ static int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev)
for_each_set_bit(i, &abort_source, ARRAY_SIZE(abort_sources))
dev_err(dev->dev, "%s: %s\n", __func__, abort_sources[i]);
 
-   if (abort_source & DW_IC_TX_ARB_LOST)
+   if (abort_source & DW_IC_TX_ARB_LOST) {
+   i2c_recover_bus(&dev->adapter);
return -EAGAIN;
-   else if (abort_source & DW_IC_TX_ABRT_GCALL_READ)
+   } else if (abort_source & DW_IC_TX_ABRT_GCALL_READ)
return -EINVAL; /* wrong msgs[] data */
else
return -EIO;
@@ -766,6 +772,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg 
msgs[], int num)
if (!wait_for_completion_timeout(&dev->cmd_complete, adap->timeout)) {
dev_err(dev->dev, "controller timed out\n");
/* i2c_dw_init implicitly disables the adapter */
+   //i2c_recover_bus(&dev->adapter); 
i2c_dw_init(dev);
ret = -ETIMEDOUT;
goto done;
@@ -825,7 +832,7 @@ static const struct i2c_algorithm i2c_dw_algo = {
.functionality  = i2c_dw_func,
 };
 
-static u32 i2c_dw_read_clear_intrbits(struct dw_i2c_dev *dev)
+u32 i2c_dw_read_clear_intrbits(struct dw_i2c_dev *dev)
 {
u32 stat;
 
diff --git a/drivers/i2c/busses/i2c-designware-core.h 
b/drivers/i2c/busses/i2c-designware-core.h
index d9aaf1790e0e..8bdf51e19f21 100644
--- a/drivers/i2c/busses/i2c-designware-core.h
+++ b/drivers/i2c/busses/i2c-designware-core.h
@@ -126,6 +126,7 @@ struct dw_i2c_dev {
int (*acquire_lock)(struct dw_i2c_dev *dev);
void(*release_lock)(struct dw_i2c_dev *dev);
boolpm_runtime_disabled;
+   struct  i2c_bus_recovery_info rinfo;
 };
 
 #define ACCESS_SWAP0x0001
diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c 
b/dri

4.11-rc6 and OF_DYNAMIC

2017-04-12 Thread Tim Sander
Hi

I have been testing the 4.11-rc6 kernel on Intel(ex Altera) Arm SOC Cyclone
with dynamic Firmware loading. As i didn't know how to trigger dynamic
loading from userspace i also applied the following patch:

OF: DT-Overlay configfs interface
https://github.com/raspberrypi/linux/commit/8f1079750ce2fce4c4c2b4f8759ea57c8fb167d3

Now i got the loading of the devicetree overlay working :-)... but only if i 
enable
OF_UNITTEST. I find this a little strange especially as it slows boot time.
I also tried adding:
select OF_DYNAMIC
to my OF_CONFIGFS patch but this didn't help. Also patching away "if 
OF_UNITTEST" 
 config OF_DYNAMIC
-   bool "Support for dynamic device trees" if OF_UNITTEST
+   bool "Support for dynamic device trees"
didn't help?

Besides these nitpicks i am really happy to see this in mainline as it really 
makes working with 
dynamic fpga based hardware much cleaner!

Best regards
Tim


Re: [PATCH] xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier

2016-10-03 Thread Sander Eikelenboom

On 2016-10-03 00:45, Boris Ostrovsky wrote:

xen_cpuhp_setup() calls mutex_lock() which, when CONFIG_DEBUG_MUTEXES
is defined, ends up calling xen_save_fl(). That routine expects
per_cpu(xen_vcpu, 0) to be already initialized.

Signed-off-by: Boris Ostrovsky 
Reported-by: Sander Eikelenboom 
---
Sander, please see if this fixes the problem. Thanks.


Hi Boris,

I have tested it and it fixes the dom0 crash in early boot for me.
Thanks again for investigating and the swift fix !

--
Sander



 arch/x86/xen/enlighten.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 366b6ae..96c2dea 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1644,7 +1644,6 @@ asmlinkage __visible void __init 
xen_start_kernel(void)

xen_initial_gdt = &per_cpu(gdt_page, 0);

xen_smp_init();
-   WARN_ON(xen_cpuhp_setup());

 #ifdef CONFIG_ACPI_NUMA
/*
@@ -1658,6 +1657,8 @@ asmlinkage __visible void __init 
xen_start_kernel(void)

   possible map and a non-dummy shared_info. */
per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];

+   WARN_ON(xen_cpuhp_setup());
+
local_irq_disable();
early_boot_irqs_disabled = true;


Re: [Intel-gfx] Linux 4.8-rc?: WARNING: at drivers/gpu/drm/i915/intel_pm.c:7866 sandybridge_pcode_write Missing switch case (16) in gen6_check_mailbox_status

2016-09-07 Thread Sander Eikelenboom

On 2016-09-07 16:49, Jani Nikula wrote:

On Tue, 06 Sep 2016, li...@eikelenboom.it wrote:

On 2016-09-06 11:25, Jani Nikula wrote:

On Tue, 06 Sep 2016, li...@eikelenboom.it wrote:

L.S.,

Since one of the last 4.8 RC's i'm getting the warning below when
booting on my sandybridge based thinkpad.
 From what it seems the machine still works fine though.


What does 'lspci -nns 2' say for you?


00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd
Generation Core Processor Family Integrated Graphics Controller
[8086:0126] (rev 09)


Fixed in drm-intel-fixes by

commit fc2780b66b15092ac68272644a522c1624c48547
Author: Chris Wilson 
Date:   Fri Aug 26 11:59:26 2016 +0100

drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to 
SNB


BR,
Jani.


Works-for-me, thx!

--
Sander


Re: [Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999

2016-08-12 Thread Sander Eikelenboom

Friday, August 12, 2016, 7:29:37 PM, you wrote:

> Hi,

> On 12/08/2016 at 19:23:36 +0200, Sander Eikelenboom wrote :
>> L.S.,
>> 
>> I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV 
>> guests and dom0 are uneffected). The clock is always set to 31/12/1999 on 
>> boot 
>> of the guest, instead of the system clock time.
>> 
>> Bisecting seems to point out commit:
>> 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants 
>> of asm/rtc.h
>> 

> Isn't that solved by http://patchwork.ozlabs.org/patch/657465/ ?


Ah yes that solves it (i only looked in your git-tree to see if there was a 
patch already), sorry for the noise !

--

Sander



[Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999

2016-08-12 Thread Sander Eikelenboom
L.S.,

I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV 
guests and dom0 are uneffected). The clock is always set to 31/12/1999 on boot 
of the guest, instead of the system clock time.

Bisecting seems to point out commit:
463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants of 
asm/rtc.h

--
Sander



Re: [PATCH v2] dts: add specific compatible type for Terasic DE0-NANO-SoC Board

2016-02-25 Thread Tim Sander
Hi Dinh

On Thursday 25 February 2016 10:56:28 Dinh Nguyen wrote:
> On 02/25/2016 04:38 AM, Steffen Trumtrar wrote:
> > Hi Tim!
> > 
> > On Thu, Feb 25, 2016 at 11:05:05AM +0100, Tim Sander wrote:
> >> From: Tim Sander 
> >> 
> >> Add a more specific compatible string:"terasic,de0-nano-soc" for
> >> respective board. Background: when checking for bootspec entries, some
> >> board specific fixups are not apropriate for board of the same platform
> >> ("altr,socfpga-cyclone5"). The same aproach is taken with the
> >> EBV-Socrates board.
> >> 
> >> Signed-off-by: Tim Sander 
> >> ---
> >> 
> >>  Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
> >>  arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts | 2 +-
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt
> >> b/Documentation/devicetree/bindings/vendor-prefixes.txt index
> >> 72e2c5a..d1f7803 100644
> >> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt
> >> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
> >> @@ -230,6 +230,7 @@ synology   Synology, Inc.
> >> 
> >>  tbs   TBS Technologies
> >>  tcl   Toby Churchill Ltd.
> >>  technologic   Technologic Systems
> >> 
> >> +terasic   Terasic Inc.
> >> 
> >>  thine THine Electronics, Inc.
> >>  tiTexas Instruments
> >>  tlm   Trusted Logic Mobility
> > 
> > You should IMHO split this up in two patches.
> > First patch: add terasic
> 
> That's right. That patch will go through the DTS maintainer's tree.
Ah well for such a simple patch it turns out more complicated than thought :-)
Will do as soon as there is agreement on a name which does not seem that 
easy...
> 
> >> diff --git a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
> >> b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts index
> >> afea364..704aa9d 100644
> >> --- a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
> >> +++ b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
> >> @@ -18,7 +18,7 @@
> >> 
> >>  / {
> >>  
> >>model = "Terasic DE-0(Atlas)";
> >> 
> >> -  compatible = "altr,socfpga-cyclone5", "altr,socfpga";
> >> +  compatible = "terasic,de0-nano-soc","altr,socfpga-cyclone5",
> >> "altr,socfpga";
> So perhaps, "terasic,de0-sockit"?
> 
> > Second patch: this.
> 
> And I can take this one.
> 
> >>chosen {
> >>
> >>bootargs = "earlyprintk";
> > 
> > The naming of this board still confuses me though.
> > 
> > It has 3 different names now:
> > - de0_sockit.dts
> > - Terasic DE-0(Atlas)
> > - de0-nano-soc
> > 
> > And according to Terasic DE0-Nano-SoC is the same as Atlas-SoC with a
> > different software?! So all three names are actually correct ?! Weird.
> 
> I had a hard time understanding this myself. But from what I gather
> from[1], I just name the file de0_sockit.
As far as i remember there are different de0 and different sockit boards, so 
the name does not seem to be as concise? I don't care but i would say that
de0-nano-soc is the most concise and easier to search for than atlas which
might turn up more false postives?

But as long as there is a more selective name than cyclone5 everthing is fine
with me.

Best regards
Tim



[PATCH v2] dts: add specific compatible type for Terasic DE0-NANO-SoC Board

2016-02-25 Thread Tim Sander
From: Tim Sander 

Add a more specific compatible string:"terasic,de0-nano-soc" for respective 
board.
Background: when checking for bootspec entries, some board specific fixups are
not apropriate for board of the same platform ("altr,socfpga-cyclone5").
The same aproach is taken with the EBV-Socrates board.

Signed-off-by: Tim Sander 
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 72e2c5a..d1f7803 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -230,6 +230,7 @@ synologySynology, Inc.
 tbsTBS Technologies
 tclToby Churchill Ltd.
 technologicTechnologic Systems
+terasicTerasic Inc.
 thine  THine Electronics, Inc.
 ti Texas Instruments
 tlmTrusted Logic Mobility
diff --git a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts 
b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
index afea364..704aa9d 100644
--- a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
+++ b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
@@ -18,7 +18,7 @@
 
 / {
model = "Terasic DE-0(Atlas)";
-   compatible = "altr,socfpga-cyclone5", "altr,socfpga";
+   compatible = "terasic,de0-nano-soc","altr,socfpga-cyclone5", 
"altr,socfpga";
 
chosen {
bootargs = "earlyprintk";
-- 
1.9.1


Bisect results for 4.4.1-rt[4,5]

2016-02-17 Thread Tim Sander
Hi Sebastian

Am Freitag, 12. Februar 2016, 10:07:59 schrieben Sie:
...
> What about rt4? It is only the stable update so you should see here the
> numbers from rt3. If that is true and your numbers are stable it should
> be easy to run git bisect between rt4 and rt5. And looking at
>   https://git.kernel.org/rt/linux-rt-devel/h/v4.4.1-rt5
> the only non-cosmetic change in -rt5 that should affect you is the
> migrate-disable fixup from Mike.
I have done a bisect run, its a rather innocent looking on liner which seems 
to cause the problems. The numbers where reasonably stable so i am pretty 
confident that this is the patch giving ~26µs additional latency on the Altera
SOC plattform:

eec2bf477ac674583a7d73b9d00f47c528b7266d is the first bad commit
commit eec2bf477ac674583a7d73b9d00f47c528b7266d
Author: Sebastian Andrzej Siewior 
Date:   Thu Feb 4 16:38:10 2016 +0100

kernel/perf: mark perf_cpu_context's timer as irqsafe

Otherwise we get a WARN_ON() backtrace and some events are reported as
"not counted".

Cc: stable...@vger.kernel.org
Reported-by: Yang Shi 
Signed-off-by: Sebastian Andrzej Siewior 

Here are the numbers of the bisect run for reference:
==> g0dd3bdd <==
# Total: 1 09829
# Min Latencies: 00010 00010
# Avg Latencies: 00020 00021
# Max Latencies: 00084 00101
# Histogram Overflows: 0 0

==> gbbc7819 <==
# Total: 1 09798
# Min Latencies: 00010 00010
# Avg Latencies: 00021 00021
# Max Latencies: 00086 00091
# Histogram Overflows: 0 0

==> geec2bf4 <==
# Total: 08713 1
# Min Latencies: 00010 00010
# Avg Latencies: 00020 00021
# Max Latencies: 00113 00070
# Histogram Overflows: 0 0

Best Regards
Tim


Re: [ANNOUNCE] 4.1.5-rt5 meant to reply to 4.4.1-rt5

2016-02-12 Thread Tim Sander
Hi Sebastian

As you got correctly i was talking about 4.4.1-rt5 and not 4.1 i replied to by 
accident.

Am Freitag, 12. Februar 2016, 10:07:59 schrieb Sebastian Andrzej Siewior:
> On 02/12/2016 09:28 AM, Tim Sander wrote:
> > Hi Sebastian
> 
> Hi Tim,
> 
> > Am Sonntag, 16. August 2015, 15:56:30 schrieb Sebastian Andrzej Siewior:
> >> I'm pleased to announce the v4.1.5-rt5 patch set.
> > 
> > I have just tested it with a Altera SoC ARM v7. The latencies seem to have
> > gotten a little bit worse with each release. The first core has always
> > been
> > worse (presumably due to interrupt load) but now it dropped to 111µs (rt5)
> > from 76µs(rt3) and 54µs(rt2).
> 
> in -rt2 we had bug in migrate disable code which means each task was
> running on CPU0. This got partly fixed in -rt3. In -rt3 the scheduler
> could assign a task to CPU1 but the task should stay there for ever.
> This little detail was fixed in -rt5.
> This is one thing that comes to mind.
> Lazy-preempt should have been fixed in -rt3, too. This should not give
> you higher latencies but higher throughput.
> 
> What about rt4? It is only the stable update so you should see here the
> numbers from rt3. If that is true and your numbers are stable it should
> be easy to run git bisect between rt4 and rt5. And looking at
>   https://git.kernel.org/rt/linux-rt-devel/h/v4.4.1-rt5
> the only non-cosmetic change in -rt5 that should affect you is the
> migrate-disable fixup from Mike.
Ok, each run takes a couple of hours so bisecting should take quite some time
but i will give it a try. I started a test with 4.4.1-rt4, if the numbers are 
within the 70µs ballpark bisecting seems the way to go. If the numbers are 
higher i suspect that stable update might have a play here. But we will see.

Best regards
Tim




[PATCH] dts: add specific compatible type for Terasic DE0-NANO-SoC Board

2016-02-12 Thread Tim Sander
From: Tim Sander 

Add a more specific compatible string:"terasic,de0-nano-soc" for respective 
board.
Background: when checking for bootspec entries, some board specific fixups
 are not apropriate for board of the same platform 
("altr,socfpga-cyclone5").
The same aproach is taken with the EBV-Socrates board.

Signed-off-by: Tim Sander 
---
 arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts 
b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
index 555e9caf21e1..3a427423168e 100644
--- a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
+++ b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts
@@ -18,7 +18,7 @@
 
 / {
model = "Terasic DE-0(Atlas)";
-   compatible = "altr,socfpga-cyclone5", "altr,socfpga";
+   compatible = "terasic,de0-nano-soc"," altr,socfpga-cyclone5", 
"altr,socfpga";
 
chosen {
bootargs = "earlyprintk";
-- 
1.9.1


Re: [ANNOUNCE] 4.1.5-rt5

2016-02-12 Thread Tim Sander
  00
000310 00   00
000311 00   00
000312 00   00
000313 00   00
000314 00   00
000315 00   00
000316 00   00
000317 00   00
000318 00   00
000319 00   00
000320 00   00
000321 00   00
000322 00   00
000323 00   00
000324 00   00
000325 00   00
000326 00   00
000327 00   00
000328 00   00
000329 00   00
000330 00   00
000331 00   00
000332 00   00
000333 00   00
000334 00   00
000335 00   00
000336 00   00
000337 00   00
000338 00   00
000339 00   00
000340 00   00
000341 00   00
000342 00   00
000343 00   00
000344 00   00
000345 00   00
000346 00   00
000347 00   00
000348 00   00
000349 00   00
000350 00   00
000351 00   00
000352 00   00
000353 00   00
000354 00   00
000355 00   00
000356 00   00
000357 00   00
000358 00   00
000359 00   00
000360 00   00
000361 00   00
000362 00   00
000363 00   00
000364 00   00
000365 00   00
000366 00   00
000367 00   00
000368 00   00
000369 00   00
000370 00   00
000371 00   00
000372 00   00
000373 00   00
000374 00   00
000375 00   00
000376 00   00
000377 00   00
000378 00   00
000379 00   00
000380 00   00
000381 00   00
000382 00   00
000383 00   00
000384 00   00
000385 00   00
000386 00   00
000387 00   00
000388 00   00
000389 00   00
000390 00   00
000391 00   00
000392 00   00
000393 00   00
000394 00   00
000395 00   00
000396 00   00
000397 00   00
000398 00   00
000399 00   00
# Total: 10 99647
# Min Latencies: 9 9
# Avg Latencies: 00017 00010
# Max Latencies: 00054 00037
# Histogram Overflows: 0 0
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:
sander@dabox:~/work/cp52-firmware$ cat 4.4-rt2_latency_1000_hackbench.txt 
#cyclictest -l10 -m -Sp99 -i200 -h400 -q
# /dev/cpu_dma_latency set to 0us
# Histogram
00 00   00
01 00   00
02 00   00
03 00   00
04 00   00
05 00   00
06 00   00
07 00   00
08 00   00
09 05   007986
10 231454   534988149
11 272449   365823273
12 187411   19724532
13 887425   7283747
14 5033801  5758908
15 29213484 10969358
16 13701430016483272
17 27971830311341821
18 2668805387889248
19 1555086375564401
20 63667078 3308210
21 25975207 2639064
22 14948747 2684771
23 9837945  2829949
24 5699296  1889347
25 2868412  636802
26 1257029  137374
27 477219   028261
28 177551   006650
29 075267   002358
30 036509   001261
31 018444   000578
32 008159   000228
33 003103   64
34 001208   22
35 000530   11
36 000259   01
37 000129   01
38 48   00
39 28   00
40 18   00
41 03   00
42 00   00
43 00   00
44 00   00
45 01   00
46 00   00
47 01   00
48 00   00
49 00   00
50 01   00
51 00   00
52 00   00
53 00   00
54 01   00
55 00   00
56 00   00
57 00   00
58 00   00
59 00   00
60 00   00
61 00   00
62 00   00
63 00   00
64 00   00
65 00   00
66 00   00
67 00   00
68 00   00
69 00   00
70 00   00
71 00   00
72 00   00
73 00   00
74 00   00
75 00   00
76 00   00
77 00   00
78 00   00
79 00   00
80 00   00
81 00   00
82 00   00
83 00   00
84 00   00
85 00   00
86 00   00
87 00   00
88 00   00
89 00   00
90 00   00
91 00   00
92 00   00
93 00   00
94 00   00
95 00   00
96 00   00
97 00   00
98 00   00
99 00   00
000100 00   00
000101 00   00
000102 00   00
000103 00   00
000104 00   00
000105 00   00
000106 00   00
000107 00   0

Re: [ANNOUNCE] 4.4-rc6-rt1

2016-01-07 Thread Tim Sander
Hi Sebastian

Thanks for your christmas present :-).

Am Mittwoch, 23. Dezember 2015, 23:57:55 schrieb Sebastian Andrzej Siewior:
> Please don't continue reading before christmas eve (or morning,
> depending on your schedule). If you don't celebrate christmas,
> well go ahead.
Ok, i have to admit i am a little late to the party.
> Dear RT folks!
> 
> I'm pleased to announce the v4.4-rc6-rt1 patch set. I tested it on my
> AMD A10, 64bit. Nothing exploded so far, filesystem is still there.
> I haven't tested it on anything else. Before someone asks: this does not
> mean it does *not* work on ARM I simply did not try it.
With the trivial compile patch below it is working on ARM:
Specifically two Cortex A9 on a CycloneV from Altera.

The performance without load looks good:
# Total: 1 1
# Min Latencies: 9 9
# Avg Latencies: 00010 00010
# Max Latencies: 00022 00033

A short run with hackbench load reveals an latency "island" from 54-69µs on the 
first core.
There are no timer ticks with 34 to 53 µs delay.
# Total: 00100 000999714
# Min Latencies: 00010 9
# Avg Latencies: 00017 00010
# Max Latencies: 00069 00029

I will test further and report if i find strange occurences.

> If you are brave then download it, install it and have fun. If something
> breaks, please report it. If your machine starts blinking like a
> christmas tree while using the patch then *please* send a photo.
Sorry no photos, no special blinking.
Best regards
Tim

Signed-off-by: Tim Sander 

--- linux-4.4-rc6/kernel/time/hrtimer.c.orig2016-01-06 16:56:32.573527206 
+0100   


 
+++ linux-4.4-rc6/kernel/time/hrtimer.c 2016-01-06 16:56:48.213215320 +0100 



   
@@ -1435,6 +1435,7 @@   



   




   
 #endif 



   




   
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer); 



   




   
 static void __

[PATCH] PCI: Add quirk for Lite-On IT Corp. / Plextor M6e PCI Express

2016-01-04 Thread Tim Sander
Hi

Please consider this patch for the next release. It won't recognize my Plextor
M6e PCIE disk without it. Please cc as i am not on the list.

Signed-off-by: Tim Sander 

PCI: Add quirk for Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 
88SS9183] (rev 14)
---
 drivers/pci/quirks.c| 4 
 include/linux/pci_ids.h | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 7e32730..93ec5a02 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3620,6 +3620,10 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_JMICRON,
 PCI_DEVICE_ID_JMICRON_JMB388_ESD,
 quirk_dma_func1_alias);
+/* https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LITE_ON,
+PCI_DEVICE_ID_PLEXTOR_M6E,
+quirk_dma_func1_alias);
 
 /*
  * Some devices DMA with the wrong devfn, not just the wrong function.
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index d9ba49c..01d8041 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2501,6 +2501,9 @@
 
 #define PCI_VENDOR_ID_ASMEDIA  0x1b21
 
+#define PCI_VENDOR_ID_LITE_ON  0x1c28
+#define PCI_DEVICE_ID_PLEXTOR_M6E  0x0122
+
 #define PCI_VENDOR_ID_CIRCUITCO0x1cc8
 #define PCI_SUBSYSTEM_ID_CIRCUITCO_MINNOWBOARD 0x0001
 
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nf_unregister_net_hook: hook not found!

2015-12-30 Thread Sander Eikelenboom

On 2015-12-30 03:39, ebied...@xmission.com wrote:

Pablo Neira Ayuso  writes:


On Mon, Dec 28, 2015 at 09:05:03PM +0100, Sander Eikelenboom wrote:

Hi,

Running a 4.4.0-rc6 kernel i encountered the warning below.


Cc'ing Eric Biederman.

@Sander, could you provide a way to reproduce this?


I am on vacation until the new year, but if this is reproducible we
should be able to print out reg, reg->pf, reg->hooknum, reg->hook
to figure out which hook is having something very weird happen to it.

This is happening in some network namespace exit.

Eric



Unfortunately i have found no way to reproduce,
13 seconds implies it was at boot, but i only have seen this once.

--
Sander


Thanks.


[   13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team
[   13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.328141] systemd-logind[2485]: Failed to start user service: 
Unknown

unit: user@117.service
[   14.356634] systemd-logind[2485]: New session c1 of user lightdm.
[   14.357320] [ cut here ]
[   14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143
netfilter_net_exit+0x25/0x50()
[   14.357328] nf_unregister_net_hook: hook not found!
[   14.357371] Modules linked in: iptable_security(+) iptable_raw
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
uvcvideo

videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal 
intel_powerclamp
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl 
kvm_intel
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel 
media kvm
snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel
pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core 
rfkill
i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore 
shpchp
tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul 
crc32c_intel

aesni_intel
[   14.357380]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e
lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common

pps_core
[   14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U
4.4.0-rc6-x220-20151224+ #1
[   14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW 
(1.39 )

07/18/2013
[   14.357390] Workqueue: netns cleanup_net
[   14.357393]  81a27dfd 81359c69 88030e7cbd40
81060297
[   14.357395]  88030e820d80 88030e7cbd90 81c962d8
81c962e0
[   14.357397]  88030e7cbdf8 81060317 81a2c010
88030018
[   14.357398] Call Trace:
[   14.357405]  [] ? dump_stack+0x40/0x57
[   14.357408]  [] ? warn_slowpath_common+0x77/0xb0
[   14.357410]  [] ? warn_slowpath_fmt+0x47/0x50
[   14.357416]  [] ? mutex_lock+0x9/0x30
[   14.357418]  [] ? netfilter_net_exit+0x25/0x50
[   14.357421]  [] ? ops_exit_list.isra.6+0x2e/0x60
[   14.357424]  [] ? cleanup_net+0x1ab/0x280
[   14.357427]  [] ? process_one_work+0x133/0x330
[   14.357429]  [] ? worker_thread+0x60/0x470
[   14.357430]  [] ? process_one_work+0x330/0x330
[   14.357434]  [] ? kthread+0xca/0xe0
[   14.357436]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357439]  [] ? ret_from_fork+0x3f/0x70
[   14.357441]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357443] ---[ end trace 9984cc4b0e89f818 ]---
[   14.357443] [ cut here ]
[   14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143
netfilter_net_exit+0x25/0x50()
[   14.357446] nf_unregister_net_hook: hook not found!
[   14.357472] Modules linked in: iptable_security(+) iptable_raw
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
uvcvideo

videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal 
intel_powerclamp
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl 
kvm_intel
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel 
media kvm
snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel
pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core 
rfkill
i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore 
shpchp
tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul 
crc32c_intel

aesni_intel
[   14.357478]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e
lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common

pps_core
[   14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U  
W


nf_unregister_net_hook: hook not found!

2015-12-28 Thread Sander Eikelenboom

Hi,

Running a 4.4.0-rc6 kernel i encountered the warning below.

--
Sander



[   13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team
[   13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.328141] systemd-logind[2485]: Failed to start user service: 
Unknown unit: user@117.service

[   14.356634] systemd-logind[2485]: New session c1 of user lightdm.
[   14.357320] [ cut here ]
[   14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 
netfilter_net_exit+0x25/0x50()

[   14.357328] nf_unregister_net_hook: hook not found!
[   14.357371] Modules linked in: iptable_security(+) iptable_raw 
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo 
videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support 
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp 
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel 
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media 
kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep 
snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev 
snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul 
crc32_pclmul crc32c_intel aesni_intel
[   14.357380]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common pps_core
[   14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U  
4.4.0-rc6-x220-20151224+ #1
[   14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW 
(1.39 ) 07/18/2013

[   14.357390] Workqueue: netns cleanup_net
[   14.357393]  81a27dfd 81359c69 88030e7cbd40 
81060297
[   14.357395]  88030e820d80 88030e7cbd90 81c962d8 
81c962e0
[   14.357397]  88030e7cbdf8 81060317 81a2c010 
88030018

[   14.357398] Call Trace:
[   14.357405]  [] ? dump_stack+0x40/0x57
[   14.357408]  [] ? warn_slowpath_common+0x77/0xb0
[   14.357410]  [] ? warn_slowpath_fmt+0x47/0x50
[   14.357416]  [] ? mutex_lock+0x9/0x30
[   14.357418]  [] ? netfilter_net_exit+0x25/0x50
[   14.357421]  [] ? ops_exit_list.isra.6+0x2e/0x60
[   14.357424]  [] ? cleanup_net+0x1ab/0x280
[   14.357427]  [] ? process_one_work+0x133/0x330
[   14.357429]  [] ? worker_thread+0x60/0x470
[   14.357430]  [] ? process_one_work+0x330/0x330
[   14.357434]  [] ? kthread+0xca/0xe0
[   14.357436]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357439]  [] ? ret_from_fork+0x3f/0x70
[   14.357441]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357443] ---[ end trace 9984cc4b0e89f818 ]---
[   14.357443] [ cut here ]
[   14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 
netfilter_net_exit+0x25/0x50()

[   14.357446] nf_unregister_net_hook: hook not found!
[   14.357472] Modules linked in: iptable_security(+) iptable_raw 
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo 
videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support 
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp 
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel 
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media 
kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep 
snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev 
snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul 
crc32_pclmul crc32c_intel aesni_intel
[   14.357478]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common pps_core
[   14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U  W   
4.4.0-rc6-x220-20151224+ #1
[   14.357481] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW 
(1.39 ) 07/18/2013

[   14.357484] Workqueue: netns cleanup_net
[   14.357486]  81a27dfd 81359c69 88030e7cbd40 
81060297
[   14.357488]  88030e820db8 88030e7cbd90 81c962d8 
81c962e0
[   14.357489]  88030e7cbdf8 81060317 81a2c010 
88030018

[   14.357490] Call Trace:
[   14.357493]  [] ? dump_stack+0x40/0x57
[   14.357495]  [] ? warn_slowpath_common+0x77/0xb0
[   14.357497]  [] ? warn_slowpath_fmt+0x47/0x50
[   14.357499

Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu

2015-12-14 Thread Sander Eikelenboom

On 2015-12-14 20:48, Eric Shelton wrote:

Please note that the same issue appears to have been introduced in the
recent 4.2.7 kernel.  It perhaps has to do
with b4ff8389ed14b849354b59ce9b360bdefcdbf99c having a matching
commit e8d097151d309eb71f750bbf34e6a7ef6256da7e in linux-stable.git.  
The

below patch to arch/x86/kernel/rtc.c was also effective for 4.2.7.

Eric


Hi Eric,

Yeah it's unfortunate the patch patching the other patches destined for 
stable didn't make it in time for stable :(.
Any how the chosen solution wasn't ideal so there now is a V2 patch by 
Boris. It hasn't been picked up yet,
but hopefully will be anytime soon (for the patch see 
http://lkml.iu.edu/hypermail/linux/kernel/1512.1/03504.html)


--
Sander


On 2015-12-02 18:30, Sander Eikelenboom wrote:

On 2015-12-02 15:55, David Vrabel wrote:
> On 28/11/15 15:47, Sander Eikelenboom wrote:
>> genirq: Flags mismatch irq 8.  (hvc_console) vs. 
>> (rtc0)
>
> We shouldn't register an rtc_cmos device because its legacy irq
> conflicts with the irq needed for hvc0.  For a multi VCPU guest irq 8
> is
> in use for the pv spinlocks and this gets requested first, preventing
> the rtc device from probing.
>
> Does this patch fix it for you?
>
> David

It does, thanks.

Reported-and-tested-by: Sander Eikelenboom 

--
Sander

> 8<
> x86: rtc_cmos platform device requires legacy irqs
>
> Adding the rtc platform device when there are no legacy irqs (no
> legacy PIC) causes a conflict with other devices that end up using the
> same irq number.
>
> In a single VCPU PV guest we should have:
>
> /proc/interrupts:
>CPU0
>   0:   4934  xen-percpu-virq  timer0
>   1:  0  xen-percpu-ipi   spinlock0
>   2:  0  xen-percpu-ipi   resched0
>   3:  0  xen-percpu-ipi   callfunc0
>   4:  0  xen-percpu-virq  debug0
>   5:  0  xen-percpu-ipi   callfuncsingle0
>   6:  0  xen-percpu-ipi   irqwork0
>   7:321   xen-dyn-event xenbus
>   8: 90   xen-dyn-event hvc_console
>   ...
>
> But hvc_console cannot get its interrupt because it is already in use
> by rtc0 and the console does not work.
>
>   genirq: Flags mismatch irq 8.  (hvc_console) vs. 
> (rtc0)
>
> The rtc_cmos device requires a particular legacy irq so don't add it
> if there are no legacy irqs.
>
> Signed-off-by: David Vrabel 
> ---
>  arch/x86/kernel/rtc.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
> index cd96852..07c70f1 100644
> --- a/arch/x86/kernel/rtc.c
> +++ b/arch/x86/kernel/rtc.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #ifdef CONFIG_X86_32
>  /*
> @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void)
>   }
>  #endif
>
> + /* RTC uses legacy IRQs. */
> + if (!nr_legacy_irqs())
> + return -ENODEV;
> +
>   platform_device_register(&rtc_device);
>   dev_info(&rtc_device.dev,
>"registered platform RTC device (no PNP device

found)\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] x86: Xen PV guests don't have the rtc_cmos platform device

2015-12-09 Thread Sander Eikelenboom

On 2015-12-09 15:42, Jan Beulich wrote:

On 09.12.15 at 15:32,  wrote:

--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void)
}
 #endif

+   if (paravirt_enabled())
+   return -ENODEV;


What about Xen Dom0?

Jan


Checked that in my testing and that still worked:
[   16.733837] rtc_cmos 00:02: RTC can wake from S4
[   16.734030] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0
[   16.734087] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes 
nvram
[   17.760329] rtc_cmos 00:02: setting system clock to 2015-12-09 
08:43:48 UTC (1449650628)


and /dev/rtc and /dev/rtc0 both exist.

But i don't know the nitty gritty details about why ...
--
Sander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-12-02 Thread Sander Eikelenboom

On 2015-12-02 15:55, David Vrabel wrote:

On 28/11/15 15:47, Sander Eikelenboom wrote:
genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)


We shouldn't register an rtc_cmos device because its legacy irq
conflicts with the irq needed for hvc0.  For a multi VCPU guest irq 8 
is

in use for the pv spinlocks and this gets requested first, preventing
the rtc device from probing.

Does this patch fix it for you?

David


It does, thanks.

Reported-and-tested-by: Sander Eikelenboom 

--
Sander


8<
x86: rtc_cmos platform device requires legacy irqs

Adding the rtc platform device when there are no legacy irqs (no
legacy PIC) causes a conflict with other devices that end up using the
same irq number.

In a single VCPU PV guest we should have:

/proc/interrupts:
   CPU0
  0:   4934  xen-percpu-virq  timer0
  1:  0  xen-percpu-ipi   spinlock0
  2:  0  xen-percpu-ipi   resched0
  3:  0  xen-percpu-ipi   callfunc0
  4:  0  xen-percpu-virq  debug0
  5:  0  xen-percpu-ipi   callfuncsingle0
  6:  0  xen-percpu-ipi   irqwork0
  7:321   xen-dyn-event xenbus
  8: 90   xen-dyn-event hvc_console
  ...

But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.

  genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)


The rtc_cmos device requires a particular legacy irq so don't add it
if there are no legacy irqs.

Signed-off-by: David Vrabel 
---
 arch/x86/kernel/rtc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index cd96852..07c70f1 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 

 #ifdef CONFIG_X86_32
 /*
@@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void)
}
 #endif

+   /* RTC uses legacy IRQs. */
+   if (!nr_legacy_irqs())
+   return -ENODEV;
+
platform_device_register(&rtc_device);
dev_info(&rtc_device.dev,
 "registered platform RTC device (no PNP device found)\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-12-02 Thread Sander Eikelenboom

On 2015-12-02 00:41, Boris Ostrovsky wrote:

On 12/01/2015 06:30 PM, Sander Eikelenboom wrote:

On 2015-12-02 00:19, Boris Ostrovsky wrote:

On 12/01/2015 06:00 PM, Sander Eikelenboom wrote:

On 2015-12-01 23:47, Boris Ostrovsky wrote:

On 11/30/2015 05:55 PM, Sander Eikelenboom wrote:

On 2015-11-30 23:54, Boris Ostrovsky wrote:

On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:

On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:
On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom 
wrote:

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the 
tip tree

pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus 
goes well (on

idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it 
seems a kworker

thread is stuck:
root   569 98.0  0.0  0 0 ?R 16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting 
would probably
quite painful since there were some breakages this merge 
window with respect

to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 
4.4-single,

4.4-multi cpu boot:


Boris has been tracking a bunch of them. I am attaching the 
latest set of

patches I've to carry on top of v4.4-rc3.


Hi Konrad,

i will test those, see if it fixes all my issues and report back


They shouldn't help you ;-( (and I just saw a message from you 
confirming this)


The first one fixes a 32-bit bug (on bare metal too). The second 
fixes

a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.


One of these patches also fixes a bug i was having with a 
pci-passthrough device in
a HVM that wasn't working (depending on which dom0-kernel i was 
using (4.3 or 4.4)),

but didn't report yet.

Fingers crossed but i think this pv-guest single vcpu issue is the 
last i'm troubled by for now ;)


I could not reproduce this, including with your kernel config file.


Hmm that's unpleasant :-\

Hmm other strange thing is it doesn't seem to affect dom0 (which is 
also a PV guest), but only unprivileged ones
All unprivileged pv-guests seem to have the irq issue, but only with 
a single vcpu i see to get the stuck kworker thread that got my 
attention, with a 2 vcpu that doesn't seem to happen, but you still 
get the dmesg output and warnings about hvc)


Could it be that:

arch/x86/include/asm/i8259.h
static inline int nr_legacy_irqs(void)
{
return legacy_pic->nr_legacy_irqs;
}

returns something different in some circumstances ?


It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 
0

after that commit.

This is the last number that you see in
NR_IRQS:4352 nr_irqs:48 0
line.

I think you should be able to safely revert both
b4ff8389ed14b849354b59ce9b360bdefcdbf99c and
8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any
difference.


-boris



That was already underway compiling :)

And it does reveal that reverting both fixes the issue, no stuck 
kworker thread .. and no:
   genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

   hvc_open: request_irq failed with rc -16.



Let me try it again tomorrow. Can you post your guest config file, Xen
version and host HW (Intel or AMD)? 'xl info' maybe?

-boris


Hi Boris,

A fresh new day .. a fresh new thought.
If i look at the /proc/interrupts from a broken and a kernel with both 
commits the
thing that catches the eye is irq8, just as the dmesg message was 
telling.


In my PV guest rtc0 now seems to try and take irq8 that was already 
assigned to HVC ?
Sounds like some assumptions around the legacy range are broken 
somewhere.


What is the benefit of not just reserving the legacy range ?

Attached the /proc/interrupts from both boots.

--
Sander






What i did get was an conflict reverting 
b4ff8389ed14b849354b59ce9b360bdefcdbf99c:
arch/arm64/include/asm/irq.h, although that shouldn't matter because 
we are on x86 and not on arm.


-- Sander




-- Sander



-boris


___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel   CPU0   
 16: 315536  xen-percpu-virq  timer0
 17:  0  xen-percpu-ipi   spinlock0
 18:  0  xen-percpu-ipi   resched0
 19:  0  xen-percpu-ipi   callfunc0
 20:  0  xen-percpu-virq  debug0
 21:  0  xen-percpu-ipi   callfuncsingle0
 22:  0  xen-percpu-ipi   irqwork0
 23:346   xen-dyn-event xenbus
 24:134   xen-dyn-event hvc_console
 25:  11464   xen-dyn-event blkif
 26:  28710   xen-dyn-event eth0-q0-tx
 27:  40136   xen-dyn-event eth0-q0-rx
NMI:  0   Non-maskable interrupts
LOC:  0   Local timer interrupts
SPU:  0   Spurious interrupts
PMI:  0   Performance monitoring interrupts
IWI:  0   IRQ work interrupts

Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-12-01 Thread Sander Eikelenboom

On 2015-12-02 00:41, Boris Ostrovsky wrote:

On 12/01/2015 06:30 PM, Sander Eikelenboom wrote:

On 2015-12-02 00:19, Boris Ostrovsky wrote:

On 12/01/2015 06:00 PM, Sander Eikelenboom wrote:

On 2015-12-01 23:47, Boris Ostrovsky wrote:

On 11/30/2015 05:55 PM, Sander Eikelenboom wrote:

On 2015-11-30 23:54, Boris Ostrovsky wrote:

On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:

On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:
On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom 
wrote:

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the 
tip tree

pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus 
goes well (on

idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it 
seems a kworker

thread is stuck:
root   569 98.0  0.0  0 0 ?R 16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting 
would probably
quite painful since there were some breakages this merge 
window with respect

to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 
4.4-single,

4.4-multi cpu boot:


Boris has been tracking a bunch of them. I am attaching the 
latest set of

patches I've to carry on top of v4.4-rc3.


Hi Konrad,

i will test those, see if it fixes all my issues and report back


They shouldn't help you ;-( (and I just saw a message from you 
confirming this)


The first one fixes a 32-bit bug (on bare metal too). The second 
fixes

a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.


One of these patches also fixes a bug i was having with a 
pci-passthrough device in
a HVM that wasn't working (depending on which dom0-kernel i was 
using (4.3 or 4.4)),

but didn't report yet.

Fingers crossed but i think this pv-guest single vcpu issue is the 
last i'm troubled by for now ;)


I could not reproduce this, including with your kernel config file.


Hmm that's unpleasant :-\

Hmm other strange thing is it doesn't seem to affect dom0 (which is 
also a PV guest), but only unprivileged ones
All unprivileged pv-guests seem to have the irq issue, but only with 
a single vcpu i see to get the stuck kworker thread that got my 
attention, with a 2 vcpu that doesn't seem to happen, but you still 
get the dmesg output and warnings about hvc)


Could it be that:

arch/x86/include/asm/i8259.h
static inline int nr_legacy_irqs(void)
{
return legacy_pic->nr_legacy_irqs;
}

returns something different in some circumstances ?


It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 
0

after that commit.

This is the last number that you see in
NR_IRQS:4352 nr_irqs:48 0
line.

I think you should be able to safely revert both
b4ff8389ed14b849354b59ce9b360bdefcdbf99c and
8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any
difference.


-boris



That was already underway compiling :)

And it does reveal that reverting both fixes the issue, no stuck 
kworker thread .. and no:
   genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

   hvc_open: request_irq failed with rc -16.



Let me try it again tomorrow. Can you post your guest config file, Xen
version and host HW (Intel or AMD)? 'xl info' maybe?

-boris


Guest config file == dom0 config file == the one i send you earlier.
Host is an AMD Phenom X6.

# xl info
host   : serveerstertje
release: 4.4.0-rc3-20151201-linus-doflr-boris+
version: #1 SMP Tue Dec 1 19:02:58 CET 2015
machine: x86_64
nr_cpus: 6
max_cpu_id : 5
nr_nodes   : 1
cores_per_socket   : 6
threads_per_core   : 1
cpu_mhz: 3200
hw_caps: 
178bf3ff:efd3fbff::00011300:00802001::37ff:

virt_caps  : hvm hvm_directio
total_memory   : 20479
free_memory: 7745
sharing_freed_memory   : 0
sharing_used_memory: 0
outstanding_claims : 0
free_cpus  : 0
xen_major  : 4
xen_minor  : 7
xen_extra  : -unstable
xen_version: 4.7-unstable
xen_caps   : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
hvm-3.0-x86_32p hvm-3.0-x86_64

xen_scheduler  : credit
xen_pagesize   : 4096
platform_params: virt_start=0x8000
xen_changeset  : Thu Nov 26 20:58:13 2015 +0100 
git:5252636-dirty
xen_commandline: dom0_mem=1536M,max:1536M loglvl=all 
loglvl_guest=all console_timestamps=datems vga=gfx-1280x1024x32 cpuidle 
cpufreq=xen com1=38400,8n1 console=vga,com1 ivrs_ioapic[6]=00:14.0 
iommu=on,verbose,debug,amd-iommu-debug conring_size=128k ucode=-1

cc_compiler: gcc-4.9.real (Debian 4.9.2-10) 4.9.2
cc_compile_by  : root
cc_compile_domain  : dyndns.org
cc_compile_date: Thu Nov 26 21:18:41 CET 201

Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-12-01 Thread Sander Eikelenboom

On 2015-12-02 00:19, Boris Ostrovsky wrote:

On 12/01/2015 06:00 PM, Sander Eikelenboom wrote:

On 2015-12-01 23:47, Boris Ostrovsky wrote:

On 11/30/2015 05:55 PM, Sander Eikelenboom wrote:

On 2015-11-30 23:54, Boris Ostrovsky wrote:

On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:

On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:
On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom 
wrote:

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the 
tip tree

pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus 
goes well (on

idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it 
seems a kworker

thread is stuck:
root   569 98.0  0.0  0 0 ?R 16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting 
would probably
quite painful since there were some breakages this merge window 
with respect

to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 
4.4-single,

4.4-multi cpu boot:


Boris has been tracking a bunch of them. I am attaching the 
latest set of

patches I've to carry on top of v4.4-rc3.


Hi Konrad,

i will test those, see if it fixes all my issues and report back


They shouldn't help you ;-( (and I just saw a message from you 
confirming this)


The first one fixes a 32-bit bug (on bare metal too). The second 
fixes

a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.


One of these patches also fixes a bug i was having with a 
pci-passthrough device in
a HVM that wasn't working (depending on which dom0-kernel i was 
using (4.3 or 4.4)),

but didn't report yet.

Fingers crossed but i think this pv-guest single vcpu issue is the 
last i'm troubled by for now ;)


I could not reproduce this, including with your kernel config file.


Hmm that's unpleasant :-\

Hmm other strange thing is it doesn't seem to affect dom0 (which is 
also a PV guest), but only unprivileged ones
All unprivileged pv-guests seem to have the irq issue, but only with a 
single vcpu i see to get the stuck kworker thread that got my 
attention, with a 2 vcpu that doesn't seem to happen, but you still 
get the dmesg output and warnings about hvc)


Could it be that:

arch/x86/include/asm/i8259.h
static inline int nr_legacy_irqs(void)
{
return legacy_pic->nr_legacy_irqs;
}

returns something different in some circumstances ?


It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0
after that commit.

This is the last number that you see in
NR_IRQS:4352 nr_irqs:48 0
line.

I think you should be able to safely revert both
b4ff8389ed14b849354b59ce9b360bdefcdbf99c and
8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any
difference.


-boris



That was already underway compiling :)

And it does reveal that reverting both fixes the issue, no stuck kworker 
thread .. and no:
   genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

   hvc_open: request_irq failed with rc -16.

What i did get was an conflict reverting 
b4ff8389ed14b849354b59ce9b360bdefcdbf99c:
arch/arm64/include/asm/irq.h, although that shouldn't matter because we 
are on x86 and not on arm.


--
Sander




-- Sander



-boris


___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-12-01 Thread Sander Eikelenboom

On 2015-12-01 23:47, Boris Ostrovsky wrote:

On 11/30/2015 05:55 PM, Sander Eikelenboom wrote:

On 2015-11-30 23:54, Boris Ostrovsky wrote:

On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:

On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:

On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote:

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the tip 
tree

pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus 
goes well (on

idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it seems 
a kworker

thread is stuck:
root   569 98.0  0.0  0 0 ?R16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting 
would probably
quite painful since there were some breakages this merge window 
with respect

to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 
4.4-single,

4.4-multi cpu boot:


Boris has been tracking a bunch of them. I am attaching the latest 
set of

patches I've to carry on top of v4.4-rc3.


Hi Konrad,

i will test those, see if it fixes all my issues and report back


They shouldn't help you ;-( (and I just saw a message from you 
confirming this)


The first one fixes a 32-bit bug (on bare metal too). The second 
fixes

a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.


One of these patches also fixes a bug i was having with a 
pci-passthrough device in
a HVM that wasn't working (depending on which dom0-kernel i was using 
(4.3 or 4.4)),

but didn't report yet.

Fingers crossed but i think this pv-guest single vcpu issue is the 
last i'm troubled by for now ;)


I could not reproduce this, including with your kernel config file.


Hmm that's unpleasant :-\

Hmm other strange thing is it doesn't seem to affect dom0 (which is also 
a PV guest), but only unprivileged ones
All unprivileged pv-guests seem to have the irq issue, but only with a 
single vcpu i see to get the stuck kworker thread that got my attention, 
with a 2 vcpu that doesn't seem to happen, but you still get the dmesg 
output and warnings about hvc)


Could it be that:

arch/x86/include/asm/i8259.h
static inline int nr_legacy_irqs(void)
{
return legacy_pic->nr_legacy_irqs;
}

returns something different in some circumstances ?

--
Sander



-boris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-12-01 Thread Sander Eikelenboom

On 2015-11-30 23:54, Boris Ostrovsky wrote:

On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:

On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:

On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote:

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the tip 
tree

pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus goes 
well (on

idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it seems a 
kworker

thread is stuck:
root   569 98.0  0.0  0 0 ?R16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting would 
probably
quite painful since there were some breakages this merge window with 
respect

to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 
4.4-single,

4.4-multi cpu boot:


Boris has been tracking a bunch of them. I am attaching the latest 
set of

patches I've to carry on top of v4.4-rc3.


Hi Konrad,

i will test those, see if it fixes all my issues and report back


They shouldn't help you ;-( (and I just saw a message from you 
confirming this)


The first one fixes a 32-bit bug (on bare metal too). The second fixes
a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.




Thanks :)

-- Sander


Between 4.3 and 4.4-single:

-NR_IRQS:4352 nr_irqs:32 16
+Using NULL legacy PIC
+NR_IRQS:4352 nr_irqs:32 0


This is fine, as long as you have 
b4ff8389ed14b849354b59ce9b360bdefcdbf99c.




-cpu 0 spinlock event irq 17
+cpu 0 spinlock event irq 1


This is strange. I wouldn't expect spinlocks to use legacy irqs.



Could it be .. that with your fixup:
xen/events: Always allocate legacy interrupts on PV guests
(b4ff8389ed14b849354b59ce9b360bdefcdbf99c)
for commit:
x86/irq: Probe for PIC presence before allocating descs for legacy 
IRQs

(8c058b0b9c34d8c8d7912880956543769323e2d8)

that we now have the situation described in the commit message of 
8c058b0b9c, but now for Xen PV instead of

Hyper-V ?
(seems both Xen and Hyper-V want to achieve the same but have different 
competing implementations ?)


(BTW 8c058b0b9c has a CC for stable ... so could be destined to cause 
more trouble).


--
Sander




and later on:

-hctosys: unable to open rtc device (rtc0)
+rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock

+genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

+hvc_open: request_irq failed with rc -16.
+Warning: unable to open an initial console.


between 4.4-single and 4.4-multi:

 Using NULL legacy PIC
-NR_IRQS:4352 nr_irqs:32 0
+NR_IRQS:4352 nr_irqs:48 0


This is probably OK too since nr_irqs depend on number of CPUs.

I think something is messed up with IRQ. I saw last week something
from setup_irq() generating a stack dump (warninig) for rtc_cmos but
it appeared harmless at that time and now I don't see it anymore.

-boris




and later on:

-rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock
+hctosys: unable to open rtc device (rtc0)

-genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

-hvc_open: request_irq failed with rc -16.
-Warning: unable to open an initial console.

attached:
- dmesg with 4.3 kernel with 1 vcpu
- dmesg with 4.4 kernel with 1 vpcu
- dmesg with 4.4 kernel with 2 vpcus
- .config of the 4.4 kernel is attached.

-- Sander



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

2015-11-30 Thread Sander Eikelenboom

On 2015-11-30 23:54, Boris Ostrovsky wrote:

On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:

On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:

On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote:

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the tip 
tree

pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus goes 
well (on

idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it seems a 
kworker

thread is stuck:
root   569 98.0  0.0  0 0 ?R16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting would 
probably
quite painful since there were some breakages this merge window with 
respect

to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 
4.4-single,

4.4-multi cpu boot:


Boris has been tracking a bunch of them. I am attaching the latest 
set of

patches I've to carry on top of v4.4-rc3.


Hi Konrad,

i will test those, see if it fixes all my issues and report back


They shouldn't help you ;-( (and I just saw a message from you 
confirming this)


The first one fixes a 32-bit bug (on bare metal too). The second fixes
a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.


One of these patches also fixes a bug i was having with a 
pci-passthrough device in
a HVM that wasn't working (depending on which dom0-kernel i was using 
(4.3 or 4.4)),

but didn't report yet.

Fingers crossed but i think this pv-guest single vcpu issue is the last 
i'm troubled by for now ;)


--
Sander





Thanks :)

-- Sander


Between 4.3 and 4.4-single:

-NR_IRQS:4352 nr_irqs:32 16
+Using NULL legacy PIC
+NR_IRQS:4352 nr_irqs:32 0


This is fine, as long as you have 
b4ff8389ed14b849354b59ce9b360bdefcdbf99c.




-cpu 0 spinlock event irq 17
+cpu 0 spinlock event irq 1


This is strange. I wouldn't expect spinlocks to use legacy irqs.



and later on:

-hctosys: unable to open rtc device (rtc0)
+rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock

+genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

+hvc_open: request_irq failed with rc -16.
+Warning: unable to open an initial console.


between 4.4-single and 4.4-multi:

 Using NULL legacy PIC
-NR_IRQS:4352 nr_irqs:32 0
+NR_IRQS:4352 nr_irqs:48 0


This is probably OK too since nr_irqs depend on number of CPUs.

I think something is messed up with IRQ. I saw last week something
from setup_irq() generating a stack dump (warninig) for rtc_cmos but
it appeared harmless at that time and now I don't see it anymore.

-boris




and later on:

-rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock
+hctosys: unable to open rtc device (rtc0)

-genirq: Flags mismatch irq 8.  (hvc_console) vs.  
(rtc0)

-hvc_open: request_irq failed with rc -16.
-Warning: unable to open an initial console.

attached:
- dmesg with 4.3 kernel with 1 vcpu
- dmesg with 4.4 kernel with 1 vpcu
- dmesg with 4.4 kernel with 2 vpcus
    - .config of the 4.4 kernel is attached.

-- Sander



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-4.4-mw] Regression: cx25821: Oops: no 32bit PCI DMA

2015-11-15 Thread Sander Eikelenboom

On 2015-11-15 13:56, Christoph Hellwig wrote:

Hi Saner,

this is my fault.  Please see the patch which I already sent out
to Andrew and lkml.


Hi Christoph,

Thanks for the pointer, just tested and it works fine again.

--
Sander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core

2015-11-05 Thread Sander Eikelenboom

Thursday, November 5, 2015, 2:53:40 PM, you wrote:

> On 11/05/2015 04:13 AM, Sander Eikelenboom wrote:
>>
>> It makes "cat /sys/kernel/debug/kernel_page_tables" work and
>> prevents a kernel with CONFIG_DEBUG_WX=y from crashing at boot.

> Great. Our nightly runs also failed spectacularly due to this bug.

>>
>> It now does give a warning about an insecure W+X mapping, so 
>> CONFIG_DEBUG_WX=y
>> seems to be working. No idea how to interpret it though (and if it's a 
>> legit
>> warning).
>>
>> -- 
>> Sander
>>
>> [   19.034706] Freeing unused kernel memory: 1104K (822fc000 - 
>> 8241)
>> [   19.041339] Write protecting the kernel read-only data: 18432k
>> [   19.052596] Freeing unused kernel memory: 1144K (880001ae2000 - 
>> 880001c0)
>> [   19.060285] Freeing unused kernel memory: 1560K (88000207a000 - 
>> 88000220)
>> [   19.067079] [ cut here ]
>> [   19.073931] WARNING: CPU: 5 PID: 1 at 
>> arch/x86/mm/dump_pagetables.c:225 note_page+0x619/0x7e0()

> Yes, this apparently is a known issue: https://lkml.org/lkml/2015/11/4/476

> -boris

Ah thx for the pointer :)

--
Sander




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core

2015-11-05 Thread Sander Eikelenboom

On 2015-11-05 00:13, Boris Ostrovsky wrote:

On 11/04/2015 03:02 PM, Sander Eikelenboom wrote:

On 2015-11-04 19:47, Stephen Smalley wrote:

On 11/04/2015 01:28 PM, Sander Eikelenboom wrote:

On 2015-11-04 16:52, Stephen Smalley wrote:

On 11/04/2015 06:55 AM, Sander Eikelenboom wrote:

Hi All,

I just tried to boot with the current linus mergewindow tree under 
Xen.
It fails with a kernel panic at boot with the new 
"CONFIG_DEBUG_WX"

option enabled.
Disabling it makes the kernel boot fine.

The splat:
[   18.424241] Freeing unused kernel memory: 1104K 
(822fc000 -

8241)
[   18.430314] Write protecting the kernel read-only data: 18432k
[   18.441054] Freeing unused kernel memory: 1144K 
(880001ae2000 -

880001c0)
[   18.447966] Freeing unused kernel memory: 1560K 
(88000207a000 -

88000220)
[   18.453947] BUG: unable to handle kernel paging request at
88055c883000
[   18.459943] IP: []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.465847] PGD 2212067 PUD 0
[   18.471564] Oops:  [#1] SMP
[   18.477248] Modules linked in:
[   18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.3.0-mw-20151104-linus-doflr+ #1
[   18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , 
BIOS

V1.8B1 09/13/2010
[   18.494778] task: 880059b9 ti: 880059b98000 
task.ti:

880059b98000
[   18.500852] RIP: e030:[] []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.507102] RSP: e02b:880059b9be48  EFLAGS: 00010296
[   18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX:
8800
[   18.519733] RDX: 0067 RSI: 880059b9be98 RDI:
88001000
[   18.526129] RBP: 880059b9bf00 R08:  R09:

[   18.532522] R10: 88005fd0e790 R11: 0001 R12:
88008000
[   18.538891] R13: cfff R14: 880059b9be98 R15:

[   18.545247] FS:  () 
GS:88005f68()

knlGS:
[   18.551708] CS:  e033 DS:  ES:  CR0: 8005003b
[   18.558153] CR2: 88055c883000 CR3: 02211000 CR4:
0660
[   18.564686] Stack:
[   18.571106]  000159b9be50 82211000 88055c884000
0800
[   18.577704]  8000 88055c883000 0007
88005fd0e790
[   18.584291]  880059b9bed8 81156ace 0001

[   18.590916] Call Trace:
[   18.597458]  [] ? 
free_reserved_area+0x11e/0x120

[   18.604180]  []
ptdump_walk_pgd_level_checkwx+0x12/0x20
[   18.611014]  [] mark_rodata_ro+0xe9/0xf0
[   18.617819]  [] ? rest_init+0x80/0x80
[   18.624512]  [] kernel_init+0x18/0xe0
[   18.631095]  [] ret_from_fork+0x3f/0x70
[   18.637650]  [] ? rest_init+0x80/0x80
[   18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe 
ff ff
48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 
70 ff
ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 
4c 89

[   18.658246] RIP  []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.665211]  RSP 
[   18.672073] CR2: 88055c883000
[   18.678852] ---[ end trace d84e34461c40637a ]---
[   18.685641] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0009
[   18.685641]
[   18.699520] Kernel Offset: disable



What's your .config?  Does cat /sys/kernel/debug/kernel_page_tables
produce a similar fault even with CONFIG_DEBUG_WX=n?


.config is attached

Hmm that sysfs file doesn't seem to exist then:
# cat /sys/kernel/debug/kernel_page_tables
cat: /sys/kernel/debug/kernel_page_tables: No such file or directory


Needs CONFIG_X86_PTDUMP=y.
Also assumes you have debugfs mounted there.


Recompiled, and the result is that it also blows up:



Can you try this:


diff --git a/arch/x86/mm/dump_pagetables.c 
b/arch/x86/mm/dump_pagetables.c

index 1bf417e..b534216 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct
seq_file *m, pgd_t *pgd,
bool checkwx)
 {
 #ifdef CONFIG_X86_64
+/* 8000 - 87ff is reserved for hypervisor */
+#define is_hypervisor_range(idx)  (paravirt_enabled() && \
+  ((idx >= pgd_index(__PAGE_OFFSET) - 16) && \
+   (idx < pgd_index(__PAGE_OFFSET
 pgd_t *start = (pgd_t *) &init_level4_pgt;
 #else
+#define is_hypervisor_range(idx)   0
 pgd_t *start = swapper_pg_dir;
 #endif
 pgprotval_t prot;
@@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct
seq_file *m, pgd_t *pgd,

 for (i = 0; i < PTRS_PER_PGD; i++) {
 st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
-if (!pgd_none(*start)) {
+if (!pgd_none(*start) && !is_hypervisor_range(i)) {
 if (pgd_large(*start) || !pgd_present(*start)) {
 prot = pgd_flags(*start);
 note_page(m, &st, __

Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core

2015-11-04 Thread Sander Eikelenboom

On 2015-11-04 19:47, Stephen Smalley wrote:

On 11/04/2015 01:28 PM, Sander Eikelenboom wrote:

On 2015-11-04 16:52, Stephen Smalley wrote:

On 11/04/2015 06:55 AM, Sander Eikelenboom wrote:

Hi All,

I just tried to boot with the current linus mergewindow tree under 
Xen.

It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX"
option enabled.
Disabling it makes the kernel boot fine.

The splat:
[   18.424241] Freeing unused kernel memory: 1104K (822fc000 
-

8241)
[   18.430314] Write protecting the kernel read-only data: 18432k
[   18.441054] Freeing unused kernel memory: 1144K (880001ae2000 
-

880001c0)
[   18.447966] Freeing unused kernel memory: 1560K (88000207a000 
-

88000220)
[   18.453947] BUG: unable to handle kernel paging request at
88055c883000
[   18.459943] IP: []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.465847] PGD 2212067 PUD 0
[   18.471564] Oops:  [#1] SMP
[   18.477248] Modules linked in:
[   18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.3.0-mw-20151104-linus-doflr+ #1
[   18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , 
BIOS

V1.8B1 09/13/2010
[   18.494778] task: 880059b9 ti: 880059b98000 task.ti:
880059b98000
[   18.500852] RIP: e030:[]  []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.507102] RSP: e02b:880059b9be48  EFLAGS: 00010296
[   18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX:
8800
[   18.519733] RDX: 0067 RSI: 880059b9be98 RDI:
88001000
[   18.526129] RBP: 880059b9bf00 R08:  R09:

[   18.532522] R10: 88005fd0e790 R11: 0001 R12:
88008000
[   18.538891] R13: cfff R14: 880059b9be98 R15:

[   18.545247] FS:  () GS:88005f68()
knlGS:
[   18.551708] CS:  e033 DS:  ES:  CR0: 8005003b
[   18.558153] CR2: 88055c883000 CR3: 02211000 CR4:
0660
[   18.564686] Stack:
[   18.571106]  000159b9be50 82211000 88055c884000
0800
[   18.577704]  8000 88055c883000 0007
88005fd0e790
[   18.584291]  880059b9bed8 81156ace 0001

[   18.590916] Call Trace:
[   18.597458]  [] ? 
free_reserved_area+0x11e/0x120

[   18.604180]  []
ptdump_walk_pgd_level_checkwx+0x12/0x20
[   18.611014]  [] mark_rodata_ro+0xe9/0xf0
[   18.617819]  [] ? rest_init+0x80/0x80
[   18.624512]  [] kernel_init+0x18/0xe0
[   18.631095]  [] ret_from_fork+0x3f/0x70
[   18.637650]  [] ? rest_init+0x80/0x80
[   18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff 
ff
48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 
ff
ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 
89

[   18.658246] RIP  []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.665211]  RSP 
[   18.672073] CR2: 88055c883000
[   18.678852] ---[ end trace d84e34461c40637a ]---
[   18.685641] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0009
[   18.685641]
[   18.699520] Kernel Offset: disable



What's your .config?  Does cat /sys/kernel/debug/kernel_page_tables
produce a similar fault even with CONFIG_DEBUG_WX=n?


.config is attached

Hmm that sysfs file doesn't seem to exist then:
# cat /sys/kernel/debug/kernel_page_tables
cat: /sys/kernel/debug/kernel_page_tables: No such file or directory


Needs CONFIG_X86_PTDUMP=y.
Also assumes you have debugfs mounted there.


Recompiled, and the result is that it also blows up:
[  902.389247] BUG: unable to handle kernel paging request at 
88055c883000
[  902.402749] IP: [] 
ptdump_walk_pgd_level_core+0x20e/0x440

[  902.416261] PGD 2212067 PUD 0
[  902.427768] Oops:  [#1] SMP
[  902.438137] Modules linked in:
[  902.448299] CPU: 2 PID: 21951 Comm: cat Not tainted 
4.3.0-mw-20151104-linus-doflr-nodebugwx-withptdump+ #1
[  902.458581] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
[  902.468850] task: 88004b49e300 ti: 88005928c000 task.ti: 
88005928c000
[  902.479133] RIP: e030:[]  [] 
ptdump_walk_pgd_level_core+0x20e/0x440

[  902.489536] RSP: e02b:88005928fd20  EFLAGS: 00010296
[  902.499692] RAX: 88055c883000 RBX:  RCX: 
8800
[  902.509755] RDX: 0067 RSI: 88005928fd70 RDI: 
88001000
[  902.519680] RBP: 88005928fdd8 R08: 1000 R09: 

[  902.529555] R10:  R11: 0246 R12: 
88005928ff20
[  902.539349] R13: cfff R14: 88005928fd70 R15: 
880033c773c0
[  902.549081] FS:  7f56b07d4700() GS:88005f68() 
knlGS:

[  902.558690] CS:  e033 DS:  ES:  CR0: 8005003b
[  902.568111] CR2: 88055c883000 CR3: 4563f000 CR4: 
0660

[  902.57

Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core

2015-11-04 Thread Sander Eikelenboom

On 2015-11-04 16:52, Stephen Smalley wrote:

On 11/04/2015 06:55 AM, Sander Eikelenboom wrote:

Hi All,

I just tried to boot with the current linus mergewindow tree under 
Xen.

It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX"
option enabled.
Disabling it makes the kernel boot fine.

The splat:
[   18.424241] Freeing unused kernel memory: 1104K (822fc000 -
8241)
[   18.430314] Write protecting the kernel read-only data: 18432k
[   18.441054] Freeing unused kernel memory: 1144K (880001ae2000 -
880001c0)
[   18.447966] Freeing unused kernel memory: 1560K (88000207a000 -
88000220)
[   18.453947] BUG: unable to handle kernel paging request at
88055c883000
[   18.459943] IP: []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.465847] PGD 2212067 PUD 0
[   18.471564] Oops:  [#1] SMP
[   18.477248] Modules linked in:
[   18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.3.0-mw-20151104-linus-doflr+ #1
[   18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , 
BIOS

V1.8B1 09/13/2010
[   18.494778] task: 880059b9 ti: 880059b98000 task.ti:
880059b98000
[   18.500852] RIP: e030:[]  []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.507102] RSP: e02b:880059b9be48  EFLAGS: 00010296
[   18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX:
8800
[   18.519733] RDX: 0067 RSI: 880059b9be98 RDI:
88001000
[   18.526129] RBP: 880059b9bf00 R08:  R09:

[   18.532522] R10: 88005fd0e790 R11: 0001 R12:
88008000
[   18.538891] R13: cfff R14: 880059b9be98 R15:

[   18.545247] FS:  () GS:88005f68()
knlGS:
[   18.551708] CS:  e033 DS:  ES:  CR0: 8005003b
[   18.558153] CR2: 88055c883000 CR3: 02211000 CR4:
0660
[   18.564686] Stack:
[   18.571106]  000159b9be50 82211000 88055c884000
0800
[   18.577704]  8000 88055c883000 0007
88005fd0e790
[   18.584291]  880059b9bed8 81156ace 0001

[   18.590916] Call Trace:
[   18.597458]  [] ? free_reserved_area+0x11e/0x120
[   18.604180]  []
ptdump_walk_pgd_level_checkwx+0x12/0x20
[   18.611014]  [] mark_rodata_ro+0xe9/0xf0
[   18.617819]  [] ? rest_init+0x80/0x80
[   18.624512]  [] kernel_init+0x18/0xe0
[   18.631095]  [] ret_from_fork+0x3f/0x70
[   18.637650]  [] ? rest_init+0x80/0x80
[   18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff 
ff
48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 
ff

ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89
[   18.658246] RIP  []
ptdump_walk_pgd_level_core+0x20e/0x440
[   18.665211]  RSP 
[   18.672073] CR2: 88055c883000
[   18.678852] ---[ end trace d84e34461c40637a ]---
[   18.685641] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0009
[   18.685641]
[   18.699520] Kernel Offset: disable



What's your .config?  Does cat /sys/kernel/debug/kernel_page_tables
produce a similar fault even with CONFIG_DEBUG_WX=n?


.config is attached

Hmm that sysfs file doesn't seem to exist then:
# cat /sys/kernel/debug/kernel_page_tables
cat: /sys/kernel/debug/kernel_page_tables: No such file or directory

--
Sander
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.3.0-mw-20151104-linus-doflr Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG

  1   2   3   4   >