[PATCH 4.9 87/93] x86/tsc: Fix ART for TSC_KNOWN_FREQ
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Peter Zijlstra commit 44fee88cea43d3c2cac962e0439cb10a3cabff6d upstream. Subhransu reported that convert_art_to_tsc() isn't working for him. The ART to TSC relation is only set up for systems which use the refined TSC calibration. Systems with known TSC frequency (available via CPUID 15) are not using the refined calibration and therefor the ART to TSC relation is never established. Add the setup to the known frequency init path which skips ART calibration. The init code needs to be duplicated as for systems which use refined calibration the ART setup must be delayed until calibration has been done. The problem has been there since the ART support was introdduced, but only detected now because Subhransu tested the first time on hardware which has TSC frequency enumerated via CPUID 15. Note for stable: The conditional has changed from TSC_RELIABLE to TSC_KNOWN_FREQUENCY. [ tglx: Rewrote changelog and identified the proper 'Fixes' commit ] Fixes: f9677e0f8308 ("x86/tsc: Always Running Timer (ART) correlated clocksource") Reported-by: "Prusty, Subhransu S" Signed-off-by: Peter Zijlstra (Intel) Cc: sta...@vger.kernel.org Cc: christopher.s.h...@intel.com Cc: kevin.b.stan...@intel.com Cc: john.stu...@linaro.org Cc: akata...@vmware.com Link: http://lkml.kernel.org/r/20170313145712.gi3...@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/tsc.c |2 ++ 1 file changed, 2 insertions(+) --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1287,6 +1287,8 @@ static int __init init_tsc_clocksource(v * exporting a reliable TSC. */ if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) { + if (boot_cpu_has(X86_FEATURE_ART)) + art_related_clocksource = _tsc; clocksource_register_khz(_tsc, tsc_khz); return 0; }
[PATCH 4.9 86/93] irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Shanker Donthineni commit 90922a2d03d84de36bf8a9979d62580102f31a92 upstream. On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware implementation uses 16Bytes for Interrupt Translation Entry (ITE), but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size. It might cause kernel memory corruption depending on the number of MSI(x) that are configured and the amount of memory that has been allocated for ITEs in its_create_device(). This patch fixes the potential memory corruption by setting the correct ITE size to 16Bytes. Cc: sta...@vger.kernel.org Signed-off-by: Shanker Donthineni Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/silicon-errata.txt | 44 + arch/arm64/Kconfig | 10 +++ drivers/irqchip/irq-gic-v3-its.c | 16 3 files changed, 49 insertions(+), 21 deletions(-) --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -42,24 +42,26 @@ file acts as a registry of software work will be updated when new workarounds are committed and backported to stable kernels. -| Implementor| Component | Erratum ID | Kconfig | -++-+-+-+ -| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | -| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | -| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | -| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | -| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | -| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | -| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | -| ARM| Cortex-A57 | #852523 | N/A | -| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | -| ARM| Cortex-A72 | #853709 | N/A | -| ARM| MMU-500 | #841119,#826419 | N/A | -|| | | | -| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 | -| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 | -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | -| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | -| Cavium | ThunderX SMMUv2 | #27704 | N/A| -|| | | | -| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | +| Implementor| Component | Erratum ID | Kconfig | +++-+-+-+ +| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | +| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | +| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | +| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | +| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +| ARM| Cortex-A57 | #852523 | N/A | +| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| ARM| Cortex-A72 | #853709 | N/A | +| ARM| MMU-500 | #841119,#826419 | N/A | +|| | | | +| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 | +| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 | +| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | +| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | +| Cavium | ThunderX SMMUv2 | #27704 | N/A | +|| | | | +| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | +|| | | | +| Qualcomm Tech. | QDF2400 ITS | E0065 |
[PATCH 4.9 61/93] ibmveth: calculate gso_segs for large packets
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Thomas Falcon[ Upstream commit 94acf164dc8f1184e8d0737be7125134c2701dbe ] Include calculations to compute the number of segments that comprise an aggregated large packet. Signed-off-by: Thomas Falcon Reviewed-by: Marcelo Ricardo Leitner Reviewed-by: Jonathan Maxwell Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ibm/ibmveth.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1181,7 +1181,9 @@ map_failed: static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) { + struct tcphdr *tcph; int offset = 0; + int hdr_len; /* only TCP packets will be aggregated */ if (skb->protocol == htons(ETH_P_IP)) { @@ -1208,14 +1210,20 @@ static void ibmveth_rx_mss_helper(struct /* if mss is not set through Large Packet bit/mss in rx buffer, * expect that the mss will be written to the tcp header checksum. */ + tcph = (struct tcphdr *)(skb->data + offset); if (lrg_pkt) { skb_shinfo(skb)->gso_size = mss; } else if (offset) { - struct tcphdr *tcph = (struct tcphdr *)(skb->data + offset); - skb_shinfo(skb)->gso_size = ntohs(tcph->check); tcph->check = 0; } + + if (skb_shinfo(skb)->gso_size) { + hdr_len = offset + tcph->doff * 4; + skb_shinfo(skb)->gso_segs = + DIV_ROUND_UP(skb->len - hdr_len, +skb_shinfo(skb)->gso_size); + } } static int ibmveth_poll(struct napi_struct *napi, int budget)
[PATCH 4.9 84/93] drm/vc4: Fix ->clock_select setting for the VEC encoder
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Boris Brezilloncommit ab8df60e3a3b68420d0d4477c5f07c00fbfb078b upstream. PV_CONTROL_CLK_SELECT_VEC is actually 2 and not 0. Fix the definition and rework the vc4_set_crtc_possible_masks() to cover the full range of the PV_CONTROL_CLK_SELECT field. Signed-off-by: Boris Brezillon Signed-off-by: Eric Anholt Cc: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vc4/vc4_crtc.c | 36 ++-- drivers/gpu/drm/vc4/vc4_drv.h |1 + drivers/gpu/drm/vc4/vc4_regs.h |3 ++- 3 files changed, 25 insertions(+), 15 deletions(-) --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -83,8 +83,7 @@ struct vc4_crtc_data { /* Which channel of the HVS this pixelvalve sources from. */ int hvs_channel; - enum vc4_encoder_type encoder0_type; - enum vc4_encoder_type encoder1_type; + enum vc4_encoder_type encoder_types[4]; }; #define CRTC_WRITE(offset, val) writel(val, vc4_crtc->regs + (offset)) @@ -867,20 +866,26 @@ static const struct drm_crtc_helper_func static const struct vc4_crtc_data pv0_data = { .hvs_channel = 0, - .encoder0_type = VC4_ENCODER_TYPE_DSI0, - .encoder1_type = VC4_ENCODER_TYPE_DPI, + .encoder_types = { + [PV_CONTROL_CLK_SELECT_DSI] = VC4_ENCODER_TYPE_DSI0, + [PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI] = VC4_ENCODER_TYPE_DPI, + }, }; static const struct vc4_crtc_data pv1_data = { .hvs_channel = 2, - .encoder0_type = VC4_ENCODER_TYPE_DSI1, - .encoder1_type = VC4_ENCODER_TYPE_SMI, + .encoder_types = { + [PV_CONTROL_CLK_SELECT_DSI] = VC4_ENCODER_TYPE_DSI1, + [PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI] = VC4_ENCODER_TYPE_SMI, + }, }; static const struct vc4_crtc_data pv2_data = { .hvs_channel = 1, - .encoder0_type = VC4_ENCODER_TYPE_VEC, - .encoder1_type = VC4_ENCODER_TYPE_HDMI, + .encoder_types = { + [PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI] = VC4_ENCODER_TYPE_HDMI, + [PV_CONTROL_CLK_SELECT_VEC] = VC4_ENCODER_TYPE_VEC, + }, }; static const struct of_device_id vc4_crtc_dt_match[] = { @@ -894,17 +899,20 @@ static void vc4_set_crtc_possible_masks( struct drm_crtc *crtc) { struct vc4_crtc *vc4_crtc = to_vc4_crtc(crtc); + const struct vc4_crtc_data *crtc_data = vc4_crtc->data; + const enum vc4_encoder_type *encoder_types = crtc_data->encoder_types; struct drm_encoder *encoder; drm_for_each_encoder(encoder, drm) { struct vc4_encoder *vc4_encoder = to_vc4_encoder(encoder); + int i; - if (vc4_encoder->type == vc4_crtc->data->encoder0_type) { - vc4_encoder->clock_select = 0; - encoder->possible_crtcs |= drm_crtc_mask(crtc); - } else if (vc4_encoder->type == vc4_crtc->data->encoder1_type) { - vc4_encoder->clock_select = 1; - encoder->possible_crtcs |= drm_crtc_mask(crtc); + for (i = 0; i < ARRAY_SIZE(crtc_data->encoder_types); i++) { + if (vc4_encoder->type == encoder_types[i]) { + vc4_encoder->clock_select = i; + encoder->possible_crtcs |= drm_crtc_mask(crtc); + break; + } } } } --- a/drivers/gpu/drm/vc4/vc4_drv.h +++ b/drivers/gpu/drm/vc4/vc4_drv.h @@ -194,6 +194,7 @@ to_vc4_plane(struct drm_plane *plane) } enum vc4_encoder_type { + VC4_ENCODER_TYPE_NONE, VC4_ENCODER_TYPE_HDMI, VC4_ENCODER_TYPE_VEC, VC4_ENCODER_TYPE_DSI0, --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -177,8 +177,9 @@ # define PV_CONTROL_WAIT_HSTARTBIT(12) # define PV_CONTROL_PIXEL_REP_MASK VC4_MASK(5, 4) # define PV_CONTROL_PIXEL_REP_SHIFT4 -# define PV_CONTROL_CLK_SELECT_DSI_VEC 0 +# define PV_CONTROL_CLK_SELECT_DSI 0 # define PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI1 +# define PV_CONTROL_CLK_SELECT_VEC 2 # define PV_CONTROL_CLK_SELECT_MASKVC4_MASK(3, 2) # define PV_CONTROL_CLK_SELECT_SHIFT 2 # define PV_CONTROL_FIFO_CLR BIT(1)
[PATCH 4.9 61/93] ibmveth: calculate gso_segs for large packets
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Thomas Falcon [ Upstream commit 94acf164dc8f1184e8d0737be7125134c2701dbe ] Include calculations to compute the number of segments that comprise an aggregated large packet. Signed-off-by: Thomas Falcon Reviewed-by: Marcelo Ricardo Leitner Reviewed-by: Jonathan Maxwell Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ibm/ibmveth.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1181,7 +1181,9 @@ map_failed: static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) { + struct tcphdr *tcph; int offset = 0; + int hdr_len; /* only TCP packets will be aggregated */ if (skb->protocol == htons(ETH_P_IP)) { @@ -1208,14 +1210,20 @@ static void ibmveth_rx_mss_helper(struct /* if mss is not set through Large Packet bit/mss in rx buffer, * expect that the mss will be written to the tcp header checksum. */ + tcph = (struct tcphdr *)(skb->data + offset); if (lrg_pkt) { skb_shinfo(skb)->gso_size = mss; } else if (offset) { - struct tcphdr *tcph = (struct tcphdr *)(skb->data + offset); - skb_shinfo(skb)->gso_size = ntohs(tcph->check); tcph->check = 0; } + + if (skb_shinfo(skb)->gso_size) { + hdr_len = offset + tcph->doff * 4; + skb_shinfo(skb)->gso_segs = + DIV_ROUND_UP(skb->len - hdr_len, +skb_shinfo(skb)->gso_size); + } } static int ibmveth_poll(struct napi_struct *napi, int budget)
[PATCH 4.9 84/93] drm/vc4: Fix ->clock_select setting for the VEC encoder
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Boris Brezillon commit ab8df60e3a3b68420d0d4477c5f07c00fbfb078b upstream. PV_CONTROL_CLK_SELECT_VEC is actually 2 and not 0. Fix the definition and rework the vc4_set_crtc_possible_masks() to cover the full range of the PV_CONTROL_CLK_SELECT field. Signed-off-by: Boris Brezillon Signed-off-by: Eric Anholt Cc: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vc4/vc4_crtc.c | 36 ++-- drivers/gpu/drm/vc4/vc4_drv.h |1 + drivers/gpu/drm/vc4/vc4_regs.h |3 ++- 3 files changed, 25 insertions(+), 15 deletions(-) --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -83,8 +83,7 @@ struct vc4_crtc_data { /* Which channel of the HVS this pixelvalve sources from. */ int hvs_channel; - enum vc4_encoder_type encoder0_type; - enum vc4_encoder_type encoder1_type; + enum vc4_encoder_type encoder_types[4]; }; #define CRTC_WRITE(offset, val) writel(val, vc4_crtc->regs + (offset)) @@ -867,20 +866,26 @@ static const struct drm_crtc_helper_func static const struct vc4_crtc_data pv0_data = { .hvs_channel = 0, - .encoder0_type = VC4_ENCODER_TYPE_DSI0, - .encoder1_type = VC4_ENCODER_TYPE_DPI, + .encoder_types = { + [PV_CONTROL_CLK_SELECT_DSI] = VC4_ENCODER_TYPE_DSI0, + [PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI] = VC4_ENCODER_TYPE_DPI, + }, }; static const struct vc4_crtc_data pv1_data = { .hvs_channel = 2, - .encoder0_type = VC4_ENCODER_TYPE_DSI1, - .encoder1_type = VC4_ENCODER_TYPE_SMI, + .encoder_types = { + [PV_CONTROL_CLK_SELECT_DSI] = VC4_ENCODER_TYPE_DSI1, + [PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI] = VC4_ENCODER_TYPE_SMI, + }, }; static const struct vc4_crtc_data pv2_data = { .hvs_channel = 1, - .encoder0_type = VC4_ENCODER_TYPE_VEC, - .encoder1_type = VC4_ENCODER_TYPE_HDMI, + .encoder_types = { + [PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI] = VC4_ENCODER_TYPE_HDMI, + [PV_CONTROL_CLK_SELECT_VEC] = VC4_ENCODER_TYPE_VEC, + }, }; static const struct of_device_id vc4_crtc_dt_match[] = { @@ -894,17 +899,20 @@ static void vc4_set_crtc_possible_masks( struct drm_crtc *crtc) { struct vc4_crtc *vc4_crtc = to_vc4_crtc(crtc); + const struct vc4_crtc_data *crtc_data = vc4_crtc->data; + const enum vc4_encoder_type *encoder_types = crtc_data->encoder_types; struct drm_encoder *encoder; drm_for_each_encoder(encoder, drm) { struct vc4_encoder *vc4_encoder = to_vc4_encoder(encoder); + int i; - if (vc4_encoder->type == vc4_crtc->data->encoder0_type) { - vc4_encoder->clock_select = 0; - encoder->possible_crtcs |= drm_crtc_mask(crtc); - } else if (vc4_encoder->type == vc4_crtc->data->encoder1_type) { - vc4_encoder->clock_select = 1; - encoder->possible_crtcs |= drm_crtc_mask(crtc); + for (i = 0; i < ARRAY_SIZE(crtc_data->encoder_types); i++) { + if (vc4_encoder->type == encoder_types[i]) { + vc4_encoder->clock_select = i; + encoder->possible_crtcs |= drm_crtc_mask(crtc); + break; + } } } } --- a/drivers/gpu/drm/vc4/vc4_drv.h +++ b/drivers/gpu/drm/vc4/vc4_drv.h @@ -194,6 +194,7 @@ to_vc4_plane(struct drm_plane *plane) } enum vc4_encoder_type { + VC4_ENCODER_TYPE_NONE, VC4_ENCODER_TYPE_HDMI, VC4_ENCODER_TYPE_VEC, VC4_ENCODER_TYPE_DSI0, --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -177,8 +177,9 @@ # define PV_CONTROL_WAIT_HSTARTBIT(12) # define PV_CONTROL_PIXEL_REP_MASK VC4_MASK(5, 4) # define PV_CONTROL_PIXEL_REP_SHIFT4 -# define PV_CONTROL_CLK_SELECT_DSI_VEC 0 +# define PV_CONTROL_CLK_SELECT_DSI 0 # define PV_CONTROL_CLK_SELECT_DPI_SMI_HDMI1 +# define PV_CONTROL_CLK_SELECT_VEC 2 # define PV_CONTROL_CLK_SELECT_MASKVC4_MASK(3, 2) # define PV_CONTROL_CLK_SELECT_SHIFT 2 # define PV_CONTROL_FIFO_CLR BIT(1)
[PATCH 4.9 59/93] PCI: Ignore BAR updates on virtual functions
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Bjorn Helgaas[ Upstream commit 63880b230a4af502c56dde3d4588634c70c66006 ] VF BARs are read-only zero, so updating VF BARs will not have any effect. See the SR-IOV spec r1.1, sec 3.4.1.11. We already ignore these updates because of 70675e0b6a1a ("PCI: Don't try to restore VF BARs"); this merely restructures it slightly to make it easier to split updates for standard and SR-IOV BARs. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci.c |4 drivers/pci/setup-res.c |5 ++--- 2 files changed, 2 insertions(+), 7 deletions(-) --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -564,10 +564,6 @@ static void pci_restore_bars(struct pci_ { int i; - /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */ - if (dev->is_virtfn) - return; - for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) pci_update_resource(dev, i); } --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -34,10 +34,9 @@ static void pci_std_update_resource(stru int reg; struct resource *res = dev->resource + resno; - if (dev->is_virtfn) { - dev_warn(>dev, "can't update VF BAR%d\n", resno); + /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */ + if (dev->is_virtfn) return; - } /* * Ignore resources for unimplemented BARs and unused resource slots
[PATCH 4.9 59/93] PCI: Ignore BAR updates on virtual functions
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Bjorn Helgaas [ Upstream commit 63880b230a4af502c56dde3d4588634c70c66006 ] VF BARs are read-only zero, so updating VF BARs will not have any effect. See the SR-IOV spec r1.1, sec 3.4.1.11. We already ignore these updates because of 70675e0b6a1a ("PCI: Don't try to restore VF BARs"); this merely restructures it slightly to make it easier to split updates for standard and SR-IOV BARs. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci.c |4 drivers/pci/setup-res.c |5 ++--- 2 files changed, 2 insertions(+), 7 deletions(-) --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -564,10 +564,6 @@ static void pci_restore_bars(struct pci_ { int i; - /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */ - if (dev->is_virtfn) - return; - for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) pci_update_resource(dev, i); } --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -34,10 +34,9 @@ static void pci_std_update_resource(stru int reg; struct resource *res = dev->resource + resno; - if (dev->is_virtfn) { - dev_warn(>dev, "can't update VF BAR%d\n", resno); + /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */ + if (dev->is_virtfn) return; - } /* * Ignore resources for unimplemented BARs and unused resource slots
[PATCH 4.9 60/93] PCI: Do any VF BAR updates before enabling the BARs
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Gavin Shan[ Upstream commit f40ec3c748c6912f6266c56a7f7992de61b255ed ] Previously we enabled VFs and enable their memory space before calling pcibios_sriov_enable(). But pcibios_sriov_enable() may update the VF BARs: for example, on PPC PowerNV we may change them to manage the association of VFs to PEs. Because 64-bit BARs cannot be updated atomically, it's unsafe to update them while they're enabled. The half-updated state may conflict with other devices in the system. Call pcibios_sriov_enable() before enabling the VFs so any BAR updates happen while the VF BARs are disabled. [bhelgaas: changelog] Tested-by: Carol Soto Signed-off-by: Gavin Shan Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -306,13 +306,6 @@ static int sriov_enable(struct pci_dev * return rc; } - pci_iov_set_numvfs(dev, nr_virtfn); - iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; - pci_cfg_access_lock(dev); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); - msleep(100); - pci_cfg_access_unlock(dev); - iov->initial_VFs = initial; if (nr_virtfn < initial) initial = nr_virtfn; @@ -323,6 +316,13 @@ static int sriov_enable(struct pci_dev * goto err_pcibios; } + pci_iov_set_numvfs(dev, nr_virtfn); + iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; + pci_cfg_access_lock(dev); + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); + msleep(100); + pci_cfg_access_unlock(dev); + for (i = 0; i < initial; i++) { rc = pci_iov_add_virtfn(dev, i, 0); if (rc)
[PATCH 4.9 60/93] PCI: Do any VF BAR updates before enabling the BARs
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Gavin Shan [ Upstream commit f40ec3c748c6912f6266c56a7f7992de61b255ed ] Previously we enabled VFs and enable their memory space before calling pcibios_sriov_enable(). But pcibios_sriov_enable() may update the VF BARs: for example, on PPC PowerNV we may change them to manage the association of VFs to PEs. Because 64-bit BARs cannot be updated atomically, it's unsafe to update them while they're enabled. The half-updated state may conflict with other devices in the system. Call pcibios_sriov_enable() before enabling the VFs so any BAR updates happen while the VF BARs are disabled. [bhelgaas: changelog] Tested-by: Carol Soto Signed-off-by: Gavin Shan Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -306,13 +306,6 @@ static int sriov_enable(struct pci_dev * return rc; } - pci_iov_set_numvfs(dev, nr_virtfn); - iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; - pci_cfg_access_lock(dev); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); - msleep(100); - pci_cfg_access_unlock(dev); - iov->initial_VFs = initial; if (nr_virtfn < initial) initial = nr_virtfn; @@ -323,6 +316,13 @@ static int sriov_enable(struct pci_dev * goto err_pcibios; } + pci_iov_set_numvfs(dev, nr_virtfn); + iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; + pci_cfg_access_lock(dev); + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); + msleep(100); + pci_cfg_access_unlock(dev); + for (i = 0; i < initial; i++) { rc = pci_iov_add_virtfn(dev, i, 0); if (rc)
[PATCH 4.9 62/93] Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw() (v2)
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Vitaly Kuznetsov[ Upstream commit fa32ff6576623616c1751562edaed8c164ca5199 ] With wrap around mappings in place we can always provide drivers with direct links to packets on the ring buffer, even when they wrap around. Do the required updates to get_next_pkt_raw()/put_pkt_raw() The first version of this commit was reverted (65a532f3d50a) to deal with cross-tree merge issues which are (hopefully) resolved now. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Tested-by: Dexuan Cui Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/hyperv.h | 32 +++- 1 file changed, 11 insertions(+), 21 deletions(-) --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1548,31 +1548,23 @@ static inline struct vmpacket_descriptor get_next_pkt_raw(struct vmbus_channel *channel) { struct hv_ring_buffer_info *ring_info = >inbound; - u32 read_loc = ring_info->priv_read_index; + u32 priv_read_loc = ring_info->priv_read_index; void *ring_buffer = hv_get_ring_buffer(ring_info); - struct vmpacket_descriptor *cur_desc; - u32 packetlen; u32 dsize = ring_info->ring_datasize; - u32 delta = read_loc - ring_info->ring_buffer->read_index; + /* +* delta is the difference between what is available to read and +* what was already consumed in place. We commit read index after +* the whole batch is processed. +*/ + u32 delta = priv_read_loc >= ring_info->ring_buffer->read_index ? + priv_read_loc - ring_info->ring_buffer->read_index : + (dsize - ring_info->ring_buffer->read_index) + priv_read_loc; u32 bytes_avail_toread = (hv_get_bytes_to_read(ring_info) - delta); if (bytes_avail_toread < sizeof(struct vmpacket_descriptor)) return NULL; - if ((read_loc + sizeof(*cur_desc)) > dsize) - return NULL; - - cur_desc = ring_buffer + read_loc; - packetlen = cur_desc->len8 << 3; - - /* -* If the packet under consideration is wrapping around, -* return failure. -*/ - if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > (dsize - 1)) - return NULL; - - return cur_desc; + return ring_buffer + priv_read_loc; } /* @@ -1584,16 +1576,14 @@ static inline void put_pkt_raw(struct vm struct vmpacket_descriptor *desc) { struct hv_ring_buffer_info *ring_info = >inbound; - u32 read_loc = ring_info->priv_read_index; u32 packetlen = desc->len8 << 3; u32 dsize = ring_info->ring_datasize; - if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > dsize) - BUG(); /* * Include the packet trailer. */ ring_info->priv_read_index += packetlen + VMBUS_PKT_TRAILER; + ring_info->priv_read_index %= dsize; } /*
[PATCH 4.9 62/93] Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw() (v2)
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Vitaly Kuznetsov [ Upstream commit fa32ff6576623616c1751562edaed8c164ca5199 ] With wrap around mappings in place we can always provide drivers with direct links to packets on the ring buffer, even when they wrap around. Do the required updates to get_next_pkt_raw()/put_pkt_raw() The first version of this commit was reverted (65a532f3d50a) to deal with cross-tree merge issues which are (hopefully) resolved now. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Tested-by: Dexuan Cui Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/hyperv.h | 32 +++- 1 file changed, 11 insertions(+), 21 deletions(-) --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1548,31 +1548,23 @@ static inline struct vmpacket_descriptor get_next_pkt_raw(struct vmbus_channel *channel) { struct hv_ring_buffer_info *ring_info = >inbound; - u32 read_loc = ring_info->priv_read_index; + u32 priv_read_loc = ring_info->priv_read_index; void *ring_buffer = hv_get_ring_buffer(ring_info); - struct vmpacket_descriptor *cur_desc; - u32 packetlen; u32 dsize = ring_info->ring_datasize; - u32 delta = read_loc - ring_info->ring_buffer->read_index; + /* +* delta is the difference between what is available to read and +* what was already consumed in place. We commit read index after +* the whole batch is processed. +*/ + u32 delta = priv_read_loc >= ring_info->ring_buffer->read_index ? + priv_read_loc - ring_info->ring_buffer->read_index : + (dsize - ring_info->ring_buffer->read_index) + priv_read_loc; u32 bytes_avail_toread = (hv_get_bytes_to_read(ring_info) - delta); if (bytes_avail_toread < sizeof(struct vmpacket_descriptor)) return NULL; - if ((read_loc + sizeof(*cur_desc)) > dsize) - return NULL; - - cur_desc = ring_buffer + read_loc; - packetlen = cur_desc->len8 << 3; - - /* -* If the packet under consideration is wrapping around, -* return failure. -*/ - if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > (dsize - 1)) - return NULL; - - return cur_desc; + return ring_buffer + priv_read_loc; } /* @@ -1584,16 +1576,14 @@ static inline void put_pkt_raw(struct vm struct vmpacket_descriptor *desc) { struct hv_ring_buffer_info *ring_info = >inbound; - u32 read_loc = ring_info->priv_read_index; u32 packetlen = desc->len8 << 3; u32 dsize = ring_info->ring_datasize; - if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > dsize) - BUG(); /* * Include the packet trailer. */ ring_info->priv_read_index += packetlen + VMBUS_PKT_TRAILER; + ring_info->priv_read_index %= dsize; } /*
[PATCH 4.9 78/93] ACPI / blacklist: Make Dell Latitude 3350 ethernet work
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Michael Pobega[ Upstream commit 708f5dcc21ae9b35f395865fc154b0105baf4de4 ] The Dell Latitude 3350's ethernet card attempts to use a reserved IRQ (18), resulting in ACPI being unable to enable the ethernet. Adding it to acpi_rev_dmi_table[] helps to work around this problem. Signed-off-by: Michael Pobega [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/blacklist.c | 12 1 file changed, 12 insertions(+) --- a/drivers/acpi/blacklist.c +++ b/drivers/acpi/blacklist.c @@ -176,6 +176,18 @@ static struct dmi_system_id acpi_rev_dmi DMI_MATCH(DMI_PRODUCT_NAME, "Precision 3520"), }, }, + /* +* Resolves a quirk with the Dell Latitude 3350 that +* causes the ethernet adapter to not function. +*/ + { +.callback = dmi_enable_rev_override, +.ident = "DELL Latitude 3350", +.matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude 3350"), + }, + }, #endif {} };
[PATCH 4.9 78/93] ACPI / blacklist: Make Dell Latitude 3350 ethernet work
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Michael Pobega [ Upstream commit 708f5dcc21ae9b35f395865fc154b0105baf4de4 ] The Dell Latitude 3350's ethernet card attempts to use a reserved IRQ (18), resulting in ACPI being unable to enable the ethernet. Adding it to acpi_rev_dmi_table[] helps to work around this problem. Signed-off-by: Michael Pobega [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/blacklist.c | 12 1 file changed, 12 insertions(+) --- a/drivers/acpi/blacklist.c +++ b/drivers/acpi/blacklist.c @@ -176,6 +176,18 @@ static struct dmi_system_id acpi_rev_dmi DMI_MATCH(DMI_PRODUCT_NAME, "Precision 3520"), }, }, + /* +* Resolves a quirk with the Dell Latitude 3350 that +* causes the ethernet adapter to not function. +*/ + { +.callback = dmi_enable_rev_override, +.ident = "DELL Latitude 3350", +.matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude 3350"), + }, + }, #endif {} };
[PATCH 4.9 65/93] powerpc/iommu: Stop using @current in mm_iommu_xxx
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kardashevskiy[ Upstream commit d7baee6901b34c4895eb78efdbf13a49079d7404 ] This changes mm_iommu_xxx helpers to take mm_struct as a parameter instead of getting it from @current which in some situations may not have a valid reference to mm. This changes helpers to receive @mm and moves all references to @current to the caller, including checks for !current and !current->mm; checks in mm_iommu_preregistered() are removed as there is no caller yet. This moves the mm_iommu_adjust_locked_vm() call to the caller as it receives mm_iommu_table_group_mem_t but it needs mm. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Acked-by: Alex Williamson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/mmu_context.h | 16 ++- arch/powerpc/mm/mmu_context_iommu.c| 46 - drivers/vfio/vfio_iommu_spapr_tce.c| 14 +++--- 3 files changed, 36 insertions(+), 40 deletions(-) --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -19,16 +19,18 @@ extern void destroy_context(struct mm_st struct mm_iommu_table_group_mem_t; extern int isolate_lru_page(struct page *page);/* from internal.h */ -extern bool mm_iommu_preregistered(void); -extern long mm_iommu_get(unsigned long ua, unsigned long entries, +extern bool mm_iommu_preregistered(struct mm_struct *mm); +extern long mm_iommu_get(struct mm_struct *mm, + unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); +extern long mm_iommu_put(struct mm_struct *mm, + struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_init(struct mm_struct *mm); extern void mm_iommu_cleanup(struct mm_struct *mm); -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, - unsigned long size); -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, - unsigned long entries); +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct *mm, + unsigned long ua, unsigned long size); +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, + unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, unsigned long ua, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -56,7 +56,7 @@ static long mm_iommu_adjust_locked_vm(st } pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", - current->pid, + current ? current->pid : 0, incr ? '+' : '-', npages << PAGE_SHIFT, mm->locked_vm << PAGE_SHIFT, @@ -66,12 +66,9 @@ static long mm_iommu_adjust_locked_vm(st return ret; } -bool mm_iommu_preregistered(void) +bool mm_iommu_preregistered(struct mm_struct *mm) { - if (!current || !current->mm) - return false; - - return !list_empty(>mm->context.iommu_group_mem_list); + return !list_empty(>context.iommu_group_mem_list); } EXPORT_SYMBOL_GPL(mm_iommu_preregistered); @@ -124,19 +121,16 @@ static int mm_iommu_move_page_from_cma(s return 0; } -long mm_iommu_get(unsigned long ua, unsigned long entries, +long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem) { struct mm_iommu_table_group_mem_t *mem; long i, j, ret = 0, locked_entries = 0; struct page *page = NULL; - if (!current || !current->mm) - return -ESRCH; /* process exited */ - mutex_lock(_list_mutex); - list_for_each_entry_rcu(mem, >mm->context.iommu_group_mem_list, + list_for_each_entry_rcu(mem, >context.iommu_group_mem_list, next) { if ((mem->ua == ua) && (mem->entries == entries)) { ++mem->used; @@ -154,7 +148,7 @@ long mm_iommu_get(unsigned long ua, unsi } - ret = mm_iommu_adjust_locked_vm(current->mm, entries, true); + ret = mm_iommu_adjust_locked_vm(mm, entries, true); if (ret) goto unlock_exit; @@ -215,11 +209,11 @@ populate: mem->entries = entries; *pmem = mem; - list_add_rcu(>next,
[PATCH 4.9 65/93] powerpc/iommu: Stop using @current in mm_iommu_xxx
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kardashevskiy [ Upstream commit d7baee6901b34c4895eb78efdbf13a49079d7404 ] This changes mm_iommu_xxx helpers to take mm_struct as a parameter instead of getting it from @current which in some situations may not have a valid reference to mm. This changes helpers to receive @mm and moves all references to @current to the caller, including checks for !current and !current->mm; checks in mm_iommu_preregistered() are removed as there is no caller yet. This moves the mm_iommu_adjust_locked_vm() call to the caller as it receives mm_iommu_table_group_mem_t but it needs mm. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Acked-by: Alex Williamson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/mmu_context.h | 16 ++- arch/powerpc/mm/mmu_context_iommu.c| 46 - drivers/vfio/vfio_iommu_spapr_tce.c| 14 +++--- 3 files changed, 36 insertions(+), 40 deletions(-) --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -19,16 +19,18 @@ extern void destroy_context(struct mm_st struct mm_iommu_table_group_mem_t; extern int isolate_lru_page(struct page *page);/* from internal.h */ -extern bool mm_iommu_preregistered(void); -extern long mm_iommu_get(unsigned long ua, unsigned long entries, +extern bool mm_iommu_preregistered(struct mm_struct *mm); +extern long mm_iommu_get(struct mm_struct *mm, + unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); +extern long mm_iommu_put(struct mm_struct *mm, + struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_init(struct mm_struct *mm); extern void mm_iommu_cleanup(struct mm_struct *mm); -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, - unsigned long size); -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, - unsigned long entries); +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct *mm, + unsigned long ua, unsigned long size); +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, + unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, unsigned long ua, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -56,7 +56,7 @@ static long mm_iommu_adjust_locked_vm(st } pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", - current->pid, + current ? current->pid : 0, incr ? '+' : '-', npages << PAGE_SHIFT, mm->locked_vm << PAGE_SHIFT, @@ -66,12 +66,9 @@ static long mm_iommu_adjust_locked_vm(st return ret; } -bool mm_iommu_preregistered(void) +bool mm_iommu_preregistered(struct mm_struct *mm) { - if (!current || !current->mm) - return false; - - return !list_empty(>mm->context.iommu_group_mem_list); + return !list_empty(>context.iommu_group_mem_list); } EXPORT_SYMBOL_GPL(mm_iommu_preregistered); @@ -124,19 +121,16 @@ static int mm_iommu_move_page_from_cma(s return 0; } -long mm_iommu_get(unsigned long ua, unsigned long entries, +long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem) { struct mm_iommu_table_group_mem_t *mem; long i, j, ret = 0, locked_entries = 0; struct page *page = NULL; - if (!current || !current->mm) - return -ESRCH; /* process exited */ - mutex_lock(_list_mutex); - list_for_each_entry_rcu(mem, >mm->context.iommu_group_mem_list, + list_for_each_entry_rcu(mem, >context.iommu_group_mem_list, next) { if ((mem->ua == ua) && (mem->entries == entries)) { ++mem->used; @@ -154,7 +148,7 @@ long mm_iommu_get(unsigned long ua, unsi } - ret = mm_iommu_adjust_locked_vm(current->mm, entries, true); + ret = mm_iommu_adjust_locked_vm(mm, entries, true); if (ret) goto unlock_exit; @@ -215,11 +209,11 @@ populate: mem->entries = entries; *pmem = mem; - list_add_rcu(>next, >mm->context.iommu_group_mem_list); + list_add_rcu(>next, >context.iommu_group_mem_list); unlock_exit: if (locked_entries && ret) -
[PATCH 4.9 64/93] powerpc/iommu: Pass mm_struct to init/cleanup helpers
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kardashevskiy[ Upstream commit 88f54a3581eb9deaa3bd1aade40aef266d782385 ] We are going to get rid of @current references in mmu_context_boos3s64.c and cache mm_struct in the VFIO container. Since mm_context_t does not have reference counting, we will be using mm_struct which does have the reference counter. This changes mm_iommu_init/mm_iommu_cleanup to receive mm_struct rather than mm_context_t (which is embedded into mm). This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/mmu_context.h |4 ++-- arch/powerpc/kernel/setup-common.c |2 +- arch/powerpc/mm/mmu_context_book3s64.c |4 ++-- arch/powerpc/mm/mmu_context_iommu.c|9 + 4 files changed, 10 insertions(+), 9 deletions(-) --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -23,8 +23,8 @@ extern bool mm_iommu_preregistered(void) extern long mm_iommu_get(unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); -extern void mm_iommu_init(mm_context_t *ctx); -extern void mm_iommu_cleanup(mm_context_t *ctx); +extern void mm_iommu_init(struct mm_struct *mm); +extern void mm_iommu_cleanup(struct mm_struct *mm); extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, unsigned long size); extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -915,7 +915,7 @@ void __init setup_arch(char **cmdline_p) init_mm.context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(_mm.context); + mm_iommu_init(_mm); #endif irqstack_early_init(); exc_lvl_early_init(); --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -115,7 +115,7 @@ int init_new_context(struct task_struct mm->context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(>context); + mm_iommu_init(mm); #endif return 0; } @@ -160,7 +160,7 @@ static inline void destroy_pagetable_pag void destroy_context(struct mm_struct *mm) { #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_cleanup(>context); + mm_iommu_cleanup(mm); #endif #ifdef CONFIG_PPC_ICSWX --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -373,16 +373,17 @@ void mm_iommu_mapped_dec(struct mm_iommu } EXPORT_SYMBOL_GPL(mm_iommu_mapped_dec); -void mm_iommu_init(mm_context_t *ctx) +void mm_iommu_init(struct mm_struct *mm) { - INIT_LIST_HEAD_RCU(>iommu_group_mem_list); + INIT_LIST_HEAD_RCU(>context.iommu_group_mem_list); } -void mm_iommu_cleanup(mm_context_t *ctx) +void mm_iommu_cleanup(struct mm_struct *mm) { struct mm_iommu_table_group_mem_t *mem, *tmp; - list_for_each_entry_safe(mem, tmp, >iommu_group_mem_list, next) { + list_for_each_entry_safe(mem, tmp, >context.iommu_group_mem_list, + next) { list_del_rcu(>next); mm_iommu_do_free(mem); }
[PATCH 4.9 63/93] vfio/spapr: Postpone allocation of userspace version of TCE table
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kardashevskiy[ Upstream commit 39701e56f5f16ea0cf8fc9e8472e645f8de91d23 ] The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Acked-by: Alex Williamson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/vfio_iommu_spapr_tce.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -509,6 +509,12 @@ static long tce_iommu_build_v2(struct tc unsigned long hpa; enum dma_data_direction dirtmp; + if (!tbl->it_userspace) { + ret = tce_iommu_userspace_view_alloc(tbl); + if (ret) + return ret; + } + for (i = 0; i < pages; ++i) { struct mm_iommu_table_group_mem_t *mem = NULL; unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, @@ -582,15 +588,6 @@ static long tce_iommu_create_table(struc WARN_ON(!ret && !(*ptbl)->it_ops->free); WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size)); - if (!ret && container->v2) { - ret = tce_iommu_userspace_view_alloc(*ptbl); - if (ret) - (*ptbl)->it_ops->free(*ptbl); - } - - if (ret) - decrement_locked_vm(table_size >> PAGE_SHIFT); - return ret; } @@ -1062,10 +1059,7 @@ static int tce_iommu_take_ownership(stru if (!tbl || !tbl->it_map) continue; - rc = tce_iommu_userspace_view_alloc(tbl); - if (!rc) - rc = iommu_take_ownership(tbl); - + rc = iommu_take_ownership(tbl); if (rc) { for (j = 0; j < i; ++j) iommu_release_ownership(
[PATCH v5 0/7] Xen transport for 9pfs frontend driver
Hi all, This patch series implements a new transport for 9pfs, aimed at Xen systems. The transport is based on a traditional Xen frontend and backend drivers pair. This patch series implements the frontend, which typically runs in a regular unprivileged guest. I also sent a series that implements the backend in userspace in QEMU, which typically runs in Dom0 (but could also run in a another guest). The frontend complies to the Xen transport for 9pfs specification version 1, available here: https://xenbits.xen.org/docs/unstable/misc/9pfs.html Changes in v5: - test priv->tag instead of ret - run checkpatch.pl against the whole series, fix all issues - set intf->ring_order appropriately - use shorter link to 9pfs spec Changes in v4: - code style improvements - use xenbus_read_unsigned when possible - do not leak "versions" - introduce BUILD_BUG_ON - introduce rwlock to protect the xen_9pfs_devs list - add review-by Changes in v3: - add full copyright header to trans_xen.c - rename ring->ring to ring->data - handle gnttab_grant_foreign_access errors - remove ring->bytes - wrap long lines - add reviewed-by Changes in v2: - use XEN_PAGE_SHIFT instead of PAGE_SHIFT - remove unnecessary initializations - fix error paths - fix memory allocations for 64K kernels - simplify p9_xen_create and p9_xen_close - use virt_XXX barriers - set status = REQ_STATUS_ERROR inside the p9_xen_response loop - add in-code comments Stefano Stabellini (7): xen: import new ring macros in ring.h xen: introduce the header file for the Xen 9pfs transport protocol xen/9pfs: introduce Xen 9pfs transport driver xen/9pfs: connect to the backend xen/9pfs: send requests to the backend xen/9pfs: receive responses xen/9pfs: build 9pfs Xen transport driver include/xen/interface/io/9pfs.h | 42 include/xen/interface/io/ring.h | 131 ++ net/9p/Kconfig | 8 + net/9p/Makefile | 4 + net/9p/trans_xen.c | 541 5 files changed, 726 insertions(+) create mode 100644 include/xen/interface/io/9pfs.h create mode 100644 net/9p/trans_xen.c Cheers, Stefano
[PATCH 4.9 64/93] powerpc/iommu: Pass mm_struct to init/cleanup helpers
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kardashevskiy [ Upstream commit 88f54a3581eb9deaa3bd1aade40aef266d782385 ] We are going to get rid of @current references in mmu_context_boos3s64.c and cache mm_struct in the VFIO container. Since mm_context_t does not have reference counting, we will be using mm_struct which does have the reference counter. This changes mm_iommu_init/mm_iommu_cleanup to receive mm_struct rather than mm_context_t (which is embedded into mm). This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/mmu_context.h |4 ++-- arch/powerpc/kernel/setup-common.c |2 +- arch/powerpc/mm/mmu_context_book3s64.c |4 ++-- arch/powerpc/mm/mmu_context_iommu.c|9 + 4 files changed, 10 insertions(+), 9 deletions(-) --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -23,8 +23,8 @@ extern bool mm_iommu_preregistered(void) extern long mm_iommu_get(unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); -extern void mm_iommu_init(mm_context_t *ctx); -extern void mm_iommu_cleanup(mm_context_t *ctx); +extern void mm_iommu_init(struct mm_struct *mm); +extern void mm_iommu_cleanup(struct mm_struct *mm); extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, unsigned long size); extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -915,7 +915,7 @@ void __init setup_arch(char **cmdline_p) init_mm.context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(_mm.context); + mm_iommu_init(_mm); #endif irqstack_early_init(); exc_lvl_early_init(); --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -115,7 +115,7 @@ int init_new_context(struct task_struct mm->context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(>context); + mm_iommu_init(mm); #endif return 0; } @@ -160,7 +160,7 @@ static inline void destroy_pagetable_pag void destroy_context(struct mm_struct *mm) { #ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_cleanup(>context); + mm_iommu_cleanup(mm); #endif #ifdef CONFIG_PPC_ICSWX --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -373,16 +373,17 @@ void mm_iommu_mapped_dec(struct mm_iommu } EXPORT_SYMBOL_GPL(mm_iommu_mapped_dec); -void mm_iommu_init(mm_context_t *ctx) +void mm_iommu_init(struct mm_struct *mm) { - INIT_LIST_HEAD_RCU(>iommu_group_mem_list); + INIT_LIST_HEAD_RCU(>context.iommu_group_mem_list); } -void mm_iommu_cleanup(mm_context_t *ctx) +void mm_iommu_cleanup(struct mm_struct *mm) { struct mm_iommu_table_group_mem_t *mem, *tmp; - list_for_each_entry_safe(mem, tmp, >iommu_group_mem_list, next) { + list_for_each_entry_safe(mem, tmp, >context.iommu_group_mem_list, + next) { list_del_rcu(>next); mm_iommu_do_free(mem); }
[PATCH 4.9 63/93] vfio/spapr: Postpone allocation of userspace version of TCE table
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kardashevskiy [ Upstream commit 39701e56f5f16ea0cf8fc9e8472e645f8de91d23 ] The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Acked-by: Alex Williamson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/vfio_iommu_spapr_tce.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -509,6 +509,12 @@ static long tce_iommu_build_v2(struct tc unsigned long hpa; enum dma_data_direction dirtmp; + if (!tbl->it_userspace) { + ret = tce_iommu_userspace_view_alloc(tbl); + if (ret) + return ret; + } + for (i = 0; i < pages; ++i) { struct mm_iommu_table_group_mem_t *mem = NULL; unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, @@ -582,15 +588,6 @@ static long tce_iommu_create_table(struc WARN_ON(!ret && !(*ptbl)->it_ops->free); WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size)); - if (!ret && container->v2) { - ret = tce_iommu_userspace_view_alloc(*ptbl); - if (ret) - (*ptbl)->it_ops->free(*ptbl); - } - - if (ret) - decrement_locked_vm(table_size >> PAGE_SHIFT); - return ret; } @@ -1062,10 +1059,7 @@ static int tce_iommu_take_ownership(stru if (!tbl || !tbl->it_map) continue; - rc = tce_iommu_userspace_view_alloc(tbl); - if (!rc) - rc = iommu_take_ownership(tbl); - + rc = iommu_take_ownership(tbl); if (rc) { for (j = 0; j < i; ++j) iommu_release_ownership(
[PATCH v5 0/7] Xen transport for 9pfs frontend driver
Hi all, This patch series implements a new transport for 9pfs, aimed at Xen systems. The transport is based on a traditional Xen frontend and backend drivers pair. This patch series implements the frontend, which typically runs in a regular unprivileged guest. I also sent a series that implements the backend in userspace in QEMU, which typically runs in Dom0 (but could also run in a another guest). The frontend complies to the Xen transport for 9pfs specification version 1, available here: https://xenbits.xen.org/docs/unstable/misc/9pfs.html Changes in v5: - test priv->tag instead of ret - run checkpatch.pl against the whole series, fix all issues - set intf->ring_order appropriately - use shorter link to 9pfs spec Changes in v4: - code style improvements - use xenbus_read_unsigned when possible - do not leak "versions" - introduce BUILD_BUG_ON - introduce rwlock to protect the xen_9pfs_devs list - add review-by Changes in v3: - add full copyright header to trans_xen.c - rename ring->ring to ring->data - handle gnttab_grant_foreign_access errors - remove ring->bytes - wrap long lines - add reviewed-by Changes in v2: - use XEN_PAGE_SHIFT instead of PAGE_SHIFT - remove unnecessary initializations - fix error paths - fix memory allocations for 64K kernels - simplify p9_xen_create and p9_xen_close - use virt_XXX barriers - set status = REQ_STATUS_ERROR inside the p9_xen_response loop - add in-code comments Stefano Stabellini (7): xen: import new ring macros in ring.h xen: introduce the header file for the Xen 9pfs transport protocol xen/9pfs: introduce Xen 9pfs transport driver xen/9pfs: connect to the backend xen/9pfs: send requests to the backend xen/9pfs: receive responses xen/9pfs: build 9pfs Xen transport driver include/xen/interface/io/9pfs.h | 42 include/xen/interface/io/ring.h | 131 ++ net/9p/Kconfig | 8 + net/9p/Makefile | 4 + net/9p/trans_xen.c | 541 5 files changed, 726 insertions(+) create mode 100644 include/xen/interface/io/9pfs.h create mode 100644 net/9p/trans_xen.c Cheers, Stefano
[PATCH 4.9 56/93] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Bjorn Helgaas[ Upstream commit 7a6d312b50e63f598f5b5914c4fd21878ac2b595 ] Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE. PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if we're reading or writing a BAR register value, that's what we should use. IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/probe.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -227,7 +227,8 @@ int __pci_read_base(struct pci_dev *dev, mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK; } } else { - res->flags |= (l & IORESOURCE_ROM_ENABLE); + if (l & PCI_ROM_ADDRESS_ENABLE) + res->flags |= IORESOURCE_ROM_ENABLE; l64 = l & PCI_ROM_ADDRESS_MASK; sz64 = sz & PCI_ROM_ADDRESS_MASK; mask64 = (u32)PCI_ROM_ADDRESS_MASK;
[PATCH 4.9 56/93] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Bjorn Helgaas [ Upstream commit 7a6d312b50e63f598f5b5914c4fd21878ac2b595 ] Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE. PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if we're reading or writing a BAR register value, that's what we should use. IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/probe.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -227,7 +227,8 @@ int __pci_read_base(struct pci_dev *dev, mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK; } } else { - res->flags |= (l & IORESOURCE_ROM_ENABLE); + if (l & PCI_ROM_ADDRESS_ENABLE) + res->flags |= IORESOURCE_ROM_ENABLE; l64 = l & PCI_ROM_ADDRESS_MASK; sz64 = sz & PCI_ROM_ADDRESS_MASK; mask64 = (u32)PCI_ROM_ADDRESS_MASK;
[PATCH 4.9 93/93] crypto: powerpc - Fix initialisation of crc32c context
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Daniel Axtenscommit aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream. Turning on crypto self-tests on a POWER8 shows: alg: hash: Test 1 failed for crc32c-vpmsum : ff ff ff ff Comparing the code with the Intel CRC32c implementation on which ours is based shows that we are doing an init with 0, not ~0 as CRC32c requires. This probably wasn't caught because btrfs does its own weird open-coded initialisation. Initialise our internal context to ~0 on init. This makes the self-tests pass, and btrfs continues to work. Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: Anton Blanchard Signed-off-by: Daniel Axtens Acked-by: Anton Blanchard Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/crypto/crc32c-vpmsum_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -52,7 +52,7 @@ static int crc32c_vpmsum_cra_init(struct { u32 *key = crypto_tfm_ctx(tfm); - *key = 0; + *key = ~0; return 0; }
[PATCH 4.9 93/93] crypto: powerpc - Fix initialisation of crc32c context
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Daniel Axtens commit aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream. Turning on crypto self-tests on a POWER8 shows: alg: hash: Test 1 failed for crc32c-vpmsum : ff ff ff ff Comparing the code with the Intel CRC32c implementation on which ours is based shows that we are doing an init with 0, not ~0 as CRC32c requires. This probably wasn't caught because btrfs does its own weird open-coded initialisation. Initialise our internal context to ~0 on init. This makes the self-tests pass, and btrfs continues to work. Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: Anton Blanchard Signed-off-by: Daniel Axtens Acked-by: Anton Blanchard Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/crypto/crc32c-vpmsum_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -52,7 +52,7 @@ static int crc32c_vpmsum_cra_init(struct { u32 *key = crypto_tfm_ctx(tfm); - *key = 0; + *key = ~0; return 0; }
[PATCH 4.9 88/93] x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Andrey Ryabinincommit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu Signed-off-by: Andrey Ryabinin Cc: kasan-...@googlegroups.com Cc: Alexander Potapenko Cc: Andrew Morton Cc: l...@01.org Cc: Dmitry Vyukov Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabi...@virtuozzo.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head64.c|1 + arch/x86/mm/kasan_init_64.c |1 + 2 files changed, 2 insertions(+) --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -4,6 +4,7 @@ * Copyright (C) 2000 Andrea Arcangeli SuSE */ +#define DISABLE_BRANCH_PROFILING #include #include #include --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -1,3 +1,4 @@ +#define DISABLE_BRANCH_PROFILING #define pr_fmt(fmt) "kasan: " fmt #include #include
[PATCH 4.9 88/93] x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Andrey Ryabinin commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu Signed-off-by: Andrey Ryabinin Cc: kasan-...@googlegroups.com Cc: Alexander Potapenko Cc: Andrew Morton Cc: l...@01.org Cc: Dmitry Vyukov Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabi...@virtuozzo.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head64.c|1 + arch/x86/mm/kasan_init_64.c |1 + 2 files changed, 2 insertions(+) --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -4,6 +4,7 @@ * Copyright (C) 2000 Andrea Arcangeli SuSE */ +#define DISABLE_BRANCH_PROFILING #include #include #include --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -1,3 +1,4 @@ +#define DISABLE_BRANCH_PROFILING #define pr_fmt(fmt) "kasan: " fmt #include #include
[PATCH 4.10 03/63] net/mlx5e: Fix broken CQE compression initialization
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Tariq Toukan[ Upstream commit b0d4660b4cc52e6477ca3a43435351d565dfcedc ] Some of RQ type parameters are derived from CQE compression state flag, CQE compression flag was initialized only after RQ type parameters setup. This leads to load RQ with stride size smaller than what we want for when CQE compression is on. This bug introduces no functional damage, it only makes CQE compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Fix this by marking default status of CQE compression in PFLAG prior to calling mlx5e_set_rq_priv_params(), as it inits some fields based on it. Tested: load driver on systems where rx CQE compress will be on (MH) pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Tariq Toukan Cc: kernel-t...@fb.com Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3500,6 +3500,9 @@ static void mlx5e_build_nic_netdev_priv( cqe_compress_heuristic(link_speed, pci_bw); } + MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, + priv->params.rx_cqe_compress_def); + mlx5e_set_rq_priv_params(priv); if (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) priv->params.lro_en = true; @@ -3525,7 +3528,6 @@ static void mlx5e_build_nic_netdev_priv( /* Initialize pflags */ MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER, priv->params.rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE); - MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, priv->params.rx_cqe_compress_def); mutex_init(>state_lock);
[PATCH 4.10 03/63] net/mlx5e: Fix broken CQE compression initialization
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Tariq Toukan [ Upstream commit b0d4660b4cc52e6477ca3a43435351d565dfcedc ] Some of RQ type parameters are derived from CQE compression state flag, CQE compression flag was initialized only after RQ type parameters setup. This leads to load RQ with stride size smaller than what we want for when CQE compression is on. This bug introduces no functional damage, it only makes CQE compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Fix this by marking default status of CQE compression in PFLAG prior to calling mlx5e_set_rq_priv_params(), as it inits some fields based on it. Tested: load driver on systems where rx CQE compress will be on (MH) pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Tariq Toukan Cc: kernel-t...@fb.com Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3500,6 +3500,9 @@ static void mlx5e_build_nic_netdev_priv( cqe_compress_heuristic(link_speed, pci_bw); } + MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, + priv->params.rx_cqe_compress_def); + mlx5e_set_rq_priv_params(priv); if (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) priv->params.lro_en = true; @@ -3525,7 +3528,6 @@ static void mlx5e_build_nic_netdev_priv( /* Initialize pflags */ MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER, priv->params.rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE); - MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, priv->params.rx_cqe_compress_def); mutex_init(>state_lock);
[PATCH 4.9 92/93] locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Niklas Casselcommit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream. We hang if SIGKILL has been sent, but the task is stuck in down_read() (after do_exit()), even though no task is doing down_write() on the rwsem in question: INFO: task libupnp:21868 blocked for more than 120 seconds. libupnp D0 21868 1 0x0818 ... Call Trace: __schedule() schedule() __down_read() do_exit() do_group_exit() __wake_up_parent() This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in the following commit: 04cafed7fc19 ("locking/rwsem: Fix down_write_killable()") ... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y. Signed-off-by: Niklas Cassel Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Andrew Morton Cc: Linus Torvalds Cc: Niklas Cassel Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()") Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-nikl...@axis.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/locking/rwsem-spinlock.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- a/kernel/locking/rwsem-spinlock.c +++ b/kernel/locking/rwsem-spinlock.c @@ -216,10 +216,8 @@ int __sched __down_write_common(struct r */ if (sem->count == 0) break; - if (signal_pending_state(state, current)) { - ret = -EINTR; - goto out; - } + if (signal_pending_state(state, current)) + goto out_nolock; set_task_state(tsk, state); raw_spin_unlock_irqrestore(>wait_lock, flags); schedule(); @@ -227,12 +225,19 @@ int __sched __down_write_common(struct r } /* got the lock */ sem->count = -1; -out: list_del(); raw_spin_unlock_irqrestore(>wait_lock, flags); return ret; + +out_nolock: + list_del(); + if (!list_empty(>wait_list)) + __rwsem_do_wake(sem, 1); + raw_spin_unlock_irqrestore(>wait_lock, flags); + + return -EINTR; } void __sched __down_write(struct rw_semaphore *sem)
[PATCH 4.9 92/93] locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Niklas Cassel commit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream. We hang if SIGKILL has been sent, but the task is stuck in down_read() (after do_exit()), even though no task is doing down_write() on the rwsem in question: INFO: task libupnp:21868 blocked for more than 120 seconds. libupnp D0 21868 1 0x0818 ... Call Trace: __schedule() schedule() __down_read() do_exit() do_group_exit() __wake_up_parent() This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in the following commit: 04cafed7fc19 ("locking/rwsem: Fix down_write_killable()") ... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y. Signed-off-by: Niklas Cassel Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Andrew Morton Cc: Linus Torvalds Cc: Niklas Cassel Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()") Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-nikl...@axis.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/locking/rwsem-spinlock.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- a/kernel/locking/rwsem-spinlock.c +++ b/kernel/locking/rwsem-spinlock.c @@ -216,10 +216,8 @@ int __sched __down_write_common(struct r */ if (sem->count == 0) break; - if (signal_pending_state(state, current)) { - ret = -EINTR; - goto out; - } + if (signal_pending_state(state, current)) + goto out_nolock; set_task_state(tsk, state); raw_spin_unlock_irqrestore(>wait_lock, flags); schedule(); @@ -227,12 +225,19 @@ int __sched __down_write_common(struct r } /* got the lock */ sem->count = -1; -out: list_del(); raw_spin_unlock_irqrestore(>wait_lock, flags); return ret; + +out_nolock: + list_del(); + if (!list_empty(>wait_list)) + __rwsem_do_wake(sem, 1); + raw_spin_unlock_irqrestore(>wait_lock, flags); + + return -EINTR; } void __sched __down_write(struct rw_semaphore *sem)
Re: [RFC v3 1/5] sched/core: add capacity constraints to CPU controller
Hello, On Tue, Feb 28, 2017 at 02:38:38PM +, Patrick Bellasi wrote: > This patch extends the CPU controller by adding a couple of new > attributes, capacity_min and capacity_max, which can be used to enforce > bandwidth boosting and capping. More specifically: > > - capacity_min: defines the minimum capacity which should be granted > (by schedutil) when a task in this group is running, > i.e. the task will run at least at that capacity > > - capacity_max: defines the maximum capacity which can be granted > (by schedutil) when a task in this group is running, > i.e. the task can run up to that capacity cpu.capacity.min and cpu.capacity.max are the more conventional names. I'm not sure about the name capacity as it doesn't encode what it does and is difficult to tell apart from cpu bandwidth limits. I think it'd be better to represent what it controls more explicitly. > These attributes: > a) are tunable at all hierarchy levels, i.e. root group too This usually is problematic because there should be a non-cgroup way of configuring the feature in case cgroup isn't configured or used, and it becomes awkward to have two separate mechanisms configuring the same thing. Maybe the feature is cgroup specific enough that it makes sense here but this needs more explanation / justification. > b) allow to create subgroups of tasks which are not violating the >capacity constraints defined by the parent group. >Thus, tasks on a subgroup can only be more boosted and/or more For both limits and protections, the parent caps the maximum the children can get. At least that's what memcg does for memory.low. Doing that makes sense for memcg because for memory the parent can still do protections regardless of what its children are doing and it makes delegation safe by default. I understand why you would want a property like capacity to be the other direction as that way you get more specific as you walk down the tree for both limits and protections; however, I think we need to think a bit more about it and ensure that the resulting interface isn't confusing. Would it work for capacity to behave the other direction - ie. a parent's min restricting the highest min that its descendants can get? It's completely fine if that's weird. Thanks. -- tejun
Re: [RFC v3 1/5] sched/core: add capacity constraints to CPU controller
Hello, On Tue, Feb 28, 2017 at 02:38:38PM +, Patrick Bellasi wrote: > This patch extends the CPU controller by adding a couple of new > attributes, capacity_min and capacity_max, which can be used to enforce > bandwidth boosting and capping. More specifically: > > - capacity_min: defines the minimum capacity which should be granted > (by schedutil) when a task in this group is running, > i.e. the task will run at least at that capacity > > - capacity_max: defines the maximum capacity which can be granted > (by schedutil) when a task in this group is running, > i.e. the task can run up to that capacity cpu.capacity.min and cpu.capacity.max are the more conventional names. I'm not sure about the name capacity as it doesn't encode what it does and is difficult to tell apart from cpu bandwidth limits. I think it'd be better to represent what it controls more explicitly. > These attributes: > a) are tunable at all hierarchy levels, i.e. root group too This usually is problematic because there should be a non-cgroup way of configuring the feature in case cgroup isn't configured or used, and it becomes awkward to have two separate mechanisms configuring the same thing. Maybe the feature is cgroup specific enough that it makes sense here but this needs more explanation / justification. > b) allow to create subgroups of tasks which are not violating the >capacity constraints defined by the parent group. >Thus, tasks on a subgroup can only be more boosted and/or more For both limits and protections, the parent caps the maximum the children can get. At least that's what memcg does for memory.low. Doing that makes sense for memcg because for memory the parent can still do protections regardless of what its children are doing and it makes delegation safe by default. I understand why you would want a property like capacity to be the other direction as that way you get more specific as you walk down the tree for both limits and protections; however, I think we need to think a bit more about it and ensure that the resulting interface isn't confusing. Would it work for capacity to behave the other direction - ie. a parent's min restricting the highest min that its descendants can get? It's completely fine if that's weird. Thanks. -- tejun
Re: [PATCH v1 0/3] ioremap() tidy-up
On Fri, Mar 17, 2017 at 11:00:03PM +0100, Arnd Bergmann wrote: > On Fri, Mar 17, 2017 at 6:46 PM, Bjorn Helgaaswrote: > > 1) Fix some comments that say "IOMMU" when they mean "MMU". > > > > 2) Remove the generic __ioremap() definition, which I think is unused and > > confusing. > > > > 3) Simplify the comments about ioremap() implementation. I split this out > > in case I went too far and made this controversial. > > All look good > > Reviewed-by: Arnd Bergmann > > Do you want to take them through your PCI tree? Sure, I can. I should have copied linux-pci; this was all motivated by reviewing Lorenzo's patches, will probably go via my tree and should be coordinated with these. Bjorn
[PATCH 4.10 02/63] net/mlx5e: Do not reduce LRO WQE size when not using build_skb
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Tariq Toukan[ Upstream commit 4078e637c12f1e0a74293f1ec9563f42bff14a03 ] When rq_type is Striding RQ, no room of SKB_RESERVE is needed as SKB allocation is not done via build_skb. Fixes: e4b85508072b ("net/mlx5e: Slightly reduce hardware LRO size") Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -81,6 +81,7 @@ static bool mlx5e_check_fragmented_strid static void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) { priv->params.rq_wq_type = rq_type; + priv->params.lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ; switch (priv->params.rq_wq_type) { case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: priv->params.log_rq_size = MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW; @@ -93,6 +94,10 @@ static void mlx5e_set_rq_type_params(str break; default: /* MLX5_WQ_TYPE_LINKED_LIST */ priv->params.log_rq_size = MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE; + + /* Extra room needed for build_skb */ + priv->params.lro_wqe_sz -= MLX5_RX_HEADROOM + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); } priv->params.min_rx_wqes = mlx5_min_rx_wqes(priv->params.rq_wq_type, BIT(priv->params.log_rq_size)); @@ -3517,12 +3522,6 @@ static void mlx5e_build_nic_netdev_priv( mlx5e_build_default_indir_rqt(mdev, priv->params.indirection_rqt, MLX5E_INDIR_RQT_SIZE, profile->max_nch(mdev)); - priv->params.lro_wqe_sz = - MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ - - /* Extra room needed for build_skb */ - MLX5_RX_HEADROOM - - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - /* Initialize pflags */ MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER, priv->params.rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
Re: [PATCH v1 0/3] ioremap() tidy-up
On Fri, Mar 17, 2017 at 11:00:03PM +0100, Arnd Bergmann wrote: > On Fri, Mar 17, 2017 at 6:46 PM, Bjorn Helgaas wrote: > > 1) Fix some comments that say "IOMMU" when they mean "MMU". > > > > 2) Remove the generic __ioremap() definition, which I think is unused and > > confusing. > > > > 3) Simplify the comments about ioremap() implementation. I split this out > > in case I went too far and made this controversial. > > All look good > > Reviewed-by: Arnd Bergmann > > Do you want to take them through your PCI tree? Sure, I can. I should have copied linux-pci; this was all motivated by reviewing Lorenzo's patches, will probably go via my tree and should be coordinated with these. Bjorn
[PATCH 4.10 02/63] net/mlx5e: Do not reduce LRO WQE size when not using build_skb
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Tariq Toukan [ Upstream commit 4078e637c12f1e0a74293f1ec9563f42bff14a03 ] When rq_type is Striding RQ, no room of SKB_RESERVE is needed as SKB allocation is not done via build_skb. Fixes: e4b85508072b ("net/mlx5e: Slightly reduce hardware LRO size") Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -81,6 +81,7 @@ static bool mlx5e_check_fragmented_strid static void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) { priv->params.rq_wq_type = rq_type; + priv->params.lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ; switch (priv->params.rq_wq_type) { case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: priv->params.log_rq_size = MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW; @@ -93,6 +94,10 @@ static void mlx5e_set_rq_type_params(str break; default: /* MLX5_WQ_TYPE_LINKED_LIST */ priv->params.log_rq_size = MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE; + + /* Extra room needed for build_skb */ + priv->params.lro_wqe_sz -= MLX5_RX_HEADROOM + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); } priv->params.min_rx_wqes = mlx5_min_rx_wqes(priv->params.rq_wq_type, BIT(priv->params.log_rq_size)); @@ -3517,12 +3522,6 @@ static void mlx5e_build_nic_netdev_priv( mlx5e_build_default_indir_rqt(mdev, priv->params.indirection_rqt, MLX5E_INDIR_RQT_SIZE, profile->max_nch(mdev)); - priv->params.lro_wqe_sz = - MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ - - /* Extra room needed for build_skb */ - MLX5_RX_HEADROOM - - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - /* Initialize pflags */ MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER, priv->params.rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
[PATCH 4.10 15/63] vxlan: lock RCU on TX path
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Jakub Kicinski[ Upstream commit 56de859e9967c070464a9a9f4f18d73f9447298e ] There is no guarantees that callers of the TX path will hold the RCU lock. Grab it explicitly. Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.") Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2062,6 +2062,7 @@ static void vxlan_xmit_one(struct sk_buf src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min, vxlan->cfg.port_max, true); + rcu_read_lock(); if (dst->sa.sa_family == AF_INET) { struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock); struct rtable *rt; @@ -2084,7 +2085,7 @@ static void vxlan_xmit_one(struct sk_buf dst_port, vni, >dst, rt->rt_flags); if (err) - return; + goto out_unlock; } else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) { df = htons(IP_DF); } @@ -2123,7 +2124,7 @@ static void vxlan_xmit_one(struct sk_buf dst_port, vni, ndst, rt6i_flags); if (err) - return; + goto out_unlock; } tos = ip_tunnel_ecn_encap(tos, old_iph, skb); @@ -2140,6 +2141,8 @@ static void vxlan_xmit_one(struct sk_buf label, src_port, dst_port, !udp_sum); #endif } +out_unlock: + rcu_read_unlock(); return; drop: @@ -2148,6 +2151,7 @@ drop: return; tx_error: + rcu_read_unlock(); if (err == -ELOOP) dev->stats.collisions++; else if (err == -ENETUNREACH)
[PATCH 4.10 17/63] mlxsw: spectrum_router: Avoid potential packets loss
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Ido Schimmel[ Upstream commit f7df4923fa986247e93ec2cdff5ca168fff14dcf ] When the structure of the LPM tree changes (f.e., due to the addition of a new prefix), we unbind the old tree and then bind the new one. This may result in temporary packet loss. Instead, overwrite the old binding with the new one. Fixes: 6b75c4807db3 ("mlxsw: spectrum_router: Add virtual router management") Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -496,30 +496,40 @@ static int mlxsw_sp_vr_lpm_tree_check(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_vr *vr, struct mlxsw_sp_prefix_usage *req_prefix_usage) { - struct mlxsw_sp_lpm_tree *lpm_tree; + struct mlxsw_sp_lpm_tree *lpm_tree = vr->lpm_tree; + struct mlxsw_sp_lpm_tree *new_tree; + int err; - if (mlxsw_sp_prefix_usage_eq(req_prefix_usage, ->lpm_tree->prefix_usage)) + if (mlxsw_sp_prefix_usage_eq(req_prefix_usage, _tree->prefix_usage)) return 0; - lpm_tree = mlxsw_sp_lpm_tree_get(mlxsw_sp, req_prefix_usage, + new_tree = mlxsw_sp_lpm_tree_get(mlxsw_sp, req_prefix_usage, vr->proto, false); - if (IS_ERR(lpm_tree)) { + if (IS_ERR(new_tree)) { /* We failed to get a tree according to the required * prefix usage. However, the current tree might be still good * for us if our requirement is subset of the prefixes used * in the tree. */ if (mlxsw_sp_prefix_usage_subset(req_prefix_usage, ->lpm_tree->prefix_usage)) +_tree->prefix_usage)) return 0; - return PTR_ERR(lpm_tree); + return PTR_ERR(new_tree); } - mlxsw_sp_vr_lpm_tree_unbind(mlxsw_sp, vr); - mlxsw_sp_lpm_tree_put(mlxsw_sp, vr->lpm_tree); + /* Prevent packet loss by overwriting existing binding */ + vr->lpm_tree = new_tree; + err = mlxsw_sp_vr_lpm_tree_bind(mlxsw_sp, vr); + if (err) + goto err_tree_bind; + mlxsw_sp_lpm_tree_put(mlxsw_sp, lpm_tree); + + return 0; + +err_tree_bind: vr->lpm_tree = lpm_tree; - return mlxsw_sp_vr_lpm_tree_bind(mlxsw_sp, vr); + mlxsw_sp_lpm_tree_put(mlxsw_sp, new_tree); + return err; } static struct mlxsw_sp_vr *mlxsw_sp_vr_get(struct mlxsw_sp *mlxsw_sp,
[PATCH 4.10 15/63] vxlan: lock RCU on TX path
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Jakub Kicinski [ Upstream commit 56de859e9967c070464a9a9f4f18d73f9447298e ] There is no guarantees that callers of the TX path will hold the RCU lock. Grab it explicitly. Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.") Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2062,6 +2062,7 @@ static void vxlan_xmit_one(struct sk_buf src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min, vxlan->cfg.port_max, true); + rcu_read_lock(); if (dst->sa.sa_family == AF_INET) { struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock); struct rtable *rt; @@ -2084,7 +2085,7 @@ static void vxlan_xmit_one(struct sk_buf dst_port, vni, >dst, rt->rt_flags); if (err) - return; + goto out_unlock; } else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) { df = htons(IP_DF); } @@ -2123,7 +2124,7 @@ static void vxlan_xmit_one(struct sk_buf dst_port, vni, ndst, rt6i_flags); if (err) - return; + goto out_unlock; } tos = ip_tunnel_ecn_encap(tos, old_iph, skb); @@ -2140,6 +2141,8 @@ static void vxlan_xmit_one(struct sk_buf label, src_port, dst_port, !udp_sum); #endif } +out_unlock: + rcu_read_unlock(); return; drop: @@ -2148,6 +2151,7 @@ drop: return; tx_error: + rcu_read_unlock(); if (err == -ELOOP) dev->stats.collisions++; else if (err == -ENETUNREACH)
[PATCH 4.10 17/63] mlxsw: spectrum_router: Avoid potential packets loss
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Ido Schimmel [ Upstream commit f7df4923fa986247e93ec2cdff5ca168fff14dcf ] When the structure of the LPM tree changes (f.e., due to the addition of a new prefix), we unbind the old tree and then bind the new one. This may result in temporary packet loss. Instead, overwrite the old binding with the new one. Fixes: 6b75c4807db3 ("mlxsw: spectrum_router: Add virtual router management") Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -496,30 +496,40 @@ static int mlxsw_sp_vr_lpm_tree_check(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_vr *vr, struct mlxsw_sp_prefix_usage *req_prefix_usage) { - struct mlxsw_sp_lpm_tree *lpm_tree; + struct mlxsw_sp_lpm_tree *lpm_tree = vr->lpm_tree; + struct mlxsw_sp_lpm_tree *new_tree; + int err; - if (mlxsw_sp_prefix_usage_eq(req_prefix_usage, ->lpm_tree->prefix_usage)) + if (mlxsw_sp_prefix_usage_eq(req_prefix_usage, _tree->prefix_usage)) return 0; - lpm_tree = mlxsw_sp_lpm_tree_get(mlxsw_sp, req_prefix_usage, + new_tree = mlxsw_sp_lpm_tree_get(mlxsw_sp, req_prefix_usage, vr->proto, false); - if (IS_ERR(lpm_tree)) { + if (IS_ERR(new_tree)) { /* We failed to get a tree according to the required * prefix usage. However, the current tree might be still good * for us if our requirement is subset of the prefixes used * in the tree. */ if (mlxsw_sp_prefix_usage_subset(req_prefix_usage, ->lpm_tree->prefix_usage)) +_tree->prefix_usage)) return 0; - return PTR_ERR(lpm_tree); + return PTR_ERR(new_tree); } - mlxsw_sp_vr_lpm_tree_unbind(mlxsw_sp, vr); - mlxsw_sp_lpm_tree_put(mlxsw_sp, vr->lpm_tree); + /* Prevent packet loss by overwriting existing binding */ + vr->lpm_tree = new_tree; + err = mlxsw_sp_vr_lpm_tree_bind(mlxsw_sp, vr); + if (err) + goto err_tree_bind; + mlxsw_sp_lpm_tree_put(mlxsw_sp, lpm_tree); + + return 0; + +err_tree_bind: vr->lpm_tree = lpm_tree; - return mlxsw_sp_vr_lpm_tree_bind(mlxsw_sp, vr); + mlxsw_sp_lpm_tree_put(mlxsw_sp, new_tree); + return err; } static struct mlxsw_sp_vr *mlxsw_sp_vr_get(struct mlxsw_sp *mlxsw_sp,
[PATCH 4.10 05/63] net/mlx5e: Fix wrong CQE decompression
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Tariq Toukan[ Upstream commit 36154be40a28e4afaa0416da2681d80b7e2ca319 ] In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wrong pointer to wqe (rx descriptor) and creating SKBs with wrong data, pointing to wrong (and already consumed) strides/pages. The meaning of the CQE field wqe_counter in striding RQ holds the stride index instead of the WQE index. Hence, when decompressing a CQE, wqe_counter should have wrapped-around the number of strides in a single multi-packet WQE. We dropped this wrap-around mask at all in CQE decompression of striding RQ. It is not needed as in such cases the CQE compression session would break because of different value of wqe_id field, starting a new compression session. Tested: ethtool -K ethxx lro off/on ethtool --set-priv-flags ethxx rx_cqe_compress on super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D verified no csum errors and no page refcount issues. Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Tariq Toukan Reported-by: Tom Herbert Cc: kernel-t...@fb.com Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -92,19 +92,18 @@ static inline void mlx5e_cqes_update_own static inline void mlx5e_decompress_cqe(struct mlx5e_rq *rq, struct mlx5e_cq *cq, u32 cqcc) { - u16 wqe_cnt_step; - cq->title.byte_cnt = cq->mini_arr[cq->mini_arr_idx].byte_cnt; cq->title.check_sum= cq->mini_arr[cq->mini_arr_idx].checksum; cq->title.op_own &= 0xf0; cq->title.op_own |= 0x01 & (cqcc >> cq->wq.log_sz); cq->title.wqe_counter = cpu_to_be16(cq->decmprs_wqe_counter); - wqe_cnt_step = - rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ ? - mpwrq_get_cqe_consumed_strides(>title) : 1; - cq->decmprs_wqe_counter = - (cq->decmprs_wqe_counter + wqe_cnt_step) & rq->wq.sz_m1; + if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) + cq->decmprs_wqe_counter += + mpwrq_get_cqe_consumed_strides(>title); + else + cq->decmprs_wqe_counter = + (cq->decmprs_wqe_counter + 1) & rq->wq.sz_m1; } static inline void mlx5e_decompress_cqe_no_hash(struct mlx5e_rq *rq,
[PATCH 4.10 05/63] net/mlx5e: Fix wrong CQE decompression
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Tariq Toukan [ Upstream commit 36154be40a28e4afaa0416da2681d80b7e2ca319 ] In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wrong pointer to wqe (rx descriptor) and creating SKBs with wrong data, pointing to wrong (and already consumed) strides/pages. The meaning of the CQE field wqe_counter in striding RQ holds the stride index instead of the WQE index. Hence, when decompressing a CQE, wqe_counter should have wrapped-around the number of strides in a single multi-packet WQE. We dropped this wrap-around mask at all in CQE decompression of striding RQ. It is not needed as in such cases the CQE compression session would break because of different value of wqe_id field, starting a new compression session. Tested: ethtool -K ethxx lro off/on ethtool --set-priv-flags ethxx rx_cqe_compress on super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D verified no csum errors and no page refcount issues. Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Tariq Toukan Reported-by: Tom Herbert Cc: kernel-t...@fb.com Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -92,19 +92,18 @@ static inline void mlx5e_cqes_update_own static inline void mlx5e_decompress_cqe(struct mlx5e_rq *rq, struct mlx5e_cq *cq, u32 cqcc) { - u16 wqe_cnt_step; - cq->title.byte_cnt = cq->mini_arr[cq->mini_arr_idx].byte_cnt; cq->title.check_sum= cq->mini_arr[cq->mini_arr_idx].checksum; cq->title.op_own &= 0xf0; cq->title.op_own |= 0x01 & (cqcc >> cq->wq.log_sz); cq->title.wqe_counter = cpu_to_be16(cq->decmprs_wqe_counter); - wqe_cnt_step = - rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ ? - mpwrq_get_cqe_consumed_strides(>title) : 1; - cq->decmprs_wqe_counter = - (cq->decmprs_wqe_counter + wqe_cnt_step) & rq->wq.sz_m1; + if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) + cq->decmprs_wqe_counter += + mpwrq_get_cqe_consumed_strides(>title); + else + cq->decmprs_wqe_counter = + (cq->decmprs_wqe_counter + 1) & rq->wq.sz_m1; } static inline void mlx5e_decompress_cqe_no_hash(struct mlx5e_rq *rq,
[PATCH 4.10 16/63] geneve: lock RCU on TX path
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Jakub Kicinski[ Upstream commit a717e3f740803cc88bd5c9a70c93504f6a368663 ] There is no guarantees that callers of the TX path will hold the RCU lock. Grab it explicitly. Fixes: fceb9c3e3825 ("geneve: avoid using stale geneve socket.") Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/geneve.c |2 ++ 1 file changed, 2 insertions(+) --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -881,12 +881,14 @@ static netdev_tx_t geneve_xmit(struct sk info = >info; } + rcu_read_lock(); #if IS_ENABLED(CONFIG_IPV6) if (info->mode & IP_TUNNEL_INFO_IPV6) err = geneve6_xmit_skb(skb, dev, geneve, info); else #endif err = geneve_xmit_skb(skb, dev, geneve, info); + rcu_read_unlock(); if (likely(!err)) return NETDEV_TX_OK;
[PATCH 4.10 16/63] geneve: lock RCU on TX path
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Jakub Kicinski [ Upstream commit a717e3f740803cc88bd5c9a70c93504f6a368663 ] There is no guarantees that callers of the TX path will hold the RCU lock. Grab it explicitly. Fixes: fceb9c3e3825 ("geneve: avoid using stale geneve socket.") Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/geneve.c |2 ++ 1 file changed, 2 insertions(+) --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -881,12 +881,14 @@ static netdev_tx_t geneve_xmit(struct sk info = >info; } + rcu_read_lock(); #if IS_ENABLED(CONFIG_IPV6) if (info->mode & IP_TUNNEL_INFO_IPV6) err = geneve6_xmit_skb(skb, dev, geneve, info); else #endif err = geneve_xmit_skb(skb, dev, geneve, info); + rcu_read_unlock(); if (likely(!err)) return NETDEV_TX_OK;
[PATCH 4.10 20/63] net: dont call strlen() on the user buffer in packet_bind_spkt()
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Alexander Potapenko[ Upstream commit 540e2894f7905538740aaf122bd8e0548e1c34a4 ] KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in packet_bind_spkt(): Acked-by: Eric Dumazet == BUG: KMSAN: use of unitialized memory CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88006b6dfc08 82559ae8 88006b6dfb48 818a7c91 85b9c870 0092 85b9c550 0092 ec400911 0002 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x238/0x290 lib/dump_stack.c:51 [] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003 [] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424 [< inline >] strlen lib/string.c:484 [] strlcpy+0x9d/0x200 lib/string.c:144 [] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132 [] SYSC_bind+0x40d/0x5f0 net/socket.c:1370 [] SyS_bind+0x82/0xa0 net/socket.c:1356 [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? chained origin: eba00911 [] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322 [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334 [] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527 [] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380 [] SYSC_bind+0x129/0x5f0 net/socket.c:1356 [] SyS_bind+0x82/0xa0 net/socket.c:1356 [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? origin description: address@SYSC_bind (origin=eb400911) == (the line numbers are relative to 4.8-rc6, but the bug persists upstream) , when I run the following program as root: = #include #include #include #include int main() { struct sockaddr addr; memset(, 0xff, sizeof(addr)); addr.sa_family = AF_PACKET; int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL)); bind(fd, , sizeof(addr)); return 0; } = This happens because addr.sa_data copied from the userspace is not zero-terminated, and copying it with strlcpy() in packet_bind_spkt() results in calling strlen() on the kernel copy of that non-terminated buffer. Signed-off-by: Alexander Potapenko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3082,7 +3082,7 @@ static int packet_bind_spkt(struct socke int addr_len) { struct sock *sk = sock->sk; - char name[15]; + char name[sizeof(uaddr->sa_data) + 1]; /* * Check legality @@ -3090,7 +3090,11 @@ static int packet_bind_spkt(struct socke if (addr_len != sizeof(struct sockaddr)) return -EINVAL; - strlcpy(name, uaddr->sa_data, sizeof(name)); + /* uaddr->sa_data comes from the userspace, it's not guaranteed to be +* zero-terminated. +*/ + memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data)); + name[sizeof(uaddr->sa_data)] = 0; return packet_do_bind(sk, name, 0, pkt_sk(sk)->num); }
[PATCH 4.10 20/63] net: dont call strlen() on the user buffer in packet_bind_spkt()
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Alexander Potapenko [ Upstream commit 540e2894f7905538740aaf122bd8e0548e1c34a4 ] KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in packet_bind_spkt(): Acked-by: Eric Dumazet == BUG: KMSAN: use of unitialized memory CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88006b6dfc08 82559ae8 88006b6dfb48 818a7c91 85b9c870 0092 85b9c550 0092 ec400911 0002 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x238/0x290 lib/dump_stack.c:51 [] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003 [] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424 [< inline >] strlen lib/string.c:484 [] strlcpy+0x9d/0x200 lib/string.c:144 [] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132 [] SYSC_bind+0x40d/0x5f0 net/socket.c:1370 [] SyS_bind+0x82/0xa0 net/socket.c:1356 [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? chained origin: eba00911 [] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322 [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334 [] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527 [] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380 [] SYSC_bind+0x129/0x5f0 net/socket.c:1356 [] SyS_bind+0x82/0xa0 net/socket.c:1356 [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? origin description: address@SYSC_bind (origin=eb400911) == (the line numbers are relative to 4.8-rc6, but the bug persists upstream) , when I run the following program as root: = #include #include #include #include int main() { struct sockaddr addr; memset(, 0xff, sizeof(addr)); addr.sa_family = AF_PACKET; int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL)); bind(fd, , sizeof(addr)); return 0; } = This happens because addr.sa_data copied from the userspace is not zero-terminated, and copying it with strlcpy() in packet_bind_spkt() results in calling strlen() on the kernel copy of that non-terminated buffer. Signed-off-by: Alexander Potapenko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3082,7 +3082,7 @@ static int packet_bind_spkt(struct socke int addr_len) { struct sock *sk = sock->sk; - char name[15]; + char name[sizeof(uaddr->sa_data) + 1]; /* * Check legality @@ -3090,7 +3090,11 @@ static int packet_bind_spkt(struct socke if (addr_len != sizeof(struct sockaddr)) return -EINVAL; - strlcpy(name, uaddr->sa_data, sizeof(name)); + /* uaddr->sa_data comes from the userspace, it's not guaranteed to be +* zero-terminated. +*/ + memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data)); + name[sizeof(uaddr->sa_data)] = 0; return packet_do_bind(sk, name, 0, pkt_sk(sk)->num); }
[PATCH 4.10 22/63] ipv6: orphan skbs in reassembly unit
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Eric Dumazet[ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ] Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer == BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr 880062da0060 by task a.out/4140 page:ea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x1008100(slab|head) raw: 01008100 000180130013 raw: dead0100 dead0200 88006741f140 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:7ff268e0ed98 EFLAGS: 0206 ORIG_RAX: 0001 RAX: ffda RBX: 7ff268e0f9c0 RCX: 7ff26e6f5b79 RDX: 0010 RSI: 20f50fe1 RDI: 0003 RBP: 7ff26ebc1220 R08: R09: R10: R11: 0206 R12: R13: 7ff268e0f9c0 R14: 7ff26efec040 R15: 0003 The buggy address belongs to the object at 880062da which belongs to the cache RAWv6 of size 1504 The buggy address 880062da0060 is located 96 bytes inside of 1504-byte region [880062da, 880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0
[PATCH 4.10 22/63] ipv6: orphan skbs in reassembly unit
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Eric Dumazet [ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ] Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer == BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr 880062da0060 by task a.out/4140 page:ea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x1008100(slab|head) raw: 01008100 000180130013 raw: dead0100 dead0200 88006741f140 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:7ff268e0ed98 EFLAGS: 0206 ORIG_RAX: 0001 RAX: ffda RBX: 7ff268e0f9c0 RCX: 7ff26e6f5b79 RDX: 0010 RSI: 20f50fe1 RDI: 0003 RBP: 7ff26ebc1220 R08: R09: R10: R11: 0206 R12: R13: 7ff268e0f9c0 R14: 7ff26efec040 R15: 0003 The buggy address belongs to the object at 880062da which belongs to the cache RAWv6 of size 1504 The buggy address 880062da0060 is located 96 bytes inside of 1504-byte region [880062da, 880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track
[PATCH 4.10 23/63] dccp: Unlock sock before calling sk_free()
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Arnaldo Carvalho de Melo[ Upstream commit d5afb6f9b6bb2c57bd0c05e76e12489dc0d037d9 ] The code where sk_clone() came from created a new socket and locked it, but then, on the error path didn't unlock it. This problem stayed there for a long while, till b0691c8ee7c2 ("net: Unlock sock before calling sk_free()") fixed it, but unfortunately the callers of sk_clone() (now sk_clone_locked()) were not audited and the one in dccp_create_openreq_child() remained. Now in the age of the syskaller fuzzer, this was finally uncovered, as reported by Dmitry: 8< I've got the following report while running syzkaller fuzzer on 86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)") [ BUG: held lock freed! ] 4.10.0+ #234 Not tainted - syz-executor6/6898 is freeing memory 88006286cac0-88006286d3b7, with a lock still held there! (slock-AF_INET6){+.-...}, at: [] spin_lock include/linux/spinlock.h:299 [inline] (slock-AF_INET6){+.-...}, at: [] sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504 5 locks held by syz-executor6/6898: #0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock include/net/sock.h:1460 [inline] #0: (sk_lock-AF_INET6){+.+.+.}, at: [] inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681 #1: (rcu_read_lock){..}, at: [] inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126 #2: (rcu_read_lock){..}, at: [] __skb_unlink include/linux/skbuff.h:1767 [inline] #2: (rcu_read_lock){..}, at: [] __skb_dequeue include/linux/skbuff.h:1783 [inline] #2: (rcu_read_lock){..}, at: [] process_backlog+0x264/0x730 net/core/dev.c:4835 #3: (rcu_read_lock){..}, at: [] ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59 #4: (slock-AF_INET6){+.-...}, at: [] spin_lock include/linux/spinlock.h:299 [inline] #4: (slock-AF_INET6){+.-...}, at: [] sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504 Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling sk_free()"). Reported-by: Dmitry Vyukov Cc: Cong Wang Cc: Eric Dumazet Cc: Gerrit Renker Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170301153510.ge15...@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/minisocks.c |1 + 1 file changed, 1 insertion(+) --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -122,6 +122,7 @@ struct sock *dccp_create_openreq_child(c /* It is still raw copy of parent, so invalidate * destructor and make plain sk_free() */ newsk->sk_destruct = NULL; + bh_unlock_sock(newsk); sk_free(newsk); return NULL; }
[PATCH 4.10 19/63] net: bridge: allow IPv6 when multicast flood is disabled
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Mike Manning[ Upstream commit 8953de2f02ad7b15e4964c82f9afd60f128e4e98 ] Even with multicast flooding turned off, IPv6 ND should still work so that IPv6 connectivity is provided. Allow this by continuing to flood multicast traffic originated by us. Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag") Cc: Nikolay Aleksandrov Signed-off-by: Mike Manning Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_forward.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -186,8 +186,9 @@ void br_flood(struct net_bridge *br, str /* Do not flood unicast traffic to ports that turn it off */ if (pkt_type == BR_PKT_UNICAST && !(p->flags & BR_FLOOD)) continue; + /* Do not flood if mc off, except for traffic we originate */ if (pkt_type == BR_PKT_MULTICAST && - !(p->flags & BR_MCAST_FLOOD)) + !(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev) continue; /* Do not flood to ports that enable proxy ARP */
[PATCH 4.10 23/63] dccp: Unlock sock before calling sk_free()
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Arnaldo Carvalho de Melo [ Upstream commit d5afb6f9b6bb2c57bd0c05e76e12489dc0d037d9 ] The code where sk_clone() came from created a new socket and locked it, but then, on the error path didn't unlock it. This problem stayed there for a long while, till b0691c8ee7c2 ("net: Unlock sock before calling sk_free()") fixed it, but unfortunately the callers of sk_clone() (now sk_clone_locked()) were not audited and the one in dccp_create_openreq_child() remained. Now in the age of the syskaller fuzzer, this was finally uncovered, as reported by Dmitry: 8< I've got the following report while running syzkaller fuzzer on 86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)") [ BUG: held lock freed! ] 4.10.0+ #234 Not tainted - syz-executor6/6898 is freeing memory 88006286cac0-88006286d3b7, with a lock still held there! (slock-AF_INET6){+.-...}, at: [] spin_lock include/linux/spinlock.h:299 [inline] (slock-AF_INET6){+.-...}, at: [] sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504 5 locks held by syz-executor6/6898: #0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock include/net/sock.h:1460 [inline] #0: (sk_lock-AF_INET6){+.+.+.}, at: [] inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681 #1: (rcu_read_lock){..}, at: [] inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126 #2: (rcu_read_lock){..}, at: [] __skb_unlink include/linux/skbuff.h:1767 [inline] #2: (rcu_read_lock){..}, at: [] __skb_dequeue include/linux/skbuff.h:1783 [inline] #2: (rcu_read_lock){..}, at: [] process_backlog+0x264/0x730 net/core/dev.c:4835 #3: (rcu_read_lock){..}, at: [] ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59 #4: (slock-AF_INET6){+.-...}, at: [] spin_lock include/linux/spinlock.h:299 [inline] #4: (slock-AF_INET6){+.-...}, at: [] sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504 Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling sk_free()"). Reported-by: Dmitry Vyukov Cc: Cong Wang Cc: Eric Dumazet Cc: Gerrit Renker Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170301153510.ge15...@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/minisocks.c |1 + 1 file changed, 1 insertion(+) --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -122,6 +122,7 @@ struct sock *dccp_create_openreq_child(c /* It is still raw copy of parent, so invalidate * destructor and make plain sk_free() */ newsk->sk_destruct = NULL; + bh_unlock_sock(newsk); sk_free(newsk); return NULL; }
[PATCH 4.10 19/63] net: bridge: allow IPv6 when multicast flood is disabled
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Mike Manning [ Upstream commit 8953de2f02ad7b15e4964c82f9afd60f128e4e98 ] Even with multicast flooding turned off, IPv6 ND should still work so that IPv6 connectivity is provided. Allow this by continuing to flood multicast traffic originated by us. Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag") Cc: Nikolay Aleksandrov Signed-off-by: Mike Manning Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_forward.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -186,8 +186,9 @@ void br_flood(struct net_bridge *br, str /* Do not flood unicast traffic to ports that turn it off */ if (pkt_type == BR_PKT_UNICAST && !(p->flags & BR_FLOOD)) continue; + /* Do not flood if mc off, except for traffic we originate */ if (pkt_type == BR_PKT_MULTICAST && - !(p->flags & BR_MCAST_FLOOD)) + !(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev) continue; /* Do not flood to ports that enable proxy ARP */
Re: [PATCH 2/4] perf annotate: Avoid division by zero when calculating percent
Em Mon, Mar 20, 2017 at 11:56:55AM +0900, Taeung Song escreveu: > Currently perf-annotate with --print-line can print > -nan(0x8) because of division by zero > when calculating percent. > > So if a sum of samples is zero, skip calculating percent. Tried to reproduce it here, couldn't, syswide record: [root@jouet ~]# perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [root@jouet ~]# perf annotate --stdio -l 2> /dev/null | grep -i nan [root@jouet ~]# Can you please send me a perf.data file with this problem? I have to go thru the code to see how this can take place... - Arnaldo > Before: > > $ perf annotate --stdio -l > > Sorted summary for file /home/taeung/workspace/a.out > -- > >32.89-nan7.04 a.c:38 >25.14-nan0.00 a.c:34 >16.26-nan 56.34 a.c:31 >15.88-nan1.41 a.c:37 > 5.67-nan0.00 a.c:39 > 1.13-nan 35.21 a.c:26 > 0.95-nan0.00 a.c:44 > 0.57-nan0.00 a.c:32 > Percent | Source code & Disassembly of a.out for cycles > (529 samples) > - > : > ... > > a.c:260.57-nan4.23 : 40081a: mov > %edi,-0x24(%rbp) > a.c:260.00-nan9.86 : 40081d: mov > %rsi,-0x30(%rbp) > > ... > > After: > > $ perf annotate --stdio -l > > Sorted summary for file /home/taeung/workspace/a.out > -- > >32.890.007.04 a.c:38 >25.140.000.00 a.c:34 >16.260.00 56.34 a.c:31 >15.880.001.41 a.c:37 > 5.670.000.00 a.c:39 > 1.130.00 35.21 a.c:26 > 0.950.000.00 a.c:44 > 0.570.000.00 a.c:32 > Percent | Source code & Disassembly of old for cycles > (529 samples) > - > : > ... > > a.c:260.570.004.23 : 40081a: mov%edi,-0x24(%rbp) > a.c:260.000.009.86 : 40081d: mov%rsi,-0x30(%rbp) > > ... > > Cc: Namhyung Kim> Cc: Jiri Olsa > Signed-off-by: Taeung Song > --- > tools/perf/util/annotate.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c > index fc91c6b..9bb43cd 100644 > --- a/tools/perf/util/annotate.c > +++ b/tools/perf/util/annotate.c > @@ -1665,11 +1665,15 @@ static int symbol__get_source_line(struct symbol > *sym, struct map *map, > src_line->nr_pcnt = nr_pcnt; > > for (k = 0; k < nr_pcnt; k++) { > + double percent = 0.0; > + > h = annotation__histogram(notes, evidx + k); > - src_line->samples[k].percent = 100.0 * h->addr[i] / > h->sum; > + if (h->sum) > + percent = 100.0 * h->addr[i] / h->sum; > > - if (src_line->samples[k].percent > percent_max) > - percent_max = src_line->samples[k].percent; > + if (percent > percent_max) > + percent_max = percent; > + src_line->samples[k].percent = percent; > } > > if (percent_max <= 0.5) > -- > 2.7.4
[PATCH v4] usb: hub: Fix error loop seen after hub communication errors
While stress testing a usb controller using a bind/unbind looop, the following error loop was observed. usb 7-1.2: new low-speed USB device number 3 using xhci-hcd usb 7-1.2: hub failed to enable device, error -108 usb 7-1-port2: cannot disable (err = -22) usb 7-1-port2: couldn't allocate usb_device usb 7-1-port2: cannot disable (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22 ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22) This continues forever. After adding tracebacks into the code, the call sequence leading to this is found to be as follows. [] hub_activate+0x368/0x7b8 [] hub_resume+0x2c/0x3c [] usb_resume_interface.isra.6+0x128/0x158 [] usb_suspend_both+0x1e8/0x288 [] usb_runtime_suspend+0x3c/0x98 [] __rpm_callback+0x48/0x7c [] rpm_callback+0xa8/0xd4 [] rpm_suspend+0x84/0x758 [] rpm_idle+0x2c8/0x498 [] __pm_runtime_idle+0x60/0xac [] usb_autopm_put_interface+0x6c/0x7c [] hub_event+0x10ac/0x12ac [] process_one_work+0x390/0x6b8 [] worker_thread+0x480/0x610 [] kthread+0x164/0x178 [] ret_from_fork+0x10/0x40 kick_hub_wq() is called from hub_activate() even after failures to communicate with the hub. This results in an endless sequence of hub event -> hub activate -> wq trigger -> hub event -> ... Provide two solutions for the problem. - Only trigger the hub event queue if communication with the hub is successful. - After a suspend failure, only resume already suspended interfaces if the communication with the device is still possible. Each of the changes fixes the observed problem. Use both to improve robustness. Acked-by: Alan SternSigned-off-by: Guenter Roeck --- v4: Other code uses a space before labels in hub_activate(). Do the same for consistency. v3: In hub.c, abort immediately if hub_port_status() returns an error. Since hub_port_status() already logs the error, don't log it again. In device,c, log the error return value from usb_suspend_device() if usb_get_status() failed as well. Don't check if the hub is still accessible if the error returned from hub_port_status() is -EBUSY. v2: Instead of not triggering the hub wq after an error to submit an urb, implement a more complex error detection and handling. Do it in two places. Marked as RFC to determine if one (or both) of those solutions are viable. drivers/usb/core/driver.c | 18 ++ drivers/usb/core/hub.c| 5 - 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index cdee5130638b..7ebdf2a4e8fe 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1331,6 +1331,24 @@ static int usb_suspend_both(struct usb_device *udev, pm_message_t msg) */ if (udev->parent && !PMSG_IS_AUTO(msg)) status = 0; + + /* +* If the device is inaccessible, don't try to resume +* suspended interfaces and just return the error. +*/ + if (status && status != -EBUSY) { + int err; + u16 devstat; + + err = usb_get_status(udev, USB_RECIP_DEVICE, 0, +); + if (err) { + dev_err(>dev, + "Failed to suspend device, error %d\n", + status); + goto done; + } + } } /* If the suspend failed, resume interfaces that did get suspended */ diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index
[PATCH v4] usb: hub: Fix error loop seen after hub communication errors
While stress testing a usb controller using a bind/unbind looop, the following error loop was observed. usb 7-1.2: new low-speed USB device number 3 using xhci-hcd usb 7-1.2: hub failed to enable device, error -108 usb 7-1-port2: cannot disable (err = -22) usb 7-1-port2: couldn't allocate usb_device usb 7-1-port2: cannot disable (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22 ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22) This continues forever. After adding tracebacks into the code, the call sequence leading to this is found to be as follows. [] hub_activate+0x368/0x7b8 [] hub_resume+0x2c/0x3c [] usb_resume_interface.isra.6+0x128/0x158 [] usb_suspend_both+0x1e8/0x288 [] usb_runtime_suspend+0x3c/0x98 [] __rpm_callback+0x48/0x7c [] rpm_callback+0xa8/0xd4 [] rpm_suspend+0x84/0x758 [] rpm_idle+0x2c8/0x498 [] __pm_runtime_idle+0x60/0xac [] usb_autopm_put_interface+0x6c/0x7c [] hub_event+0x10ac/0x12ac [] process_one_work+0x390/0x6b8 [] worker_thread+0x480/0x610 [] kthread+0x164/0x178 [] ret_from_fork+0x10/0x40 kick_hub_wq() is called from hub_activate() even after failures to communicate with the hub. This results in an endless sequence of hub event -> hub activate -> wq trigger -> hub event -> ... Provide two solutions for the problem. - Only trigger the hub event queue if communication with the hub is successful. - After a suspend failure, only resume already suspended interfaces if the communication with the device is still possible. Each of the changes fixes the observed problem. Use both to improve robustness. Acked-by: Alan Stern Signed-off-by: Guenter Roeck --- v4: Other code uses a space before labels in hub_activate(). Do the same for consistency. v3: In hub.c, abort immediately if hub_port_status() returns an error. Since hub_port_status() already logs the error, don't log it again. In device,c, log the error return value from usb_suspend_device() if usb_get_status() failed as well. Don't check if the hub is still accessible if the error returned from hub_port_status() is -EBUSY. v2: Instead of not triggering the hub wq after an error to submit an urb, implement a more complex error detection and handling. Do it in two places. Marked as RFC to determine if one (or both) of those solutions are viable. drivers/usb/core/driver.c | 18 ++ drivers/usb/core/hub.c| 5 - 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index cdee5130638b..7ebdf2a4e8fe 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1331,6 +1331,24 @@ static int usb_suspend_both(struct usb_device *udev, pm_message_t msg) */ if (udev->parent && !PMSG_IS_AUTO(msg)) status = 0; + + /* +* If the device is inaccessible, don't try to resume +* suspended interfaces and just return the error. +*/ + if (status && status != -EBUSY) { + int err; + u16 devstat; + + err = usb_get_status(udev, USB_RECIP_DEVICE, 0, +); + if (err) { + dev_err(>dev, + "Failed to suspend device, error %d\n", + status); + goto done; + } + } } /* If the suspend failed, resume interfaces that did get suspended */ diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 5286bf67869a..2e047f982af3 100644 --- a/drivers/usb/core/hub.c +++
Re: [PATCH 2/4] perf annotate: Avoid division by zero when calculating percent
Em Mon, Mar 20, 2017 at 11:56:55AM +0900, Taeung Song escreveu: > Currently perf-annotate with --print-line can print > -nan(0x8) because of division by zero > when calculating percent. > > So if a sum of samples is zero, skip calculating percent. Tried to reproduce it here, couldn't, syswide record: [root@jouet ~]# perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [root@jouet ~]# perf annotate --stdio -l 2> /dev/null | grep -i nan [root@jouet ~]# Can you please send me a perf.data file with this problem? I have to go thru the code to see how this can take place... - Arnaldo > Before: > > $ perf annotate --stdio -l > > Sorted summary for file /home/taeung/workspace/a.out > -- > >32.89-nan7.04 a.c:38 >25.14-nan0.00 a.c:34 >16.26-nan 56.34 a.c:31 >15.88-nan1.41 a.c:37 > 5.67-nan0.00 a.c:39 > 1.13-nan 35.21 a.c:26 > 0.95-nan0.00 a.c:44 > 0.57-nan0.00 a.c:32 > Percent | Source code & Disassembly of a.out for cycles > (529 samples) > - > : > ... > > a.c:260.57-nan4.23 : 40081a: mov > %edi,-0x24(%rbp) > a.c:260.00-nan9.86 : 40081d: mov > %rsi,-0x30(%rbp) > > ... > > After: > > $ perf annotate --stdio -l > > Sorted summary for file /home/taeung/workspace/a.out > -- > >32.890.007.04 a.c:38 >25.140.000.00 a.c:34 >16.260.00 56.34 a.c:31 >15.880.001.41 a.c:37 > 5.670.000.00 a.c:39 > 1.130.00 35.21 a.c:26 > 0.950.000.00 a.c:44 > 0.570.000.00 a.c:32 > Percent | Source code & Disassembly of old for cycles > (529 samples) > - > : > ... > > a.c:260.570.004.23 : 40081a: mov%edi,-0x24(%rbp) > a.c:260.000.009.86 : 40081d: mov%rsi,-0x30(%rbp) > > ... > > Cc: Namhyung Kim > Cc: Jiri Olsa > Signed-off-by: Taeung Song > --- > tools/perf/util/annotate.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c > index fc91c6b..9bb43cd 100644 > --- a/tools/perf/util/annotate.c > +++ b/tools/perf/util/annotate.c > @@ -1665,11 +1665,15 @@ static int symbol__get_source_line(struct symbol > *sym, struct map *map, > src_line->nr_pcnt = nr_pcnt; > > for (k = 0; k < nr_pcnt; k++) { > + double percent = 0.0; > + > h = annotation__histogram(notes, evidx + k); > - src_line->samples[k].percent = 100.0 * h->addr[i] / > h->sum; > + if (h->sum) > + percent = 100.0 * h->addr[i] / h->sum; > > - if (src_line->samples[k].percent > percent_max) > - percent_max = src_line->samples[k].percent; > + if (percent > percent_max) > + percent_max = percent; > + src_line->samples[k].percent = percent; > } > > if (percent_max <= 0.5) > -- > 2.7.4
[PATCH 4.10 26/63] amd-xgbe: Dont overwrite SFP PHY mod_absent settings
4.10-stable review patch. If anyone has any objections, please let me know. -- From: "Lendacky, Thomas"[ Upstream commit 2697ea5a859b83ca49511dcfd98daf42584eb3cf ] If an SFP module is not present, xgbe_phy_sfp_phy_settings() should return after applying the default settings. Currently there is no return statement and the default settings are overwritten. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |2 ++ 1 file changed, 2 insertions(+) --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -716,6 +716,8 @@ static void xgbe_phy_sfp_phy_settings(st pdata->phy.duplex = DUPLEX_UNKNOWN; pdata->phy.autoneg = AUTONEG_ENABLE; pdata->phy.advertising = pdata->phy.supported; + + return; } pdata->phy.advertising &= ~ADVERTISED_Autoneg;
[PATCH 4.10 26/63] amd-xgbe: Dont overwrite SFP PHY mod_absent settings
4.10-stable review patch. If anyone has any objections, please let me know. -- From: "Lendacky, Thomas" [ Upstream commit 2697ea5a859b83ca49511dcfd98daf42584eb3cf ] If an SFP module is not present, xgbe_phy_sfp_phy_settings() should return after applying the default settings. Currently there is no return statement and the default settings are overwritten. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |2 ++ 1 file changed, 2 insertions(+) --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -716,6 +716,8 @@ static void xgbe_phy_sfp_phy_settings(st pdata->phy.duplex = DUPLEX_UNKNOWN; pdata->phy.autoneg = AUTONEG_ENABLE; pdata->phy.advertising = pdata->phy.supported; + + return; } pdata->phy.advertising &= ~ADVERTISED_Autoneg;
[PATCH 4.10 10/63] ipv4: add missing initialization for flowi4_uid
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Julian Anastasov[ Upstream commit 8bcfd0925ef15f072ba1e7bee2c25e9e1b5fd6ca ] Avoid matching of random stack value for uid when rules are looked up on input route or when RP filter is used. Problem should affect only setups that use ip rules with uid range. Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_frontend.c |6 +++--- net/ipv4/route.c|1 + 2 files changed, 4 insertions(+), 3 deletions(-) --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -319,7 +319,7 @@ static int __fib_validate_source(struct int ret, no_addr; struct fib_result res; struct flowi4 fl4; - struct net *net; + struct net *net = dev_net(dev); bool dev_match; fl4.flowi4_oif = 0; @@ -332,6 +332,7 @@ static int __fib_validate_source(struct fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.flowi4_tun_key.tun_id = 0; fl4.flowi4_flags = 0; + fl4.flowi4_uid = sock_net_uid(net, NULL); no_addr = idev->ifa_list == NULL; @@ -339,13 +340,12 @@ static int __fib_validate_source(struct trace_fib_validate_source(dev, ); - net = dev_net(dev); if (fib_lookup(net, , , 0)) goto last_resort; if (res.type != RTN_UNICAST && (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev))) goto e_inval; - if (!rpf && !fib_num_tclassid_users(dev_net(dev)) && + if (!rpf && !fib_num_tclassid_users(net) && (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) goto last_resort; fib_combine_itag(itag, ); --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1858,6 +1858,7 @@ static int ip_route_input_slow(struct sk fl4.flowi4_flags = 0; fl4.daddr = daddr; fl4.saddr = saddr; + fl4.flowi4_uid = sock_net_uid(net, NULL); err = fib_lookup(net, , , 0); if (err != 0) { if (!IN_DEV_FORWARD(in_dev))
[PATCH 4.10 10/63] ipv4: add missing initialization for flowi4_uid
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Julian Anastasov [ Upstream commit 8bcfd0925ef15f072ba1e7bee2c25e9e1b5fd6ca ] Avoid matching of random stack value for uid when rules are looked up on input route or when RP filter is used. Problem should affect only setups that use ip rules with uid range. Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_frontend.c |6 +++--- net/ipv4/route.c|1 + 2 files changed, 4 insertions(+), 3 deletions(-) --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -319,7 +319,7 @@ static int __fib_validate_source(struct int ret, no_addr; struct fib_result res; struct flowi4 fl4; - struct net *net; + struct net *net = dev_net(dev); bool dev_match; fl4.flowi4_oif = 0; @@ -332,6 +332,7 @@ static int __fib_validate_source(struct fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.flowi4_tun_key.tun_id = 0; fl4.flowi4_flags = 0; + fl4.flowi4_uid = sock_net_uid(net, NULL); no_addr = idev->ifa_list == NULL; @@ -339,13 +340,12 @@ static int __fib_validate_source(struct trace_fib_validate_source(dev, ); - net = dev_net(dev); if (fib_lookup(net, , , 0)) goto last_resort; if (res.type != RTN_UNICAST && (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev))) goto e_inval; - if (!rpf && !fib_num_tclassid_users(dev_net(dev)) && + if (!rpf && !fib_num_tclassid_users(net) && (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) goto last_resort; fib_combine_itag(itag, ); --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1858,6 +1858,7 @@ static int ip_route_input_slow(struct sk fl4.flowi4_flags = 0; fl4.daddr = daddr; fl4.saddr = saddr; + fl4.flowi4_uid = sock_net_uid(net, NULL); err = fib_lookup(net, , , 0); if (err != 0) { if (!IN_DEV_FORWARD(in_dev))
[PATCH 4.10 12/63] sctp: set sin_port for addr param when checking duplicate address
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Xin Long[ Upstream commit 2e3ce5bc2aa938653c3866aa7f4901a1f199b1c8 ] Commit b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's bind address list") tried to check for duplicate address before copying to asoc's bind_addr list from global addr list. But all the addrs' sin_ports in global addr list are 0 while the addrs' sin_ports are bp->port in asoc's bind_addr list. It means even if it's a duplicate address, af->cmp_addr will still return 0 as the their sin_ports are different. This patch is to fix it by setting the sin_port for addr param with bp->port before comparing the addrs. Fixes: b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's bind address list") Reported-by: Wei Chen Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/protocol.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -199,6 +199,7 @@ int sctp_copy_local_addr_list(struct net sctp_scope_t scope, gfp_t gfp, int copy_flags) { struct sctp_sockaddr_entry *addr; + union sctp_addr laddr; int error = 0; rcu_read_lock(); @@ -220,7 +221,10 @@ int sctp_copy_local_addr_list(struct net !(copy_flags & SCTP_ADDR6_PEERSUPP))) continue; - if (sctp_bind_addr_state(bp, >a) != -1) + laddr = addr->a; + /* also works for setting ipv6 address port */ + laddr.v4.sin_port = htons(bp->port); + if (sctp_bind_addr_state(bp, ) != -1) continue; error = sctp_add_bind_addr(bp, >a, sizeof(addr->a),
[PATCH 4.10 12/63] sctp: set sin_port for addr param when checking duplicate address
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Xin Long [ Upstream commit 2e3ce5bc2aa938653c3866aa7f4901a1f199b1c8 ] Commit b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's bind address list") tried to check for duplicate address before copying to asoc's bind_addr list from global addr list. But all the addrs' sin_ports in global addr list are 0 while the addrs' sin_ports are bp->port in asoc's bind_addr list. It means even if it's a duplicate address, af->cmp_addr will still return 0 as the their sin_ports are different. This patch is to fix it by setting the sin_port for addr param with bp->port before comparing the addrs. Fixes: b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's bind address list") Reported-by: Wei Chen Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/protocol.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -199,6 +199,7 @@ int sctp_copy_local_addr_list(struct net sctp_scope_t scope, gfp_t gfp, int copy_flags) { struct sctp_sockaddr_entry *addr; + union sctp_addr laddr; int error = 0; rcu_read_lock(); @@ -220,7 +221,10 @@ int sctp_copy_local_addr_list(struct net !(copy_flags & SCTP_ADDR6_PEERSUPP))) continue; - if (sctp_bind_addr_state(bp, >a) != -1) + laddr = addr->a; + /* also works for setting ipv6 address port */ + laddr.v4.sin_port = htons(bp->port); + if (sctp_bind_addr_state(bp, ) != -1) continue; error = sctp_add_bind_addr(bp, >a, sizeof(addr->a),
[PATCH 4.10 41/63] mpls: Do not decrement alive counter for unregister events
4.10-stable review patch. If anyone has any objections, please let me know. -- From: David Ahern[ Upstream commit 79099aab38c8f5c746748b066ae74ba984fe2cc8 ] Multipath routes can be rendered usesless when a device in one of the paths is deleted. For example: $ ip -f mpls ro ls 100 nexthop as to 200 via inet 172.16.2.2 dev virt12 nexthop as to 300 via inet 172.16.3.2 dev br0 101 nexthop as to 201 via inet6 2000:2::2 dev virt12 nexthop as to 301 via inet6 2000:3::2 dev br0 $ ip li del br0 When br0 is deleted the other hop is not considered in mpls_select_multipath because of the alive check -- rt_nhn_alive is 0. rt_nhn_alive is decremented once in mpls_ifdown when the device is taken down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For a 2 hop route, deleting one device drops the alive count to 0. Since devices are taken down before unregistering, the decrement on NETDEV_UNREGISTER is redundant. Fixes: c89359a42e2a4 ("mpls: support for dead routes") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/mpls/af_mpls.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -956,7 +956,8 @@ static void mpls_ifdown(struct net_devic /* fall through */ case NETDEV_CHANGE: nh->nh_flags |= RTNH_F_LINKDOWN; - ACCESS_ONCE(rt->rt_nhn_alive) = rt->rt_nhn_alive - 1; + if (event != NETDEV_UNREGISTER) + ACCESS_ONCE(rt->rt_nhn_alive) = rt->rt_nhn_alive - 1; break; } if (event == NETDEV_UNREGISTER)
[PATCH 4.10 41/63] mpls: Do not decrement alive counter for unregister events
4.10-stable review patch. If anyone has any objections, please let me know. -- From: David Ahern [ Upstream commit 79099aab38c8f5c746748b066ae74ba984fe2cc8 ] Multipath routes can be rendered usesless when a device in one of the paths is deleted. For example: $ ip -f mpls ro ls 100 nexthop as to 200 via inet 172.16.2.2 dev virt12 nexthop as to 300 via inet 172.16.3.2 dev br0 101 nexthop as to 201 via inet6 2000:2::2 dev virt12 nexthop as to 301 via inet6 2000:3::2 dev br0 $ ip li del br0 When br0 is deleted the other hop is not considered in mpls_select_multipath because of the alive check -- rt_nhn_alive is 0. rt_nhn_alive is decremented once in mpls_ifdown when the device is taken down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For a 2 hop route, deleting one device drops the alive count to 0. Since devices are taken down before unregistering, the decrement on NETDEV_UNREGISTER is redundant. Fixes: c89359a42e2a4 ("mpls: support for dead routes") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/mpls/af_mpls.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -956,7 +956,8 @@ static void mpls_ifdown(struct net_devic /* fall through */ case NETDEV_CHANGE: nh->nh_flags |= RTNH_F_LINKDOWN; - ACCESS_ONCE(rt->rt_nhn_alive) = rt->rt_nhn_alive - 1; + if (event != NETDEV_UNREGISTER) + ACCESS_ONCE(rt->rt_nhn_alive) = rt->rt_nhn_alive - 1; break; } if (event == NETDEV_UNREGISTER)
[PATCH 4.10 09/63] vxlan: dont allow overwrite of config src addr
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Brian Russell[ Upstream commit 1158632b5a2dcce0786c1b1b99654e81cc867981 ] When using IPv6 transport and a default dst, a pointer to the configured source address is passed into the route lookup. If no source address is configured, then the value is overwritten. IPv6 route lookup ignores egress ifindex match if the source address is set, so if egress ifindex match is desired, the source address must be passed as any. The overwrite breaks this for subsequent lookups. Avoid this by copying the configured address to an existing stack variable and pass a pointer to that instead. Fixes: 272d96a5ab10 ("net: vxlan: lwt: Use source ip address during route lookup.") Signed-off-by: Brian Russell Acked-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1992,7 +1992,6 @@ static void vxlan_xmit_one(struct sk_buf const struct iphdr *old_iph = ip_hdr(skb); union vxlan_addr *dst; union vxlan_addr remote_ip, local_ip; - union vxlan_addr *src; struct vxlan_metadata _md; struct vxlan_metadata *md = &_md; __be16 src_port = 0, dst_port; @@ -2019,7 +2018,7 @@ static void vxlan_xmit_one(struct sk_buf dst_port = rdst->remote_port ? rdst->remote_port : vxlan->cfg.dst_port; vni = rdst->remote_vni; - src = >cfg.saddr; + local_ip = vxlan->cfg.saddr; dst_cache = >dst_cache; md->gbp = skb->mark; ttl = vxlan->cfg.ttl; @@ -2052,7 +2051,6 @@ static void vxlan_xmit_one(struct sk_buf dst = _ip; dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port; vni = tunnel_id_to_key32(info->key.tun_id); - src = _ip; dst_cache = >dst_cache; if (info->options_len) md = ip_tunnel_info_opts(info); @@ -2072,7 +2070,7 @@ static void vxlan_xmit_one(struct sk_buf rt = vxlan_get_route(vxlan, dev, sock4, skb, rdst ? rdst->remote_ifindex : 0, tos, dst->sin.sin_addr.s_addr, ->sin.sin_addr.s_addr, +_ip.sin.sin_addr.s_addr, dst_port, src_port, dst_cache, info); if (IS_ERR(rt)) { @@ -2099,7 +2097,7 @@ static void vxlan_xmit_one(struct sk_buf if (err < 0) goto tx_error; - udp_tunnel_xmit_skb(rt, sock4->sock->sk, skb, src->sin.sin_addr.s_addr, + udp_tunnel_xmit_skb(rt, sock4->sock->sk, skb, local_ip.sin.sin_addr.s_addr, dst->sin.sin_addr.s_addr, tos, ttl, df, src_port, dst_port, xnet, !udp_sum); #if IS_ENABLED(CONFIG_IPV6) @@ -2109,7 +2107,7 @@ static void vxlan_xmit_one(struct sk_buf ndst = vxlan6_get_route(vxlan, dev, sock6, skb, rdst ? rdst->remote_ifindex : 0, tos, label, >sin6.sin6_addr, - >sin6.sin6_addr, + _ip.sin6.sin6_addr, dst_port, src_port, dst_cache, info); if (IS_ERR(ndst)) { @@ -2137,7 +2135,7 @@ static void vxlan_xmit_one(struct sk_buf goto tx_error; udp_tunnel6_xmit_skb(ndst, sock6->sock->sk, skb, dev, ->sin6.sin6_addr, +_ip.sin6.sin6_addr, >sin6.sin6_addr, tos, ttl, label, src_port, dst_port, !udp_sum); #endif
[PATCH 4.10 25/63] amd-xgbe: Be sure to set MDIO modes on device (re)start
4.10-stable review patch. If anyone has any objections, please let me know. -- From: "Lendacky, Thomas"[ Upstream commit b42c6761fd1651f564491b53016046c9ebf0b2a9 ] The MDIO register mode is set when the device is probed. But when the device is brought down and then back up, the MDIO register mode has been reset. Be sure to reset the mode during device startup and only change the mode of the address specified. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amd/xgbe/xgbe-dev.c|2 +- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 22 ++ 2 files changed, 23 insertions(+), 1 deletion(-) --- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c @@ -1323,7 +1323,7 @@ static int xgbe_read_ext_mii_regs(struct static int xgbe_set_ext_mii_mode(struct xgbe_prv_data *pdata, unsigned int port, enum xgbe_mdio_mode mode) { - unsigned int reg_val = 0; + unsigned int reg_val = XGMAC_IOREAD(pdata, MAC_MDIOCL22R); switch (mode) { case XGBE_MDIO_MODE_CL22: --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -875,6 +875,16 @@ static int xgbe_phy_find_phy_device(stru !phy_data->sfp_phy_avail) return 0; + /* Set the proper MDIO mode for the PHY */ + ret = pdata->hw_if.set_ext_mii_mode(pdata, phy_data->mdio_addr, + phy_data->phydev_mode); + if (ret) { + netdev_err(pdata->netdev, + "mdio port/clause not compatible (%u/%u)\n", + phy_data->mdio_addr, phy_data->phydev_mode); + return ret; + } + /* Create and connect to the PHY device */ phydev = get_phy_device(phy_data->mii, phy_data->mdio_addr, (phy_data->phydev_mode == XGBE_MDIO_MODE_CL45)); @@ -2722,6 +2732,18 @@ static int xgbe_phy_start(struct xgbe_pr if (ret) return ret; + /* Set the proper MDIO mode for the re-driver */ + if (phy_data->redrv && !phy_data->redrv_if) { + ret = pdata->hw_if.set_ext_mii_mode(pdata, phy_data->redrv_addr, + XGBE_MDIO_MODE_CL22); + if (ret) { + netdev_err(pdata->netdev, + "redriver mdio port not compatible (%u)\n", + phy_data->redrv_addr); + return ret; + } + } + /* Start in highest supported mode */ xgbe_phy_set_mode(pdata, phy_data->start_mode);
[PATCH 4.10 56/63] x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Andrey Ryabinincommit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu Signed-off-by: Andrey Ryabinin Cc: kasan-...@googlegroups.com Cc: Alexander Potapenko Cc: Andrew Morton Cc: l...@01.org Cc: Dmitry Vyukov Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabi...@virtuozzo.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head64.c|1 + arch/x86/mm/kasan_init_64.c |1 + 2 files changed, 2 insertions(+) --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -4,6 +4,7 @@ * Copyright (C) 2000 Andrea Arcangeli SuSE */ +#define DISABLE_BRANCH_PROFILING #include #include #include --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -1,3 +1,4 @@ +#define DISABLE_BRANCH_PROFILING #define pr_fmt(fmt) "kasan: " fmt #include #include
[PATCH 4.10 09/63] vxlan: dont allow overwrite of config src addr
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Brian Russell [ Upstream commit 1158632b5a2dcce0786c1b1b99654e81cc867981 ] When using IPv6 transport and a default dst, a pointer to the configured source address is passed into the route lookup. If no source address is configured, then the value is overwritten. IPv6 route lookup ignores egress ifindex match if the source address is set, so if egress ifindex match is desired, the source address must be passed as any. The overwrite breaks this for subsequent lookups. Avoid this by copying the configured address to an existing stack variable and pass a pointer to that instead. Fixes: 272d96a5ab10 ("net: vxlan: lwt: Use source ip address during route lookup.") Signed-off-by: Brian Russell Acked-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1992,7 +1992,6 @@ static void vxlan_xmit_one(struct sk_buf const struct iphdr *old_iph = ip_hdr(skb); union vxlan_addr *dst; union vxlan_addr remote_ip, local_ip; - union vxlan_addr *src; struct vxlan_metadata _md; struct vxlan_metadata *md = &_md; __be16 src_port = 0, dst_port; @@ -2019,7 +2018,7 @@ static void vxlan_xmit_one(struct sk_buf dst_port = rdst->remote_port ? rdst->remote_port : vxlan->cfg.dst_port; vni = rdst->remote_vni; - src = >cfg.saddr; + local_ip = vxlan->cfg.saddr; dst_cache = >dst_cache; md->gbp = skb->mark; ttl = vxlan->cfg.ttl; @@ -2052,7 +2051,6 @@ static void vxlan_xmit_one(struct sk_buf dst = _ip; dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port; vni = tunnel_id_to_key32(info->key.tun_id); - src = _ip; dst_cache = >dst_cache; if (info->options_len) md = ip_tunnel_info_opts(info); @@ -2072,7 +2070,7 @@ static void vxlan_xmit_one(struct sk_buf rt = vxlan_get_route(vxlan, dev, sock4, skb, rdst ? rdst->remote_ifindex : 0, tos, dst->sin.sin_addr.s_addr, ->sin.sin_addr.s_addr, +_ip.sin.sin_addr.s_addr, dst_port, src_port, dst_cache, info); if (IS_ERR(rt)) { @@ -2099,7 +2097,7 @@ static void vxlan_xmit_one(struct sk_buf if (err < 0) goto tx_error; - udp_tunnel_xmit_skb(rt, sock4->sock->sk, skb, src->sin.sin_addr.s_addr, + udp_tunnel_xmit_skb(rt, sock4->sock->sk, skb, local_ip.sin.sin_addr.s_addr, dst->sin.sin_addr.s_addr, tos, ttl, df, src_port, dst_port, xnet, !udp_sum); #if IS_ENABLED(CONFIG_IPV6) @@ -2109,7 +2107,7 @@ static void vxlan_xmit_one(struct sk_buf ndst = vxlan6_get_route(vxlan, dev, sock6, skb, rdst ? rdst->remote_ifindex : 0, tos, label, >sin6.sin6_addr, - >sin6.sin6_addr, + _ip.sin6.sin6_addr, dst_port, src_port, dst_cache, info); if (IS_ERR(ndst)) { @@ -2137,7 +2135,7 @@ static void vxlan_xmit_one(struct sk_buf goto tx_error; udp_tunnel6_xmit_skb(ndst, sock6->sock->sk, skb, dev, ->sin6.sin6_addr, +_ip.sin6.sin6_addr, >sin6.sin6_addr, tos, ttl, label, src_port, dst_port, !udp_sum); #endif
[PATCH 4.10 25/63] amd-xgbe: Be sure to set MDIO modes on device (re)start
4.10-stable review patch. If anyone has any objections, please let me know. -- From: "Lendacky, Thomas" [ Upstream commit b42c6761fd1651f564491b53016046c9ebf0b2a9 ] The MDIO register mode is set when the device is probed. But when the device is brought down and then back up, the MDIO register mode has been reset. Be sure to reset the mode during device startup and only change the mode of the address specified. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amd/xgbe/xgbe-dev.c|2 +- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 22 ++ 2 files changed, 23 insertions(+), 1 deletion(-) --- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c @@ -1323,7 +1323,7 @@ static int xgbe_read_ext_mii_regs(struct static int xgbe_set_ext_mii_mode(struct xgbe_prv_data *pdata, unsigned int port, enum xgbe_mdio_mode mode) { - unsigned int reg_val = 0; + unsigned int reg_val = XGMAC_IOREAD(pdata, MAC_MDIOCL22R); switch (mode) { case XGBE_MDIO_MODE_CL22: --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -875,6 +875,16 @@ static int xgbe_phy_find_phy_device(stru !phy_data->sfp_phy_avail) return 0; + /* Set the proper MDIO mode for the PHY */ + ret = pdata->hw_if.set_ext_mii_mode(pdata, phy_data->mdio_addr, + phy_data->phydev_mode); + if (ret) { + netdev_err(pdata->netdev, + "mdio port/clause not compatible (%u/%u)\n", + phy_data->mdio_addr, phy_data->phydev_mode); + return ret; + } + /* Create and connect to the PHY device */ phydev = get_phy_device(phy_data->mii, phy_data->mdio_addr, (phy_data->phydev_mode == XGBE_MDIO_MODE_CL45)); @@ -2722,6 +2732,18 @@ static int xgbe_phy_start(struct xgbe_pr if (ret) return ret; + /* Set the proper MDIO mode for the re-driver */ + if (phy_data->redrv && !phy_data->redrv_if) { + ret = pdata->hw_if.set_ext_mii_mode(pdata, phy_data->redrv_addr, + XGBE_MDIO_MODE_CL22); + if (ret) { + netdev_err(pdata->netdev, + "redriver mdio port not compatible (%u)\n", + phy_data->redrv_addr); + return ret; + } + } + /* Start in highest supported mode */ xgbe_phy_set_mode(pdata, phy_data->start_mode);
[PATCH 4.10 56/63] x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Andrey Ryabinin commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu Signed-off-by: Andrey Ryabinin Cc: kasan-...@googlegroups.com Cc: Alexander Potapenko Cc: Andrew Morton Cc: l...@01.org Cc: Dmitry Vyukov Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabi...@virtuozzo.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head64.c|1 + arch/x86/mm/kasan_init_64.c |1 + 2 files changed, 2 insertions(+) --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -4,6 +4,7 @@ * Copyright (C) 2000 Andrea Arcangeli SuSE */ +#define DISABLE_BRANCH_PROFILING #include #include #include --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -1,3 +1,4 @@ +#define DISABLE_BRANCH_PROFILING #define pr_fmt(fmt) "kasan: " fmt #include #include
[PATCH] drm/gma500: fix memory leak on edid
From: Colin Ian Kingedid is allocated on the call to psb_intel_sdvo_get_edid but not kfree'd at all, causing a memory leak. Fix this by kfree'ing the edid. (This may be null, but kfree can handle null frees). Detected by CoverityScan, CID#1090730 ("Resource Leak") Fixes: 5736995b473b ("gma500: Replace SDVO code with slightly modified version from i915") Signed-off-by: Colin Ian King --- drivers/gpu/drm/gma500/psb_intel_sdvo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/gma500/psb_intel_sdvo.c b/drivers/gpu/drm/gma500/psb_intel_sdvo.c index e787d376ba67..f38e6ad1ab9b 100644 --- a/drivers/gpu/drm/gma500/psb_intel_sdvo.c +++ b/drivers/gpu/drm/gma500/psb_intel_sdvo.c @@ -1650,6 +1650,7 @@ static bool psb_intel_sdvo_detect_hdmi_audio(struct drm_connector *connector) edid = psb_intel_sdvo_get_edid(connector); if (edid != NULL && edid->input & DRM_EDID_INPUT_DIGITAL) has_audio = drm_detect_monitor_audio(edid); + kfree(edid); return has_audio; } -- 2.11.0
Re: [PATCH] fs/pstore: Perform erase from a worker
On Fri, Mar 17, 2017 at 2:52 AM, Chris Wilsonwrote: > In order to prevent a cyclic recursion between psi->read_mutex and the > inode_lock, we need to move the pse->erase to a worker. > > [ 605.374955] == > [ 605.381281] [ INFO: possible circular locking dependency detected ] > [ 605.387679] 4.11.0-rc2-CI-CI_DRM_2352+ #1 Not tainted > [ 605.392826] --- > [ 605.399196] rm/7298 is trying to acquire lock: > [ 605.403720] (>read_mutex){+.+.+.}, at: [] > pstore_unlink+0x3f/0xa0 > [ 605.412300] > [ 605.412300] but task is already holding lock: > [ 605.418237] (>s_type->i_mutex_key#14){++}, at: > [] vfs_unlink+0x4c/0x19 > 0 > [ 605.427397] > [ 605.427397] which lock already depends on the new lock. > [ 605.427397] > [ 605.435770] > [ 605.435770] the existing dependency chain (in reverse order) is: > [ 605.443396] > [ 605.443396] -> #1 (>s_type->i_mutex_key#14){++}: > [ 605.450347]lock_acquire+0xc9/0x220 > [ 605.454551]down_write+0x3f/0x70 > [ 605.458484]pstore_mkfile+0x1f4/0x460 > [ 605.462835]pstore_get_records+0x17a/0x320 > [ 605.467664]pstore_fill_super+0xa4/0xc0 > [ 605.472205]mount_single+0x89/0xb0 > [ 605.476314]pstore_mount+0x13/0x20 > [ 605.480411]mount_fs+0xf/0x90 > [ 605.484122]vfs_kern_mount+0x66/0x170 > [ 605.488464]do_mount+0x190/0xd50 > [ 605.492397]SyS_mount+0x90/0xd0 > [ 605.496212]entry_SYSCALL_64_fastpath+0x1c/0xb1 > [ 605.501496] > [ 605.501496] -> #0 (>read_mutex){+.+.+.}: > [ 605.507747]__lock_acquire+0x1ac0/0x1bb0 > [ 605.512401]lock_acquire+0xc9/0x220 > [ 605.516594]__mutex_lock+0x6e/0x990 > [ 605.520755]mutex_lock_nested+0x16/0x20 > [ 605.525279]pstore_unlink+0x3f/0xa0 > [ 605.529465]vfs_unlink+0xb5/0x190 > [ 605.533477]do_unlinkat+0x24c/0x2a0 > [ 605.537672]SyS_unlinkat+0x16/0x30 > [ 605.541781]entry_SYSCALL_64_fastpath+0x1c/0xb1 If I'm reading this right it's a race between mount and unlink... that's quite a corner case. :) > [ 605.547067] > [ 605.547067] other info that might help us debug this: > [ 605.547067] > [ 605.555221] Possible unsafe locking scenario: > [ 605.555221] > [ 605.561280]CPU0CPU1 > [ 605.565883] > [ 605.570502] lock(>s_type->i_mutex_key#14); > [ 605.575217]lock(>read_mutex); > [ 605.581803] > lock(>s_type->i_mutex_key#14); > [ 605.589159] lock(>read_mutex); I haven't had time to dig much yet, but I wonder if the locking order on unlink could just be reversed, and the deadlock would go away? > [ 605.593156] > [ 605.593156] *** DEADLOCK *** > [ 605.593156] > [ 605.599214] 3 locks held by rm/7298: > [ 605.602896] #0: (sb_writers#11){.+.+..}, at: [] > mnt_want_write+0x1f/0x50 > [ 605.611490] #1: (>s_type->i_mutex_key#14/1){+.+...}, at: > [] do_unlinkat+0 > x11c/0x2a0 > [ 605.621417] #2: (>s_type->i_mutex_key#14){++}, at: > [] vfs_unlink+0x4c > /0x190 > [ 605.630995] > [ 605.630995] stack backtrace: > [ 605.635450] CPU: 7 PID: 7298 Comm: rm Not tainted > 4.11.0-rc2-CI-CI_DRM_2352+ #1 > [ 605.642999] Hardware name: Gigabyte Technology Co., Ltd. > Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2 > 017 > [ 605.652305] Call Trace: > [ 605.654814] dump_stack+0x67/0x92 > [ 605.658184] print_circular_bug+0x1e0/0x2e0 > [ 605.662465] __lock_acquire+0x1ac0/0x1bb0 > [ 605.34] ? retint_kernel+0x2d/0x2d > [ 605.670456] lock_acquire+0xc9/0x220 > [ 605.674112] ? pstore_unlink+0x3f/0xa0 > [ 605.677970] ? pstore_unlink+0x3f/0xa0 > [ 605.681818] __mutex_lock+0x6e/0x990 > [ 605.685456] ? pstore_unlink+0x3f/0xa0 > [ 605.689791] ? pstore_unlink+0x3f/0xa0 > [ 605.694124] ? vfs_unlink+0x4c/0x190 > [ 605.698310] mutex_lock_nested+0x16/0x20 > [ 605.702859] pstore_unlink+0x3f/0xa0 > [ 605.707021] vfs_unlink+0xb5/0x190 > [ 605.711024] do_unlinkat+0x24c/0x2a0 > [ 605.715194] SyS_unlinkat+0x16/0x30 > [ 605.719275] entry_SYSCALL_64_fastpath+0x1c/0xb1 > [ 605.724543] RIP: 0033:0x7f8b08073ed7 > [ 605.728676] RSP: 002b:7ffe70eff628 EFLAGS: 0206 ORIG_RAX: > 0107 > [ 605.736929] RAX: ffda RBX: 8147ea93 RCX: > 7f8b08073ed7 > [ 605.744711] RDX: RSI: 0145 RDI: > ff9c > [ 605.752512] RBP: c9000338ff88 R08: 0003 R09: > > [ 605.760276] R10: 015e R11: 0206 R12: > > [ 605.768040] R13: 7ffe70eff750 R14: 0144ff70 R15: > 01451230 > [ 605.775800] ? __this_cpu_preempt_check+0x13/0x20 > > Reported-by: Tomi Sarvela > Fixes: e9e360b08a44 ("pstore: Protect
[PATCH] drm/gma500: fix memory leak on edid
From: Colin Ian King edid is allocated on the call to psb_intel_sdvo_get_edid but not kfree'd at all, causing a memory leak. Fix this by kfree'ing the edid. (This may be null, but kfree can handle null frees). Detected by CoverityScan, CID#1090730 ("Resource Leak") Fixes: 5736995b473b ("gma500: Replace SDVO code with slightly modified version from i915") Signed-off-by: Colin Ian King --- drivers/gpu/drm/gma500/psb_intel_sdvo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/gma500/psb_intel_sdvo.c b/drivers/gpu/drm/gma500/psb_intel_sdvo.c index e787d376ba67..f38e6ad1ab9b 100644 --- a/drivers/gpu/drm/gma500/psb_intel_sdvo.c +++ b/drivers/gpu/drm/gma500/psb_intel_sdvo.c @@ -1650,6 +1650,7 @@ static bool psb_intel_sdvo_detect_hdmi_audio(struct drm_connector *connector) edid = psb_intel_sdvo_get_edid(connector); if (edid != NULL && edid->input & DRM_EDID_INPUT_DIGITAL) has_audio = drm_detect_monitor_audio(edid); + kfree(edid); return has_audio; } -- 2.11.0
Re: [PATCH] fs/pstore: Perform erase from a worker
On Fri, Mar 17, 2017 at 2:52 AM, Chris Wilson wrote: > In order to prevent a cyclic recursion between psi->read_mutex and the > inode_lock, we need to move the pse->erase to a worker. > > [ 605.374955] == > [ 605.381281] [ INFO: possible circular locking dependency detected ] > [ 605.387679] 4.11.0-rc2-CI-CI_DRM_2352+ #1 Not tainted > [ 605.392826] --- > [ 605.399196] rm/7298 is trying to acquire lock: > [ 605.403720] (>read_mutex){+.+.+.}, at: [] > pstore_unlink+0x3f/0xa0 > [ 605.412300] > [ 605.412300] but task is already holding lock: > [ 605.418237] (>s_type->i_mutex_key#14){++}, at: > [] vfs_unlink+0x4c/0x19 > 0 > [ 605.427397] > [ 605.427397] which lock already depends on the new lock. > [ 605.427397] > [ 605.435770] > [ 605.435770] the existing dependency chain (in reverse order) is: > [ 605.443396] > [ 605.443396] -> #1 (>s_type->i_mutex_key#14){++}: > [ 605.450347]lock_acquire+0xc9/0x220 > [ 605.454551]down_write+0x3f/0x70 > [ 605.458484]pstore_mkfile+0x1f4/0x460 > [ 605.462835]pstore_get_records+0x17a/0x320 > [ 605.467664]pstore_fill_super+0xa4/0xc0 > [ 605.472205]mount_single+0x89/0xb0 > [ 605.476314]pstore_mount+0x13/0x20 > [ 605.480411]mount_fs+0xf/0x90 > [ 605.484122]vfs_kern_mount+0x66/0x170 > [ 605.488464]do_mount+0x190/0xd50 > [ 605.492397]SyS_mount+0x90/0xd0 > [ 605.496212]entry_SYSCALL_64_fastpath+0x1c/0xb1 > [ 605.501496] > [ 605.501496] -> #0 (>read_mutex){+.+.+.}: > [ 605.507747]__lock_acquire+0x1ac0/0x1bb0 > [ 605.512401]lock_acquire+0xc9/0x220 > [ 605.516594]__mutex_lock+0x6e/0x990 > [ 605.520755]mutex_lock_nested+0x16/0x20 > [ 605.525279]pstore_unlink+0x3f/0xa0 > [ 605.529465]vfs_unlink+0xb5/0x190 > [ 605.533477]do_unlinkat+0x24c/0x2a0 > [ 605.537672]SyS_unlinkat+0x16/0x30 > [ 605.541781]entry_SYSCALL_64_fastpath+0x1c/0xb1 If I'm reading this right it's a race between mount and unlink... that's quite a corner case. :) > [ 605.547067] > [ 605.547067] other info that might help us debug this: > [ 605.547067] > [ 605.555221] Possible unsafe locking scenario: > [ 605.555221] > [ 605.561280]CPU0CPU1 > [ 605.565883] > [ 605.570502] lock(>s_type->i_mutex_key#14); > [ 605.575217]lock(>read_mutex); > [ 605.581803] > lock(>s_type->i_mutex_key#14); > [ 605.589159] lock(>read_mutex); I haven't had time to dig much yet, but I wonder if the locking order on unlink could just be reversed, and the deadlock would go away? > [ 605.593156] > [ 605.593156] *** DEADLOCK *** > [ 605.593156] > [ 605.599214] 3 locks held by rm/7298: > [ 605.602896] #0: (sb_writers#11){.+.+..}, at: [] > mnt_want_write+0x1f/0x50 > [ 605.611490] #1: (>s_type->i_mutex_key#14/1){+.+...}, at: > [] do_unlinkat+0 > x11c/0x2a0 > [ 605.621417] #2: (>s_type->i_mutex_key#14){++}, at: > [] vfs_unlink+0x4c > /0x190 > [ 605.630995] > [ 605.630995] stack backtrace: > [ 605.635450] CPU: 7 PID: 7298 Comm: rm Not tainted > 4.11.0-rc2-CI-CI_DRM_2352+ #1 > [ 605.642999] Hardware name: Gigabyte Technology Co., Ltd. > Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2 > 017 > [ 605.652305] Call Trace: > [ 605.654814] dump_stack+0x67/0x92 > [ 605.658184] print_circular_bug+0x1e0/0x2e0 > [ 605.662465] __lock_acquire+0x1ac0/0x1bb0 > [ 605.34] ? retint_kernel+0x2d/0x2d > [ 605.670456] lock_acquire+0xc9/0x220 > [ 605.674112] ? pstore_unlink+0x3f/0xa0 > [ 605.677970] ? pstore_unlink+0x3f/0xa0 > [ 605.681818] __mutex_lock+0x6e/0x990 > [ 605.685456] ? pstore_unlink+0x3f/0xa0 > [ 605.689791] ? pstore_unlink+0x3f/0xa0 > [ 605.694124] ? vfs_unlink+0x4c/0x190 > [ 605.698310] mutex_lock_nested+0x16/0x20 > [ 605.702859] pstore_unlink+0x3f/0xa0 > [ 605.707021] vfs_unlink+0xb5/0x190 > [ 605.711024] do_unlinkat+0x24c/0x2a0 > [ 605.715194] SyS_unlinkat+0x16/0x30 > [ 605.719275] entry_SYSCALL_64_fastpath+0x1c/0xb1 > [ 605.724543] RIP: 0033:0x7f8b08073ed7 > [ 605.728676] RSP: 002b:7ffe70eff628 EFLAGS: 0206 ORIG_RAX: > 0107 > [ 605.736929] RAX: ffda RBX: 8147ea93 RCX: > 7f8b08073ed7 > [ 605.744711] RDX: RSI: 0145 RDI: > ff9c > [ 605.752512] RBP: c9000338ff88 R08: 0003 R09: > > [ 605.760276] R10: 015e R11: 0206 R12: > > [ 605.768040] R13: 7ffe70eff750 R14: 0144ff70 R15: > 01451230 > [ 605.775800] ? __this_cpu_preempt_check+0x13/0x20 > > Reported-by: Tomi Sarvela > Fixes: e9e360b08a44 ("pstore: Protect unlink with read_mutex") > Bugzilla:
[PATCH 4.10 61/63] locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Niklas Casselcommit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream. We hang if SIGKILL has been sent, but the task is stuck in down_read() (after do_exit()), even though no task is doing down_write() on the rwsem in question: INFO: task libupnp:21868 blocked for more than 120 seconds. libupnp D0 21868 1 0x0818 ... Call Trace: __schedule() schedule() __down_read() do_exit() do_group_exit() __wake_up_parent() This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in the following commit: 04cafed7fc19 ("locking/rwsem: Fix down_write_killable()") ... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y. Signed-off-by: Niklas Cassel Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Andrew Morton Cc: Linus Torvalds Cc: Niklas Cassel Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()") Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-nikl...@axis.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/locking/rwsem-spinlock.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- a/kernel/locking/rwsem-spinlock.c +++ b/kernel/locking/rwsem-spinlock.c @@ -216,10 +216,8 @@ int __sched __down_write_common(struct r */ if (sem->count == 0) break; - if (signal_pending_state(state, current)) { - ret = -EINTR; - goto out; - } + if (signal_pending_state(state, current)) + goto out_nolock; set_task_state(tsk, state); raw_spin_unlock_irqrestore(>wait_lock, flags); schedule(); @@ -227,12 +225,19 @@ int __sched __down_write_common(struct r } /* got the lock */ sem->count = -1; -out: list_del(); raw_spin_unlock_irqrestore(>wait_lock, flags); return ret; + +out_nolock: + list_del(); + if (!list_empty(>wait_list)) + __rwsem_do_wake(sem, 1); + raw_spin_unlock_irqrestore(>wait_lock, flags); + + return -EINTR; } void __sched __down_write(struct rw_semaphore *sem)
[PATCH 4.10 61/63] locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Niklas Cassel commit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream. We hang if SIGKILL has been sent, but the task is stuck in down_read() (after do_exit()), even though no task is doing down_write() on the rwsem in question: INFO: task libupnp:21868 blocked for more than 120 seconds. libupnp D0 21868 1 0x0818 ... Call Trace: __schedule() schedule() __down_read() do_exit() do_group_exit() __wake_up_parent() This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in the following commit: 04cafed7fc19 ("locking/rwsem: Fix down_write_killable()") ... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y. Signed-off-by: Niklas Cassel Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Andrew Morton Cc: Linus Torvalds Cc: Niklas Cassel Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()") Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-nikl...@axis.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/locking/rwsem-spinlock.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) --- a/kernel/locking/rwsem-spinlock.c +++ b/kernel/locking/rwsem-spinlock.c @@ -216,10 +216,8 @@ int __sched __down_write_common(struct r */ if (sem->count == 0) break; - if (signal_pending_state(state, current)) { - ret = -EINTR; - goto out; - } + if (signal_pending_state(state, current)) + goto out_nolock; set_task_state(tsk, state); raw_spin_unlock_irqrestore(>wait_lock, flags); schedule(); @@ -227,12 +225,19 @@ int __sched __down_write_common(struct r } /* got the lock */ sem->count = -1; -out: list_del(); raw_spin_unlock_irqrestore(>wait_lock, flags); return ret; + +out_nolock: + list_del(); + if (!list_empty(>wait_list)) + __rwsem_do_wake(sem, 1); + raw_spin_unlock_irqrestore(>wait_lock, flags); + + return -EINTR; } void __sched __down_write(struct rw_semaphore *sem)
[PATCH 4.10 60/63] futex: Add missing error handling to FUTEX_REQUEUE_PI
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Peter Zijlstracommit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream. Thomas spotted that fixup_pi_state_owner() can return errors and we fail to unlock the rt_mutex in that case. Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Darren Hart Cc: juri.le...@arm.com Cc: bige...@linutronix.de Cc: xlp...@redhat.com Cc: rost...@goodmis.org Cc: mathieu.desnoy...@efficios.com Cc: jdesfos...@efficios.com Cc: dvh...@infradead.org Cc: bris...@redhat.com Link: http://lkml.kernel.org/r/20170304093558.867401...@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c |2 ++ 1 file changed, 2 insertions(+) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2896,6 +2896,8 @@ static int futex_wait_requeue_pi(u32 __u if (q.pi_state && (q.pi_state->owner != current)) { spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, , current); + if (ret && rt_mutex_owner(_state->pi_mutex) == current) + rt_mutex_unlock(_state->pi_mutex); /* * Drop the reference to the pi state which * the requeue_pi() code acquired for us.
[PATCH 4.10 59/63] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Peter Zijlstracommit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream. While working on the futex code, I stumbled over this potential use-after-free scenario. Dmitry triggered it later with syzkaller. pi_mutex is a pointer into pi_state, which we drop the reference on in unqueue_me_pi(). So any access to that pointer after that is bad. Since other sites already do rt_mutex_unlock() with hb->lock held, see for example futex_lock_pi(), simply move the unlock before unqueue_me_pi(). Reported-by: Dmitry Vyukov Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Darren Hart Cc: juri.le...@arm.com Cc: bige...@linutronix.de Cc: xlp...@redhat.com Cc: rost...@goodmis.org Cc: mathieu.desnoy...@efficios.com Cc: jdesfos...@efficios.com Cc: dvh...@infradead.org Cc: bris...@redhat.com Link: http://lkml.kernel.org/r/20170304093558.801744...@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2813,7 +2813,6 @@ static int futex_wait_requeue_pi(u32 __u { struct hrtimer_sleeper timeout, *to = NULL; struct rt_mutex_waiter rt_waiter; - struct rt_mutex *pi_mutex = NULL; struct futex_hash_bucket *hb; union futex_key key2 = FUTEX_KEY_INIT; struct futex_q q = futex_q_init; @@ -2905,6 +2904,8 @@ static int futex_wait_requeue_pi(u32 __u spin_unlock(q.lock_ptr); } } else { + struct rt_mutex *pi_mutex; + /* * We have been woken up by futex_unlock_pi(), a timeout, or a * signal. futex_unlock_pi() will not destroy the lock_ptr nor @@ -2928,18 +2929,19 @@ static int futex_wait_requeue_pi(u32 __u if (res) ret = (res < 0) ? res : 0; + /* +* If fixup_pi_state_owner() faulted and was unable to handle +* the fault, unlock the rt_mutex and return the fault to +* userspace. +*/ + if (ret && rt_mutex_owner(pi_mutex) == current) + rt_mutex_unlock(pi_mutex); + /* Unqueue and drop the lock. */ unqueue_me_pi(); } - /* -* If fixup_pi_state_owner() faulted and was unable to handle the -* fault, unlock the rt_mutex and return the fault to userspace. -*/ - if (ret == -EFAULT) { - if (pi_mutex && rt_mutex_owner(pi_mutex) == current) - rt_mutex_unlock(pi_mutex); - } else if (ret == -EINTR) { + if (ret == -EINTR) { /* * We've already been requeued, but cannot restart by calling * futex_lock_pi() directly. We could restart this syscall, but
[PATCH 4.10 60/63] futex: Add missing error handling to FUTEX_REQUEUE_PI
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Peter Zijlstra commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream. Thomas spotted that fixup_pi_state_owner() can return errors and we fail to unlock the rt_mutex in that case. Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Darren Hart Cc: juri.le...@arm.com Cc: bige...@linutronix.de Cc: xlp...@redhat.com Cc: rost...@goodmis.org Cc: mathieu.desnoy...@efficios.com Cc: jdesfos...@efficios.com Cc: dvh...@infradead.org Cc: bris...@redhat.com Link: http://lkml.kernel.org/r/20170304093558.867401...@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c |2 ++ 1 file changed, 2 insertions(+) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2896,6 +2896,8 @@ static int futex_wait_requeue_pi(u32 __u if (q.pi_state && (q.pi_state->owner != current)) { spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, , current); + if (ret && rt_mutex_owner(_state->pi_mutex) == current) + rt_mutex_unlock(_state->pi_mutex); /* * Drop the reference to the pi state which * the requeue_pi() code acquired for us.
[PATCH 4.10 59/63] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Peter Zijlstra commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream. While working on the futex code, I stumbled over this potential use-after-free scenario. Dmitry triggered it later with syzkaller. pi_mutex is a pointer into pi_state, which we drop the reference on in unqueue_me_pi(). So any access to that pointer after that is bad. Since other sites already do rt_mutex_unlock() with hb->lock held, see for example futex_lock_pi(), simply move the unlock before unqueue_me_pi(). Reported-by: Dmitry Vyukov Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Darren Hart Cc: juri.le...@arm.com Cc: bige...@linutronix.de Cc: xlp...@redhat.com Cc: rost...@goodmis.org Cc: mathieu.desnoy...@efficios.com Cc: jdesfos...@efficios.com Cc: dvh...@infradead.org Cc: bris...@redhat.com Link: http://lkml.kernel.org/r/20170304093558.801744...@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2813,7 +2813,6 @@ static int futex_wait_requeue_pi(u32 __u { struct hrtimer_sleeper timeout, *to = NULL; struct rt_mutex_waiter rt_waiter; - struct rt_mutex *pi_mutex = NULL; struct futex_hash_bucket *hb; union futex_key key2 = FUTEX_KEY_INIT; struct futex_q q = futex_q_init; @@ -2905,6 +2904,8 @@ static int futex_wait_requeue_pi(u32 __u spin_unlock(q.lock_ptr); } } else { + struct rt_mutex *pi_mutex; + /* * We have been woken up by futex_unlock_pi(), a timeout, or a * signal. futex_unlock_pi() will not destroy the lock_ptr nor @@ -2928,18 +2929,19 @@ static int futex_wait_requeue_pi(u32 __u if (res) ret = (res < 0) ? res : 0; + /* +* If fixup_pi_state_owner() faulted and was unable to handle +* the fault, unlock the rt_mutex and return the fault to +* userspace. +*/ + if (ret && rt_mutex_owner(pi_mutex) == current) + rt_mutex_unlock(pi_mutex); + /* Unqueue and drop the lock. */ unqueue_me_pi(); } - /* -* If fixup_pi_state_owner() faulted and was unable to handle the -* fault, unlock the rt_mutex and return the fault to userspace. -*/ - if (ret == -EFAULT) { - if (pi_mutex && rt_mutex_owner(pi_mutex) == current) - rt_mutex_unlock(pi_mutex); - } else if (ret == -EINTR) { + if (ret == -EINTR) { /* * We've already been requeued, but cannot restart by calling * futex_lock_pi() directly. We could restart this syscall, but
[PATCH 4.10 49/63] arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Marc Zyngiercommit 68925176296a8b995e503349200e256674bfe5ac upstream. When invalidating guest TLBs, special care must be taken to actually shoot the guest TLBs and not the host ones if we're running on a VHE system. This is controlled by the HCR_EL2.TGE bit, which we forget to clear before invalidating TLBs. Address the issue by introducing two wrappers (__tlb_switch_to_guest and __tlb_switch_to_host) that take care of both the VTTBR_EL2 and HCR_EL2.TGE switching. Reported-by: Tomasz Nowicki Tested-by: Tomasz Nowicki Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/hyp/tlb.c | 64 --- 1 file changed, 55 insertions(+), 9 deletions(-) --- a/arch/arm64/kvm/hyp/tlb.c +++ b/arch/arm64/kvm/hyp/tlb.c @@ -17,14 +17,62 @@ #include +static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm) +{ + u64 val; + + /* +* With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}, and +* most TLB operations target EL2/EL0. In order to affect the +* guest TLBs (EL1/EL0), we need to change one of these two +* bits. Changing E2H is impossible (goodbye TTBR1_EL2), so +* let's flip TGE before executing the TLB operation. +*/ + write_sysreg(kvm->arch.vttbr, vttbr_el2); + val = read_sysreg(hcr_el2); + val &= ~HCR_TGE; + write_sysreg(val, hcr_el2); + isb(); +} + +static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm) +{ + write_sysreg(kvm->arch.vttbr, vttbr_el2); + isb(); +} + +static hyp_alternate_select(__tlb_switch_to_guest, + __tlb_switch_to_guest_nvhe, + __tlb_switch_to_guest_vhe, + ARM64_HAS_VIRT_HOST_EXTN); + +static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm) +{ + /* +* We're done with the TLB operation, let's restore the host's +* view of HCR_EL2. +*/ + write_sysreg(0, vttbr_el2); + write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2); +} + +static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm) +{ + write_sysreg(0, vttbr_el2); +} + +static hyp_alternate_select(__tlb_switch_to_host, + __tlb_switch_to_host_nvhe, + __tlb_switch_to_host_vhe, + ARM64_HAS_VIRT_HOST_EXTN); + void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) { dsb(ishst); /* Switch to requested VMID */ kvm = kern_hyp_va(kvm); - write_sysreg(kvm->arch.vttbr, vttbr_el2); - isb(); + __tlb_switch_to_guest()(kvm); /* * We could do so much better if we had the VA as well. @@ -45,7 +93,7 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa dsb(ish); isb(); - write_sysreg(0, vttbr_el2); + __tlb_switch_to_host()(kvm); } void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm) @@ -54,14 +102,13 @@ void __hyp_text __kvm_tlb_flush_vmid(str /* Switch to requested VMID */ kvm = kern_hyp_va(kvm); - write_sysreg(kvm->arch.vttbr, vttbr_el2); - isb(); + __tlb_switch_to_guest()(kvm); asm volatile("tlbi vmalls12e1is" : : ); dsb(ish); isb(); - write_sysreg(0, vttbr_el2); + __tlb_switch_to_host()(kvm); } void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu) @@ -69,14 +116,13 @@ void __hyp_text __kvm_tlb_flush_local_vm struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm); /* Switch to requested VMID */ - write_sysreg(kvm->arch.vttbr, vttbr_el2); - isb(); + __tlb_switch_to_guest()(kvm); asm volatile("tlbi vmalle1" : : ); dsb(nsh); isb(); - write_sysreg(0, vttbr_el2); + __tlb_switch_to_host()(kvm); } void __hyp_text __kvm_flush_vm_context(void)
[PATCH 4.10 49/63] arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Marc Zyngier commit 68925176296a8b995e503349200e256674bfe5ac upstream. When invalidating guest TLBs, special care must be taken to actually shoot the guest TLBs and not the host ones if we're running on a VHE system. This is controlled by the HCR_EL2.TGE bit, which we forget to clear before invalidating TLBs. Address the issue by introducing two wrappers (__tlb_switch_to_guest and __tlb_switch_to_host) that take care of both the VTTBR_EL2 and HCR_EL2.TGE switching. Reported-by: Tomasz Nowicki Tested-by: Tomasz Nowicki Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/hyp/tlb.c | 64 --- 1 file changed, 55 insertions(+), 9 deletions(-) --- a/arch/arm64/kvm/hyp/tlb.c +++ b/arch/arm64/kvm/hyp/tlb.c @@ -17,14 +17,62 @@ #include +static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm) +{ + u64 val; + + /* +* With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}, and +* most TLB operations target EL2/EL0. In order to affect the +* guest TLBs (EL1/EL0), we need to change one of these two +* bits. Changing E2H is impossible (goodbye TTBR1_EL2), so +* let's flip TGE before executing the TLB operation. +*/ + write_sysreg(kvm->arch.vttbr, vttbr_el2); + val = read_sysreg(hcr_el2); + val &= ~HCR_TGE; + write_sysreg(val, hcr_el2); + isb(); +} + +static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm) +{ + write_sysreg(kvm->arch.vttbr, vttbr_el2); + isb(); +} + +static hyp_alternate_select(__tlb_switch_to_guest, + __tlb_switch_to_guest_nvhe, + __tlb_switch_to_guest_vhe, + ARM64_HAS_VIRT_HOST_EXTN); + +static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm) +{ + /* +* We're done with the TLB operation, let's restore the host's +* view of HCR_EL2. +*/ + write_sysreg(0, vttbr_el2); + write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2); +} + +static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm) +{ + write_sysreg(0, vttbr_el2); +} + +static hyp_alternate_select(__tlb_switch_to_host, + __tlb_switch_to_host_nvhe, + __tlb_switch_to_host_vhe, + ARM64_HAS_VIRT_HOST_EXTN); + void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) { dsb(ishst); /* Switch to requested VMID */ kvm = kern_hyp_va(kvm); - write_sysreg(kvm->arch.vttbr, vttbr_el2); - isb(); + __tlb_switch_to_guest()(kvm); /* * We could do so much better if we had the VA as well. @@ -45,7 +93,7 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa dsb(ish); isb(); - write_sysreg(0, vttbr_el2); + __tlb_switch_to_host()(kvm); } void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm) @@ -54,14 +102,13 @@ void __hyp_text __kvm_tlb_flush_vmid(str /* Switch to requested VMID */ kvm = kern_hyp_va(kvm); - write_sysreg(kvm->arch.vttbr, vttbr_el2); - isb(); + __tlb_switch_to_guest()(kvm); asm volatile("tlbi vmalls12e1is" : : ); dsb(ish); isb(); - write_sysreg(0, vttbr_el2); + __tlb_switch_to_host()(kvm); } void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu) @@ -69,14 +116,13 @@ void __hyp_text __kvm_tlb_flush_local_vm struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm); /* Switch to requested VMID */ - write_sysreg(kvm->arch.vttbr, vttbr_el2); - isb(); + __tlb_switch_to_guest()(kvm); asm volatile("tlbi vmalle1" : : ); dsb(nsh); isb(); - write_sysreg(0, vttbr_el2); + __tlb_switch_to_host()(kvm); } void __hyp_text __kvm_flush_vm_context(void)
[PATCH 4.10 50/63] irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Shanker Donthinenicommit 90922a2d03d84de36bf8a9979d62580102f31a92 upstream. On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware implementation uses 16Bytes for Interrupt Translation Entry (ITE), but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size. It might cause kernel memory corruption depending on the number of MSI(x) that are configured and the amount of memory that has been allocated for ITEs in its_create_device(). This patch fixes the potential memory corruption by setting the correct ITE size to 16Bytes. Cc: sta...@vger.kernel.org Signed-off-by: Shanker Donthineni Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/silicon-errata.txt | 44 + arch/arm64/Kconfig | 10 +++ drivers/irqchip/irq-gic-v3-its.c | 16 3 files changed, 49 insertions(+), 21 deletions(-) --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -42,24 +42,26 @@ file acts as a registry of software work will be updated when new workarounds are committed and backported to stable kernels. -| Implementor| Component | Erratum ID | Kconfig | -++-+-+-+ -| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | -| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | -| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | -| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | -| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | -| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | -| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | -| ARM| Cortex-A57 | #852523 | N/A | -| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | -| ARM| Cortex-A72 | #853709 | N/A | -| ARM| MMU-500 | #841119,#826419 | N/A | -|| | | | -| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 | -| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 | -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | -| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | -| Cavium | ThunderX SMMUv2 | #27704 | N/A| -|| | | | -| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | +| Implementor| Component | Erratum ID | Kconfig | +++-+-+-+ +| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | +| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | +| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | +| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | +| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +| ARM| Cortex-A57 | #852523 | N/A | +| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| ARM| Cortex-A72 | #853709 | N/A | +| ARM| MMU-500 | #841119,#826419 | N/A | +|| | | | +| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 | +| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 | +| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | +| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | +| Cavium | ThunderX SMMUv2 | #27704 | N/A | +|| | | | +| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | +|| | |
[PATCH 4.10 62/63] crypto: powerpc - Fix initialisation of crc32c context
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Daniel Axtenscommit aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream. Turning on crypto self-tests on a POWER8 shows: alg: hash: Test 1 failed for crc32c-vpmsum : ff ff ff ff Comparing the code with the Intel CRC32c implementation on which ours is based shows that we are doing an init with 0, not ~0 as CRC32c requires. This probably wasn't caught because btrfs does its own weird open-coded initialisation. Initialise our internal context to ~0 on init. This makes the self-tests pass, and btrfs continues to work. Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: Anton Blanchard Signed-off-by: Daniel Axtens Acked-by: Anton Blanchard Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/crypto/crc32c-vpmsum_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -52,7 +52,7 @@ static int crc32c_vpmsum_cra_init(struct { u32 *key = crypto_tfm_ctx(tfm); - *key = 0; + *key = ~0; return 0; }
[PATCH 4.10 04/63] net/mlx5e: Update MPWQE stride size when modifying CQE compress state
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Saeed Mahameed[ Upstream commit 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 ] When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size = 256B CQE compress OFF ==> stride size = 64B This is already done on driver load via mlx5e_set_rq_type_params, all we need is just to call it on arbitrary admin changes of cqe compression state via priv flags or when changing timestamping state (as it is mutually exclusive with cqe compression). This bug introduces no functional damage, it only makes cqe compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Tested: ethtool --set-priv-flags ethxx rx_cqe_compress on pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Saeed Mahameed Reviewed-by: Tariq Toukan Cc: kernel-t...@fb.com Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en.h |1 + drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |1 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c|2 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |1 + 4 files changed, 4 insertions(+), 1 deletion(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -803,6 +803,7 @@ int mlx5e_get_max_linkspeed(struct mlx5_ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); +void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type); static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5_wqe_ctrl_seg *ctrl, int bf_sz) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1477,6 +1477,7 @@ static int set_pflag_rx_cqe_compress(str MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable); priv->params.rx_cqe_compress_def = enable; + mlx5e_set_rq_type_params(priv, priv->params.rq_wq_type); if (reset) err = mlx5e_open_locked(netdev); --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -78,7 +78,7 @@ static bool mlx5e_check_fragmented_strid MLX5_CAP_ETH(mdev, reg_umr_sq); } -static void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) +void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) { priv->params.rq_wq_type = rq_type; priv->params.lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ; --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -172,6 +172,7 @@ void mlx5e_modify_rx_cqe_compression(str mlx5e_close_locked(priv->netdev); MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, val); + mlx5e_set_rq_type_params(priv, priv->params.rq_wq_type); if (was_opened) mlx5e_open_locked(priv->netdev);
[PATCH 4.10 50/63] irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Shanker Donthineni commit 90922a2d03d84de36bf8a9979d62580102f31a92 upstream. On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware implementation uses 16Bytes for Interrupt Translation Entry (ITE), but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size. It might cause kernel memory corruption depending on the number of MSI(x) that are configured and the amount of memory that has been allocated for ITEs in its_create_device(). This patch fixes the potential memory corruption by setting the correct ITE size to 16Bytes. Cc: sta...@vger.kernel.org Signed-off-by: Shanker Donthineni Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/silicon-errata.txt | 44 + arch/arm64/Kconfig | 10 +++ drivers/irqchip/irq-gic-v3-its.c | 16 3 files changed, 49 insertions(+), 21 deletions(-) --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -42,24 +42,26 @@ file acts as a registry of software work will be updated when new workarounds are committed and backported to stable kernels. -| Implementor| Component | Erratum ID | Kconfig | -++-+-+-+ -| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | -| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | -| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | -| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | -| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | -| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | -| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | -| ARM| Cortex-A57 | #852523 | N/A | -| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | -| ARM| Cortex-A72 | #853709 | N/A | -| ARM| MMU-500 | #841119,#826419 | N/A | -|| | | | -| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 | -| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 | -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | -| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | -| Cavium | ThunderX SMMUv2 | #27704 | N/A| -|| | | | -| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | +| Implementor| Component | Erratum ID | Kconfig | +++-+-+-+ +| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | +| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | +| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | +| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | +| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +| ARM| Cortex-A57 | #852523 | N/A | +| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| ARM| Cortex-A72 | #853709 | N/A | +| ARM| MMU-500 | #841119,#826419 | N/A | +|| | | | +| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 | +| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 | +| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | +| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | +| Cavium | ThunderX SMMUv2 | #27704 | N/A | +|| | | | +| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | +|| | | | +| Qualcomm Tech. | QDF2400 ITS | E0065 |
[PATCH 4.10 62/63] crypto: powerpc - Fix initialisation of crc32c context
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Daniel Axtens commit aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream. Turning on crypto self-tests on a POWER8 shows: alg: hash: Test 1 failed for crc32c-vpmsum : ff ff ff ff Comparing the code with the Intel CRC32c implementation on which ours is based shows that we are doing an init with 0, not ~0 as CRC32c requires. This probably wasn't caught because btrfs does its own weird open-coded initialisation. Initialise our internal context to ~0 on init. This makes the self-tests pass, and btrfs continues to work. Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: Anton Blanchard Signed-off-by: Daniel Axtens Acked-by: Anton Blanchard Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/crypto/crc32c-vpmsum_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -52,7 +52,7 @@ static int crc32c_vpmsum_cra_init(struct { u32 *key = crypto_tfm_ctx(tfm); - *key = 0; + *key = ~0; return 0; }
[PATCH 4.10 04/63] net/mlx5e: Update MPWQE stride size when modifying CQE compress state
4.10-stable review patch. If anyone has any objections, please let me know. -- From: Saeed Mahameed [ Upstream commit 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 ] When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size = 256B CQE compress OFF ==> stride size = 64B This is already done on driver load via mlx5e_set_rq_type_params, all we need is just to call it on arbitrary admin changes of cqe compression state via priv flags or when changing timestamping state (as it is mutually exclusive with cqe compression). This bug introduces no functional damage, it only makes cqe compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Tested: ethtool --set-priv-flags ethxx rx_cqe_compress on pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Saeed Mahameed Reviewed-by: Tariq Toukan Cc: kernel-t...@fb.com Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en.h |1 + drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |1 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c|2 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |1 + 4 files changed, 4 insertions(+), 1 deletion(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -803,6 +803,7 @@ int mlx5e_get_max_linkspeed(struct mlx5_ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); +void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type); static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5_wqe_ctrl_seg *ctrl, int bf_sz) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1477,6 +1477,7 @@ static int set_pflag_rx_cqe_compress(str MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable); priv->params.rx_cqe_compress_def = enable; + mlx5e_set_rq_type_params(priv, priv->params.rq_wq_type); if (reset) err = mlx5e_open_locked(netdev); --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -78,7 +78,7 @@ static bool mlx5e_check_fragmented_strid MLX5_CAP_ETH(mdev, reg_umr_sq); } -static void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) +void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) { priv->params.rq_wq_type = rq_type; priv->params.lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ; --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -172,6 +172,7 @@ void mlx5e_modify_rx_cqe_compression(str mlx5e_close_locked(priv->netdev); MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, val); + mlx5e_set_rq_type_params(priv, priv->params.rq_wq_type); if (was_opened) mlx5e_open_locked(priv->netdev);
[PATCH 4.9 18/93] ipv6: orphan skbs in reassembly unit
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Eric Dumazet[ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ] Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer == BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr 880062da0060 by task a.out/4140 page:ea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x1008100(slab|head) raw: 01008100 000180130013 raw: dead0100 dead0200 88006741f140 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:7ff268e0ed98 EFLAGS: 0206 ORIG_RAX: 0001 RAX: ffda RBX: 7ff268e0f9c0 RCX: 7ff26e6f5b79 RDX: 0010 RSI: 20f50fe1 RDI: 0003 RBP: 7ff26ebc1220 R08: R09: R10: R11: 0206 R12: R13: 7ff268e0f9c0 R14: 7ff26efec040 R15: 0003 The buggy address belongs to the object at 880062da which belongs to the cache RAWv6 of size 1504 The buggy address 880062da0060 is located 96 bytes inside of 1504-byte region [880062da, 880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0
[PATCH 4.9 18/93] ipv6: orphan skbs in reassembly unit
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Eric Dumazet [ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ] Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer == BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr 880062da0060 by task a.out/4140 page:ea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x1008100(slab|head) raw: 01008100 000180130013 raw: dead0100 dead0200 88006741f140 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:7ff268e0ed98 EFLAGS: 0206 ORIG_RAX: 0001 RAX: ffda RBX: 7ff268e0f9c0 RCX: 7ff26e6f5b79 RDX: 0010 RSI: 20f50fe1 RDI: 0003 RBP: 7ff26ebc1220 R08: R09: R10: R11: 0206 R12: R13: 7ff268e0f9c0 R14: 7ff26efec040 R15: 0003 The buggy address belongs to the object at 880062da which belongs to the cache RAWv6 of size 1504 The buggy address 880062da0060 is located 96 bytes inside of 1504-byte region [880062da, 880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track
Re: [PATCH v3] usb: hub: Fix error loop seen after hub communication errors
Hi, On Thu, Mar 16, 2017 at 12:24 PM, Guenter Roeckwrote: > @@ -1198,7 +1201,7 @@ static void hub_activate(struct usb_hub *hub, enum > hub_activation_type type) > > /* Scan all ports that need attention */ > kick_hub_wq(hub); > - > +abort: One tiny nit that could be done when applying this patch is to add a space before "abort". Other goto labels in this function are preceded by a space and it's sane to try to match the existing coding convention in the function rather than trying to mix and match. Other than that this patch seems sane to me, but I am by no means an expert on this code. ;) -Doug
[PATCH 4.10 37/63] uapi: fix linux/packet_diag.h userspace compilation error
4.10-stable review patch. If anyone has any objections, please let me know. -- From: "Dmitry V. Levin"[ Upstream commit 745cb7f8a5de0805cade3de3991b7a95317c7c73 ] Replace MAX_ADDR_LEN with its numeric value to fix the following linux/packet_diag.h userspace compilation error: /usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function) __u8 pdmc_addr[MAX_ADDR_LEN]; This is not the first case in the UAPI where the numeric value of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h already does the same: $ grep MAX_ADDR_LEN include/uapi/linux/if_link.h __u8 mac[32]; /* MAX_ADDR_LEN */ There are no UAPI headers besides these two that use MAX_ADDR_LEN. Signed-off-by: Dmitry V. Levin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/packet_diag.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/include/uapi/linux/packet_diag.h +++ b/include/uapi/linux/packet_diag.h @@ -64,7 +64,7 @@ struct packet_diag_mclist { __u32 pdmc_count; __u16 pdmc_type; __u16 pdmc_alen; - __u8pdmc_addr[MAX_ADDR_LEN]; + __u8pdmc_addr[32]; /* MAX_ADDR_LEN */ }; struct packet_diag_ring {
Re: [PATCH v3] usb: hub: Fix error loop seen after hub communication errors
Hi, On Thu, Mar 16, 2017 at 12:24 PM, Guenter Roeck wrote: > @@ -1198,7 +1201,7 @@ static void hub_activate(struct usb_hub *hub, enum > hub_activation_type type) > > /* Scan all ports that need attention */ > kick_hub_wq(hub); > - > +abort: One tiny nit that could be done when applying this patch is to add a space before "abort". Other goto labels in this function are preceded by a space and it's sane to try to match the existing coding convention in the function rather than trying to mix and match. Other than that this patch seems sane to me, but I am by no means an expert on this code. ;) -Doug
[PATCH 4.10 37/63] uapi: fix linux/packet_diag.h userspace compilation error
4.10-stable review patch. If anyone has any objections, please let me know. -- From: "Dmitry V. Levin" [ Upstream commit 745cb7f8a5de0805cade3de3991b7a95317c7c73 ] Replace MAX_ADDR_LEN with its numeric value to fix the following linux/packet_diag.h userspace compilation error: /usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function) __u8 pdmc_addr[MAX_ADDR_LEN]; This is not the first case in the UAPI where the numeric value of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h already does the same: $ grep MAX_ADDR_LEN include/uapi/linux/if_link.h __u8 mac[32]; /* MAX_ADDR_LEN */ There are no UAPI headers besides these two that use MAX_ADDR_LEN. Signed-off-by: Dmitry V. Levin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/packet_diag.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/include/uapi/linux/packet_diag.h +++ b/include/uapi/linux/packet_diag.h @@ -64,7 +64,7 @@ struct packet_diag_mclist { __u32 pdmc_count; __u16 pdmc_type; __u16 pdmc_alen; - __u8pdmc_addr[MAX_ADDR_LEN]; + __u8pdmc_addr[32]; /* MAX_ADDR_LEN */ }; struct packet_diag_ring {
Re: [PATCH v2 06/14] mmc: dw_mmc: simplify optional reset handling
On 20 March 2017 at 12:00, Philipp Zabelwrote: > On Mon, 2017-03-20 at 11:49 +0100, Andrzej Hajda wrote: >> On 20.03.2017 11:27, Philipp Zabel wrote: > [...] >> > diff --git a/include/linux/reset.h b/include/linux/reset.h >> > index 86b4ed75359e8..c905ff1c21ec6 100644 >> > --- a/include/linux/reset.h >> > +++ b/include/linux/reset.h >> > @@ -74,14 +74,14 @@ static inline struct reset_control >> > *__of_reset_control_get( >> > const char *id, int index, bool shared, >> > bool optional) >> > { >> > - return ERR_PTR(-ENOTSUPP); >> > + return optional ? NULL : ERR_PTR(-ENOTSUPP); >> > } >> > >> > static inline struct reset_control *__devm_reset_control_get( >> > struct device *dev, const char *id, >> > int index, bool shared, bool optional) >> > { >> > - return ERR_PTR(-ENOTSUPP); >> > + return optional ? NULL : ERR_PTR(-ENOTSUPP); >> > } >> > >> > #endif /* CONFIG_RESET_CONTROLLER */ >> > -->8-- >> >> In dw_mmc.c file there are also unconditional calls to >> reset_control_assert, with disabled RESET_CONTROLLER it will cause >> unexpected WARNs. >> Anyway if you change reset API as above I think you should remove all >> warns from reset stubs, because NULL reset is valid, but these warns are >> there for reason - contradiction. > > You are right, I have to let go of those, too. Until fixed, I have dropped the three changes from my next branch related to this. Please re-post when fixed. Kind regards Uffe > > regards > Philipp >
Re: [PATCH v2 06/14] mmc: dw_mmc: simplify optional reset handling
On 20 March 2017 at 12:00, Philipp Zabel wrote: > On Mon, 2017-03-20 at 11:49 +0100, Andrzej Hajda wrote: >> On 20.03.2017 11:27, Philipp Zabel wrote: > [...] >> > diff --git a/include/linux/reset.h b/include/linux/reset.h >> > index 86b4ed75359e8..c905ff1c21ec6 100644 >> > --- a/include/linux/reset.h >> > +++ b/include/linux/reset.h >> > @@ -74,14 +74,14 @@ static inline struct reset_control >> > *__of_reset_control_get( >> > const char *id, int index, bool shared, >> > bool optional) >> > { >> > - return ERR_PTR(-ENOTSUPP); >> > + return optional ? NULL : ERR_PTR(-ENOTSUPP); >> > } >> > >> > static inline struct reset_control *__devm_reset_control_get( >> > struct device *dev, const char *id, >> > int index, bool shared, bool optional) >> > { >> > - return ERR_PTR(-ENOTSUPP); >> > + return optional ? NULL : ERR_PTR(-ENOTSUPP); >> > } >> > >> > #endif /* CONFIG_RESET_CONTROLLER */ >> > -->8-- >> >> In dw_mmc.c file there are also unconditional calls to >> reset_control_assert, with disabled RESET_CONTROLLER it will cause >> unexpected WARNs. >> Anyway if you change reset API as above I think you should remove all >> warns from reset stubs, because NULL reset is valid, but these warns are >> there for reason - contradiction. > > You are right, I have to let go of those, too. Until fixed, I have dropped the three changes from my next branch related to this. Please re-post when fixed. Kind regards Uffe > > regards > Philipp >
Re: [PATCH v5 38/39] media: imx: csi: fix crop rectangle reset in sink set_fmt
On Mon, Mar 20, 2017 at 06:40:21PM +0100, Philipp Zabel wrote: > On Mon, 2017-03-20 at 14:17 +, Russell King - ARM Linux wrote: > > I have tripped over a bug in media-ctl when specifying both a crop and > > compose rectangle - the --help output suggests that "," should be used > > to separate them. media-ctl rejects that, telling me the character at > > the "," should be "]". Replacing the "," with " " allows media-ctl to > > accept it and set both rectangles, so it sounds like a parser bug - I've > > not looked into this any further yet. > > I can confirm this. I don't see any place in > v4l2_subdev_parse_pad_format that handles the "," separator. There's > just whitespace skipping between the v4l2-properties. Maybe this is the easiest solution: utils/media-ctl/options.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utils/media-ctl/options.c b/utils/media-ctl/options.c index 83ca1ca..8b97874 100644 --- a/utils/media-ctl/options.c +++ b/utils/media-ctl/options.c @@ -65,7 +65,7 @@ static void usage(const char *argv0) printf("\tentity = entity-number | ( '\"' entity-name '\"' ) ;\n"); printf("\n"); printf("\tv4l2= pad '[' v4l2-properties ']' ;\n"); - printf("\tv4l2-properties = v4l2-property { ',' v4l2-property } ;\n"); + printf("\tv4l2-properties = v4l2-property { ' '* v4l2-property } ;\n"); printf("\tv4l2-property = v4l2-mbusfmt | v4l2-crop | v4l2-interval\n"); printf("\t| v4l2-compose | v4l2-interval ;\n"); printf("\tv4l2-mbusfmt= 'fmt:' fcc '/' size ; { 'field:' v4l2-field ; } { 'colorspace:' v4l2-colorspace ; }\n"); ;) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [PATCH v21 13/13] acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver
On Tue, Mar 21, 2017 at 01:57:58AM +0800, Fu Wei wrote: > On 18 March 2017 at 04:01, Mark Rutlandwrote: > > On Tue, Feb 07, 2017 at 02:50:15AM +0800, fu@linaro.org wrote: > > I've not been able to find where the ACPI spec says that zero is not a > > valid GSIV. This may simply be an oversight/ambiguity in the spec. > > > > Is there any statement to that effect? > > you are right, zero is a valid GSIV, I will delete this check. Thanks That being the case, how does one describe a watchdog that does not have an interrupt? As I mentioned, I think this is an oversight/ambiguity in the spec tat we should address. > > My reading of SBSA is that there is one watchdog in the system. > > > > Is that not the case? > > do you mean: > --- > 4.2.4 Watchdogs > The base server system implements a Generic Watchdog as specified in > APPENDIX A: Generic Watchdog. > --- > > I am not sure about that if this is saying "we only have one SBSA > watchdog in a system" > > would you let me know where mention it? Do I miss something? My reading was that the 'a' above meant a single element. i.e. The base server system implements _a_ Generic Watchdog as specified in APPENDIX A: Generic Watchdog. Subsequently in 4.2.5, it is stated: In this scenario, the system wakeup timer or generic watchdog is still required to send its interrupt. ... which only makes sense if there is a single watchdog in the system. Perhaps this is an oversight in the specification. Thanks, Mark.