[PATCH 3.4 099/125] xen/pciback: Do not install an IRQ handler for MSI interrupts.
From: Konrad Rzeszutek Wilk3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0 upstream. Otherwise an guest can subvert the generic MSI code to trigger an BUG_ON condition during MSI interrupt freeing: for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); Xen PCI backed installs an IRQ handler (request_irq) for the dev->irq whenever the guest writes PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is done in case the device has legacy interrupts the GSI line is shared by the backend devices. To subvert the backend the guest needs to make the backend to change the dev->irq from the GSI to the MSI interrupt line, make the backend allocate an interrupt handler, and then command the backend to free the MSI interrupt and hit the BUG_ON. Since the backend only calls 'request_irq' when the guest writes to the PCI_COMMAND register the guest needs to call XEN_PCI_OP_enable_msi before any other operation. This will cause the generic MSI code to setup an MSI entry and populate dev->irq with the new PIRQ value. Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY and cause the backend to setup an IRQ handler for dev->irq (which instead of the GSI value has the MSI pirq). See 'xen_pcibk_control_isr'. Then the guest disables the MSI: XEN_PCI_OP_disable_msi which ends up triggering the BUG_ON condition in 'free_msi_irqs' as there is an IRQ handler for the entry->irq (dev->irq). Note that this cannot be done using MSI-X as the generic code does not over-write dev->irq with the MSI-X PIRQ values. The patch inhibits setting up the IRQ handler if MSI or MSI-X (for symmetry reasons) code had been called successfully. P.S. Xen PCIBack when it sets up the device for the guest consumption ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device). XSA-120 addendum patch removed that - however when upstreaming said addendum we found that it caused issues with qemu upstream. That has now been fixed in qemu upstream. This is part of XSA-157 Reviewed-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- drivers/xen/xen-pciback/pciback_ops.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index f7ce4de..90bc022 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -69,6 +69,13 @@ static void xen_pcibk_control_isr(struct pci_dev *dev, int reset) enable ? "enable" : "disable"); if (enable) { + /* +* The MSI or MSI-X should not have an IRQ handler. Otherwise +* if the guest terminates we BUG_ON in free_msi_irqs. +*/ + if (dev->msi_enabled || dev->msix_enabled) + goto out; + rc = request_irq(dev_data->irq, xen_pcibk_guest_interrupt, IRQF_SHARED, dev_data->irq_name, dev); -- 1.9.1
[PATCH 3.4 105/125] parisc: Fix syscall restarts
From: Helge Deller <del...@gmx.de> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream. On parisc syscalls which are interrupted by signals sometimes failed to restart and instead returned -ENOSYS which in the worst case lead to userspace crashes. A similiar problem existed on MIPS and was fixed by commit e967ef02 ("MIPS: Fix restart of indirect syscalls"). On parisc the current syscall restart code assumes that all syscall callers load the syscall number in the delay slot of the ble instruction. That's how it is e.g. done in the unistd.h header file: ble 0x100(%sr2, %r0) ldi #syscall_nr, %r20 Because of that assumption the current code never restored %r20 before returning to userspace. This assumption is at least not true for code which uses the glibc syscall() function, which instead uses this syntax: ble 0x100(%sr2, %r0) copy regX, %r20 where regX depend on how the compiler optimizes the code and register usage. This patch fixes this problem by adding code to analyze how the syscall number is loaded in the delay branch and - if needed - copy the syscall number to regX prior returning to userspace for the syscall restart. Signed-off-by: Helge Deller <del...@gmx.de> Cc: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- arch/parisc/kernel/signal.c | 67 +++-- 1 file changed, 52 insertions(+), 15 deletions(-) diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c index 12c1ed3..c626855 100644 --- a/arch/parisc/kernel/signal.c +++ b/arch/parisc/kernel/signal.c @@ -468,6 +468,55 @@ handle_signal(unsigned long sig, siginfo_t *info, struct k_sigaction *ka, return 1; } +/* + * Check how the syscall number gets loaded into %r20 within + * the delay branch in userspace and adjust as needed. + */ + +static void check_syscallno_in_delay_branch(struct pt_regs *regs) +{ + u32 opcode, source_reg; + u32 __user *uaddr; + int err; + + /* Usually we don't have to restore %r20 (the system call number) +* because it gets loaded in the delay slot of the branch external +* instruction via the ldi instruction. +* In some cases a register-to-register copy instruction might have +* been used instead, in which case we need to copy the syscall +* number into the source register before returning to userspace. +*/ + + /* A syscall is just a branch, so all we have to do is fiddle the +* return pointer so that the ble instruction gets executed again. +*/ + regs->gr[31] -= 8; /* delayed branching */ + + /* Get assembler opcode of code in delay branch */ + uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4); + err = get_user(opcode, uaddr); + if (err) + return; + + /* Check if delay branch uses "ldi int,%r20" */ + if ((opcode & 0x) == 0x3414) + return; /* everything ok, just return */ + + /* Check if delay branch uses "nop" */ + if (opcode == INSN_NOP) + return; + + /* Check if delay branch uses "copy %rX,%r20" */ + if ((opcode & 0xffe0) == 0x08000254) { + source_reg = (opcode >> 16) & 31; + regs->gr[source_reg] = regs->gr[20]; + return; + } + + pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n", + current->comm, task_pid_nr(current), opcode); +} + static inline void syscall_restart(struct pt_regs *regs, struct k_sigaction *ka) { @@ -489,10 +538,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction *ka) } /* fallthrough */ case -ERESTARTNOINTR: - /* A syscall is just a branch, so all -* we have to do is fiddle the return pointer. -*/ - regs->gr[31] -= 8; /* delayed branching */ + check_syscallno_in_delay_branch(regs); /* Preserve original r28. */ regs->gr[28] = regs->orig_r28; break; @@ -543,18 +589,9 @@ insert_restart_trampoline(struct pt_regs *regs) } case -ERESTARTNOHAND: case -ERESTARTSYS: - case -ERESTARTNOINTR: { - /* Hooray for delayed branching. We don't -* have to restore %r20 (the system call -* number) because it gets loaded in the delay -* slot of the branch external instruction. -*/ - regs->gr[31] -= 8; - /* Preserve original r28. */ - regs->gr[28] = regs->orig_r28; - +
[PATCH 3.4 081/125] mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
From: Michal Hocko <mho...@suse.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 373ccbe5927034b55bdc80b0f8b54d6e13fe8d12 upstream. Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99763e ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko <mho...@suse.com> Reported-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Cc: Tejun Heo <t...@kernel.org> Cc: Cristopher Lameter <clame...@sgi.com> Cc: Joonsoo Kim <js1...@gmail.com> Cc: Arkadiusz Miskiewicz <ar...@maven.pl> Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- mm/backing-dev.c | 19 --- mm/vmstat.c | 6 -- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index dd8e2aa..3f54b7d 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -843,8 +843,9 @@ EXPORT_SYMBOL(congestion_wait); * jiffies for either a BDI to exit congestion of the given @sync queue * or a write to complete. * - * In the absence of zone congestion, cond_resched() is called to yield - * the processor if necessary but otherwise does not sleep. + * In the absence of zone congestion, a short sleep or a cond_resched is + * performed to yield the processor and to allow other subsystems to make + * a forward progress. * * The return value is 0 if the sleep is for the full timeout. Otherwise, * it is the number of jiffies that were still remaining when the function @@ -864,7 +865,19 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout) */ if (atomic_read(_bdi_congested[sync]) == 0 || !zone_is_reclaim_congested(zone)) { - cond_resched(); + + /* +* Memory allocation/reclaim might be called from a WQ +* context and the current implementation of the WQ +* concurrency control doesn't recognize that a particular +* WQ is congested if the worker thread is looping without +* ever sleeping. Therefore we have to do a short sleep +* here rather than calling cond_resched(). +*/ + if (current->flags & PF_WQ_WORKER) + schedule_timeout(1); + else + cond_resched(); /* In case we scheduled, work out time remaining */ ret = timeout - (jiffies - start); diff --git a/mm/vmstat.c b/mm/vmstat.c index 7db1b9b..e89c0f6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1139,13 +1139,14 @@ static const struct file_operations proc_vmstat_file_operations = { #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SMP +static struct workqueue_struct *vmstat_wq; static DEFINE_PER_CPU(struct delayed_work, vmstat_work); int sysctl_stat_interval __read_mostly = HZ; static void vmstat_update(struct work_struct *w) { refresh_cpu_vm_stats(smp_processor_id()); - schedule_delayed_work(&__get_cpu_var(vmstat_work), + queue_delayed_work(vmstat_wq, &__get_cpu_var(vmstat_work), round_jiffies_relative(sysctl_stat_interval)); } @@ -1154,7 +1155,7 @@ static void __cpuinit start_cpu_timer(int cpu) struct delayed_work *work = _cpu(vmstat_work, cpu); INIT_DELAYED_WORK_DEFERRABLE(work, vmstat_update); - schedule_delayed_work_on(cpu, w
[PATCH 3.4 070/125] usb: xhci: fix config fail of FS hub behind a HS hub with MTT
From: Chunfeng Yun3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 096b110a3dd3c868e4610937c80d2e3f3357c1a9 upstream. if a full speed hub connects to a high speed hub which supports MTT, the MTT field of its slot context will be set to 1 when xHCI driver setups an xHCI virtual device in xhci_setup_addressable_virt_dev(); once usb core fetch its hub descriptor, and need to update the xHC's internal data structures for the device, the HUB field of its slot context will be set to 1 too, meanwhile MTT is also set before, this will cause configure endpoint command fail, so in the case, we should clear MTT to 0 for full speed hub according to section 6.2.2 Signed-off-by: Chunfeng Yun Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li --- drivers/usb/host/xhci.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 88be7a5..95ac4cf 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -4123,8 +4123,16 @@ int xhci_update_hub_device(struct usb_hcd *hcd, struct usb_device *hdev, ctrl_ctx->add_flags |= cpu_to_le32(SLOT_FLAG); slot_ctx = xhci_get_slot_ctx(xhci, config_cmd->in_ctx); slot_ctx->dev_info |= cpu_to_le32(DEV_HUB); + /* +* refer to section 6.2.2: MTT should be 0 for full speed hub, +* but it may be already set to 1 when setup an xHCI virtual +* device, so clear it anyway. +*/ if (tt->multi) slot_ctx->dev_info |= cpu_to_le32(DEV_MTT); + else if (hdev->speed == USB_SPEED_FULL) + slot_ctx->dev_info &= cpu_to_le32(~DEV_MTT); + if (xhci->hci_version > 0x95) { xhci_dbg(xhci, "xHCI version %x needs hub " "TT think time and number of ports\n", -- 1.9.1
[PATCH 3.4 070/125] usb: xhci: fix config fail of FS hub behind a HS hub with MTT
From: Chunfeng Yun 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 096b110a3dd3c868e4610937c80d2e3f3357c1a9 upstream. if a full speed hub connects to a high speed hub which supports MTT, the MTT field of its slot context will be set to 1 when xHCI driver setups an xHCI virtual device in xhci_setup_addressable_virt_dev(); once usb core fetch its hub descriptor, and need to update the xHC's internal data structures for the device, the HUB field of its slot context will be set to 1 too, meanwhile MTT is also set before, this will cause configure endpoint command fail, so in the case, we should clear MTT to 0 for full speed hub according to section 6.2.2 Signed-off-by: Chunfeng Yun Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li --- drivers/usb/host/xhci.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 88be7a5..95ac4cf 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -4123,8 +4123,16 @@ int xhci_update_hub_device(struct usb_hcd *hcd, struct usb_device *hdev, ctrl_ctx->add_flags |= cpu_to_le32(SLOT_FLAG); slot_ctx = xhci_get_slot_ctx(xhci, config_cmd->in_ctx); slot_ctx->dev_info |= cpu_to_le32(DEV_HUB); + /* +* refer to section 6.2.2: MTT should be 0 for full speed hub, +* but it may be already set to 1 when setup an xHCI virtual +* device, so clear it anyway. +*/ if (tt->multi) slot_ctx->dev_info |= cpu_to_le32(DEV_MTT); + else if (hdev->speed == USB_SPEED_FULL) + slot_ctx->dev_info &= cpu_to_le32(~DEV_MTT); + if (xhci->hci_version > 0x95) { xhci_dbg(xhci, "xHCI version %x needs hub " "TT think time and number of ports\n", -- 1.9.1
[PATCH 3.4 099/125] xen/pciback: Do not install an IRQ handler for MSI interrupts.
From: Konrad Rzeszutek Wilk 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0 upstream. Otherwise an guest can subvert the generic MSI code to trigger an BUG_ON condition during MSI interrupt freeing: for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); Xen PCI backed installs an IRQ handler (request_irq) for the dev->irq whenever the guest writes PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is done in case the device has legacy interrupts the GSI line is shared by the backend devices. To subvert the backend the guest needs to make the backend to change the dev->irq from the GSI to the MSI interrupt line, make the backend allocate an interrupt handler, and then command the backend to free the MSI interrupt and hit the BUG_ON. Since the backend only calls 'request_irq' when the guest writes to the PCI_COMMAND register the guest needs to call XEN_PCI_OP_enable_msi before any other operation. This will cause the generic MSI code to setup an MSI entry and populate dev->irq with the new PIRQ value. Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY and cause the backend to setup an IRQ handler for dev->irq (which instead of the GSI value has the MSI pirq). See 'xen_pcibk_control_isr'. Then the guest disables the MSI: XEN_PCI_OP_disable_msi which ends up triggering the BUG_ON condition in 'free_msi_irqs' as there is an IRQ handler for the entry->irq (dev->irq). Note that this cannot be done using MSI-X as the generic code does not over-write dev->irq with the MSI-X PIRQ values. The patch inhibits setting up the IRQ handler if MSI or MSI-X (for symmetry reasons) code had been called successfully. P.S. Xen PCIBack when it sets up the device for the guest consumption ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device). XSA-120 addendum patch removed that - however when upstreaming said addendum we found that it caused issues with qemu upstream. That has now been fixed in qemu upstream. This is part of XSA-157 Reviewed-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- drivers/xen/xen-pciback/pciback_ops.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index f7ce4de..90bc022 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -69,6 +69,13 @@ static void xen_pcibk_control_isr(struct pci_dev *dev, int reset) enable ? "enable" : "disable"); if (enable) { + /* +* The MSI or MSI-X should not have an IRQ handler. Otherwise +* if the guest terminates we BUG_ON in free_msi_irqs. +*/ + if (dev->msi_enabled || dev->msix_enabled) + goto out; + rc = request_irq(dev_data->irq, xen_pcibk_guest_interrupt, IRQF_SHARED, dev_data->irq_name, dev); -- 1.9.1
[PATCH 3.4 105/125] parisc: Fix syscall restarts
From: Helge Deller 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream. On parisc syscalls which are interrupted by signals sometimes failed to restart and instead returned -ENOSYS which in the worst case lead to userspace crashes. A similiar problem existed on MIPS and was fixed by commit e967ef02 ("MIPS: Fix restart of indirect syscalls"). On parisc the current syscall restart code assumes that all syscall callers load the syscall number in the delay slot of the ble instruction. That's how it is e.g. done in the unistd.h header file: ble 0x100(%sr2, %r0) ldi #syscall_nr, %r20 Because of that assumption the current code never restored %r20 before returning to userspace. This assumption is at least not true for code which uses the glibc syscall() function, which instead uses this syntax: ble 0x100(%sr2, %r0) copy regX, %r20 where regX depend on how the compiler optimizes the code and register usage. This patch fixes this problem by adding code to analyze how the syscall number is loaded in the delay branch and - if needed - copy the syscall number to regX prior returning to userspace for the syscall restart. Signed-off-by: Helge Deller Cc: Mathieu Desnoyers [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- arch/parisc/kernel/signal.c | 67 +++-- 1 file changed, 52 insertions(+), 15 deletions(-) diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c index 12c1ed3..c626855 100644 --- a/arch/parisc/kernel/signal.c +++ b/arch/parisc/kernel/signal.c @@ -468,6 +468,55 @@ handle_signal(unsigned long sig, siginfo_t *info, struct k_sigaction *ka, return 1; } +/* + * Check how the syscall number gets loaded into %r20 within + * the delay branch in userspace and adjust as needed. + */ + +static void check_syscallno_in_delay_branch(struct pt_regs *regs) +{ + u32 opcode, source_reg; + u32 __user *uaddr; + int err; + + /* Usually we don't have to restore %r20 (the system call number) +* because it gets loaded in the delay slot of the branch external +* instruction via the ldi instruction. +* In some cases a register-to-register copy instruction might have +* been used instead, in which case we need to copy the syscall +* number into the source register before returning to userspace. +*/ + + /* A syscall is just a branch, so all we have to do is fiddle the +* return pointer so that the ble instruction gets executed again. +*/ + regs->gr[31] -= 8; /* delayed branching */ + + /* Get assembler opcode of code in delay branch */ + uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4); + err = get_user(opcode, uaddr); + if (err) + return; + + /* Check if delay branch uses "ldi int,%r20" */ + if ((opcode & 0x) == 0x3414) + return; /* everything ok, just return */ + + /* Check if delay branch uses "nop" */ + if (opcode == INSN_NOP) + return; + + /* Check if delay branch uses "copy %rX,%r20" */ + if ((opcode & 0xffe0) == 0x08000254) { + source_reg = (opcode >> 16) & 31; + regs->gr[source_reg] = regs->gr[20]; + return; + } + + pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n", + current->comm, task_pid_nr(current), opcode); +} + static inline void syscall_restart(struct pt_regs *regs, struct k_sigaction *ka) { @@ -489,10 +538,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction *ka) } /* fallthrough */ case -ERESTARTNOINTR: - /* A syscall is just a branch, so all -* we have to do is fiddle the return pointer. -*/ - regs->gr[31] -= 8; /* delayed branching */ + check_syscallno_in_delay_branch(regs); /* Preserve original r28. */ regs->gr[28] = regs->orig_r28; break; @@ -543,18 +589,9 @@ insert_restart_trampoline(struct pt_regs *regs) } case -ERESTARTNOHAND: case -ERESTARTSYS: - case -ERESTARTNOINTR: { - /* Hooray for delayed branching. We don't -* have to restore %r20 (the system call -* number) because it gets loaded in the delay -* slot of the branch external instruction. -*/ - regs->gr[31] -= 8; - /* Preserve original r28. */ - regs->gr[28] = regs->orig_r28; - + case -ERESTARTNOINTR: + check_syscallno_in_delay_branch(regs); return; - } default: break; } -- 1.9.1
[PATCH 3.4 081/125] mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
From: Michal Hocko 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 373ccbe5927034b55bdc80b0f8b54d6e13fe8d12 upstream. Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99763e ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko Reported-by: Tetsuo Handa Cc: Tejun Heo Cc: Cristopher Lameter Cc: Joonsoo Kim Cc: Arkadiusz Miskiewicz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- mm/backing-dev.c | 19 --- mm/vmstat.c | 6 -- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index dd8e2aa..3f54b7d 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -843,8 +843,9 @@ EXPORT_SYMBOL(congestion_wait); * jiffies for either a BDI to exit congestion of the given @sync queue * or a write to complete. * - * In the absence of zone congestion, cond_resched() is called to yield - * the processor if necessary but otherwise does not sleep. + * In the absence of zone congestion, a short sleep or a cond_resched is + * performed to yield the processor and to allow other subsystems to make + * a forward progress. * * The return value is 0 if the sleep is for the full timeout. Otherwise, * it is the number of jiffies that were still remaining when the function @@ -864,7 +865,19 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout) */ if (atomic_read(_bdi_congested[sync]) == 0 || !zone_is_reclaim_congested(zone)) { - cond_resched(); + + /* +* Memory allocation/reclaim might be called from a WQ +* context and the current implementation of the WQ +* concurrency control doesn't recognize that a particular +* WQ is congested if the worker thread is looping without +* ever sleeping. Therefore we have to do a short sleep +* here rather than calling cond_resched(). +*/ + if (current->flags & PF_WQ_WORKER) + schedule_timeout(1); + else + cond_resched(); /* In case we scheduled, work out time remaining */ ret = timeout - (jiffies - start); diff --git a/mm/vmstat.c b/mm/vmstat.c index 7db1b9b..e89c0f6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1139,13 +1139,14 @@ static const struct file_operations proc_vmstat_file_operations = { #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SMP +static struct workqueue_struct *vmstat_wq; static DEFINE_PER_CPU(struct delayed_work, vmstat_work); int sysctl_stat_interval __read_mostly = HZ; static void vmstat_update(struct work_struct *w) { refresh_cpu_vm_stats(smp_processor_id()); - schedule_delayed_work(&__get_cpu_var(vmstat_work), + queue_delayed_work(vmstat_wq, &__get_cpu_var(vmstat_work), round_jiffies_relative(sysctl_stat_interval)); } @@ -1154,7 +1155,7 @@ static void __cpuinit start_cpu_timer(int cpu) struct delayed_work *work = _cpu(vmstat_work, cpu); INIT_DELAYED_WORK_DEFERRABLE(work, vmstat_update); - schedule_delayed_work_on(cpu, work, __round_jiffies_relative(HZ, cpu)); + queue_delayed_work_on(cpu, vmstat_wq, work, __round_jiffies_relative(HZ, cpu)); } /* @@ -1204,6 +1205,7 @@ static int __init setup_vmstat(void) register_cpu_notifier(_notifier); + vmstat_wq = alloc_workque
[PATCH 3.4 077/125] ses: Fix problems with simple enclosures
From: James Bottomley3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 3417c1b5cb1fdc10261dbed42b05cc93166a78fd upstream. Simple enclosure implementations (mostly USB) are allowed to return only page 8 to every diagnostic query. That really confuses our implementation because we assume the return is the page we asked for and end up doing incorrect offsets based on bogus information leading to accesses outside of allocated ranges. Fix that by checking the page code of the return and giving an error if it isn't the one we asked for. This should fix reported bugs with USB storage by simply refusing to attach to enclosures that behave like this. It's also good defensive practise now that we're starting to see more USB enclosures. Reported-by: Andrea Gelmini Reviewed-by: Ewan D. Milne Reviewed-by: Tomas Henzl Signed-off-by: James Bottomley Signed-off-by: Zefan Li --- drivers/scsi/ses.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index eba183c..b3051fe 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -70,6 +70,7 @@ static int ses_probe(struct device *dev) static int ses_recv_diag(struct scsi_device *sdev, int page_code, void *buf, int bufflen) { + int ret; unsigned char cmd[] = { RECEIVE_DIAGNOSTIC, 1, /* Set PCV bit */ @@ -78,9 +79,26 @@ static int ses_recv_diag(struct scsi_device *sdev, int page_code, bufflen & 0xff, 0 }; + unsigned char recv_page_code; - return scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, + ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, NULL, SES_TIMEOUT, SES_RETRIES, NULL); + if (unlikely(!ret)) + return ret; + + recv_page_code = ((unsigned char *)buf)[0]; + + if (likely(recv_page_code == page_code)) + return ret; + + /* successful diagnostic but wrong page code. This happens to some +* USB devices, just print a message and pretend there was an error */ + + sdev_printk(KERN_ERR, sdev, + "Wrong diagnostic page; asked for %d got %u\n", + page_code, recv_page_code); + + return -EINVAL; } static int ses_send_diag(struct scsi_device *sdev, int page_code, -- 1.9.1
[PATCH 3.4 109/125] ftrace/scripts: Fix incorrect use of sprintf in recordmcount
From: Colin Ian King3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 713a3e4de707fab49d5aa4bceb77db1058572a7b upstream. Fix build warning: scripts/recordmcount.c:589:4: warning: format not a string literal and no format arguments [-Wformat-security] sprintf("%s: failed\n", file); Fixes: a50bd43935586 ("ftrace/scripts: Have recordmcount copy the object file") Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.k...@canonical.com Cc: Li Bin Cc: Russell King Cc: Will Deacon Signed-off-by: Colin Ian King Signed-off-by: Steven Rostedt Signed-off-by: Zefan Li --- scripts/recordmcount.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index 0970379..0d5ae4a 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -546,7 +546,7 @@ main(int argc, char *argv[]) do_file(file); break; case SJ_FAIL:/* error in do_file or below */ - sprintf("%s: failed\n", file); + fprintf(stderr, "%s: failed\n", file); ++n_error; break; case SJ_SUCCEED:/* premature success */ -- 1.9.1
[PATCH 3.4 106/125] ipv6/addrlabel: fix ip6addrlbl_get()
From: Andrey Ryabinin3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit e459dfeeb64008b2d23bdf600f03b3605dbb8152 upstream. ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded, ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed, ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer. Fix this by inverting ip6addrlbl_hold() check. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: Andrey Ryabinin Reviewed-by: Cong Wang Acked-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/addrlabel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c index 2d8ddba..c8c6a12 100644 --- a/net/ipv6/addrlabel.c +++ b/net/ipv6/addrlabel.c @@ -558,7 +558,7 @@ static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr* nlh, rcu_read_lock(); p = __ipv6_addr_label(net, addr, ipv6_addr_type(addr), ifal->ifal_index); - if (p && ip6addrlbl_hold(p)) + if (p && !ip6addrlbl_hold(p)) p = NULL; lseq = ip6addrlbl_table.seq; rcu_read_unlock(); -- 1.9.1
[PATCH 3.4 110/125] net: possible use after free in dst_release
From: Francesco Ruggeri <frugg...@aristanetworks.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 07a5d38453599052aff0877b16bb9c1585f08609 upstream. dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before dst_release gets a chance to access dst->flags. Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()") Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Signed-off-by: Francesco Ruggeri <frugg...@arista.com> Acked-by: Eric Dumazet <eduma...@google.com> Signed-off-by: David S. Miller <da...@davemloft.net> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- net/core/dst.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dst.c b/net/core/dst.c index 54ba1eb..48cff89 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -269,10 +269,11 @@ void dst_release(struct dst_entry *dst) { if (dst) { int newrefcnt; + unsigned short nocache = dst->flags & DST_NOCACHE; newrefcnt = atomic_dec_return(>__refcnt); WARN_ON(newrefcnt < 0); - if (!newrefcnt && unlikely(dst->flags & DST_NOCACHE)) { + if (!newrefcnt && unlikely(nocache)) { dst = dst_destroy(dst); if (dst) __dst_free(dst); -- 1.9.1
[PATCH 3.4 073/125] 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping
From: Al Viro <v...@zeniv.linux.org.uk> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3 upstream. For block devices the pagecache is associated with the inode on bdevfs, not with the aliasing ones on the mountable filesystems. The latter have its own ->i_data empty and ->i_mapping pointing to the (unique per major/minor) bdevfs inode. That guarantees cache coherence between all block device inodes with the same device number. Eviction of an alias inode has no business trying to evict the pages belonging to bdevfs one; moreover, ->i_mapping is only safe to access when the thing is opened. At the time of ->evict_inode() the victim is definitely *not* opened. We are about to kill the address space embedded into struct inode (inode->i_data) and that's what we need to empty of any pages. 9p instance tries to empty inode->i_mapping instead, which is both unsafe and bogus - if we have several device nodes with the same device number in different places, closing one of them should not try to empty the (shared) page cache. Fortunately, other instances in the tree are OK; they are evicting from >i_data instead, as 9p one should. Reported-by: "Suzuki K. Poulose" <suzuki.poul...@arm.com> Tested-by: "Suzuki K. Poulose" <suzuki.poul...@arm.com> Signed-off-by: Al Viro <v...@zeniv.linux.org.uk> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- fs/9p/vfs_inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index c9b32dc..116e43f 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -447,9 +447,9 @@ void v9fs_evict_inode(struct inode *inode) { struct v9fs_inode *v9inode = V9FS_I(inode); - truncate_inode_pages(inode->i_mapping, 0); + truncate_inode_pages(>i_data, 0); end_writeback(inode); - filemap_fdatawrite(inode->i_mapping); + filemap_fdatawrite(>i_data); #ifdef CONFIG_9P_FSCACHE v9fs_cache_inode_put_cookie(inode); -- 1.9.1
[PATCH 3.4 079/125] ses: fix additional element traversal bug
From: James Bottomley3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 5e1033561da1152c57b97ee84371dba2b3d64c25 upstream. KASAN found that our additional element processing scripts drop off the end of the VPD page into unallocated space. The reason is that not every element has additional information but our traversal routines think they do, leading to them expecting far more additional information than is present. Fix this by adding a gate to the traversal routine so that it only processes elements that are expected to have additional information (list is in SES-2 section 6.1.13.1: Additional Element Status diagnostic page overview) Reported-by: Pavel Tikhomirov Tested-by: Pavel Tikhomirov Signed-off-by: James Bottomley Signed-off-by: Zefan Li --- drivers/scsi/ses.c| 10 +- include/linux/enclosure.h | 4 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index b3051fe..3643bbf 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -454,7 +454,15 @@ static void ses_enclosure_data_process(struct enclosure_device *edev, if (desc_ptr) desc_ptr += len; - if (addl_desc_ptr) + if (addl_desc_ptr && + /* only find additional descriptions for specific devices */ + (type_ptr[0] == ENCLOSURE_COMPONENT_DEVICE || +type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE || +type_ptr[0] == ENCLOSURE_COMPONENT_SAS_EXPANDER || +/* these elements are optional */ +type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_TARGET_PORT || +type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT || +type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) addl_desc_ptr += addl_desc_ptr[1] + 2; } diff --git a/include/linux/enclosure.h b/include/linux/enclosure.h index 9a33c5f..f6c229e 100644 --- a/include/linux/enclosure.h +++ b/include/linux/enclosure.h @@ -29,7 +29,11 @@ /* A few generic types ... taken from ses-2 */ enum enclosure_component_type { ENCLOSURE_COMPONENT_DEVICE = 0x01, + ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS = 0x07, + ENCLOSURE_COMPONENT_SCSI_TARGET_PORT = 0x14, + ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT = 0x15, ENCLOSURE_COMPONENT_ARRAY_DEVICE = 0x17, + ENCLOSURE_COMPONENT_SAS_EXPANDER = 0x18, }; /* ses-2 common element status */ -- 1.9.1
[PATCH 3.4 107/125] ocfs2: fix BUG when calculate new backup super
From: Joseph Qi <joseph...@huawei.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33 upstream. When resizing, it firstly extends the last gd. Once it should backup super in the gd, it calculates new backup super and update the corresponding value. But it currently doesn't consider the situation that the backup super is already done. And in this case, it still sets the bit in gd bitmap and then decrease from bg_free_bits_count, which leads to a corrupted gd and trigger the BUG in ocfs2_block_group_set_bits: BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits); So check whether the backup super is done and then do the updates. Signed-off-by: Joseph Qi <joseph...@huawei.com> Reviewed-by: Jiufei Xue <xuejiu...@huawei.com> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com> Cc: Mark Fasheh <mfas...@suse.de> Cc: Joel Becker <jl...@evilplan.org> Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- fs/ocfs2/resize.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c index ec55add..2ca64fa 100644 --- a/fs/ocfs2/resize.c +++ b/fs/ocfs2/resize.c @@ -56,11 +56,12 @@ static u16 ocfs2_calc_new_backup_super(struct inode *inode, int new_clusters, u32 first_new_cluster, u16 cl_cpg, + u16 old_bg_clusters, int set) { int i; u16 backups = 0; - u32 cluster; + u32 cluster, lgd_cluster; u64 blkno, gd_blkno, lgd_blkno = le64_to_cpu(gd->bg_blkno); for (i = 0; i < OCFS2_MAX_BACKUP_SUPERBLOCKS; i++) { @@ -73,6 +74,12 @@ static u16 ocfs2_calc_new_backup_super(struct inode *inode, else if (gd_blkno > lgd_blkno) break; + /* check if already done backup super */ + lgd_cluster = ocfs2_blocks_to_clusters(inode->i_sb, lgd_blkno); + lgd_cluster += old_bg_clusters; + if (lgd_cluster >= cluster) + continue; + if (set) ocfs2_set_bit(cluster % cl_cpg, (unsigned long *)gd->bg_bitmap); @@ -101,6 +108,7 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle, u16 chain, num_bits, backups = 0; u16 cl_bpc = le16_to_cpu(cl->cl_bpc); u16 cl_cpg = le16_to_cpu(cl->cl_cpg); + u16 old_bg_clusters; trace_ocfs2_update_last_group_and_inode(new_clusters, first_new_cluster); @@ -114,6 +122,7 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle, group = (struct ocfs2_group_desc *)group_bh->b_data; + old_bg_clusters = le16_to_cpu(group->bg_bits) / cl_bpc; /* update the group first. */ num_bits = new_clusters * cl_bpc; le16_add_cpu(>bg_bits, num_bits); @@ -129,7 +138,7 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle, group, new_clusters, first_new_cluster, -cl_cpg, 1); +cl_cpg, old_bg_clusters, 1); le16_add_cpu(>bg_free_bits_count, -1 * backups); } @@ -169,7 +178,7 @@ out_rollback: group, new_clusters, first_new_cluster, - cl_cpg, 0); + cl_cpg, old_bg_clusters, 0); le16_add_cpu(>bg_free_bits_count, backups); le16_add_cpu(>bg_bits, -1 * num_bits); le16_add_cpu(>bg_free_bits_count, -1 * num_bits); -- 1.9.1
[PATCH 3.4 074/125] crypto: skcipher - Copy iv from desc even for 0-len walks
From: "Jason A. Donenfeld" <ja...@zx2c4.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 70d906bc17500edfa9bdd8c8b7e59618c7911613 upstream. Some ciphers actually support encrypting zero length plaintexts. For example, many AEAD modes support this. The resulting ciphertext for those winds up being only the authentication tag, which is a result of the key, the iv, the additional data, and the fact that the plaintext had zero length. The blkcipher constructors won't copy the IV to the right place, however, when using a zero length input, resulting in some significant problems when ciphers call their initialization routines, only to find that the ->iv parameter is uninitialized. One such example of this would be using chacha20poly1305 with a zero length input, which then calls chacha20, which calls the key setup routine, which eventually OOPSes due to the uninitialized ->iv member. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- crypto/ablkcipher.c | 2 +- crypto/blkcipher.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c index 4a9c499..1ef1428 100644 --- a/crypto/ablkcipher.c +++ b/crypto/ablkcipher.c @@ -280,12 +280,12 @@ static int ablkcipher_walk_first(struct ablkcipher_request *req, if (WARN_ON_ONCE(in_irq())) return -EDEADLK; + walk->iv = req->info; walk->nbytes = walk->total; if (unlikely(!walk->total)) return 0; walk->iv_buffer = NULL; - walk->iv = req->info; if (unlikely(((unsigned long)walk->iv & alignmask))) { int err = ablkcipher_copy_iv(walk, tfm, alignmask); if (err) diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index 0a1ebea..34e5d65 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -329,12 +329,12 @@ static int blkcipher_walk_first(struct blkcipher_desc *desc, if (WARN_ON_ONCE(in_irq())) return -EDEADLK; + walk->iv = desc->info; walk->nbytes = walk->total; if (unlikely(!walk->total)) return 0; walk->buffer = NULL; - walk->iv = desc->info; if (unlikely(((unsigned long)walk->iv & alignmask))) { int err = blkcipher_copy_iv(walk, tfm, alignmask); if (err) -- 1.9.1
[PATCH 3.4 110/125] net: possible use after free in dst_release
From: Francesco Ruggeri 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 07a5d38453599052aff0877b16bb9c1585f08609 upstream. dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before dst_release gets a chance to access dst->flags. Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()") Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Signed-off-by: Francesco Ruggeri Acked-by: Eric Dumazet Signed-off-by: David S. Miller [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- net/core/dst.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dst.c b/net/core/dst.c index 54ba1eb..48cff89 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -269,10 +269,11 @@ void dst_release(struct dst_entry *dst) { if (dst) { int newrefcnt; + unsigned short nocache = dst->flags & DST_NOCACHE; newrefcnt = atomic_dec_return(>__refcnt); WARN_ON(newrefcnt < 0); - if (!newrefcnt && unlikely(dst->flags & DST_NOCACHE)) { + if (!newrefcnt && unlikely(nocache)) { dst = dst_destroy(dst); if (dst) __dst_free(dst); -- 1.9.1
[PATCH 3.4 073/125] 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping
From: Al Viro 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3 upstream. For block devices the pagecache is associated with the inode on bdevfs, not with the aliasing ones on the mountable filesystems. The latter have its own ->i_data empty and ->i_mapping pointing to the (unique per major/minor) bdevfs inode. That guarantees cache coherence between all block device inodes with the same device number. Eviction of an alias inode has no business trying to evict the pages belonging to bdevfs one; moreover, ->i_mapping is only safe to access when the thing is opened. At the time of ->evict_inode() the victim is definitely *not* opened. We are about to kill the address space embedded into struct inode (inode->i_data) and that's what we need to empty of any pages. 9p instance tries to empty inode->i_mapping instead, which is both unsafe and bogus - if we have several device nodes with the same device number in different places, closing one of them should not try to empty the (shared) page cache. Fortunately, other instances in the tree are OK; they are evicting from >i_data instead, as 9p one should. Reported-by: "Suzuki K. Poulose" Tested-by: "Suzuki K. Poulose" Signed-off-by: Al Viro [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- fs/9p/vfs_inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index c9b32dc..116e43f 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -447,9 +447,9 @@ void v9fs_evict_inode(struct inode *inode) { struct v9fs_inode *v9inode = V9FS_I(inode); - truncate_inode_pages(inode->i_mapping, 0); + truncate_inode_pages(>i_data, 0); end_writeback(inode); - filemap_fdatawrite(inode->i_mapping); + filemap_fdatawrite(>i_data); #ifdef CONFIG_9P_FSCACHE v9fs_cache_inode_put_cookie(inode); -- 1.9.1
[PATCH 3.4 079/125] ses: fix additional element traversal bug
From: James Bottomley 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 5e1033561da1152c57b97ee84371dba2b3d64c25 upstream. KASAN found that our additional element processing scripts drop off the end of the VPD page into unallocated space. The reason is that not every element has additional information but our traversal routines think they do, leading to them expecting far more additional information than is present. Fix this by adding a gate to the traversal routine so that it only processes elements that are expected to have additional information (list is in SES-2 section 6.1.13.1: Additional Element Status diagnostic page overview) Reported-by: Pavel Tikhomirov Tested-by: Pavel Tikhomirov Signed-off-by: James Bottomley Signed-off-by: Zefan Li --- drivers/scsi/ses.c| 10 +- include/linux/enclosure.h | 4 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index b3051fe..3643bbf 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -454,7 +454,15 @@ static void ses_enclosure_data_process(struct enclosure_device *edev, if (desc_ptr) desc_ptr += len; - if (addl_desc_ptr) + if (addl_desc_ptr && + /* only find additional descriptions for specific devices */ + (type_ptr[0] == ENCLOSURE_COMPONENT_DEVICE || +type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE || +type_ptr[0] == ENCLOSURE_COMPONENT_SAS_EXPANDER || +/* these elements are optional */ +type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_TARGET_PORT || +type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT || +type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) addl_desc_ptr += addl_desc_ptr[1] + 2; } diff --git a/include/linux/enclosure.h b/include/linux/enclosure.h index 9a33c5f..f6c229e 100644 --- a/include/linux/enclosure.h +++ b/include/linux/enclosure.h @@ -29,7 +29,11 @@ /* A few generic types ... taken from ses-2 */ enum enclosure_component_type { ENCLOSURE_COMPONENT_DEVICE = 0x01, + ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS = 0x07, + ENCLOSURE_COMPONENT_SCSI_TARGET_PORT = 0x14, + ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT = 0x15, ENCLOSURE_COMPONENT_ARRAY_DEVICE = 0x17, + ENCLOSURE_COMPONENT_SAS_EXPANDER = 0x18, }; /* ses-2 common element status */ -- 1.9.1
[PATCH 3.4 107/125] ocfs2: fix BUG when calculate new backup super
From: Joseph Qi 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33 upstream. When resizing, it firstly extends the last gd. Once it should backup super in the gd, it calculates new backup super and update the corresponding value. But it currently doesn't consider the situation that the backup super is already done. And in this case, it still sets the bit in gd bitmap and then decrease from bg_free_bits_count, which leads to a corrupted gd and trigger the BUG in ocfs2_block_group_set_bits: BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits); So check whether the backup super is done and then do the updates. Signed-off-by: Joseph Qi Reviewed-by: Jiufei Xue Reviewed-by: Yiwen Jiang Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- fs/ocfs2/resize.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c index ec55add..2ca64fa 100644 --- a/fs/ocfs2/resize.c +++ b/fs/ocfs2/resize.c @@ -56,11 +56,12 @@ static u16 ocfs2_calc_new_backup_super(struct inode *inode, int new_clusters, u32 first_new_cluster, u16 cl_cpg, + u16 old_bg_clusters, int set) { int i; u16 backups = 0; - u32 cluster; + u32 cluster, lgd_cluster; u64 blkno, gd_blkno, lgd_blkno = le64_to_cpu(gd->bg_blkno); for (i = 0; i < OCFS2_MAX_BACKUP_SUPERBLOCKS; i++) { @@ -73,6 +74,12 @@ static u16 ocfs2_calc_new_backup_super(struct inode *inode, else if (gd_blkno > lgd_blkno) break; + /* check if already done backup super */ + lgd_cluster = ocfs2_blocks_to_clusters(inode->i_sb, lgd_blkno); + lgd_cluster += old_bg_clusters; + if (lgd_cluster >= cluster) + continue; + if (set) ocfs2_set_bit(cluster % cl_cpg, (unsigned long *)gd->bg_bitmap); @@ -101,6 +108,7 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle, u16 chain, num_bits, backups = 0; u16 cl_bpc = le16_to_cpu(cl->cl_bpc); u16 cl_cpg = le16_to_cpu(cl->cl_cpg); + u16 old_bg_clusters; trace_ocfs2_update_last_group_and_inode(new_clusters, first_new_cluster); @@ -114,6 +122,7 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle, group = (struct ocfs2_group_desc *)group_bh->b_data; + old_bg_clusters = le16_to_cpu(group->bg_bits) / cl_bpc; /* update the group first. */ num_bits = new_clusters * cl_bpc; le16_add_cpu(>bg_bits, num_bits); @@ -129,7 +138,7 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle, group, new_clusters, first_new_cluster, -cl_cpg, 1); +cl_cpg, old_bg_clusters, 1); le16_add_cpu(>bg_free_bits_count, -1 * backups); } @@ -169,7 +178,7 @@ out_rollback: group, new_clusters, first_new_cluster, - cl_cpg, 0); + cl_cpg, old_bg_clusters, 0); le16_add_cpu(>bg_free_bits_count, backups); le16_add_cpu(>bg_bits, -1 * num_bits); le16_add_cpu(>bg_free_bits_count, -1 * num_bits); -- 1.9.1
[PATCH 3.4 077/125] ses: Fix problems with simple enclosures
From: James Bottomley 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 3417c1b5cb1fdc10261dbed42b05cc93166a78fd upstream. Simple enclosure implementations (mostly USB) are allowed to return only page 8 to every diagnostic query. That really confuses our implementation because we assume the return is the page we asked for and end up doing incorrect offsets based on bogus information leading to accesses outside of allocated ranges. Fix that by checking the page code of the return and giving an error if it isn't the one we asked for. This should fix reported bugs with USB storage by simply refusing to attach to enclosures that behave like this. It's also good defensive practise now that we're starting to see more USB enclosures. Reported-by: Andrea Gelmini Reviewed-by: Ewan D. Milne Reviewed-by: Tomas Henzl Signed-off-by: James Bottomley Signed-off-by: Zefan Li --- drivers/scsi/ses.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index eba183c..b3051fe 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -70,6 +70,7 @@ static int ses_probe(struct device *dev) static int ses_recv_diag(struct scsi_device *sdev, int page_code, void *buf, int bufflen) { + int ret; unsigned char cmd[] = { RECEIVE_DIAGNOSTIC, 1, /* Set PCV bit */ @@ -78,9 +79,26 @@ static int ses_recv_diag(struct scsi_device *sdev, int page_code, bufflen & 0xff, 0 }; + unsigned char recv_page_code; - return scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, + ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, NULL, SES_TIMEOUT, SES_RETRIES, NULL); + if (unlikely(!ret)) + return ret; + + recv_page_code = ((unsigned char *)buf)[0]; + + if (likely(recv_page_code == page_code)) + return ret; + + /* successful diagnostic but wrong page code. This happens to some +* USB devices, just print a message and pretend there was an error */ + + sdev_printk(KERN_ERR, sdev, + "Wrong diagnostic page; asked for %d got %u\n", + page_code, recv_page_code); + + return -EINVAL; } static int ses_send_diag(struct scsi_device *sdev, int page_code, -- 1.9.1
[PATCH 3.4 109/125] ftrace/scripts: Fix incorrect use of sprintf in recordmcount
From: Colin Ian King 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 713a3e4de707fab49d5aa4bceb77db1058572a7b upstream. Fix build warning: scripts/recordmcount.c:589:4: warning: format not a string literal and no format arguments [-Wformat-security] sprintf("%s: failed\n", file); Fixes: a50bd43935586 ("ftrace/scripts: Have recordmcount copy the object file") Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.k...@canonical.com Cc: Li Bin Cc: Russell King Cc: Will Deacon Signed-off-by: Colin Ian King Signed-off-by: Steven Rostedt Signed-off-by: Zefan Li --- scripts/recordmcount.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index 0970379..0d5ae4a 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -546,7 +546,7 @@ main(int argc, char *argv[]) do_file(file); break; case SJ_FAIL:/* error in do_file or below */ - sprintf("%s: failed\n", file); + fprintf(stderr, "%s: failed\n", file); ++n_error; break; case SJ_SUCCEED:/* premature success */ -- 1.9.1
[PATCH 3.4 106/125] ipv6/addrlabel: fix ip6addrlbl_get()
From: Andrey Ryabinin 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit e459dfeeb64008b2d23bdf600f03b3605dbb8152 upstream. ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded, ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed, ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer. Fix this by inverting ip6addrlbl_hold() check. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: Andrey Ryabinin Reviewed-by: Cong Wang Acked-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/addrlabel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c index 2d8ddba..c8c6a12 100644 --- a/net/ipv6/addrlabel.c +++ b/net/ipv6/addrlabel.c @@ -558,7 +558,7 @@ static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr* nlh, rcu_read_lock(); p = __ipv6_addr_label(net, addr, ipv6_addr_type(addr), ifal->ifal_index); - if (p && ip6addrlbl_hold(p)) + if (p && !ip6addrlbl_hold(p)) p = NULL; lseq = ip6addrlbl_table.seq; rcu_read_unlock(); -- 1.9.1
[PATCH 3.4 074/125] crypto: skcipher - Copy iv from desc even for 0-len walks
From: "Jason A. Donenfeld" 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 70d906bc17500edfa9bdd8c8b7e59618c7911613 upstream. Some ciphers actually support encrypting zero length plaintexts. For example, many AEAD modes support this. The resulting ciphertext for those winds up being only the authentication tag, which is a result of the key, the iv, the additional data, and the fact that the plaintext had zero length. The blkcipher constructors won't copy the IV to the right place, however, when using a zero length input, resulting in some significant problems when ciphers call their initialization routines, only to find that the ->iv parameter is uninitialized. One such example of this would be using chacha20poly1305 with a zero length input, which then calls chacha20, which calls the key setup routine, which eventually OOPSes due to the uninitialized ->iv member. Signed-off-by: Jason A. Donenfeld Signed-off-by: Herbert Xu [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- crypto/ablkcipher.c | 2 +- crypto/blkcipher.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c index 4a9c499..1ef1428 100644 --- a/crypto/ablkcipher.c +++ b/crypto/ablkcipher.c @@ -280,12 +280,12 @@ static int ablkcipher_walk_first(struct ablkcipher_request *req, if (WARN_ON_ONCE(in_irq())) return -EDEADLK; + walk->iv = req->info; walk->nbytes = walk->total; if (unlikely(!walk->total)) return 0; walk->iv_buffer = NULL; - walk->iv = req->info; if (unlikely(((unsigned long)walk->iv & alignmask))) { int err = ablkcipher_copy_iv(walk, tfm, alignmask); if (err) diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index 0a1ebea..34e5d65 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -329,12 +329,12 @@ static int blkcipher_walk_first(struct blkcipher_desc *desc, if (WARN_ON_ONCE(in_irq())) return -EDEADLK; + walk->iv = desc->info; walk->nbytes = walk->total; if (unlikely(!walk->total)) return 0; walk->buffer = NULL; - walk->iv = desc->info; if (unlikely(((unsigned long)walk->iv & alignmask))) { int err = blkcipher_copy_iv(walk, tfm, alignmask); if (err) -- 1.9.1
[PATCH 3.4 104/125] KEYS: Fix race between read and revoke
From: David Howells3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream. This fixes CVE-2015-7550. There's a race between keyctl_read() and keyctl_revoke(). If the revoke happens between keyctl_read() checking the validity of a key and the key's semaphore being taken, then the key type read method will see a revoked key. This causes a problem for the user-defined key type because it assumes in its read method that there will always be a payload in a non-revoked key and doesn't check for a NULL pointer. Fix this by making keyctl_read() check the validity of a key after taking semaphore instead of before. I think the bug was introduced with the original keyrings code. This was discovered by a multithreaded test program generated by syzkaller (http://github.com/google/syzkaller). Here's a cleaned up version: #include #include #include void *thr0(void *arg) { key_serial_t key = (unsigned long)arg; keyctl_revoke(key); return 0; } void *thr1(void *arg) { key_serial_t key = (unsigned long)arg; char buffer[16]; keyctl_read(key, buffer, 16); return 0; } int main() { key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING); pthread_t th[5]; pthread_create([0], 0, thr0, (void *)(unsigned long)key); pthread_create([1], 0, thr1, (void *)(unsigned long)key); pthread_create([2], 0, thr0, (void *)(unsigned long)key); pthread_create([3], 0, thr1, (void *)(unsigned long)key); pthread_join(th[0], 0); pthread_join(th[1], 0); pthread_join(th[2], 0); pthread_join(th[3], 0); return 0; } Build as: cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread Run as: while keyctl-race; do :; done as it may need several iterations to crash the kernel. The crash can be summarised as: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [] user_read+0x56/0xa3 ... Call Trace: [] keyctl_read_key+0xb6/0xd7 [] SyS_keyctl+0x83/0xe0 [] entry_SYSCALL_64_fastpath+0x12/0x6f Reported-by: Dmitry Vyukov Signed-off-by: David Howells Tested-by: Dmitry Vyukov Signed-off-by: James Morris Signed-off-by: Zefan Li --- security/keys/keyctl.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index dfc8c22..0ba68b1 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -702,16 +702,16 @@ long keyctl_read_key(key_serial_t keyid, char __user *buffer, size_t buflen) /* the key is probably readable - now try to read it */ can_read_key: - ret = key_validate(key); - if (ret == 0) { - ret = -EOPNOTSUPP; - if (key->type->read) { - /* read the data with the semaphore held (since we -* might sleep) */ - down_read(>sem); + ret = -EOPNOTSUPP; + if (key->type->read) { + /* Read the data with the semaphore held (since we might sleep) +* to protect against the key being updated or revoked. +*/ + down_read(>sem); + ret = key_validate(key); + if (ret == 0) ret = key->type->read(key, buffer, buflen); - up_read(>sem); - } + up_read(>sem); } error2: -- 1.9.1
[PATCH 3.4 104/125] KEYS: Fix race between read and revoke
From: David Howells 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream. This fixes CVE-2015-7550. There's a race between keyctl_read() and keyctl_revoke(). If the revoke happens between keyctl_read() checking the validity of a key and the key's semaphore being taken, then the key type read method will see a revoked key. This causes a problem for the user-defined key type because it assumes in its read method that there will always be a payload in a non-revoked key and doesn't check for a NULL pointer. Fix this by making keyctl_read() check the validity of a key after taking semaphore instead of before. I think the bug was introduced with the original keyrings code. This was discovered by a multithreaded test program generated by syzkaller (http://github.com/google/syzkaller). Here's a cleaned up version: #include #include #include void *thr0(void *arg) { key_serial_t key = (unsigned long)arg; keyctl_revoke(key); return 0; } void *thr1(void *arg) { key_serial_t key = (unsigned long)arg; char buffer[16]; keyctl_read(key, buffer, 16); return 0; } int main() { key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING); pthread_t th[5]; pthread_create([0], 0, thr0, (void *)(unsigned long)key); pthread_create([1], 0, thr1, (void *)(unsigned long)key); pthread_create([2], 0, thr0, (void *)(unsigned long)key); pthread_create([3], 0, thr1, (void *)(unsigned long)key); pthread_join(th[0], 0); pthread_join(th[1], 0); pthread_join(th[2], 0); pthread_join(th[3], 0); return 0; } Build as: cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread Run as: while keyctl-race; do :; done as it may need several iterations to crash the kernel. The crash can be summarised as: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [] user_read+0x56/0xa3 ... Call Trace: [] keyctl_read_key+0xb6/0xd7 [] SyS_keyctl+0x83/0xe0 [] entry_SYSCALL_64_fastpath+0x12/0x6f Reported-by: Dmitry Vyukov Signed-off-by: David Howells Tested-by: Dmitry Vyukov Signed-off-by: James Morris Signed-off-by: Zefan Li --- security/keys/keyctl.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index dfc8c22..0ba68b1 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -702,16 +702,16 @@ long keyctl_read_key(key_serial_t keyid, char __user *buffer, size_t buflen) /* the key is probably readable - now try to read it */ can_read_key: - ret = key_validate(key); - if (ret == 0) { - ret = -EOPNOTSUPP; - if (key->type->read) { - /* read the data with the semaphore held (since we -* might sleep) */ - down_read(>sem); + ret = -EOPNOTSUPP; + if (key->type->read) { + /* Read the data with the semaphore held (since we might sleep) +* to protect against the key being updated or revoked. +*/ + down_read(>sem); + ret = key_validate(key); + if (ret == 0) ret = key->type->read(key, buffer, buflen); - up_read(>sem); - } + up_read(>sem); } error2: -- 1.9.1
[PATCH 3.4 080/125] parisc iommu: fix panic due to trying to allocate too large region
From: Mikulas Patocka3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit e46e31a3696ae2d66f32c207df3969613726e636 upstream. When using the Promise TX2+ SATA controller on PA-RISC, the system often crashes with kernel panic, for example just writing data with the dd utility will make it crash. Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ a000 is out of mapping resources CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2 Backtrace: [<4021497c>] show_stack+0x14/0x20 [<40410bf0>] dump_stack+0x88/0x100 [<4023978c>] panic+0x124/0x360 [<40452c18>] sba_alloc_range+0x698/0x6a0 [<40453150>] sba_map_sg+0x260/0x5b8 [<0c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata] [<0c19535c>] ata_scsi_translate+0xe4/0x220 [libata] [<0c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata] [<40499bbc>] scsi_dispatch_cmd+0xfc/0x130 [<4049da34>] scsi_request_fn+0x6e4/0x970 [<403e95a8>] __blk_run_queue+0x40/0x60 [<403e9d8c>] blk_run_queue+0x3c/0x68 [<4049a534>] scsi_run_queue+0x2a4/0x360 [<4049be68>] scsi_end_request+0x1a8/0x238 [<4049de84>] scsi_io_completion+0xfc/0x688 [<40493c74>] scsi_finish_command+0x17c/0x1d0 The cause of the crash is not exhaustion of the IOMMU space, there is plenty of free pages. The function sba_alloc_range is called with size 0x11000, thus the pages_needed variable is 0x11. The function sba_search_bitmap is called with bits_wanted 0x11 and boundary size is 0x10 (because dma_get_seg_boundary(dev) returns 0x). The function sba_search_bitmap attempts to allocate 17 pages that must not cross 16-page boundary - it can't satisfy this requirement (iommu_is_span_boundary always returns true) and fails even if there are many free entries in the IOMMU space. How did it happen that we try to allocate 17 pages that don't cross 16-page boundary? The cause is in the function iommu_coalesce_chunks. This function tries to coalesce adjacent entries in the scatterlist. The function does several checks if it may coalesce one entry with the next, one of those checks is this: if (startsg->length + dma_len > max_seg_size) break; When it finishes coalescing adjacent entries, it allocates the mapping: sg_dma_len(contig_sg) = dma_len; dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); sg_dma_address(contig_sg) = PIDE_FLAG | (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT) | dma_offset; It is possible that (startsg->length + dma_len > max_seg_size) is false (we are just near the 0x1 max_seg_size boundary), so the funcion decides to coalesce this entry with the next entry. When the coalescing succeeds, the function performs dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); And now, because of non-zero dma_offset, dma_len is greater than 0x1. iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts to allocate 17 pages for a device that must not cross 16-page boundary. To fix the bug, we must make sure that dma_len after addition of dma_offset and alignment doesn't cross the segment boundary. I.e. change if (startsg->length + dma_len > max_seg_size) break; to if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size) break; This patch makes this change (it precalculates max_seg_boundary at the beginning of the function iommu_coalesce_chunks). I also added a check that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is not needed for Promise TX2+ SATA, but it may be needed for other devices that have dma_get_seg_boundary lower than dma_get_max_seg_size). Signed-off-by: Mikulas Patocka Signed-off-by: Helge Deller Signed-off-by: Zefan Li --- drivers/parisc/iommu-helpers.h | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/parisc/iommu-helpers.h b/drivers/parisc/iommu-helpers.h index 8c33491..c6aa388 100644 --- a/drivers/parisc/iommu-helpers.h +++ b/drivers/parisc/iommu-helpers.h @@ -104,7 +104,11 @@ iommu_coalesce_chunks(struct ioc *ioc, struct device *dev, struct scatterlist *contig_sg; /* contig chunk head */ unsigned long dma_offset, dma_len; /* start/len of DMA stream */ unsigned int n_mappings = 0; - unsigned int max_seg_size = dma_get_max_seg_size(dev); + unsigned int max_seg_size = min(dma_get_max_seg_size(dev), + (unsigned)DMA_CHUNK_SIZE); + unsigned int max_seg_boundary = dma_get_seg_boundary(dev) + 1; + if (max_seg_boundary) /* check if the addition above didn't overflow */ + max_seg_size = min(max_seg_size, max_seg_boundary); while (nents > 0) { @@ -139,14 +143,11 @@
[PATCH 3.4 080/125] parisc iommu: fix panic due to trying to allocate too large region
From: Mikulas Patocka 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit e46e31a3696ae2d66f32c207df3969613726e636 upstream. When using the Promise TX2+ SATA controller on PA-RISC, the system often crashes with kernel panic, for example just writing data with the dd utility will make it crash. Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ a000 is out of mapping resources CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2 Backtrace: [<4021497c>] show_stack+0x14/0x20 [<40410bf0>] dump_stack+0x88/0x100 [<4023978c>] panic+0x124/0x360 [<40452c18>] sba_alloc_range+0x698/0x6a0 [<40453150>] sba_map_sg+0x260/0x5b8 [<0c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata] [<0c19535c>] ata_scsi_translate+0xe4/0x220 [libata] [<0c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata] [<40499bbc>] scsi_dispatch_cmd+0xfc/0x130 [<4049da34>] scsi_request_fn+0x6e4/0x970 [<403e95a8>] __blk_run_queue+0x40/0x60 [<403e9d8c>] blk_run_queue+0x3c/0x68 [<4049a534>] scsi_run_queue+0x2a4/0x360 [<4049be68>] scsi_end_request+0x1a8/0x238 [<4049de84>] scsi_io_completion+0xfc/0x688 [<40493c74>] scsi_finish_command+0x17c/0x1d0 The cause of the crash is not exhaustion of the IOMMU space, there is plenty of free pages. The function sba_alloc_range is called with size 0x11000, thus the pages_needed variable is 0x11. The function sba_search_bitmap is called with bits_wanted 0x11 and boundary size is 0x10 (because dma_get_seg_boundary(dev) returns 0x). The function sba_search_bitmap attempts to allocate 17 pages that must not cross 16-page boundary - it can't satisfy this requirement (iommu_is_span_boundary always returns true) and fails even if there are many free entries in the IOMMU space. How did it happen that we try to allocate 17 pages that don't cross 16-page boundary? The cause is in the function iommu_coalesce_chunks. This function tries to coalesce adjacent entries in the scatterlist. The function does several checks if it may coalesce one entry with the next, one of those checks is this: if (startsg->length + dma_len > max_seg_size) break; When it finishes coalescing adjacent entries, it allocates the mapping: sg_dma_len(contig_sg) = dma_len; dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); sg_dma_address(contig_sg) = PIDE_FLAG | (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT) | dma_offset; It is possible that (startsg->length + dma_len > max_seg_size) is false (we are just near the 0x1 max_seg_size boundary), so the funcion decides to coalesce this entry with the next entry. When the coalescing succeeds, the function performs dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); And now, because of non-zero dma_offset, dma_len is greater than 0x1. iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts to allocate 17 pages for a device that must not cross 16-page boundary. To fix the bug, we must make sure that dma_len after addition of dma_offset and alignment doesn't cross the segment boundary. I.e. change if (startsg->length + dma_len > max_seg_size) break; to if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size) break; This patch makes this change (it precalculates max_seg_boundary at the beginning of the function iommu_coalesce_chunks). I also added a check that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is not needed for Promise TX2+ SATA, but it may be needed for other devices that have dma_get_seg_boundary lower than dma_get_max_seg_size). Signed-off-by: Mikulas Patocka Signed-off-by: Helge Deller Signed-off-by: Zefan Li --- drivers/parisc/iommu-helpers.h | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/parisc/iommu-helpers.h b/drivers/parisc/iommu-helpers.h index 8c33491..c6aa388 100644 --- a/drivers/parisc/iommu-helpers.h +++ b/drivers/parisc/iommu-helpers.h @@ -104,7 +104,11 @@ iommu_coalesce_chunks(struct ioc *ioc, struct device *dev, struct scatterlist *contig_sg; /* contig chunk head */ unsigned long dma_offset, dma_len; /* start/len of DMA stream */ unsigned int n_mappings = 0; - unsigned int max_seg_size = dma_get_max_seg_size(dev); + unsigned int max_seg_size = min(dma_get_max_seg_size(dev), + (unsigned)DMA_CHUNK_SIZE); + unsigned int max_seg_boundary = dma_get_seg_boundary(dev) + 1; + if (max_seg_boundary) /* check if the addition above didn't overflow */ + max_seg_size = min(max_seg_size, max_seg_boundary); while (nents > 0) { @@ -139,14 +143,11 @@ iommu_coalesce_chunks(struct ioc *ioc, struct device *dev, /*
[PATCH 3.4 078/125] vgaarb: fix signal handling in vga_get()
From: "Kirill A. Shutemov"3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 9f5bd30818c42c6c36a51f93b4df75a2ea2bd85e upstream. There are few defects in vga_get() related to signal hadning: - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE case; - if we found pending signal we must remove ourself from wait queue and change task state back to running; - -ERESTARTSYS is more appropriate, I guess. Signed-off-by: Kirill A. Shutemov Reviewed-by: David Herrmann Signed-off-by: Dave Airlie Signed-off-by: Zefan Li --- drivers/gpu/vga/vgaarb.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 111d956..6a46d6e 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -381,8 +381,10 @@ int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible) set_current_state(interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE); - if (signal_pending(current)) { - rc = -EINTR; + if (interruptible && signal_pending(current)) { + __set_current_state(TASK_RUNNING); + remove_wait_queue(_wait_queue, ); + rc = -ERESTARTSYS; break; } schedule(); -- 1.9.1
[PATCH 3.4 083/125] tty: Fix GPF in flush_to_ldisc()
From: Peter Hurley <pe...@hurleysoftware.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 9ce119f318ba1a07c29149301f1544b6c4bea52a upstream. A line discipline which does not define a receive_buf() method can can cause a GPF if data is ever received [1]. Oddly, this was known to the author of n_tracesink in 2011, but never fixed. [1] GPF report BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 3752d067 PUD 37a7b067 PMD 0 Oops: 0010 [#1] SMP KASAN Modules linked in: CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events_unbound flush_to_ldisc task: 88006da94440 ti: 88006db6 task.ti: 88006db6 RIP: 0010:[<>] [< (null)>] (null) RSP: 0018:88006db67b50 EFLAGS: 00010246 RAX: 0102 RBX: 88003ab32f88 RCX: 0102 RDX: RSI: 88003ab330a6 RDI: 88003aabd388 RBP: 88006db67c48 R08: 88003ab32f9c R09: 88003ab31fb0 R10: 88003ab32fa8 R11: R12: dc00 R13: 88006db67c20 R14: 863df820 R15: 88003ab31fb8 FS: () GS:88006dc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 37938000 CR4: 06e0 Stack: 829f46f1 88006da94bf8 88006da94bf8 88003ab31fb0 88003aabd438 88003ab31ff8 88006430fd90 88003ab32f9c ed0007557a87 11000db6cf78 88003ab32078 Call Trace: [] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030 [] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162 [] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302 [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468 Code: Bad RIP value. RIP [< (null)>] (null) RSP CR2: ---[ end trace a587f8947e54d6ea ]--- Reported-by: Dmitry Vyukov <dvyu...@google.com> Signed-off-by: Peter Hurley <pe...@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> [lizf: Backportd to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- drivers/tty/tty_buffer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c index 4f02f9c..3f59d6c 100644 --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -443,7 +443,8 @@ static void flush_to_ldisc(struct work_struct *work) flag_buf = head->flag_buf_ptr + head->read; head->read += count; spin_unlock_irqrestore(>buf.lock, flags); - disc->ops->receive_buf(tty, char_buf, + if (disc->ops->receive_buf) + disc->ops->receive_buf(tty, char_buf, flag_buf, count); spin_lock_irqsave(>buf.lock, flags); } -- 1.9.1
[PATCH 3.4 103/125] USB: fix invalid memory access in hub_activate()
From: Alan Stern <st...@rowland.harvard.edu> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit e50293ef9775c5f1cf3fcc093037dd6a8c5684ea upstream. Commit 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") changed the hub_activate() routine to make part of it run in a workqueue. However, the commit failed to take a reference to the usb_hub structure or to lock the hub interface while doing so. As a result, if a hub is plugged in and quickly unplugged before the work routine can run, the routine will try to access memory that has been deallocated. Or, if the hub is unplugged while the routine is running, the memory may be deallocated while it is in active use. This patch fixes the problem by taking a reference to the usb_hub at the start of hub_activate() and releasing it at the end (when the work is finished), and by locking the hub interface while the work routine is running. It also adds a check at the start of the routine to see if the hub has already been disconnected, in which nothing should be done. Signed-off-by: Alan Stern <st...@rowland.harvard.edu> Reported-by: Alexandru Cornea <alexandru.cor...@intel.com> Tested-by: Alexandru Cornea <alexandru.cor...@intel.com> Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> [lizf: Backported to 3.4: add forward declaration of hub_release()] Signed-off-by: Zefan Li <lize...@huawei.com> --- drivers/usb/core/hub.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 62ea924..e0ad5dc 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -156,6 +156,7 @@ EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem); #define HUB_DEBOUNCE_STABLE 100 +static void hub_release(struct kref *kref); static int usb_reset_and_verify_device(struct usb_device *udev); static inline char *portspeed(struct usb_hub *hub, int portstatus) @@ -797,10 +798,20 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) unsigned delay; /* Continue a partial initialization */ - if (type == HUB_INIT2) - goto init2; - if (type == HUB_INIT3) + if (type == HUB_INIT2 || type == HUB_INIT3) { + device_lock(hub->intfdev); + + /* Was the hub disconnected while we were waiting? */ + if (hub->disconnected) { + device_unlock(hub->intfdev); + kref_put(>kref, hub_release); + return; + } + if (type == HUB_INIT2) + goto init2; goto init3; + } + kref_get(>kref); /* The superspeed hub except for root hub has to use Hub Depth * value as an offset into the route string to locate the bits @@ -990,6 +1001,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) PREPARE_DELAYED_WORK(>init_work, hub_init_func3); schedule_delayed_work(>init_work, msecs_to_jiffies(delay)); + device_unlock(hub->intfdev); return; /* Continues at init3: below */ } else { msleep(delay); @@ -1010,6 +1022,11 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) /* Allow autosuspend if it was suppressed */ if (type <= HUB_INIT3) usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); + + if (type == HUB_INIT2 || type == HUB_INIT3) + device_unlock(hub->intfdev); + + kref_put(>kref, hub_release); } /* Implement the continuations for the delays above */ -- 1.9.1
[PATCH 3.4 075/125] rfkill: copy the name into the rfkill struct
From: Johannes Berg3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit b7bb110008607a915298bf0f47d25886ecb94477 upstream. Some users of rfkill, like NFC and cfg80211, use a dynamic name when allocating rfkill, in those cases dev_name(). Therefore, the pointer passed to rfkill_alloc() might not be valid forever, I specifically found the case that the rfkill name was quite obviously an invalid pointer (or at least garbage) when the wiphy had been renamed. Fix this by making a copy of the rfkill name in rfkill_alloc(). Signed-off-by: Johannes Berg Signed-off-by: Zefan Li --- net/rfkill/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/rfkill/core.c b/net/rfkill/core.c index f974961..feef1a45 100644 --- a/net/rfkill/core.c +++ b/net/rfkill/core.c @@ -51,7 +51,6 @@ struct rfkill { spinlock_t lock; - const char *name; enum rfkill_typetype; unsigned long state; @@ -75,6 +74,7 @@ struct rfkill { struct delayed_work poll_work; struct work_struct uevent_work; struct work_struct sync_work; + charname[]; }; #define to_rfkill(d) container_of(d, struct rfkill, dev) @@ -849,14 +849,14 @@ struct rfkill * __must_check rfkill_alloc(const char *name, if (WARN_ON(type == RFKILL_TYPE_ALL || type >= NUM_RFKILL_TYPES)) return NULL; - rfkill = kzalloc(sizeof(*rfkill), GFP_KERNEL); + rfkill = kzalloc(sizeof(*rfkill) + strlen(name) + 1, GFP_KERNEL); if (!rfkill) return NULL; spin_lock_init(>lock); INIT_LIST_HEAD(>node); rfkill->type = type; - rfkill->name = name; + strcpy(rfkill->name, name); rfkill->ops = ops; rfkill->data = ops_data; -- 1.9.1
[PATCH 3.4 108/125] mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
From: Andrew Banman3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 5f0f2887f4de9508dcf438deab28f1de8070c271 upstream. test_pages_in_a_zone() does not account for the possibility of missing sections in the given pfn range. pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing sections to pass the test, leading to a kernel oops. Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check for missing sections before proceeding into the zone-check code. This also prevents a crash from offlining memory devices with missing sections. Despite this, it may be a good idea to keep the related patch '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with missing sections' because missing sections in a memory block may lead to other problems not covered by the scope of this fix. Signed-off-by: Andrew Banman Acked-by: Alex Thorlton Cc: Russ Anderson Cc: Alex Thorlton Cc: Yinghai Lu Cc: Greg KH Cc: Seth Jennings Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li --- mm/memory_hotplug.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 09d87b7..223232a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -716,23 +716,30 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) */ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long pfn; + unsigned long pfn, sec_end_pfn; struct zone *zone = NULL; struct page *page; int i; - for (pfn = start_pfn; + for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn); pfn < end_pfn; -pfn += MAX_ORDER_NR_PAGES) { - i = 0; - /* This is just a CONFIG_HOLES_IN_ZONE check.*/ - while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) - i++; - if (i == MAX_ORDER_NR_PAGES) +pfn = sec_end_pfn + 1, sec_end_pfn += PAGES_PER_SECTION) { + /* Make sure the memory section is present first */ + if (!present_section_nr(pfn_to_section_nr(pfn))) continue; - page = pfn_to_page(pfn + i); - if (zone && page_zone(page) != zone) - return 0; - zone = page_zone(page); + for (; pfn < sec_end_pfn && pfn < end_pfn; +pfn += MAX_ORDER_NR_PAGES) { + i = 0; + /* This is just a CONFIG_HOLES_IN_ZONE check.*/ + while ((i < MAX_ORDER_NR_PAGES) && + !pfn_valid_within(pfn + i)) + i++; + if (i == MAX_ORDER_NR_PAGES) + continue; + page = pfn_to_page(pfn + i); + if (zone && page_zone(page) != zone) + return 0; + zone = page_zone(page); + } } return 1; } -- 1.9.1
[PATCH 3.4 078/125] vgaarb: fix signal handling in vga_get()
From: "Kirill A. Shutemov" 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 9f5bd30818c42c6c36a51f93b4df75a2ea2bd85e upstream. There are few defects in vga_get() related to signal hadning: - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE case; - if we found pending signal we must remove ourself from wait queue and change task state back to running; - -ERESTARTSYS is more appropriate, I guess. Signed-off-by: Kirill A. Shutemov Reviewed-by: David Herrmann Signed-off-by: Dave Airlie Signed-off-by: Zefan Li --- drivers/gpu/vga/vgaarb.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 111d956..6a46d6e 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -381,8 +381,10 @@ int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible) set_current_state(interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE); - if (signal_pending(current)) { - rc = -EINTR; + if (interruptible && signal_pending(current)) { + __set_current_state(TASK_RUNNING); + remove_wait_queue(_wait_queue, ); + rc = -ERESTARTSYS; break; } schedule(); -- 1.9.1
[PATCH 3.4 083/125] tty: Fix GPF in flush_to_ldisc()
From: Peter Hurley 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 9ce119f318ba1a07c29149301f1544b6c4bea52a upstream. A line discipline which does not define a receive_buf() method can can cause a GPF if data is ever received [1]. Oddly, this was known to the author of n_tracesink in 2011, but never fixed. [1] GPF report BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 3752d067 PUD 37a7b067 PMD 0 Oops: 0010 [#1] SMP KASAN Modules linked in: CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events_unbound flush_to_ldisc task: 88006da94440 ti: 88006db6 task.ti: 88006db6 RIP: 0010:[<>] [< (null)>] (null) RSP: 0018:88006db67b50 EFLAGS: 00010246 RAX: 0102 RBX: 88003ab32f88 RCX: 0102 RDX: RSI: 88003ab330a6 RDI: 88003aabd388 RBP: 88006db67c48 R08: 88003ab32f9c R09: 88003ab31fb0 R10: 88003ab32fa8 R11: R12: dc00 R13: 88006db67c20 R14: 863df820 R15: 88003ab31fb8 FS: () GS:88006dc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 37938000 CR4: 06e0 Stack: 829f46f1 88006da94bf8 88006da94bf8 88003ab31fb0 88003aabd438 88003ab31ff8 88006430fd90 88003ab32f9c ed0007557a87 11000db6cf78 88003ab32078 Call Trace: [] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030 [] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162 [] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302 [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468 Code: Bad RIP value. RIP [< (null)>] (null) RSP CR2: ---[ end trace a587f8947e54d6ea ]--- Reported-by: Dmitry Vyukov Signed-off-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman [lizf: Backportd to 3.4: adjust context] Signed-off-by: Zefan Li --- drivers/tty/tty_buffer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c index 4f02f9c..3f59d6c 100644 --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -443,7 +443,8 @@ static void flush_to_ldisc(struct work_struct *work) flag_buf = head->flag_buf_ptr + head->read; head->read += count; spin_unlock_irqrestore(>buf.lock, flags); - disc->ops->receive_buf(tty, char_buf, + if (disc->ops->receive_buf) + disc->ops->receive_buf(tty, char_buf, flag_buf, count); spin_lock_irqsave(>buf.lock, flags); } -- 1.9.1
[PATCH 3.4 103/125] USB: fix invalid memory access in hub_activate()
From: Alan Stern 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit e50293ef9775c5f1cf3fcc093037dd6a8c5684ea upstream. Commit 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") changed the hub_activate() routine to make part of it run in a workqueue. However, the commit failed to take a reference to the usb_hub structure or to lock the hub interface while doing so. As a result, if a hub is plugged in and quickly unplugged before the work routine can run, the routine will try to access memory that has been deallocated. Or, if the hub is unplugged while the routine is running, the memory may be deallocated while it is in active use. This patch fixes the problem by taking a reference to the usb_hub at the start of hub_activate() and releasing it at the end (when the work is finished), and by locking the hub interface while the work routine is running. It also adds a check at the start of the routine to see if the hub has already been disconnected, in which nothing should be done. Signed-off-by: Alan Stern Reported-by: Alexandru Cornea Tested-by: Alexandru Cornea Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") Signed-off-by: Greg Kroah-Hartman [lizf: Backported to 3.4: add forward declaration of hub_release()] Signed-off-by: Zefan Li --- drivers/usb/core/hub.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 62ea924..e0ad5dc 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -156,6 +156,7 @@ EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem); #define HUB_DEBOUNCE_STABLE 100 +static void hub_release(struct kref *kref); static int usb_reset_and_verify_device(struct usb_device *udev); static inline char *portspeed(struct usb_hub *hub, int portstatus) @@ -797,10 +798,20 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) unsigned delay; /* Continue a partial initialization */ - if (type == HUB_INIT2) - goto init2; - if (type == HUB_INIT3) + if (type == HUB_INIT2 || type == HUB_INIT3) { + device_lock(hub->intfdev); + + /* Was the hub disconnected while we were waiting? */ + if (hub->disconnected) { + device_unlock(hub->intfdev); + kref_put(>kref, hub_release); + return; + } + if (type == HUB_INIT2) + goto init2; goto init3; + } + kref_get(>kref); /* The superspeed hub except for root hub has to use Hub Depth * value as an offset into the route string to locate the bits @@ -990,6 +1001,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) PREPARE_DELAYED_WORK(>init_work, hub_init_func3); schedule_delayed_work(>init_work, msecs_to_jiffies(delay)); + device_unlock(hub->intfdev); return; /* Continues at init3: below */ } else { msleep(delay); @@ -1010,6 +1022,11 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) /* Allow autosuspend if it was suppressed */ if (type <= HUB_INIT3) usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); + + if (type == HUB_INIT2 || type == HUB_INIT3) + device_unlock(hub->intfdev); + + kref_put(>kref, hub_release); } /* Implement the continuations for the delays above */ -- 1.9.1
[PATCH 3.4 075/125] rfkill: copy the name into the rfkill struct
From: Johannes Berg 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit b7bb110008607a915298bf0f47d25886ecb94477 upstream. Some users of rfkill, like NFC and cfg80211, use a dynamic name when allocating rfkill, in those cases dev_name(). Therefore, the pointer passed to rfkill_alloc() might not be valid forever, I specifically found the case that the rfkill name was quite obviously an invalid pointer (or at least garbage) when the wiphy had been renamed. Fix this by making a copy of the rfkill name in rfkill_alloc(). Signed-off-by: Johannes Berg Signed-off-by: Zefan Li --- net/rfkill/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/rfkill/core.c b/net/rfkill/core.c index f974961..feef1a45 100644 --- a/net/rfkill/core.c +++ b/net/rfkill/core.c @@ -51,7 +51,6 @@ struct rfkill { spinlock_t lock; - const char *name; enum rfkill_typetype; unsigned long state; @@ -75,6 +74,7 @@ struct rfkill { struct delayed_work poll_work; struct work_struct uevent_work; struct work_struct sync_work; + charname[]; }; #define to_rfkill(d) container_of(d, struct rfkill, dev) @@ -849,14 +849,14 @@ struct rfkill * __must_check rfkill_alloc(const char *name, if (WARN_ON(type == RFKILL_TYPE_ALL || type >= NUM_RFKILL_TYPES)) return NULL; - rfkill = kzalloc(sizeof(*rfkill), GFP_KERNEL); + rfkill = kzalloc(sizeof(*rfkill) + strlen(name) + 1, GFP_KERNEL); if (!rfkill) return NULL; spin_lock_init(>lock); INIT_LIST_HEAD(>node); rfkill->type = type; - rfkill->name = name; + strcpy(rfkill->name, name); rfkill->ops = ops; rfkill->data = ops_data; -- 1.9.1
[PATCH 3.4 108/125] mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
From: Andrew Banman 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 5f0f2887f4de9508dcf438deab28f1de8070c271 upstream. test_pages_in_a_zone() does not account for the possibility of missing sections in the given pfn range. pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing sections to pass the test, leading to a kernel oops. Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check for missing sections before proceeding into the zone-check code. This also prevents a crash from offlining memory devices with missing sections. Despite this, it may be a good idea to keep the related patch '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with missing sections' because missing sections in a memory block may lead to other problems not covered by the scope of this fix. Signed-off-by: Andrew Banman Acked-by: Alex Thorlton Cc: Russ Anderson Cc: Alex Thorlton Cc: Yinghai Lu Cc: Greg KH Cc: Seth Jennings Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li --- mm/memory_hotplug.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 09d87b7..223232a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -716,23 +716,30 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) */ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long pfn; + unsigned long pfn, sec_end_pfn; struct zone *zone = NULL; struct page *page; int i; - for (pfn = start_pfn; + for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn); pfn < end_pfn; -pfn += MAX_ORDER_NR_PAGES) { - i = 0; - /* This is just a CONFIG_HOLES_IN_ZONE check.*/ - while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) - i++; - if (i == MAX_ORDER_NR_PAGES) +pfn = sec_end_pfn + 1, sec_end_pfn += PAGES_PER_SECTION) { + /* Make sure the memory section is present first */ + if (!present_section_nr(pfn_to_section_nr(pfn))) continue; - page = pfn_to_page(pfn + i); - if (zone && page_zone(page) != zone) - return 0; - zone = page_zone(page); + for (; pfn < sec_end_pfn && pfn < end_pfn; +pfn += MAX_ORDER_NR_PAGES) { + i = 0; + /* This is just a CONFIG_HOLES_IN_ZONE check.*/ + while ((i < MAX_ORDER_NR_PAGES) && + !pfn_valid_within(pfn + i)) + i++; + if (i == MAX_ORDER_NR_PAGES) + continue; + page = pfn_to_page(pfn + i); + if (zone && page_zone(page) != zone) + return 0; + zone = page_zone(page); + } } return 1; } -- 1.9.1
[PATCH 3.4 000/125] 3.4.113-rc1 review
From: Zefan LiThis is the start of the stable review cycle for the 3.4.113 release. There are 125 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Oct 14 12:32:05 UTC 2016. Anything received after that time might be too late. A combined patch relative to 3.4.112 will be posted as an additional response to this. A shortlog and diffstat can be found below. thanks, Zefan Li Aaro Koskinen (1): broadcom: fix PHY_ID_BCM5481 entry in the id table Al Viro (2): fix sysvfs symlinks 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping Alan Stern (1): USB: fix invalid memory access in hub_activate() Aleksander Morgado (1): USB: serial: option: add support for Novatel MiFi USB620L Alexey Khoroshilov (1): USB: whci-hcd: add check for dma mapping error Andrew Banman (1): mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() Andrey Ryabinin (1): ipv6/addrlabel: fix ip6addrlbl_get() Anson Huang (1): ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted Arnd Bergmann (1): ARM: pxa: remove incorrect __init annotation on pxa27x_set_pwrmode Ben Hutchings (1): USB: ti_usb_3410_502: Fix ID table size Bjørn Mork (1): USB: option: add XS Stick W100-2 from 4G Systems Boris BREZILLON (1): mtd: mtdpart: fix add_mtd_partitions error path Borislav Petkov (1): x86/cpu: Call verify_cpu() after having entered long mode too Chen Yu (1): ACPI: Use correct IRQ when uninstalling ACPI interrupt handler Christoph Hellwig (1): scsi: restart list search after unlock in scsi_remove_target Chunfeng Yun (1): usb: xhci: fix config fail of FS hub behind a HS hub with MTT Clemens Ladisch (3): ALSA: usb-audio: add packet size quirk for the Medeli DD305 ALSA: usb-audio: prevent CH345 multiport output SysEx corruption ALSA: usb-audio: work around CH345 input SysEx corruption Colin Ian King (1): ftrace/scripts: Fix incorrect use of sprintf in recordmcount Daeho Jeong (1): ext4, jbd2: ensure entering into panic after recording an error in superblock Dan Carpenter (4): mwifiex: fix mwifiex_rdeeprom_read() devres: fix a for loop bounds check mISDN: fix a loop count USB: ipaq.c: fix a timeout loop Dave Airlie (1): drm/radeon: fix hotplug race at startup David Howells (2): FS-Cache: Handle a write to the page immediately beyond the EOF marker KEYS: Fix race between read and revoke David Turner (1): ext4: Fix handling of extended tv_sec David Vrabel (3): xen: Add RING_COPY_REQUEST() xen-netback: don't use last request to determine minimum Tx credit xen-netback: use RING_COPY_REQUEST() throughout David Woodhouse (1): iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints Dmitry Tunin (1): Bluetooth: ath3k: Add support of AR3012 0cf3:817b device Dmitry V. Levin (1): x86/signal: Fix restart_syscall number for x32 tasks Eric Dumazet (5): net: fix a race in dst_release() tcp: md5: fix lockdep annotation af_unix: fix a fatal race with bit fields udp: properly support MSG_PEEK with truncated buffers tcp: make challenge acks less predictable Filipe Manana (1): Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Francesco Ruggeri (1): net: possible use after free in dst_release Helge Deller (1): parisc: Fix syscall restarts Herbert Xu (2): crypto: algif_hash - Only export and import on sockets with data net: Fix skb csum races when peeking James Bottomley (2): ses: Fix problems with simple enclosures ses: fix additional element traversal bug Jan Kara (3): vfs: Make sendfile(2) killable even better vfs: Avoid softlockups with sendfile(2) jbd2: Fix unreclaimed pages after truncate in data=journal mode Jason A. Donenfeld (1): crypto: skcipher - Copy iv from desc even for 0-len walks Jeff Layton (1): nfs: if we have no valid attrs, then don't declare the attribute cache valid Jiri Slaby (1): usblp: do not set TASK_INTERRUPTIBLE before lock Joe Thornber (1): dm btree: fix bufio buffer leaks in dm_btree_del() error path Johan Hovold (1): spi: fix parent-device reference leak Johannes Berg (3): mac80211: fix driver RSSI event calculations mac80211: mesh: fix call_rcu() usage rfkill: copy the name into the rfkill struct John Stultz (1): time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge Joseph Qi (1): ocfs2: fix BUG when calculate new backup super Karl Heiss (1): sctp: Prevent soft lockup when sctp_accept() is called during a timeout event Kees Cook (1): mac: validate mac_partition is within sector Kinglong Mee (2): FS-Cache: Increase reference of parent after registering, netfs success FS-Cache: Don't override netfs's primary_index if registering failed Kirill A. Shutemov (1): vgaarb: fix signal
[PATCH 3.4 082/125] mm: hugetlb: call huge_pte_alloc() only if ptep is null
From: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 0d777df5d8953293be090d9ab5a355db893e8357 upstream. Currently at the beginning of hugetlb_fault(), we call huge_pte_offset() and check whether the obtained *ptep is a migration/hwpoison entry or not. And if not, then we get to call huge_pte_alloc(). This is racy because the *ptep could turn into migration/hwpoison entry after the huge_pte_offset() check. This race results in BUG_ON in huge_pte_alloc(). We don't have to call huge_pte_alloc() when the huge_pte_offset() returns non-NULL, so let's fix this bug with moving the code into else block. Note that the *ptep could turn into a migration/hwpoison entry after this block, but that's not a problem because we have another !pte_present check later (we never go into hugetlb_no_page() in that case.) Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> Acked-by: Hillf Danton <hillf...@alibaba-inc.com> Acked-by: David Rientjes <rient...@google.com> Cc: Hugh Dickins <hu...@google.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mel Gorman <mgor...@suse.de> Cc: Joonsoo Kim <iamjoonsoo@lge.com> Cc: Mike Kravetz <mike.krav...@oracle.com> Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- mm/hugetlb.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e622aab..416cbfd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2835,12 +2835,12 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(h - hstates); + } else { + ptep = huge_pte_alloc(mm, address, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } - ptep = huge_pte_alloc(mm, address, huge_page_size(h)); - if (!ptep) - return VM_FAULT_OOM; - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate -- 1.9.1
[PATCH 3.4 076/125] dm btree: fix bufio buffer leaks in dm_btree_del() error path
From: Joe Thornber3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit ed8b45a3679eb49069b094c0711b30833f27c734 upstream. If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Zefan Li --- drivers/md/persistent-data/dm-btree.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c index 77c615e..c948acf 100644 --- a/drivers/md/persistent-data/dm-btree.c +++ b/drivers/md/persistent-data/dm-btree.c @@ -230,6 +230,16 @@ static void pop_frame(struct del_stack *s) dm_tm_unlock(s->tm, f->b); } +static void unlock_all_frames(struct del_stack *s) +{ + struct frame *f; + + while (unprocessed_frames(s)) { + f = s->spine + s->top--; + dm_tm_unlock(s->tm, f->b); + } +} + int dm_btree_del(struct dm_btree_info *info, dm_block_t root) { int r; @@ -285,9 +295,13 @@ int dm_btree_del(struct dm_btree_info *info, dm_block_t root) f->current_child = f->nr_children; } } - out: + if (r) { + /* cleanup all frames of del_stack */ + unlock_all_frames(s); + } kfree(s); + return r; } EXPORT_SYMBOL_GPL(dm_btree_del); -- 1.9.1
[PATCH 3.4 082/125] mm: hugetlb: call huge_pte_alloc() only if ptep is null
From: Naoya Horiguchi 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 0d777df5d8953293be090d9ab5a355db893e8357 upstream. Currently at the beginning of hugetlb_fault(), we call huge_pte_offset() and check whether the obtained *ptep is a migration/hwpoison entry or not. And if not, then we get to call huge_pte_alloc(). This is racy because the *ptep could turn into migration/hwpoison entry after the huge_pte_offset() check. This race results in BUG_ON in huge_pte_alloc(). We don't have to call huge_pte_alloc() when the huge_pte_offset() returns non-NULL, so let's fix this bug with moving the code into else block. Note that the *ptep could turn into a migration/hwpoison entry after this block, but that's not a problem because we have another !pte_present check later (we never go into hugetlb_no_page() in that case.) Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi Acked-by: Hillf Danton Acked-by: David Rientjes Cc: Hugh Dickins Cc: Dave Hansen Cc: Mel Gorman Cc: Joonsoo Kim Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- mm/hugetlb.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e622aab..416cbfd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2835,12 +2835,12 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(h - hstates); + } else { + ptep = huge_pte_alloc(mm, address, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } - ptep = huge_pte_alloc(mm, address, huge_page_size(h)); - if (!ptep) - return VM_FAULT_OOM; - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate -- 1.9.1
[PATCH 3.4 076/125] dm btree: fix bufio buffer leaks in dm_btree_del() error path
From: Joe Thornber 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit ed8b45a3679eb49069b094c0711b30833f27c734 upstream. If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Zefan Li --- drivers/md/persistent-data/dm-btree.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c index 77c615e..c948acf 100644 --- a/drivers/md/persistent-data/dm-btree.c +++ b/drivers/md/persistent-data/dm-btree.c @@ -230,6 +230,16 @@ static void pop_frame(struct del_stack *s) dm_tm_unlock(s->tm, f->b); } +static void unlock_all_frames(struct del_stack *s) +{ + struct frame *f; + + while (unprocessed_frames(s)) { + f = s->spine + s->top--; + dm_tm_unlock(s->tm, f->b); + } +} + int dm_btree_del(struct dm_btree_info *info, dm_block_t root) { int r; @@ -285,9 +295,13 @@ int dm_btree_del(struct dm_btree_info *info, dm_block_t root) f->current_child = f->nr_children; } } - out: + if (r) { + /* cleanup all frames of del_stack */ + unlock_all_frames(s); + } kfree(s); + return r; } EXPORT_SYMBOL_GPL(dm_btree_del); -- 1.9.1
[PATCH 3.4 000/125] 3.4.113-rc1 review
From: Zefan Li This is the start of the stable review cycle for the 3.4.113 release. There are 125 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Oct 14 12:32:05 UTC 2016. Anything received after that time might be too late. A combined patch relative to 3.4.112 will be posted as an additional response to this. A shortlog and diffstat can be found below. thanks, Zefan Li Aaro Koskinen (1): broadcom: fix PHY_ID_BCM5481 entry in the id table Al Viro (2): fix sysvfs symlinks 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping Alan Stern (1): USB: fix invalid memory access in hub_activate() Aleksander Morgado (1): USB: serial: option: add support for Novatel MiFi USB620L Alexey Khoroshilov (1): USB: whci-hcd: add check for dma mapping error Andrew Banman (1): mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() Andrey Ryabinin (1): ipv6/addrlabel: fix ip6addrlbl_get() Anson Huang (1): ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted Arnd Bergmann (1): ARM: pxa: remove incorrect __init annotation on pxa27x_set_pwrmode Ben Hutchings (1): USB: ti_usb_3410_502: Fix ID table size Bjørn Mork (1): USB: option: add XS Stick W100-2 from 4G Systems Boris BREZILLON (1): mtd: mtdpart: fix add_mtd_partitions error path Borislav Petkov (1): x86/cpu: Call verify_cpu() after having entered long mode too Chen Yu (1): ACPI: Use correct IRQ when uninstalling ACPI interrupt handler Christoph Hellwig (1): scsi: restart list search after unlock in scsi_remove_target Chunfeng Yun (1): usb: xhci: fix config fail of FS hub behind a HS hub with MTT Clemens Ladisch (3): ALSA: usb-audio: add packet size quirk for the Medeli DD305 ALSA: usb-audio: prevent CH345 multiport output SysEx corruption ALSA: usb-audio: work around CH345 input SysEx corruption Colin Ian King (1): ftrace/scripts: Fix incorrect use of sprintf in recordmcount Daeho Jeong (1): ext4, jbd2: ensure entering into panic after recording an error in superblock Dan Carpenter (4): mwifiex: fix mwifiex_rdeeprom_read() devres: fix a for loop bounds check mISDN: fix a loop count USB: ipaq.c: fix a timeout loop Dave Airlie (1): drm/radeon: fix hotplug race at startup David Howells (2): FS-Cache: Handle a write to the page immediately beyond the EOF marker KEYS: Fix race between read and revoke David Turner (1): ext4: Fix handling of extended tv_sec David Vrabel (3): xen: Add RING_COPY_REQUEST() xen-netback: don't use last request to determine minimum Tx credit xen-netback: use RING_COPY_REQUEST() throughout David Woodhouse (1): iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints Dmitry Tunin (1): Bluetooth: ath3k: Add support of AR3012 0cf3:817b device Dmitry V. Levin (1): x86/signal: Fix restart_syscall number for x32 tasks Eric Dumazet (5): net: fix a race in dst_release() tcp: md5: fix lockdep annotation af_unix: fix a fatal race with bit fields udp: properly support MSG_PEEK with truncated buffers tcp: make challenge acks less predictable Filipe Manana (1): Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Francesco Ruggeri (1): net: possible use after free in dst_release Helge Deller (1): parisc: Fix syscall restarts Herbert Xu (2): crypto: algif_hash - Only export and import on sockets with data net: Fix skb csum races when peeking James Bottomley (2): ses: Fix problems with simple enclosures ses: fix additional element traversal bug Jan Kara (3): vfs: Make sendfile(2) killable even better vfs: Avoid softlockups with sendfile(2) jbd2: Fix unreclaimed pages after truncate in data=journal mode Jason A. Donenfeld (1): crypto: skcipher - Copy iv from desc even for 0-len walks Jeff Layton (1): nfs: if we have no valid attrs, then don't declare the attribute cache valid Jiri Slaby (1): usblp: do not set TASK_INTERRUPTIBLE before lock Joe Thornber (1): dm btree: fix bufio buffer leaks in dm_btree_del() error path Johan Hovold (1): spi: fix parent-device reference leak Johannes Berg (3): mac80211: fix driver RSSI event calculations mac80211: mesh: fix call_rcu() usage rfkill: copy the name into the rfkill struct John Stultz (1): time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge Joseph Qi (1): ocfs2: fix BUG when calculate new backup super Karl Heiss (1): sctp: Prevent soft lockup when sctp_accept() is called during a timeout event Kees Cook (1): mac: validate mac_partition is within sector Kinglong Mee (2): FS-Cache: Increase reference of parent after registering, netfs success FS-Cache: Don't override netfs's primary_index if registering failed Kirill A. Shutemov (1): vgaarb: fix signal handling in vga_get()
[PATCH 3.4 034/125] usb: musb: core: fix order of arguments to ulpi write callback
From: Uwe Kleine-König3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 705e63d2b29c8bbf091119084544d353bda70393 upstream. There is a bit of a mess in the order of arguments to the ulpi write callback. There is int ulpi_write(struct ulpi *ulpi, u8 addr, u8 val) in drivers/usb/common/ulpi.c; struct usb_phy_io_ops { ... int (*write)(struct usb_phy *x, u32 val, u32 reg); } in include/linux/usb/phy.h. The callback registered by the musb driver has to comply to the latter, but up to now had "offset" first which effectively made the function broken for correct users. So flip the order and while at it also switch to the parameter names of struct usb_phy_io_ops's write. Fixes: ffb865b1e460 ("usb: musb: add ulpi access operations") Signed-off-by: Uwe Kleine-König Signed-off-by: Felipe Balbi Signed-off-by: Zefan Li --- drivers/usb/musb/musb_core.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c index d3481c4..db25165 100644 --- a/drivers/usb/musb/musb_core.c +++ b/drivers/usb/musb/musb_core.c @@ -131,7 +131,7 @@ static inline struct musb *dev_to_musb(struct device *dev) /*-*/ #ifndef CONFIG_BLACKFIN -static int musb_ulpi_read(struct usb_phy *phy, u32 offset) +static int musb_ulpi_read(struct usb_phy *phy, u32 reg) { void __iomem *addr = phy->io_priv; int i = 0; @@ -150,7 +150,7 @@ static int musb_ulpi_read(struct usb_phy *phy, u32 offset) * ULPICarKitControlDisableUTMI after clearing POWER_SUSPENDM. */ - musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset); + musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg); musb_writeb(addr, MUSB_ULPI_REG_CONTROL, MUSB_ULPI_REG_REQ | MUSB_ULPI_RDN_WR); @@ -175,7 +175,7 @@ out: return ret; } -static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data) +static int musb_ulpi_write(struct usb_phy *phy, u32 val, u32 reg) { void __iomem *addr = phy->io_priv; int i = 0; @@ -190,8 +190,8 @@ static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data) power &= ~MUSB_POWER_SUSPENDM; musb_writeb(addr, MUSB_POWER, power); - musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset); - musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)data); + musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg); + musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)val); musb_writeb(addr, MUSB_ULPI_REG_CONTROL, MUSB_ULPI_REG_REQ); while (!(musb_readb(addr, MUSB_ULPI_REG_CONTROL) -- 1.9.1
[PATCH 3.4 091/125] ftrace/scripts: Have recordmcount copy the object file
From: "Steven Rostedt (Red Hat)"3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit a50bd43935586420fb75f4558369eb08566fac5e upstream. Russell King found that he had weird side effects when compiling the kernel with hard linked ccache. The reason was that recordmcount modified the kernel in place via mmap, and when a file gets modified twice by recordmcount, it will complain about it. To fix this issue, Russell wrote a patch that checked if the file was hard linked more than once and would unlink it if it was. Linus Torvalds was not happy with the fact that recordmcount does this in place modification. Instead of doing the unlink only if the file has two or more hard links, it does the unlink all the time. In otherwords, it always does a copy if it changed something. That is, it does the write out if a change was made. Signed-off-by: Steven Rostedt Signed-off-by: Zefan Li --- scripts/recordmcount.c | 145 + 1 file changed, 110 insertions(+), 35 deletions(-) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index 4eb047a..0970379 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -35,12 +35,17 @@ static int fd_map; /* File descriptor for file being modified. */ static int mmap_failed; /* Boolean flag. */ -static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */ static char gpfx; /* prefix for global symbol name (sometimes '_') */ static struct stat sb; /* Remember .st_size, etc. */ static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */ static const char *altmcount; /* alternate mcount symbol name */ static int warn_on_notrace_sect; /* warn when section has mcount not being recorded */ +static void *file_map; /* pointer of the mapped file */ +static void *file_end; /* pointer to the end of the mapped file */ +static int file_updated; /* flag to state file was changed */ +static void *file_ptr; /* current file pointer location */ +static void *file_append; /* added to the end of the file */ +static size_t file_append_size; /* how much is added to end of file */ /* setjmp() return values */ enum { @@ -54,10 +59,14 @@ static void cleanup(void) { if (!mmap_failed) - munmap(ehdr_curr, sb.st_size); + munmap(file_map, sb.st_size); else - free(ehdr_curr); - close(fd_map); + free(file_map); + file_map = NULL; + free(file_append); + file_append = NULL; + file_append_size = 0; + file_updated = 0; } static void __attribute__((noreturn)) @@ -79,12 +88,22 @@ succeed_file(void) static off_t ulseek(int const fd, off_t const offset, int const whence) { - off_t const w = lseek(fd, offset, whence); - if (w == (off_t)-1) { - perror("lseek"); + switch (whence) { + case SEEK_SET: + file_ptr = file_map + offset; + break; + case SEEK_CUR: + file_ptr += offset; + break; + case SEEK_END: + file_ptr = file_map + (sb.st_size - offset); + break; + } + if (file_ptr < file_map) { + fprintf(stderr, "lseek: seek before file\n"); fail_file(); } - return w; + return file_ptr - file_map; } static size_t @@ -101,12 +120,38 @@ uread(int const fd, void *const buf, size_t const count) static size_t uwrite(int const fd, void const *const buf, size_t const count) { - size_t const n = write(fd, buf, count); - if (n != count) { - perror("write"); - fail_file(); + size_t cnt = count; + off_t idx = 0; + + file_updated = 1; + + if (file_ptr + count >= file_end) { + off_t aoffset = (file_ptr + count) - file_end; + + if (aoffset > file_append_size) { + file_append = realloc(file_append, aoffset); + file_append_size = aoffset; + } + if (!file_append) { + perror("write"); + fail_file(); + } + if (file_ptr < file_end) { + cnt = file_end - file_ptr; + } else { + cnt = 0; + idx = aoffset - count; + } } - return n; + + if (cnt) + memcpy(file_ptr, buf, cnt); + + if (cnt < count) + memcpy(file_append + idx, buf + cnt, count - cnt); + + file_ptr += count; + return count; } static void * @@ -163,9 +208,7 @@ static int make_nop_x86(void *map, size_t const offset) */ static void *mmap_file(char const *fname) { - void *addr; - - fd_map = open(fname, O_RDWR); + fd_map = open(fname, O_RDONLY);
[PATCH 3.4 117/125] ipv6: update ip6_rt_last_gc every time GC is run
From: Michal Kubeček3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream. As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/ip6_fib.c | 6 +- net/ipv6/route.c | 4 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index fc5ce6e..e6b7a00 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1595,6 +1595,8 @@ static DEFINE_SPINLOCK(fib6_gc_lock); void fib6_run_gc(unsigned long expires, struct net *net, bool force) { + unsigned long now; + if (force) { spin_lock_bh(_gc_lock); } else if (!spin_trylock_bh(_gc_lock)) { @@ -1607,10 +1609,12 @@ void fib6_run_gc(unsigned long expires, struct net *net, bool force) gc_args.more = icmp6_dst_gc(); fib6_clean_all(net, fib6_age, 0, NULL); + now = jiffies; + net->ipv6.ip6_rt_last_gc = now; if (gc_args.more) mod_timer(>ipv6.ip6_fib_timer, - round_jiffies(jiffies + round_jiffies(now + net->ipv6.sysctl.ip6_rt_gc_interval)); else del_timer(>ipv6.ip6_fib_timer); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7ab7f8a..28957ba 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1230,7 +1230,6 @@ static void icmp6_clean_all(int (*func)(struct rt6_info *rt, void *arg), static int ip6_dst_gc(struct dst_ops *ops) { - unsigned long now = jiffies; struct net *net = container_of(ops, struct net, ipv6.ip6_dst_ops); int rt_min_interval = net->ipv6.sysctl.ip6_rt_gc_min_interval; int rt_max_size = net->ipv6.sysctl.ip6_rt_max_size; @@ -1240,13 +1239,12 @@ static int ip6_dst_gc(struct dst_ops *ops) int entries; entries = dst_entries_get_fast(ops); - if (time_after(rt_last_gc + rt_min_interval, now) && + if (time_after(rt_last_gc + rt_min_interval, jiffies) && entries <= rt_max_size) goto out; net->ipv6.ip6_rt_gc_expire++; fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, entries > rt_max_size); - net->ipv6.ip6_rt_last_gc = now; entries = dst_entries_get_slow(ops); if (entries < ops->gc_thresh) net->ipv6.ip6_rt_gc_expire = rt_gc_timeout>>1; -- 1.9.1
[PATCH 3.4 112/125] USB: ti_usb_3410_502: Fix ID table size
From: Ben Hutchings3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- Commit 35a2fbc941ac ("USB: serial: ti_usb_3410_5052: new device id for Abbot strip port cable") failed to update the size of the ti_id_table_3410 array. This doesn't need to be fixed upstream following commit d7ece6515e12 ("USB: ti_usb_3410_5052: remove vendor/product module parameters") but should be fixed in stable branches older than 3.12. Backports of commit c9d09dc7ad10 ("USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.") similarly failed to update the size of the ti_id_table_combined array. Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li --- drivers/usb/serial/ti_usb_3410_5052.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/ti_usb_3410_5052.c b/drivers/usb/serial/ti_usb_3410_5052.c index 2575779..974c4fa 100644 --- a/drivers/usb/serial/ti_usb_3410_5052.c +++ b/drivers/usb/serial/ti_usb_3410_5052.c @@ -164,7 +164,7 @@ static unsigned int product_5052_count; /* the array dimension is the number of default entries plus */ /* TI_EXTRA_VID_PID_COUNT user defined entries plus 1 terminating */ /* null entry */ -static struct usb_device_id ti_id_table_3410[15+TI_EXTRA_VID_PID_COUNT+1] = { +static struct usb_device_id ti_id_table_3410[16+TI_EXTRA_VID_PID_COUNT+1] = { { USB_DEVICE(TI_VENDOR_ID, TI_3410_PRODUCT_ID) }, { USB_DEVICE(TI_VENDOR_ID, TI_3410_EZ430_ID) }, { USB_DEVICE(MTS_VENDOR_ID, MTS_GSM_NO_FW_PRODUCT_ID) }, @@ -190,7 +190,7 @@ static struct usb_device_id ti_id_table_5052[5+TI_EXTRA_VID_PID_COUNT+1] = { { USB_DEVICE(TI_VENDOR_ID, TI_5052_FIRMWARE_PRODUCT_ID) }, }; -static struct usb_device_id ti_id_table_combined[19+2*TI_EXTRA_VID_PID_COUNT+1] = { +static struct usb_device_id ti_id_table_combined[20+2*TI_EXTRA_VID_PID_COUNT+1] = { { USB_DEVICE(TI_VENDOR_ID, TI_3410_PRODUCT_ID) }, { USB_DEVICE(TI_VENDOR_ID, TI_3410_EZ430_ID) }, { USB_DEVICE(MTS_VENDOR_ID, MTS_GSM_NO_FW_PRODUCT_ID) }, -- 1.9.1
[PATCH 3.4 097/125] xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
From: Konrad Rzeszutek Wilk3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d upstream. The guest sequence of: a) XEN_PCI_OP_enable_msi b) XEN_PCI_OP_enable_msi c) XEN_PCI_OP_disable_msi results in hitting an BUG_ON condition in the msi.c code. The MSI code uses an dev->msi_list to which it adds MSI entries. Under the above conditions an BUG_ON() can be hit. The device passed in the guest MUST have MSI capability. The a) adds the entry to the dev->msi_list and sets msi_enabled. The b) adds a second entry but adding in to SysFS fails (duplicate entry) and deletes all of the entries from msi_list and returns (with msi_enabled is still set). c) pci_disable_msi passes the msi_enabled checks and hits: BUG_ON(list_empty(dev_to_msi_list(>dev))); and blows up. The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard against that. The check for msix_enabled is not stricly neccessary. This is part of XSA-157. Reviewed-by: David Vrabel Reviewed-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- drivers/xen/xen-pciback/pciback_ops.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index a751a66..1ab998c 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -143,7 +143,12 @@ int xen_pcibk_enable_msi(struct xen_pcibk_device *pdev, if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: enable MSI\n", pci_name(dev)); - status = pci_enable_msi(dev); + if (dev->msi_enabled) + status = -EALREADY; + else if (dev->msix_enabled) + status = -ENXIO; + else + status = pci_enable_msi(dev); if (status) { pr_warn_ratelimited(DRV_NAME ": %s: error enabling MSI for guest %u: err %d\n", -- 1.9.1
[PATCH 3.4 116/125] sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
From: Karl Heiss3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 635682a14427d241bab7bbdeebb48a7d7b91638e upstream. A case can occur when sctp_accept() is called by the user during a heartbeat timeout event after the 4-way handshake. Since sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the bh_sock_lock in sctp_generate_heartbeat_event() will be taken with the listening socket but released with the new association socket. The result is a deadlock on any future attempts to take the listening socket lock. Note that this race can occur with other SCTP timeouts that take the bh_lock_sock() in the event sctp_accept() is called. BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0] ... RIP: 0010:[] [] _spin_lock+0x1e/0x30 RSP: 0018:880028323b20 EFLAGS: 0206 RAX: 0002 RBX: 880028323b20 RCX: RDX: RSI: 880028323be0 RDI: 8804632c4b48 RBP: 8100bb93 R08: R09: R10: 880610662280 R11: 0100 R12: 880028323aa0 R13: 8804383c3880 R14: 880028323a90 R15: 81534225 FS: () GS:88002832() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 006df528 CR3: 01a85000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880616b7, task 880616b6cab0) Stack: 880028323c40 a01c2582 880614cfb020 0100 0014383a6c44 8804383c3880 880614e93c00 880614e93c00 8804632c4b00 8804383c38b8 Call Trace: [] ? sctp_rcv+0x492/0xa10 [sctp] [] ? nf_iterate+0x69/0xb0 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? nf_hook_slow+0x76/0x120 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? ip_local_deliver_finish+0xdd/0x2d0 [] ? ip_local_deliver+0x98/0xa0 [] ? ip_rcv_finish+0x12d/0x440 [] ? ip_rcv+0x275/0x350 [] ? __netif_receive_skb+0x4ab/0x750 ... With lockdep debugging: = [ BUG: bad unlock balance detected! ] - CslRx/12087 is trying to release lock (slock-AF_INET) at: [] sctp_generate_timeout_event+0x40/0xe0 [sctp] but there are no more locks to release! other info that might help us debug this: 2 locks held by CslRx/12087: #0: (>timers[i]){+.-...}, at: [] run_timer_softirq+0x16f/0x3e0 #1: (slock-AF_INET){+.-...}, at: [] sctp_generate_timeout_event+0x23/0xe0 [sctp] Ensure the socket taken is also the same one that is released by saving a copy of the socket before entering the timeout event critical section. Signed-off-by: Karl Heiss Signed-off-by: David S. Miller [bwh: Backported to 3.2: - Net namespaces are not used - Keep using sctp_bh_{,un}lock_sock() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li --- net/sctp/sm_sideeffect.c | 34 +++--- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 5fa033a..06c75b1 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -249,11 +249,12 @@ void sctp_generate_t3_rtx_event(unsigned long peer) int error; struct sctp_transport *transport = (struct sctp_transport *) peer; struct sctp_association *asoc = transport->asoc; + struct sock *sk = asoc->base.sk; /* Check whether a task is in the sock. */ - sctp_bh_lock_sock(asoc->base.sk); - if (sock_owned_by_user(asoc->base.sk)) { + sctp_bh_lock_sock(sk); + if (sock_owned_by_user(sk)) { SCTP_DEBUG_PRINTK("%s:Sock is busy.\n", __func__); /* Try again later. */ @@ -276,10 +277,10 @@ void sctp_generate_t3_rtx_event(unsigned long peer) transport, GFP_ATOMIC); if (error) - asoc->base.sk->sk_err = -error; + sk->sk_err = -error; out_unlock: - sctp_bh_unlock_sock(asoc->base.sk); + sctp_bh_unlock_sock(sk); sctp_transport_put(transport); } @@ -289,10 +290,11 @@ out_unlock: static void sctp_generate_timeout_event(struct sctp_association *asoc, sctp_event_timeout_t timeout_type) { + struct sock *sk = asoc->base.sk; int error = 0; - sctp_bh_lock_sock(asoc->base.sk); - if (sock_owned_by_user(asoc->base.sk)) { + sctp_bh_lock_sock(sk); + if (sock_owned_by_user(sk)) { SCTP_DEBUG_PRINTK("%s:Sock is busy: timer %d\n", __func__, timeout_type); @@ -316,10 +318,10 @@ static
[PATCH 3.4 118/125] ipv6: don't call fib6_run_gc() until routing is ready
From: Michal Kubeček3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 2c861cc65ef4604011a0082e4dcdba2819aa191a upstream. When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- include/net/ndisc.h | 2 ++ net/ipv6/af_inet6.c | 6 ++ net/ipv6/ndisc.c| 18 +++--- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/net/ndisc.h b/include/net/ndisc.h index 6f9c25a..cd205e9 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -117,7 +117,9 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, str } extern int ndisc_init(void); +extern int ndisc_late_init(void); +extern voidndisc_late_cleanup(void); extern voidndisc_cleanup(void); extern int ndisc_rcv(struct sk_buff *skb); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 5300ef3..8ddb56f 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -1161,6 +1161,9 @@ static int __init inet6_init(void) err = ip6_route_init(); if (err) goto ip6_route_fail; + err = ndisc_late_init(); + if (err) + goto ndisc_late_fail; err = ip6_flowlabel_init(); if (err) goto ip6_flowlabel_fail; @@ -1221,6 +1224,8 @@ ipv6_exthdrs_fail: addrconf_fail: ip6_flowlabel_cleanup(); ip6_flowlabel_fail: + ndisc_late_cleanup(); +ndisc_late_fail: ip6_route_cleanup(); ip6_route_fail: #ifdef CONFIG_PROC_FS @@ -1288,6 +1293,7 @@ static void __exit inet6_exit(void) ipv6_exthdrs_exit(); addrconf_cleanup(); ip6_flowlabel_cleanup(); + ndisc_late_cleanup(); ip6_route_cleanup(); #ifdef CONFIG_PROC_FS diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index e235b4c..02e6568 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1867,24 +1867,28 @@ int __init ndisc_init(void) if (err) goto out_unregister_pernet; #endif - err = register_netdevice_notifier(_netdev_notifier); - if (err) - goto out_unregister_sysctl; out: return err; -out_unregister_sysctl: #ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(_tbl.parms); out_unregister_pernet: -#endif unregister_pernet_subsys(_net_ops); goto out; +#endif } -void ndisc_cleanup(void) +int __init ndisc_late_init(void) +{ + return register_netdevice_notifier(_netdev_notifier); +} + +void ndisc_late_cleanup(void) { unregister_netdevice_notifier(_netdev_notifier); +} + +void ndisc_cleanup(void) +{ #ifdef CONFIG_SYSCTL neigh_sysctl_unregister(_tbl.parms); #endif -- 1.9.1
[PATCH 3.4 111/125] af_unix: fix a fatal race with bit fields
From: Eric Dumazet3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream. Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Reported-by: Ambrose Feinstein Signed-off-by: Eric Dumazet Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Signed-off-by: David S. Miller Cc: hejianet Signed-off-by: Zefan Li --- include/net/af_unix.h | 5 +++-- net/unix/garbage.c| 12 ++-- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index d29a576..f3cbf1c 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -56,9 +56,10 @@ struct unix_sock { struct list_headlink; atomic_long_t inflight; spinlock_t lock; - unsigned intgc_candidate : 1; - unsigned intgc_maybe_cycle : 1; unsigned char recursion_level; + unsigned long gc_flags; +#define UNIX_GC_CANDIDATE 0 +#define UNIX_GC_MAYBE_CYCLE1 struct socket_wqpeer_wq; wait_queue_tpeer_wake; }; diff --git a/net/unix/garbage.c b/net/unix/garbage.c index b6f4b99..00d3e56 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), * have been added to the queues after * starting the garbage collection */ - if (u->gc_candidate) { + if (test_bit(UNIX_GC_CANDIDATE, >gc_flags)) { hit = true; func(u); } @@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u) * of the list, so that it's checked even if it was already * passed over */ - if (u->gc_maybe_cycle) + if (test_bit(UNIX_GC_MAYBE_CYCLE, >gc_flags)) list_move_tail(>link, _candidates); } @@ -315,8 +315,8 @@ void unix_gc(void) BUG_ON(total_refs < inflight_refs); if (total_refs == inflight_refs) { list_move_tail(>link, _candidates); - u->gc_candidate = 1; - u->gc_maybe_cycle = 1; + __set_bit(UNIX_GC_CANDIDATE, >gc_flags); + __set_bit(UNIX_GC_MAYBE_CYCLE, >gc_flags); } } @@ -344,7 +344,7 @@ void unix_gc(void) if (atomic_long_read(>inflight) > 0) { list_move_tail(>link, _cycle_list); - u->gc_maybe_cycle = 0; + __clear_bit(UNIX_GC_MAYBE_CYCLE, >gc_flags); scan_children(>sk, inc_inflight_move_tail, NULL); } } @@ -356,7 +356,7 @@ void unix_gc(void) */ while (!list_empty(_cycle_list)) { u = list_entry(not_cycle_list.next, struct unix_sock, link); - u->gc_candidate = 0; + __clear_bit(UNIX_GC_CANDIDATE, >gc_flags); list_move_tail(>link, _inflight_list); } -- 1.9.1
[PATCH 3.4 091/125] ftrace/scripts: Have recordmcount copy the object file
From: "Steven Rostedt (Red Hat)" 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit a50bd43935586420fb75f4558369eb08566fac5e upstream. Russell King found that he had weird side effects when compiling the kernel with hard linked ccache. The reason was that recordmcount modified the kernel in place via mmap, and when a file gets modified twice by recordmcount, it will complain about it. To fix this issue, Russell wrote a patch that checked if the file was hard linked more than once and would unlink it if it was. Linus Torvalds was not happy with the fact that recordmcount does this in place modification. Instead of doing the unlink only if the file has two or more hard links, it does the unlink all the time. In otherwords, it always does a copy if it changed something. That is, it does the write out if a change was made. Signed-off-by: Steven Rostedt Signed-off-by: Zefan Li --- scripts/recordmcount.c | 145 + 1 file changed, 110 insertions(+), 35 deletions(-) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index 4eb047a..0970379 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -35,12 +35,17 @@ static int fd_map; /* File descriptor for file being modified. */ static int mmap_failed; /* Boolean flag. */ -static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */ static char gpfx; /* prefix for global symbol name (sometimes '_') */ static struct stat sb; /* Remember .st_size, etc. */ static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */ static const char *altmcount; /* alternate mcount symbol name */ static int warn_on_notrace_sect; /* warn when section has mcount not being recorded */ +static void *file_map; /* pointer of the mapped file */ +static void *file_end; /* pointer to the end of the mapped file */ +static int file_updated; /* flag to state file was changed */ +static void *file_ptr; /* current file pointer location */ +static void *file_append; /* added to the end of the file */ +static size_t file_append_size; /* how much is added to end of file */ /* setjmp() return values */ enum { @@ -54,10 +59,14 @@ static void cleanup(void) { if (!mmap_failed) - munmap(ehdr_curr, sb.st_size); + munmap(file_map, sb.st_size); else - free(ehdr_curr); - close(fd_map); + free(file_map); + file_map = NULL; + free(file_append); + file_append = NULL; + file_append_size = 0; + file_updated = 0; } static void __attribute__((noreturn)) @@ -79,12 +88,22 @@ succeed_file(void) static off_t ulseek(int const fd, off_t const offset, int const whence) { - off_t const w = lseek(fd, offset, whence); - if (w == (off_t)-1) { - perror("lseek"); + switch (whence) { + case SEEK_SET: + file_ptr = file_map + offset; + break; + case SEEK_CUR: + file_ptr += offset; + break; + case SEEK_END: + file_ptr = file_map + (sb.st_size - offset); + break; + } + if (file_ptr < file_map) { + fprintf(stderr, "lseek: seek before file\n"); fail_file(); } - return w; + return file_ptr - file_map; } static size_t @@ -101,12 +120,38 @@ uread(int const fd, void *const buf, size_t const count) static size_t uwrite(int const fd, void const *const buf, size_t const count) { - size_t const n = write(fd, buf, count); - if (n != count) { - perror("write"); - fail_file(); + size_t cnt = count; + off_t idx = 0; + + file_updated = 1; + + if (file_ptr + count >= file_end) { + off_t aoffset = (file_ptr + count) - file_end; + + if (aoffset > file_append_size) { + file_append = realloc(file_append, aoffset); + file_append_size = aoffset; + } + if (!file_append) { + perror("write"); + fail_file(); + } + if (file_ptr < file_end) { + cnt = file_end - file_ptr; + } else { + cnt = 0; + idx = aoffset - count; + } } - return n; + + if (cnt) + memcpy(file_ptr, buf, cnt); + + if (cnt < count) + memcpy(file_append + idx, buf + cnt, count - cnt); + + file_ptr += count; + return count; } static void * @@ -163,9 +208,7 @@ static int make_nop_x86(void *map, size_t const offset) */ static void *mmap_file(char const *fname) { - void *addr; - - fd_map = open(fname, O_RDWR); + fd_map = open(fname, O_RDONLY); if (fd_map < 0 || fstat(fd_map, ) < 0) {
[PATCH 3.4 117/125] ipv6: update ip6_rt_last_gc every time GC is run
From: Michal Kubeček 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream. As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/ip6_fib.c | 6 +- net/ipv6/route.c | 4 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index fc5ce6e..e6b7a00 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1595,6 +1595,8 @@ static DEFINE_SPINLOCK(fib6_gc_lock); void fib6_run_gc(unsigned long expires, struct net *net, bool force) { + unsigned long now; + if (force) { spin_lock_bh(_gc_lock); } else if (!spin_trylock_bh(_gc_lock)) { @@ -1607,10 +1609,12 @@ void fib6_run_gc(unsigned long expires, struct net *net, bool force) gc_args.more = icmp6_dst_gc(); fib6_clean_all(net, fib6_age, 0, NULL); + now = jiffies; + net->ipv6.ip6_rt_last_gc = now; if (gc_args.more) mod_timer(>ipv6.ip6_fib_timer, - round_jiffies(jiffies + round_jiffies(now + net->ipv6.sysctl.ip6_rt_gc_interval)); else del_timer(>ipv6.ip6_fib_timer); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7ab7f8a..28957ba 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1230,7 +1230,6 @@ static void icmp6_clean_all(int (*func)(struct rt6_info *rt, void *arg), static int ip6_dst_gc(struct dst_ops *ops) { - unsigned long now = jiffies; struct net *net = container_of(ops, struct net, ipv6.ip6_dst_ops); int rt_min_interval = net->ipv6.sysctl.ip6_rt_gc_min_interval; int rt_max_size = net->ipv6.sysctl.ip6_rt_max_size; @@ -1240,13 +1239,12 @@ static int ip6_dst_gc(struct dst_ops *ops) int entries; entries = dst_entries_get_fast(ops); - if (time_after(rt_last_gc + rt_min_interval, now) && + if (time_after(rt_last_gc + rt_min_interval, jiffies) && entries <= rt_max_size) goto out; net->ipv6.ip6_rt_gc_expire++; fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, entries > rt_max_size); - net->ipv6.ip6_rt_last_gc = now; entries = dst_entries_get_slow(ops); if (entries < ops->gc_thresh) net->ipv6.ip6_rt_gc_expire = rt_gc_timeout>>1; -- 1.9.1
[PATCH 3.4 112/125] USB: ti_usb_3410_502: Fix ID table size
From: Ben Hutchings 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- Commit 35a2fbc941ac ("USB: serial: ti_usb_3410_5052: new device id for Abbot strip port cable") failed to update the size of the ti_id_table_3410 array. This doesn't need to be fixed upstream following commit d7ece6515e12 ("USB: ti_usb_3410_5052: remove vendor/product module parameters") but should be fixed in stable branches older than 3.12. Backports of commit c9d09dc7ad10 ("USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.") similarly failed to update the size of the ti_id_table_combined array. Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li --- drivers/usb/serial/ti_usb_3410_5052.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/ti_usb_3410_5052.c b/drivers/usb/serial/ti_usb_3410_5052.c index 2575779..974c4fa 100644 --- a/drivers/usb/serial/ti_usb_3410_5052.c +++ b/drivers/usb/serial/ti_usb_3410_5052.c @@ -164,7 +164,7 @@ static unsigned int product_5052_count; /* the array dimension is the number of default entries plus */ /* TI_EXTRA_VID_PID_COUNT user defined entries plus 1 terminating */ /* null entry */ -static struct usb_device_id ti_id_table_3410[15+TI_EXTRA_VID_PID_COUNT+1] = { +static struct usb_device_id ti_id_table_3410[16+TI_EXTRA_VID_PID_COUNT+1] = { { USB_DEVICE(TI_VENDOR_ID, TI_3410_PRODUCT_ID) }, { USB_DEVICE(TI_VENDOR_ID, TI_3410_EZ430_ID) }, { USB_DEVICE(MTS_VENDOR_ID, MTS_GSM_NO_FW_PRODUCT_ID) }, @@ -190,7 +190,7 @@ static struct usb_device_id ti_id_table_5052[5+TI_EXTRA_VID_PID_COUNT+1] = { { USB_DEVICE(TI_VENDOR_ID, TI_5052_FIRMWARE_PRODUCT_ID) }, }; -static struct usb_device_id ti_id_table_combined[19+2*TI_EXTRA_VID_PID_COUNT+1] = { +static struct usb_device_id ti_id_table_combined[20+2*TI_EXTRA_VID_PID_COUNT+1] = { { USB_DEVICE(TI_VENDOR_ID, TI_3410_PRODUCT_ID) }, { USB_DEVICE(TI_VENDOR_ID, TI_3410_EZ430_ID) }, { USB_DEVICE(MTS_VENDOR_ID, MTS_GSM_NO_FW_PRODUCT_ID) }, -- 1.9.1
[PATCH 3.4 097/125] xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
From: Konrad Rzeszutek Wilk 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d upstream. The guest sequence of: a) XEN_PCI_OP_enable_msi b) XEN_PCI_OP_enable_msi c) XEN_PCI_OP_disable_msi results in hitting an BUG_ON condition in the msi.c code. The MSI code uses an dev->msi_list to which it adds MSI entries. Under the above conditions an BUG_ON() can be hit. The device passed in the guest MUST have MSI capability. The a) adds the entry to the dev->msi_list and sets msi_enabled. The b) adds a second entry but adding in to SysFS fails (duplicate entry) and deletes all of the entries from msi_list and returns (with msi_enabled is still set). c) pci_disable_msi passes the msi_enabled checks and hits: BUG_ON(list_empty(dev_to_msi_list(>dev))); and blows up. The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard against that. The check for msix_enabled is not stricly neccessary. This is part of XSA-157. Reviewed-by: David Vrabel Reviewed-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- drivers/xen/xen-pciback/pciback_ops.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index a751a66..1ab998c 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -143,7 +143,12 @@ int xen_pcibk_enable_msi(struct xen_pcibk_device *pdev, if (unlikely(verbose_request)) printk(KERN_DEBUG DRV_NAME ": %s: enable MSI\n", pci_name(dev)); - status = pci_enable_msi(dev); + if (dev->msi_enabled) + status = -EALREADY; + else if (dev->msix_enabled) + status = -ENXIO; + else + status = pci_enable_msi(dev); if (status) { pr_warn_ratelimited(DRV_NAME ": %s: error enabling MSI for guest %u: err %d\n", -- 1.9.1
[PATCH 3.4 116/125] sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
From: Karl Heiss 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 635682a14427d241bab7bbdeebb48a7d7b91638e upstream. A case can occur when sctp_accept() is called by the user during a heartbeat timeout event after the 4-way handshake. Since sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the bh_sock_lock in sctp_generate_heartbeat_event() will be taken with the listening socket but released with the new association socket. The result is a deadlock on any future attempts to take the listening socket lock. Note that this race can occur with other SCTP timeouts that take the bh_lock_sock() in the event sctp_accept() is called. BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0] ... RIP: 0010:[] [] _spin_lock+0x1e/0x30 RSP: 0018:880028323b20 EFLAGS: 0206 RAX: 0002 RBX: 880028323b20 RCX: RDX: RSI: 880028323be0 RDI: 8804632c4b48 RBP: 8100bb93 R08: R09: R10: 880610662280 R11: 0100 R12: 880028323aa0 R13: 8804383c3880 R14: 880028323a90 R15: 81534225 FS: () GS:88002832() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 006df528 CR3: 01a85000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880616b7, task 880616b6cab0) Stack: 880028323c40 a01c2582 880614cfb020 0100 0014383a6c44 8804383c3880 880614e93c00 880614e93c00 8804632c4b00 8804383c38b8 Call Trace: [] ? sctp_rcv+0x492/0xa10 [sctp] [] ? nf_iterate+0x69/0xb0 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? nf_hook_slow+0x76/0x120 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? ip_local_deliver_finish+0xdd/0x2d0 [] ? ip_local_deliver+0x98/0xa0 [] ? ip_rcv_finish+0x12d/0x440 [] ? ip_rcv+0x275/0x350 [] ? __netif_receive_skb+0x4ab/0x750 ... With lockdep debugging: = [ BUG: bad unlock balance detected! ] - CslRx/12087 is trying to release lock (slock-AF_INET) at: [] sctp_generate_timeout_event+0x40/0xe0 [sctp] but there are no more locks to release! other info that might help us debug this: 2 locks held by CslRx/12087: #0: (>timers[i]){+.-...}, at: [] run_timer_softirq+0x16f/0x3e0 #1: (slock-AF_INET){+.-...}, at: [] sctp_generate_timeout_event+0x23/0xe0 [sctp] Ensure the socket taken is also the same one that is released by saving a copy of the socket before entering the timeout event critical section. Signed-off-by: Karl Heiss Signed-off-by: David S. Miller [bwh: Backported to 3.2: - Net namespaces are not used - Keep using sctp_bh_{,un}lock_sock() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li --- net/sctp/sm_sideeffect.c | 34 +++--- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 5fa033a..06c75b1 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -249,11 +249,12 @@ void sctp_generate_t3_rtx_event(unsigned long peer) int error; struct sctp_transport *transport = (struct sctp_transport *) peer; struct sctp_association *asoc = transport->asoc; + struct sock *sk = asoc->base.sk; /* Check whether a task is in the sock. */ - sctp_bh_lock_sock(asoc->base.sk); - if (sock_owned_by_user(asoc->base.sk)) { + sctp_bh_lock_sock(sk); + if (sock_owned_by_user(sk)) { SCTP_DEBUG_PRINTK("%s:Sock is busy.\n", __func__); /* Try again later. */ @@ -276,10 +277,10 @@ void sctp_generate_t3_rtx_event(unsigned long peer) transport, GFP_ATOMIC); if (error) - asoc->base.sk->sk_err = -error; + sk->sk_err = -error; out_unlock: - sctp_bh_unlock_sock(asoc->base.sk); + sctp_bh_unlock_sock(sk); sctp_transport_put(transport); } @@ -289,10 +290,11 @@ out_unlock: static void sctp_generate_timeout_event(struct sctp_association *asoc, sctp_event_timeout_t timeout_type) { + struct sock *sk = asoc->base.sk; int error = 0; - sctp_bh_lock_sock(asoc->base.sk); - if (sock_owned_by_user(asoc->base.sk)) { + sctp_bh_lock_sock(sk); + if (sock_owned_by_user(sk)) { SCTP_DEBUG_PRINTK("%s:Sock is busy: timer %d\n", __func__, timeout_type); @@ -316,10 +318,10 @@ static void sctp_generate_timeout_event(struct sctp_association *asoc, (void
[PATCH 3.4 118/125] ipv6: don't call fib6_run_gc() until routing is ready
From: Michal Kubeček 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 2c861cc65ef4604011a0082e4dcdba2819aa191a upstream. When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- include/net/ndisc.h | 2 ++ net/ipv6/af_inet6.c | 6 ++ net/ipv6/ndisc.c| 18 +++--- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/net/ndisc.h b/include/net/ndisc.h index 6f9c25a..cd205e9 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -117,7 +117,9 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct neigh_table *tbl, str } extern int ndisc_init(void); +extern int ndisc_late_init(void); +extern voidndisc_late_cleanup(void); extern voidndisc_cleanup(void); extern int ndisc_rcv(struct sk_buff *skb); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 5300ef3..8ddb56f 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -1161,6 +1161,9 @@ static int __init inet6_init(void) err = ip6_route_init(); if (err) goto ip6_route_fail; + err = ndisc_late_init(); + if (err) + goto ndisc_late_fail; err = ip6_flowlabel_init(); if (err) goto ip6_flowlabel_fail; @@ -1221,6 +1224,8 @@ ipv6_exthdrs_fail: addrconf_fail: ip6_flowlabel_cleanup(); ip6_flowlabel_fail: + ndisc_late_cleanup(); +ndisc_late_fail: ip6_route_cleanup(); ip6_route_fail: #ifdef CONFIG_PROC_FS @@ -1288,6 +1293,7 @@ static void __exit inet6_exit(void) ipv6_exthdrs_exit(); addrconf_cleanup(); ip6_flowlabel_cleanup(); + ndisc_late_cleanup(); ip6_route_cleanup(); #ifdef CONFIG_PROC_FS diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index e235b4c..02e6568 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1867,24 +1867,28 @@ int __init ndisc_init(void) if (err) goto out_unregister_pernet; #endif - err = register_netdevice_notifier(_netdev_notifier); - if (err) - goto out_unregister_sysctl; out: return err; -out_unregister_sysctl: #ifdef CONFIG_SYSCTL - neigh_sysctl_unregister(_tbl.parms); out_unregister_pernet: -#endif unregister_pernet_subsys(_net_ops); goto out; +#endif } -void ndisc_cleanup(void) +int __init ndisc_late_init(void) +{ + return register_netdevice_notifier(_netdev_notifier); +} + +void ndisc_late_cleanup(void) { unregister_netdevice_notifier(_netdev_notifier); +} + +void ndisc_cleanup(void) +{ #ifdef CONFIG_SYSCTL neigh_sysctl_unregister(_tbl.parms); #endif -- 1.9.1
[PATCH 3.4 111/125] af_unix: fix a fatal race with bit fields
From: Eric Dumazet 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream. Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Reported-by: Ambrose Feinstein Signed-off-by: Eric Dumazet Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Signed-off-by: David S. Miller Cc: hejianet Signed-off-by: Zefan Li --- include/net/af_unix.h | 5 +++-- net/unix/garbage.c| 12 ++-- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index d29a576..f3cbf1c 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -56,9 +56,10 @@ struct unix_sock { struct list_headlink; atomic_long_t inflight; spinlock_t lock; - unsigned intgc_candidate : 1; - unsigned intgc_maybe_cycle : 1; unsigned char recursion_level; + unsigned long gc_flags; +#define UNIX_GC_CANDIDATE 0 +#define UNIX_GC_MAYBE_CYCLE1 struct socket_wqpeer_wq; wait_queue_tpeer_wake; }; diff --git a/net/unix/garbage.c b/net/unix/garbage.c index b6f4b99..00d3e56 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), * have been added to the queues after * starting the garbage collection */ - if (u->gc_candidate) { + if (test_bit(UNIX_GC_CANDIDATE, >gc_flags)) { hit = true; func(u); } @@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u) * of the list, so that it's checked even if it was already * passed over */ - if (u->gc_maybe_cycle) + if (test_bit(UNIX_GC_MAYBE_CYCLE, >gc_flags)) list_move_tail(>link, _candidates); } @@ -315,8 +315,8 @@ void unix_gc(void) BUG_ON(total_refs < inflight_refs); if (total_refs == inflight_refs) { list_move_tail(>link, _candidates); - u->gc_candidate = 1; - u->gc_maybe_cycle = 1; + __set_bit(UNIX_GC_CANDIDATE, >gc_flags); + __set_bit(UNIX_GC_MAYBE_CYCLE, >gc_flags); } } @@ -344,7 +344,7 @@ void unix_gc(void) if (atomic_long_read(>inflight) > 0) { list_move_tail(>link, _cycle_list); - u->gc_maybe_cycle = 0; + __clear_bit(UNIX_GC_MAYBE_CYCLE, >gc_flags); scan_children(>sk, inc_inflight_move_tail, NULL); } } @@ -356,7 +356,7 @@ void unix_gc(void) */ while (!list_empty(_cycle_list)) { u = list_entry(not_cycle_list.next, struct unix_sock, link); - u->gc_candidate = 0; + __clear_bit(UNIX_GC_CANDIDATE, >gc_flags); list_move_tail(>link, _inflight_list); } -- 1.9.1
[PATCH 3.4 034/125] usb: musb: core: fix order of arguments to ulpi write callback
From: Uwe Kleine-König 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 705e63d2b29c8bbf091119084544d353bda70393 upstream. There is a bit of a mess in the order of arguments to the ulpi write callback. There is int ulpi_write(struct ulpi *ulpi, u8 addr, u8 val) in drivers/usb/common/ulpi.c; struct usb_phy_io_ops { ... int (*write)(struct usb_phy *x, u32 val, u32 reg); } in include/linux/usb/phy.h. The callback registered by the musb driver has to comply to the latter, but up to now had "offset" first which effectively made the function broken for correct users. So flip the order and while at it also switch to the parameter names of struct usb_phy_io_ops's write. Fixes: ffb865b1e460 ("usb: musb: add ulpi access operations") Signed-off-by: Uwe Kleine-König Signed-off-by: Felipe Balbi Signed-off-by: Zefan Li --- drivers/usb/musb/musb_core.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c index d3481c4..db25165 100644 --- a/drivers/usb/musb/musb_core.c +++ b/drivers/usb/musb/musb_core.c @@ -131,7 +131,7 @@ static inline struct musb *dev_to_musb(struct device *dev) /*-*/ #ifndef CONFIG_BLACKFIN -static int musb_ulpi_read(struct usb_phy *phy, u32 offset) +static int musb_ulpi_read(struct usb_phy *phy, u32 reg) { void __iomem *addr = phy->io_priv; int i = 0; @@ -150,7 +150,7 @@ static int musb_ulpi_read(struct usb_phy *phy, u32 offset) * ULPICarKitControlDisableUTMI after clearing POWER_SUSPENDM. */ - musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset); + musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg); musb_writeb(addr, MUSB_ULPI_REG_CONTROL, MUSB_ULPI_REG_REQ | MUSB_ULPI_RDN_WR); @@ -175,7 +175,7 @@ out: return ret; } -static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data) +static int musb_ulpi_write(struct usb_phy *phy, u32 val, u32 reg) { void __iomem *addr = phy->io_priv; int i = 0; @@ -190,8 +190,8 @@ static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data) power &= ~MUSB_POWER_SUSPENDM; musb_writeb(addr, MUSB_POWER, power); - musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset); - musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)data); + musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg); + musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)val); musb_writeb(addr, MUSB_ULPI_REG_CONTROL, MUSB_ULPI_REG_REQ); while (!(musb_readb(addr, MUSB_ULPI_REG_CONTROL) -- 1.9.1
[PATCH 3.4 051/125] USB: option: add XS Stick W100-2 from 4G Systems
From: Bjørn Mork3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 638148e20c7f8f6e95017fdc13bce8549a6925e0 upstream. Thomas reports " 4gsystems sells two total different LTE-surfsticks under the same name. .. The newer version of XS Stick W100 is from "omega" .. Under windows the driver switches to the same ID, and uses MI03\6 for network and MI01\6 for modem. .. echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1c9e ProdID=9b01 Rev=02.32 S: Manufacturer=USB Modem S: Product=USB Modem S: SerialNumber= C: #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage Now all important things are there: wwp0s29f7u2i3 (net), ttyUSB2 (at), cdc-wdm0 (qmi), ttyUSB1 (at) There is also ttyUSB0, but it is not usable, at least not for at. The device works well with qmi and ModemManager-NetworkManager. " Reported-by: Thomas Schäfer Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li --- drivers/usb/serial/option.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index d5febd4..1852ca6 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -352,6 +352,7 @@ static void option_instat_callback(struct urb *urb); /* This is the 4G XS Stick W14 a.k.a. Mobilcom Debitel Surf-Stick * * It seems to contain a Qualcomm QSC6240/6290 chipset*/ #define FOUR_G_SYSTEMS_PRODUCT_W14 0x9603 +#define FOUR_G_SYSTEMS_PRODUCT_W1000x9b01 /* iBall 3.5G connect wireless modem */ #define IBALL_3_5G_CONNECT 0x9605 @@ -525,6 +526,11 @@ static const struct option_blacklist_info four_g_w14_blacklist = { .sendsetup = BIT(0) | BIT(1), }; +static const struct option_blacklist_info four_g_w100_blacklist = { + .sendsetup = BIT(1) | BIT(2), + .reserved = BIT(3), +}; + static const struct option_blacklist_info alcatel_x200_blacklist = { .sendsetup = BIT(0) | BIT(1), .reserved = BIT(4), @@ -1621,6 +1627,9 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W14), .driver_info = (kernel_ulong_t)_g_w14_blacklist }, + { USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W100), + .driver_info = (kernel_ulong_t)_g_w100_blacklist + }, { USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, SPEEDUP_PRODUCT_SU9800, 0xff) }, { USB_DEVICE(LONGCHEER_VENDOR_ID, ZOOM_PRODUCT_4597) }, { USB_DEVICE(LONGCHEER_VENDOR_ID, IBALL_3_5G_CONNECT) }, -- 1.9.1
[PATCH 3.4 120/125] Fix incomplete backport of commit 423f04d63cf4
From: Zefan Li3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- Signed-off-by: Zefan Li --- drivers/md/raid1.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a548eed..a4d994f 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1272,11 +1272,8 @@ static void error(struct mddev *mddev, struct md_rdev *rdev) set_bit(Blocked, >flags); spin_lock_irqsave(>device_lock, flags); if (test_and_clear_bit(In_sync, >flags)) { - unsigned long flags; - spin_lock_irqsave(>device_lock, flags); mddev->degraded++; set_bit(Faulty, >flags); - spin_unlock_irqrestore(>device_lock, flags); /* * if recovery is running, make sure it aborts. */ -- 1.9.1
[PATCH 3.4 123/125] Revert "USB: Add OTG PET device to TPL"
From: Zefan Li3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- This reverts commit 97fa724b23c3dd22e9c0979ad0e9d260cc6d545d. Conflicts: drivers/usb/core/quirks.c Signed-off-by: Zefan Li --- drivers/usb/core/otg_whitelist.h | 5 - drivers/usb/core/quirks.c| 4 2 files changed, 9 deletions(-) diff --git a/drivers/usb/core/otg_whitelist.h b/drivers/usb/core/otg_whitelist.h index 2753cec..e8cdce5 100644 --- a/drivers/usb/core/otg_whitelist.h +++ b/drivers/usb/core/otg_whitelist.h @@ -59,11 +59,6 @@ static int is_targeted(struct usb_device *dev) le16_to_cpu(dev->descriptor.idProduct) == 0xbadd)) return 0; - /* OTG PET device is always targeted (see OTG 2.0 ECN 6.4.2) */ - if ((le16_to_cpu(dev->descriptor.idVendor) == 0x1a0a && -le16_to_cpu(dev->descriptor.idProduct) == 0x0200)) - return 1; - /* NOTE: can't use usb_match_id() since interface caches * aren't set up yet. this is cut/paste from that code. */ diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 32e08dc..90f04a8 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -184,10 +184,6 @@ static const struct usb_device_id usb_interface_quirk_list[] = { { USB_VENDOR_AND_INTERFACE_INFO(0x046d, USB_CLASS_VIDEO, 1, 0), .driver_info = USB_QUIRK_RESET_RESUME }, - /* Protocol and OTG Electrical Test Device */ - { USB_DEVICE(0x1a0a, 0x0200), .driver_info = - USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL }, - { } /* terminating entry must be last */ }; -- 1.9.1
[PATCH 3.4 040/125] ip6mr: call del_timer_sync() in ip6mr_free_table()
From: WANG Cong3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 7ba0c47c34a1ea5bc7a24ca67309996cce0569b5 upstream. We need to wait for the flying timers, since we are going to free the mrtable right after it. Cc: Hannes Frederic Sowa Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/ip6mr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index abe1f76..84cf871 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -333,7 +333,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, u32 id) static void ip6mr_free_table(struct mr6_table *mrt) { - del_timer(>ipmr_expire_timer); + del_timer_sync(>ipmr_expire_timer); mroute_clean_tables(mrt); kfree(mrt); } -- 1.9.1
[PATCH 3.4 096/125] xen/pciback: Save xen_pci_op commands before processing it
From: Konrad Rzeszutek Wilk3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 upstream. Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Jan Beulich Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- drivers/xen/xen-pciback/pciback.h | 1 + drivers/xen/xen-pciback/pciback_ops.c | 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h index a7def01..7a642e3 100644 --- a/drivers/xen/xen-pciback/pciback.h +++ b/drivers/xen/xen-pciback/pciback.h @@ -37,6 +37,7 @@ struct xen_pcibk_device { struct xen_pci_sharedinfo *sh_info; unsigned long flags; struct work_struct op_work; + struct xen_pci_op op; }; struct xen_pcibk_dev_data { diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index d52703c..a751a66 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -297,9 +297,11 @@ void xen_pcibk_do_op(struct work_struct *data) container_of(data, struct xen_pcibk_device, op_work); struct pci_dev *dev; struct xen_pcibk_dev_data *dev_data = NULL; - struct xen_pci_op *op = >sh_info->op; + struct xen_pci_op *op = >op; int test_intx = 0; + *op = pdev->sh_info->op; + barrier(); dev = xen_pcibk_get_pci_dev(pdev, op->domain, op->bus, op->devfn); if (dev == NULL) @@ -341,6 +343,17 @@ void xen_pcibk_do_op(struct work_struct *data) if ((dev_data->enable_intx != test_intx)) xen_pcibk_control_isr(dev, 0 /* no reset */); } + pdev->sh_info->op.err = op->err; + pdev->sh_info->op.value = op->value; +#ifdef CONFIG_PCI_MSI + if (op->cmd == XEN_PCI_OP_enable_msix && op->err == 0) { + unsigned int i; + + for (i = 0; i < op->value; i++) + pdev->sh_info->op.msix_entries[i].vector = + op->msix_entries[i].vector; + } +#endif /* Tell the driver domain that we're done. */ wmb(); clear_bit(_XEN_PCIF_active, (unsigned long *)>sh_info->flags); -- 1.9.1
[PATCH 3.4 058/125] sched/core: Clear the root_domain cpumasks in init_rootdomain()
From: Xunlei Pang <xlp...@redhat.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 8295c69925ad53ec32ca54ac9fc194ff21bc40e2 upstream. root_domain::rto_mask allocated through alloc_cpumask_var() contains garbage data, this may cause problems. For instance, When doing pull_rt_task(), it may do useless iterations if rto_mask retains some extra garbage bits. Worse still, this violates the isolated domain rule for clustered scheduling using cpuset, because the tasks(with all the cpus allowed) belongs to one root domain can be pulled away into another root domain. The patch cleans the garbage by using zalloc_cpumask_var() instead of alloc_cpumask_var() for root_domain::rto_mask allocation, thereby addressing the issues. Do the same thing for root_domain's other cpumask memembers: dlo_mask, span, and online. Signed-off-by: Xunlei Pang <xlp...@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Mike Galbraith <efa...@gmx.de> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Steven Rostedt <rost...@goodmis.org> Cc: Thomas Gleixner <t...@linutronix.de> Link: http://lkml.kernel.org/r/1449057179-29321-1-git-send-email-xlp...@redhat.com Signed-off-by: Ingo Molnar <mi...@kernel.org> [lizf: there's no rd->dlo_mask, so remove the change to it] Signed-off-by: Zefan Li <lize...@huawei.com> --- kernel/sched/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 609a226..e29d800 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5931,11 +5931,11 @@ static int init_rootdomain(struct root_domain *rd) { memset(rd, 0, sizeof(*rd)); - if (!alloc_cpumask_var(>span, GFP_KERNEL)) + if (!zalloc_cpumask_var(>span, GFP_KERNEL)) goto out; - if (!alloc_cpumask_var(>online, GFP_KERNEL)) + if (!zalloc_cpumask_var(>online, GFP_KERNEL)) goto free_span; - if (!alloc_cpumask_var(>rto_mask, GFP_KERNEL)) + if (!zalloc_cpumask_var(>rto_mask, GFP_KERNEL)) goto free_online; if (cpupri_init(>cpupri) != 0) -- 1.9.1
[PATCH 3.4 096/125] xen/pciback: Save xen_pci_op commands before processing it
From: Konrad Rzeszutek Wilk 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 upstream. Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Jan Beulich Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- drivers/xen/xen-pciback/pciback.h | 1 + drivers/xen/xen-pciback/pciback_ops.c | 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h index a7def01..7a642e3 100644 --- a/drivers/xen/xen-pciback/pciback.h +++ b/drivers/xen/xen-pciback/pciback.h @@ -37,6 +37,7 @@ struct xen_pcibk_device { struct xen_pci_sharedinfo *sh_info; unsigned long flags; struct work_struct op_work; + struct xen_pci_op op; }; struct xen_pcibk_dev_data { diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index d52703c..a751a66 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -297,9 +297,11 @@ void xen_pcibk_do_op(struct work_struct *data) container_of(data, struct xen_pcibk_device, op_work); struct pci_dev *dev; struct xen_pcibk_dev_data *dev_data = NULL; - struct xen_pci_op *op = >sh_info->op; + struct xen_pci_op *op = >op; int test_intx = 0; + *op = pdev->sh_info->op; + barrier(); dev = xen_pcibk_get_pci_dev(pdev, op->domain, op->bus, op->devfn); if (dev == NULL) @@ -341,6 +343,17 @@ void xen_pcibk_do_op(struct work_struct *data) if ((dev_data->enable_intx != test_intx)) xen_pcibk_control_isr(dev, 0 /* no reset */); } + pdev->sh_info->op.err = op->err; + pdev->sh_info->op.value = op->value; +#ifdef CONFIG_PCI_MSI + if (op->cmd == XEN_PCI_OP_enable_msix && op->err == 0) { + unsigned int i; + + for (i = 0; i < op->value; i++) + pdev->sh_info->op.msix_entries[i].vector = + op->msix_entries[i].vector; + } +#endif /* Tell the driver domain that we're done. */ wmb(); clear_bit(_XEN_PCIF_active, (unsigned long *)>sh_info->flags); -- 1.9.1
[PATCH 3.4 058/125] sched/core: Clear the root_domain cpumasks in init_rootdomain()
From: Xunlei Pang 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 8295c69925ad53ec32ca54ac9fc194ff21bc40e2 upstream. root_domain::rto_mask allocated through alloc_cpumask_var() contains garbage data, this may cause problems. For instance, When doing pull_rt_task(), it may do useless iterations if rto_mask retains some extra garbage bits. Worse still, this violates the isolated domain rule for clustered scheduling using cpuset, because the tasks(with all the cpus allowed) belongs to one root domain can be pulled away into another root domain. The patch cleans the garbage by using zalloc_cpumask_var() instead of alloc_cpumask_var() for root_domain::rto_mask allocation, thereby addressing the issues. Do the same thing for root_domain's other cpumask memembers: dlo_mask, span, and online. Signed-off-by: Xunlei Pang Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1449057179-29321-1-git-send-email-xlp...@redhat.com Signed-off-by: Ingo Molnar [lizf: there's no rd->dlo_mask, so remove the change to it] Signed-off-by: Zefan Li --- kernel/sched/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 609a226..e29d800 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5931,11 +5931,11 @@ static int init_rootdomain(struct root_domain *rd) { memset(rd, 0, sizeof(*rd)); - if (!alloc_cpumask_var(>span, GFP_KERNEL)) + if (!zalloc_cpumask_var(>span, GFP_KERNEL)) goto out; - if (!alloc_cpumask_var(>online, GFP_KERNEL)) + if (!zalloc_cpumask_var(>online, GFP_KERNEL)) goto free_span; - if (!alloc_cpumask_var(>rto_mask, GFP_KERNEL)) + if (!zalloc_cpumask_var(>rto_mask, GFP_KERNEL)) goto free_online; if (cpupri_init(>cpupri) != 0) -- 1.9.1
[PATCH 3.4 123/125] Revert "USB: Add OTG PET device to TPL"
From: Zefan Li 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- This reverts commit 97fa724b23c3dd22e9c0979ad0e9d260cc6d545d. Conflicts: drivers/usb/core/quirks.c Signed-off-by: Zefan Li --- drivers/usb/core/otg_whitelist.h | 5 - drivers/usb/core/quirks.c| 4 2 files changed, 9 deletions(-) diff --git a/drivers/usb/core/otg_whitelist.h b/drivers/usb/core/otg_whitelist.h index 2753cec..e8cdce5 100644 --- a/drivers/usb/core/otg_whitelist.h +++ b/drivers/usb/core/otg_whitelist.h @@ -59,11 +59,6 @@ static int is_targeted(struct usb_device *dev) le16_to_cpu(dev->descriptor.idProduct) == 0xbadd)) return 0; - /* OTG PET device is always targeted (see OTG 2.0 ECN 6.4.2) */ - if ((le16_to_cpu(dev->descriptor.idVendor) == 0x1a0a && -le16_to_cpu(dev->descriptor.idProduct) == 0x0200)) - return 1; - /* NOTE: can't use usb_match_id() since interface caches * aren't set up yet. this is cut/paste from that code. */ diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 32e08dc..90f04a8 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -184,10 +184,6 @@ static const struct usb_device_id usb_interface_quirk_list[] = { { USB_VENDOR_AND_INTERFACE_INFO(0x046d, USB_CLASS_VIDEO, 1, 0), .driver_info = USB_QUIRK_RESET_RESUME }, - /* Protocol and OTG Electrical Test Device */ - { USB_DEVICE(0x1a0a, 0x0200), .driver_info = - USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL }, - { } /* terminating entry must be last */ }; -- 1.9.1
[PATCH 3.4 040/125] ip6mr: call del_timer_sync() in ip6mr_free_table()
From: WANG Cong 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 7ba0c47c34a1ea5bc7a24ca67309996cce0569b5 upstream. We need to wait for the flying timers, since we are going to free the mrtable right after it. Cc: Hannes Frederic Sowa Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/ip6mr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index abe1f76..84cf871 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -333,7 +333,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, u32 id) static void ip6mr_free_table(struct mr6_table *mrt) { - del_timer(>ipmr_expire_timer); + del_timer_sync(>ipmr_expire_timer); mroute_clean_tables(mrt); kfree(mrt); } -- 1.9.1
[PATCH 3.4 051/125] USB: option: add XS Stick W100-2 from 4G Systems
From: Bjørn Mork 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 638148e20c7f8f6e95017fdc13bce8549a6925e0 upstream. Thomas reports " 4gsystems sells two total different LTE-surfsticks under the same name. .. The newer version of XS Stick W100 is from "omega" .. Under windows the driver switches to the same ID, and uses MI03\6 for network and MI01\6 for modem. .. echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1c9e ProdID=9b01 Rev=02.32 S: Manufacturer=USB Modem S: Product=USB Modem S: SerialNumber= C: #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage Now all important things are there: wwp0s29f7u2i3 (net), ttyUSB2 (at), cdc-wdm0 (qmi), ttyUSB1 (at) There is also ttyUSB0, but it is not usable, at least not for at. The device works well with qmi and ModemManager-NetworkManager. " Reported-by: Thomas Schäfer Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li --- drivers/usb/serial/option.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index d5febd4..1852ca6 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -352,6 +352,7 @@ static void option_instat_callback(struct urb *urb); /* This is the 4G XS Stick W14 a.k.a. Mobilcom Debitel Surf-Stick * * It seems to contain a Qualcomm QSC6240/6290 chipset*/ #define FOUR_G_SYSTEMS_PRODUCT_W14 0x9603 +#define FOUR_G_SYSTEMS_PRODUCT_W1000x9b01 /* iBall 3.5G connect wireless modem */ #define IBALL_3_5G_CONNECT 0x9605 @@ -525,6 +526,11 @@ static const struct option_blacklist_info four_g_w14_blacklist = { .sendsetup = BIT(0) | BIT(1), }; +static const struct option_blacklist_info four_g_w100_blacklist = { + .sendsetup = BIT(1) | BIT(2), + .reserved = BIT(3), +}; + static const struct option_blacklist_info alcatel_x200_blacklist = { .sendsetup = BIT(0) | BIT(1), .reserved = BIT(4), @@ -1621,6 +1627,9 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W14), .driver_info = (kernel_ulong_t)_g_w14_blacklist }, + { USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W100), + .driver_info = (kernel_ulong_t)_g_w100_blacklist + }, { USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, SPEEDUP_PRODUCT_SU9800, 0xff) }, { USB_DEVICE(LONGCHEER_VENDOR_ID, ZOOM_PRODUCT_4597) }, { USB_DEVICE(LONGCHEER_VENDOR_ID, IBALL_3_5G_CONNECT) }, -- 1.9.1
[PATCH 3.4 120/125] Fix incomplete backport of commit 423f04d63cf4
From: Zefan Li 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- Signed-off-by: Zefan Li --- drivers/md/raid1.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a548eed..a4d994f 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1272,11 +1272,8 @@ static void error(struct mddev *mddev, struct md_rdev *rdev) set_bit(Blocked, >flags); spin_lock_irqsave(>device_lock, flags); if (test_and_clear_bit(In_sync, >flags)) { - unsigned long flags; - spin_lock_irqsave(>device_lock, flags); mddev->degraded++; set_bit(Faulty, >flags); - spin_unlock_irqrestore(>device_lock, flags); /* * if recovery is running, make sure it aborts. */ -- 1.9.1
[PATCH 3.4 090/125] scripts: recordmcount: break hardlinks
From: Russell King3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit dd39a26538e37f6c6131e829a4a510787e43c783 upstream. recordmcount edits the file in-place, which can cause problems when using ccache in hardlink mode. Arrange for recordmcount to break a hardlinked object. Link: http://lkml.kernel.org/r/e1a7mvt-et...@rmk-pc.arm.linux.org.uk Signed-off-by: Russell King Signed-off-by: Steven Rostedt Signed-off-by: Zefan Li --- scripts/recordmcount.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index ee52cb8..4eb047a 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -182,6 +182,20 @@ static void *mmap_file(char const *fname) addr = umalloc(sb.st_size); uread(fd_map, addr, sb.st_size); } + if (sb.st_nlink != 1) { + /* file is hard-linked, break the hard link */ + close(fd_map); + if (unlink(fname) < 0) { + perror(fname); + fail_file(); + } + fd_map = open(fname, O_RDWR | O_CREAT, sb.st_mode); + if (fd_map < 0) { + perror(fname); + fail_file(); + } + uwrite(fd_map, addr, sb.st_size); + } return addr; } -- 1.9.1
[PATCH 3.4 113/125] net: Fix skb csum races when peeking
From: Herbert Xu3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- [ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ] When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li --- net/core/datagram.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/datagram.c b/net/core/datagram.c index ba96ad9..bc412ca 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -695,7 +695,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len) if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) netdev_rx_csum_fault(skb->dev); - skb->ip_summed = CHECKSUM_UNNECESSARY; + if (!skb_shared(skb)) + skb->ip_summed = CHECKSUM_UNNECESSARY; } return sum; } -- 1.9.1
[PATCH 3.4 090/125] scripts: recordmcount: break hardlinks
From: Russell King 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit dd39a26538e37f6c6131e829a4a510787e43c783 upstream. recordmcount edits the file in-place, which can cause problems when using ccache in hardlink mode. Arrange for recordmcount to break a hardlinked object. Link: http://lkml.kernel.org/r/e1a7mvt-et...@rmk-pc.arm.linux.org.uk Signed-off-by: Russell King Signed-off-by: Steven Rostedt Signed-off-by: Zefan Li --- scripts/recordmcount.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index ee52cb8..4eb047a 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -182,6 +182,20 @@ static void *mmap_file(char const *fname) addr = umalloc(sb.st_size); uread(fd_map, addr, sb.st_size); } + if (sb.st_nlink != 1) { + /* file is hard-linked, break the hard link */ + close(fd_map); + if (unlink(fname) < 0) { + perror(fname); + fail_file(); + } + fd_map = open(fname, O_RDWR | O_CREAT, sb.st_mode); + if (fd_map < 0) { + perror(fname); + fail_file(); + } + uwrite(fd_map, addr, sb.st_size); + } return addr; } -- 1.9.1
[PATCH 3.4 113/125] net: Fix skb csum races when peeking
From: Herbert Xu 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- [ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ] When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li --- net/core/datagram.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/datagram.c b/net/core/datagram.c index ba96ad9..bc412ca 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -695,7 +695,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len) if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) netdev_rx_csum_fault(skb->dev); - skb->ip_summed = CHECKSUM_UNNECESSARY; + if (!skb_shared(skb)) + skb->ip_summed = CHECKSUM_UNNECESSARY; } return sum; } -- 1.9.1
[PATCH 3.4 055/125] vfs: Avoid softlockups with sendfile(2)
From: Jan Kara3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit c2489e07c0a71a56fb2c84bc0ee66cddfca7d068 upstream. The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov Signed-off-by: Jan Kara Signed-off-by: Al Viro Signed-off-by: Zefan Li --- fs/splice.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/splice.c b/fs/splice.c index 4e2309e..8b97331 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -935,6 +935,7 @@ ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd, splice_from_pipe_begin(sd); do { + cond_resched(); ret = splice_from_pipe_next(pipe, sd); if (ret > 0) ret = splice_from_pipe_feed(pipe, sd, actor); -- 1.9.1
[PATCH 3.4 041/125] net: ip6mr: fix static mfc/dev leaks on table destruction
From: Nikolay Aleksandrov3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 upstream. Similar to ipv4, when destroying an mrt table the static mfc entries and the static devices are kept, which leads to devices that can never be destroyed (because of refcnt taken) and leaked memory. Make sure that everything is cleaned up on netns destruction. Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code") CC: Benjamin Thery Signed-off-by: Nikolay Aleksandrov Reviewed-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/ip6mr.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 84cf871..c5fa9df 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -117,7 +117,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb, struct mfc6_cache *c, struct rtmsg *rtm); static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb); -static void mroute_clean_tables(struct mr6_table *mrt); +static void mroute_clean_tables(struct mr6_table *mrt, bool all); static void ipmr_expire_process(unsigned long arg); #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES @@ -334,7 +334,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, u32 id) static void ip6mr_free_table(struct mr6_table *mrt) { del_timer_sync(>ipmr_expire_timer); - mroute_clean_tables(mrt); + mroute_clean_tables(mrt, true); kfree(mrt); } @@ -1472,7 +1472,7 @@ static int ip6mr_mfc_add(struct net *net, struct mr6_table *mrt, * Close the multicast socket, and clear the vif tables etc */ -static void mroute_clean_tables(struct mr6_table *mrt) +static void mroute_clean_tables(struct mr6_table *mrt, bool all) { int i; LIST_HEAD(list); @@ -1482,8 +1482,9 @@ static void mroute_clean_tables(struct mr6_table *mrt) * Shut down all active vif entries */ for (i = 0; i < mrt->maxvif; i++) { - if (!(mrt->vif6_table[i].flags & VIFF_STATIC)) - mif6_delete(mrt, i, ); + if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC)) + continue; + mif6_delete(mrt, i, ); } unregister_netdevice_many(); @@ -1492,7 +1493,7 @@ static void mroute_clean_tables(struct mr6_table *mrt) */ for (i = 0; i < MFC6_LINES; i++) { list_for_each_entry_safe(c, next, >mfc6_cache_array[i], list) { - if (c->mfc_flags & MFC_STATIC) + if (!all && (c->mfc_flags & MFC_STATIC)) continue; write_lock_bh(_lock); list_del(>list); @@ -1546,7 +1547,7 @@ int ip6mr_sk_done(struct sock *sk) net->ipv6.devconf_all->mc_forwarding--; write_unlock_bh(_lock); - mroute_clean_tables(mrt); + mroute_clean_tables(mrt, false); err = 0; break; } -- 1.9.1
[PATCH 3.4 036/125] mac80211: mesh: fix call_rcu() usage
From: Johannes Berg3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit c2e703a55245bfff3db53b1f7cbe59f1ee8a4339 upstream. When using call_rcu(), the called function may be delayed quite significantly, and without a matching rcu_barrier() there's no way to be sure it has finished. Therefore, global state that could be gone/freed/reused should never be touched in the callback. Fix this in mesh by moving the atomic_dec() into the caller; that's not really a problem since we already unlinked the path and it will be destroyed anyway. This fixes a crash Jouni observed when running certain tests in a certain order, in which the mesh interface was torn down, the memory reused for a function pointer (work struct) and running that then crashed since the pointer had been decremented by 1, resulting in an invalid instruction byte stream. Fixes: eb2b9311fd00 ("mac80211: mesh path table implementation") Reported-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Zefan Li --- net/mac80211/mesh_pathtbl.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c index 49aaefd..7ed81ee 100644 --- a/net/mac80211/mesh_pathtbl.c +++ b/net/mac80211/mesh_pathtbl.c @@ -757,10 +757,8 @@ void mesh_plink_broken(struct sta_info *sta) static void mesh_path_node_reclaim(struct rcu_head *rp) { struct mpath_node *node = container_of(rp, struct mpath_node, rcu); - struct ieee80211_sub_if_data *sdata = node->mpath->sdata; del_timer_sync(>mpath->timer); - atomic_dec(>u.mesh.mpaths); kfree(node->mpath); kfree(node); } @@ -768,8 +766,9 @@ static void mesh_path_node_reclaim(struct rcu_head *rp) /* needs to be called with the corresponding hashwlock taken */ static void __mesh_path_del(struct mesh_table *tbl, struct mpath_node *node) { - struct mesh_path *mpath; - mpath = node->mpath; + struct mesh_path *mpath = node->mpath; + struct ieee80211_sub_if_data *sdata = node->mpath->sdata; + spin_lock(>state_lock); mpath->flags |= MESH_PATH_RESOLVING; if (mpath->is_gate) @@ -777,6 +776,7 @@ static void __mesh_path_del(struct mesh_table *tbl, struct mpath_node *node) hlist_del_rcu(>list); call_rcu(>rcu, mesh_path_node_reclaim); spin_unlock(>state_lock); + atomic_dec(>u.mesh.mpaths); atomic_dec(>entries); } -- 1.9.1
[PATCH 3.4 055/125] vfs: Avoid softlockups with sendfile(2)
From: Jan Kara 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit c2489e07c0a71a56fb2c84bc0ee66cddfca7d068 upstream. The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov Signed-off-by: Jan Kara Signed-off-by: Al Viro Signed-off-by: Zefan Li --- fs/splice.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/splice.c b/fs/splice.c index 4e2309e..8b97331 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -935,6 +935,7 @@ ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd, splice_from_pipe_begin(sd); do { + cond_resched(); ret = splice_from_pipe_next(pipe, sd); if (ret > 0) ret = splice_from_pipe_feed(pipe, sd, actor); -- 1.9.1
[PATCH 3.4 041/125] net: ip6mr: fix static mfc/dev leaks on table destruction
From: Nikolay Aleksandrov 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 upstream. Similar to ipv4, when destroying an mrt table the static mfc entries and the static devices are kept, which leads to devices that can never be destroyed (because of refcnt taken) and leaked memory. Make sure that everything is cleaned up on netns destruction. Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code") CC: Benjamin Thery Signed-off-by: Nikolay Aleksandrov Reviewed-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- net/ipv6/ip6mr.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 84cf871..c5fa9df 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -117,7 +117,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb, struct mfc6_cache *c, struct rtmsg *rtm); static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb); -static void mroute_clean_tables(struct mr6_table *mrt); +static void mroute_clean_tables(struct mr6_table *mrt, bool all); static void ipmr_expire_process(unsigned long arg); #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES @@ -334,7 +334,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, u32 id) static void ip6mr_free_table(struct mr6_table *mrt) { del_timer_sync(>ipmr_expire_timer); - mroute_clean_tables(mrt); + mroute_clean_tables(mrt, true); kfree(mrt); } @@ -1472,7 +1472,7 @@ static int ip6mr_mfc_add(struct net *net, struct mr6_table *mrt, * Close the multicast socket, and clear the vif tables etc */ -static void mroute_clean_tables(struct mr6_table *mrt) +static void mroute_clean_tables(struct mr6_table *mrt, bool all) { int i; LIST_HEAD(list); @@ -1482,8 +1482,9 @@ static void mroute_clean_tables(struct mr6_table *mrt) * Shut down all active vif entries */ for (i = 0; i < mrt->maxvif; i++) { - if (!(mrt->vif6_table[i].flags & VIFF_STATIC)) - mif6_delete(mrt, i, ); + if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC)) + continue; + mif6_delete(mrt, i, ); } unregister_netdevice_many(); @@ -1492,7 +1493,7 @@ static void mroute_clean_tables(struct mr6_table *mrt) */ for (i = 0; i < MFC6_LINES; i++) { list_for_each_entry_safe(c, next, >mfc6_cache_array[i], list) { - if (c->mfc_flags & MFC_STATIC) + if (!all && (c->mfc_flags & MFC_STATIC)) continue; write_lock_bh(_lock); list_del(>list); @@ -1546,7 +1547,7 @@ int ip6mr_sk_done(struct sock *sk) net->ipv6.devconf_all->mc_forwarding--; write_unlock_bh(_lock); - mroute_clean_tables(mrt); + mroute_clean_tables(mrt, false); err = 0; break; } -- 1.9.1
[PATCH 3.4 036/125] mac80211: mesh: fix call_rcu() usage
From: Johannes Berg 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit c2e703a55245bfff3db53b1f7cbe59f1ee8a4339 upstream. When using call_rcu(), the called function may be delayed quite significantly, and without a matching rcu_barrier() there's no way to be sure it has finished. Therefore, global state that could be gone/freed/reused should never be touched in the callback. Fix this in mesh by moving the atomic_dec() into the caller; that's not really a problem since we already unlinked the path and it will be destroyed anyway. This fixes a crash Jouni observed when running certain tests in a certain order, in which the mesh interface was torn down, the memory reused for a function pointer (work struct) and running that then crashed since the pointer had been decremented by 1, resulting in an invalid instruction byte stream. Fixes: eb2b9311fd00 ("mac80211: mesh path table implementation") Reported-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Zefan Li --- net/mac80211/mesh_pathtbl.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c index 49aaefd..7ed81ee 100644 --- a/net/mac80211/mesh_pathtbl.c +++ b/net/mac80211/mesh_pathtbl.c @@ -757,10 +757,8 @@ void mesh_plink_broken(struct sta_info *sta) static void mesh_path_node_reclaim(struct rcu_head *rp) { struct mpath_node *node = container_of(rp, struct mpath_node, rcu); - struct ieee80211_sub_if_data *sdata = node->mpath->sdata; del_timer_sync(>mpath->timer); - atomic_dec(>u.mesh.mpaths); kfree(node->mpath); kfree(node); } @@ -768,8 +766,9 @@ static void mesh_path_node_reclaim(struct rcu_head *rp) /* needs to be called with the corresponding hashwlock taken */ static void __mesh_path_del(struct mesh_table *tbl, struct mpath_node *node) { - struct mesh_path *mpath; - mpath = node->mpath; + struct mesh_path *mpath = node->mpath; + struct ieee80211_sub_if_data *sdata = node->mpath->sdata; + spin_lock(>state_lock); mpath->flags |= MESH_PATH_RESOLVING; if (mpath->is_gate) @@ -777,6 +776,7 @@ static void __mesh_path_del(struct mesh_table *tbl, struct mpath_node *node) hlist_del_rcu(>list); call_rcu(>rcu, mesh_path_node_reclaim); spin_unlock(>state_lock); + atomic_dec(>u.mesh.mpaths); atomic_dec(>entries); } -- 1.9.1
[PATCH 3.4 092/125] xen: Add RING_COPY_REQUEST()
From: David Vrabel3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 454d5d882c7e412b840e3c99010fe81a9862f6fb upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- include/xen/interface/io/ring.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h index 7d28aff..7dc685b 100644 --- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -181,6 +181,20 @@ struct __name##_back_ring { \ #define RING_GET_REQUEST(_r, _idx) \ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req)) +/* + * Get a local copy of a request. + * + * Use this in preference to RING_GET_REQUEST() so all processing is + * done on a local copy that cannot be modified by the other end. + * + * Note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 may cause this + * to be ineffective where _req is a struct which consists of only bitfields. + */ +#define RING_COPY_REQUEST(_r, _idx, _req) do { \ + /* Use volatile to force the copy into _req. */ \ + *(_req) = *(volatile typeof(_req))RING_GET_REQUEST(_r, _idx); \ +} while (0) + #define RING_GET_RESPONSE(_r, _idx)\ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp)) -- 1.9.1
[PATCH 3.4 057/125] wan/x25: Fix use-after-free in x25_asy_open_tty()
From: Peter Hurley3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit ee9159ddce14bc1dec9435ae4e3bd3153e783706 upstream. The N_X25 line discipline may access the previous line discipline's closed and already-freed private data on open [1]. The tty->disc_data field _never_ refers to valid data on entry to the line discipline's open() method. Rather, the ldisc is expected to initialize that field for its own use for the lifetime of the instance (ie. from open() to close() only). [1] [ 634.336761] == [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr 8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] = [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Reported-and-tested-by: Sasha Levin Signed-off-by: Peter Hurley Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- drivers/net/wan/x25_asy.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c index d7a65e1..dadf085 100644 --- a/drivers/net/wan/x25_asy.c +++ b/drivers/net/wan/x25_asy.c @@ -546,16 +546,12 @@ static void x25_asy_receive_buf(struct tty_struct *tty, static int x25_asy_open_tty(struct tty_struct *tty) { - struct x25_asy *sl = tty->disc_data; + struct x25_asy *sl; int err; if (tty->ops->write == NULL) return -EOPNOTSUPP; - /* First make sure we're not already connected. */ - if (sl && sl->magic == X25_ASY_MAGIC) - return -EEXIST; - /* OK. Find a free X.25 channel to use. */ sl = x25_asy_alloc(); if (sl == NULL) -- 1.9.1
[PATCH 3.4 092/125] xen: Add RING_COPY_REQUEST()
From: David Vrabel 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 454d5d882c7e412b840e3c99010fe81a9862f6fb upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li --- include/xen/interface/io/ring.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h index 7d28aff..7dc685b 100644 --- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -181,6 +181,20 @@ struct __name##_back_ring { \ #define RING_GET_REQUEST(_r, _idx) \ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req)) +/* + * Get a local copy of a request. + * + * Use this in preference to RING_GET_REQUEST() so all processing is + * done on a local copy that cannot be modified by the other end. + * + * Note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 may cause this + * to be ineffective where _req is a struct which consists of only bitfields. + */ +#define RING_COPY_REQUEST(_r, _idx, _req) do { \ + /* Use volatile to force the copy into _req. */ \ + *(_req) = *(volatile typeof(_req))RING_GET_REQUEST(_r, _idx); \ +} while (0) + #define RING_GET_RESPONSE(_r, _idx)\ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp)) -- 1.9.1
[PATCH 3.4 057/125] wan/x25: Fix use-after-free in x25_asy_open_tty()
From: Peter Hurley 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit ee9159ddce14bc1dec9435ae4e3bd3153e783706 upstream. The N_X25 line discipline may access the previous line discipline's closed and already-freed private data on open [1]. The tty->disc_data field _never_ refers to valid data on entry to the line discipline's open() method. Rather, the ldisc is expected to initialize that field for its own use for the lifetime of the instance (ie. from open() to close() only). [1] [ 634.336761] == [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr 8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] = [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Reported-and-tested-by: Sasha Levin Signed-off-by: Peter Hurley Signed-off-by: David S. Miller Signed-off-by: Zefan Li --- drivers/net/wan/x25_asy.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c index d7a65e1..dadf085 100644 --- a/drivers/net/wan/x25_asy.c +++ b/drivers/net/wan/x25_asy.c @@ -546,16 +546,12 @@ static void x25_asy_receive_buf(struct tty_struct *tty, static int x25_asy_open_tty(struct tty_struct *tty) { - struct x25_asy *sl = tty->disc_data; + struct x25_asy *sl; int err; if (tty->ops->write == NULL) return -EOPNOTSUPP; - /* First make sure we're not already connected. */ - if (sl && sl->magic == X25_ASY_MAGIC) - return -EEXIST; - /* OK. Find a free X.25 channel to use. */ sl = x25_asy_alloc(); if (sl == NULL) -- 1.9.1
[PATCH 3.4 062/125] USB: cp210x: Remove CP2110 ID from compatibility list
From: Konstantin Shkolnyy3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 7c90e610b60cd1ed6abafd806acfaedccbbe52d1 upstream. CP2110 ID (0x10c4, 0xea80) doesn't belong here because it's a HID and completely different from CP210x devices. Signed-off-by: Konstantin Shkolnyy Signed-off-by: Johan Hovold Signed-off-by: Zefan Li --- drivers/usb/serial/cp210x.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index 7a04e2c..b48444b 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -138,7 +138,6 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA61) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA70) }, /* Silicon Labs factory default */ - { USB_DEVICE(0x10C4, 0xEA80) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA71) }, /* Infinity GPS-MIC-1 Radio Monophone */ { USB_DEVICE(0x10C4, 0xF001) }, /* Elan Digital Systems USBscope50 */ { USB_DEVICE(0x10C4, 0xF002) }, /* Elan Digital Systems USBwave12 */ -- 1.9.1
[PATCH 3.4 094/125] xen-netback: use RING_COPY_REQUEST() throughout
From: David Vrabel <david.vra...@citrix.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 68a33bfd8403e4e22847165d149823a2e0e67c9c upstream. Instead of open-coding memcpy()s and directly accessing Tx and Rx requests, use the new RING_COPY_REQUEST() that ensures the local copy is correct. This is more than is strictly necessary for guest Rx requests since only the id and gref fields are used and it is harmless if the frontend modifies these. This is part of XSA155. Reviewed-by: Wei Liu <wei.l...@citrix.com> Signed-off-by: David Vrabel <david.vra...@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> [lizf: Backported to 3.4: - adjust context - s/queue/vif/g] Signed-off-by: Zefan Li <lize...@huawei.com> --- drivers/net/xen-netback/netback.c | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 25d4c31..37bcc56 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -398,17 +398,17 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif, struct netrx_pending_operations *npo) { struct netbk_rx_meta *meta; - struct xen_netif_rx_request *req; + struct xen_netif_rx_request req; - req = RING_GET_REQUEST(>rx, vif->rx.req_cons++); + RING_COPY_REQUEST(>rx, vif->rx.req_cons++, ); meta = npo->meta + npo->meta_prod++; meta->gso_size = 0; meta->size = 0; - meta->id = req->id; + meta->id = req.id; npo->copy_off = 0; - npo->copy_gref = req->gref; + npo->copy_gref = req.gref; return meta; } @@ -510,7 +510,7 @@ static int netbk_gop_skb(struct sk_buff *skb, struct xenvif *vif = netdev_priv(skb->dev); int nr_frags = skb_shinfo(skb)->nr_frags; int i; - struct xen_netif_rx_request *req; + struct xen_netif_rx_request req; struct netbk_rx_meta *meta; unsigned char *data; int head = 1; @@ -520,14 +520,14 @@ static int netbk_gop_skb(struct sk_buff *skb, /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { - req = RING_GET_REQUEST(>rx, vif->rx.req_cons++); + RING_COPY_REQUEST(>rx, vif->rx.req_cons++, ); meta = npo->meta + npo->meta_prod++; meta->gso_size = skb_shinfo(skb)->gso_size; meta->size = 0; - meta->id = req->id; + meta->id = req.id; } - req = RING_GET_REQUEST(>rx, vif->rx.req_cons++); + RING_COPY_REQUEST(>rx, vif->rx.req_cons++, ); meta = npo->meta + npo->meta_prod++; if (!vif->gso_prefix) @@ -536,9 +536,9 @@ static int netbk_gop_skb(struct sk_buff *skb, meta->gso_size = 0; meta->size = 0; - meta->id = req->id; + meta->id = req.id; npo->copy_off = 0; - npo->copy_gref = req->gref; + npo->copy_gref = req.gref; data = skb->data; while (data < skb_tail_pointer(skb)) { @@ -882,7 +882,7 @@ static void netbk_tx_err(struct xenvif *vif, make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR); if (cons == end) break; - txp = RING_GET_REQUEST(>tx, cons++); + RING_COPY_REQUEST(>tx, cons++, txp); } while (1); vif->tx.req_cons = cons; xen_netbk_check_rx_xenvif(vif); @@ -943,8 +943,7 @@ static int netbk_count_requests(struct xenvif *vif, drop_err = -E2BIG; } - memcpy(txp, RING_GET_REQUEST(>tx, cons + slots), - sizeof(*txp)); + RING_COPY_REQUEST(>tx, cons + slots, txp); /* If the guest submitted a frame >= 64 KiB then * first->size overflowed and following slots will @@ -1226,8 +1225,7 @@ static int xen_netbk_get_extras(struct xenvif *vif, return -EBADR; } - memcpy(, RING_GET_REQUEST(>tx, cons), - sizeof(extra)); + RING_COPY_REQUEST(>tx, cons, ); if (unlikely(!extra.type || extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) { vif->tx.req_cons = ++cons; @@ -1422,7 +1420,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) idx = vif->tx.req_cons; rmb(); /* Ensure that we see the request before we copy it. */ - memcpy(, RING_GET_REQUEST(>tx, idx), sizeof(txreq)); + RING_COPY_REQUEST(>tx, idx, ); /* Credit-based scheduling. */ if (txreq.size > vif->remaining_credit && -- 1.9.1
[PATCH 3.4 062/125] USB: cp210x: Remove CP2110 ID from compatibility list
From: Konstantin Shkolnyy 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 7c90e610b60cd1ed6abafd806acfaedccbbe52d1 upstream. CP2110 ID (0x10c4, 0xea80) doesn't belong here because it's a HID and completely different from CP210x devices. Signed-off-by: Konstantin Shkolnyy Signed-off-by: Johan Hovold Signed-off-by: Zefan Li --- drivers/usb/serial/cp210x.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index 7a04e2c..b48444b 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -138,7 +138,6 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA61) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA70) }, /* Silicon Labs factory default */ - { USB_DEVICE(0x10C4, 0xEA80) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA71) }, /* Infinity GPS-MIC-1 Radio Monophone */ { USB_DEVICE(0x10C4, 0xF001) }, /* Elan Digital Systems USBscope50 */ { USB_DEVICE(0x10C4, 0xF002) }, /* Elan Digital Systems USBwave12 */ -- 1.9.1
[PATCH 3.4 094/125] xen-netback: use RING_COPY_REQUEST() throughout
From: David Vrabel 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 68a33bfd8403e4e22847165d149823a2e0e67c9c upstream. Instead of open-coding memcpy()s and directly accessing Tx and Rx requests, use the new RING_COPY_REQUEST() that ensures the local copy is correct. This is more than is strictly necessary for guest Rx requests since only the id and gref fields are used and it is harmless if the frontend modifies these. This is part of XSA155. Reviewed-by: Wei Liu Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk [lizf: Backported to 3.4: - adjust context - s/queue/vif/g] Signed-off-by: Zefan Li --- drivers/net/xen-netback/netback.c | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 25d4c31..37bcc56 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -398,17 +398,17 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif, struct netrx_pending_operations *npo) { struct netbk_rx_meta *meta; - struct xen_netif_rx_request *req; + struct xen_netif_rx_request req; - req = RING_GET_REQUEST(>rx, vif->rx.req_cons++); + RING_COPY_REQUEST(>rx, vif->rx.req_cons++, ); meta = npo->meta + npo->meta_prod++; meta->gso_size = 0; meta->size = 0; - meta->id = req->id; + meta->id = req.id; npo->copy_off = 0; - npo->copy_gref = req->gref; + npo->copy_gref = req.gref; return meta; } @@ -510,7 +510,7 @@ static int netbk_gop_skb(struct sk_buff *skb, struct xenvif *vif = netdev_priv(skb->dev); int nr_frags = skb_shinfo(skb)->nr_frags; int i; - struct xen_netif_rx_request *req; + struct xen_netif_rx_request req; struct netbk_rx_meta *meta; unsigned char *data; int head = 1; @@ -520,14 +520,14 @@ static int netbk_gop_skb(struct sk_buff *skb, /* Set up a GSO prefix descriptor, if necessary */ if (skb_shinfo(skb)->gso_size && vif->gso_prefix) { - req = RING_GET_REQUEST(>rx, vif->rx.req_cons++); + RING_COPY_REQUEST(>rx, vif->rx.req_cons++, ); meta = npo->meta + npo->meta_prod++; meta->gso_size = skb_shinfo(skb)->gso_size; meta->size = 0; - meta->id = req->id; + meta->id = req.id; } - req = RING_GET_REQUEST(>rx, vif->rx.req_cons++); + RING_COPY_REQUEST(>rx, vif->rx.req_cons++, ); meta = npo->meta + npo->meta_prod++; if (!vif->gso_prefix) @@ -536,9 +536,9 @@ static int netbk_gop_skb(struct sk_buff *skb, meta->gso_size = 0; meta->size = 0; - meta->id = req->id; + meta->id = req.id; npo->copy_off = 0; - npo->copy_gref = req->gref; + npo->copy_gref = req.gref; data = skb->data; while (data < skb_tail_pointer(skb)) { @@ -882,7 +882,7 @@ static void netbk_tx_err(struct xenvif *vif, make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR); if (cons == end) break; - txp = RING_GET_REQUEST(>tx, cons++); + RING_COPY_REQUEST(>tx, cons++, txp); } while (1); vif->tx.req_cons = cons; xen_netbk_check_rx_xenvif(vif); @@ -943,8 +943,7 @@ static int netbk_count_requests(struct xenvif *vif, drop_err = -E2BIG; } - memcpy(txp, RING_GET_REQUEST(>tx, cons + slots), - sizeof(*txp)); + RING_COPY_REQUEST(>tx, cons + slots, txp); /* If the guest submitted a frame >= 64 KiB then * first->size overflowed and following slots will @@ -1226,8 +1225,7 @@ static int xen_netbk_get_extras(struct xenvif *vif, return -EBADR; } - memcpy(, RING_GET_REQUEST(>tx, cons), - sizeof(extra)); + RING_COPY_REQUEST(>tx, cons, ); if (unlikely(!extra.type || extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) { vif->tx.req_cons = ++cons; @@ -1422,7 +1420,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) idx = vif->tx.req_cons; rmb(); /* Ensure that we see the request before we copy it. */ - memcpy(, RING_GET_REQUEST(>tx, idx), sizeof(txreq)); + RING_COPY_REQUEST(>tx, idx, ); /* Credit-based scheduling. */ if (txreq.size > vif->remaining_credit && -- 1.9.1
[PATCH 3.4 060/125] fix sysvfs symlinks
From: Al Viro <v...@zeniv.linux.org.uk> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 0ebf7f10d67a70e120f365018f1c5fce9ddc567d upstream. The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Signed-off-by: Al Viro <v...@zeniv.linux.org.uk> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- fs/sysv/inode.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c index 3da5ce2..fcdd63c 100644 --- a/fs/sysv/inode.c +++ b/fs/sysv/inode.c @@ -176,14 +176,8 @@ void sysv_set_inode(struct inode *inode, dev_t rdev) inode->i_fop = _dir_operations; inode->i_mapping->a_ops = _aops; } else if (S_ISLNK(inode->i_mode)) { - if (inode->i_blocks) { - inode->i_op = _symlink_inode_operations; - inode->i_mapping->a_ops = _aops; - } else { - inode->i_op = _fast_symlink_inode_operations; - nd_terminate_link(SYSV_I(inode)->i_data, inode->i_size, - sizeof(SYSV_I(inode)->i_data) - 1); - } + inode->i_op = _symlink_inode_operations; + inode->i_mapping->a_ops = _aops; } else init_special_inode(inode, inode->i_mode, rdev); } -- 1.9.1
[PATCH 3.4 063/125] ext4: Fix handling of extended tv_sec
From: David Turner3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream. In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by: David Turner Signed-off-by: Theodore Ts'o Reported-by: Mark Harris Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732 Signed-off-by: Zefan Li --- fs/ext4/ext4.h | 51 --- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b9cdb6d..aedf75f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -704,19 +705,55 @@ struct move_extent { <= (EXT4_GOOD_OLD_INODE_SIZE + \ (einode)->i_extra_isize)) \ +/* + * We use an encoding that preserves the times for extra epoch "00": + * + * extra msb of adjust for signed + * epoch 32-bit 32-bit tv_sec to + * bits timedecoded 64-bit tv_sec 64-bit tv_sec valid time range + * 0 01-0x8000..-0x0001 0x0 1901-12-13..1969-12-31 + * 0 000x0..0x07fff 0x0 1970-01-01..2038-01-19 + * 0 110x08000..0x0 0x1 2038-01-19..2106-02-07 + * 0 100x1..0x17fff 0x1 2106-02-07..2174-02-25 + * 1 010x18000..0x1 0x2 2174-02-25..2242-03-16 + * 1 000x2..0x27fff 0x2 2242-03-16..2310-04-04 + * 1 110x28000..0x2 0x3 2310-04-04..2378-04-22 + * 1 100x3..0x37fff 0x3 2378-04-22..2446-05-10 + * + * Note that previous versions of the kernel on 64-bit systems would + * incorrectly use extra epoch bits 1,1 for dates between 1901 and + * 1970. e2fsck will correct this, assuming that it is run on the + * affected filesystem before 2242. + */ + static inline __le32 ext4_encode_extra_time(struct timespec *time) { - return cpu_to_le32((sizeof(time->tv_sec) > 4 ? - (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) | - ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); + u32 extra = sizeof(time->tv_sec) > 4 ? + ((time->tv_sec - (s32)time->tv_sec) >> 32) & EXT4_EPOCH_MASK : 0; + return cpu_to_le32(extra | (time->tv_nsec << EXT4_EPOCH_BITS)); } static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) { - if (sizeof(time->tv_sec) > 4) - time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) - << 32; - time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; + if (unlikely(sizeof(time->tv_sec) > 4 && + (extra & cpu_to_le32(EXT4_EPOCH_MASK { +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,20,0) + /* Handle legacy encoding of pre-1970 dates with epoch +* bits 1,1. We assume that by kernel version 4.20, +* everyone will have run fsck over the affected +* filesystems to correct the problem. (This +* backwards compatibility may be removed before this +* time, at the discretion of the ext4 developers.) +*/ + u64 extra_bits = le32_to_cpu(extra) & EXT4_EPOCH_MASK; + if (extra_bits == 3 && ((time->tv_sec) & 0x8000) != 0) + extra_bits = 0; + time->tv_sec += extra_bits << 32; +#else + time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; +#endif + } + time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; } #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ -- 1.9.1
[PATCH 3.4 028/125] x86/cpu: Call verify_cpu() after having entered long mode too
From: Borislav Petkov <b...@suse.de> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 04633df0c43d710e5f696b06539c100898678235 upstream. When we get loaded by a 64-bit bootloader, kernel entry point is startup_64 in head_64.S. We don't trust any and all bootloaders because some will fiddle with CPU configuration so we go ahead and massage each CPU into sanity again. For example, some dell BIOSes have this XD disable feature which set IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround for other OSes but Linux sure doesn't need it. A similar thing is present in the Surface 3 firmware - see https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit only on the BSP: # rdmsr -a 0x1a0 400850089 850089 850089 850089 I know, right?! There's not even an off switch in there. So fix all those cases by sanitizing the 64-bit entry point too. For that, make verify_cpu() callable in 64-bit mode also. Requested-and-debugged-by: "H. Peter Anvin" <h...@zytor.com> Reported-and-tested-by: Bastien Nocera <bugzi...@hadess.net> Signed-off-by: Borislav Petkov <b...@suse.de> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email...@alien8.de Signed-off-by: Thomas Gleixner <t...@linutronix.de> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lize...@huawei.com> --- arch/x86/kernel/head_64.S| 8 arch/x86/kernel/verify_cpu.S | 12 +++- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 40f4eb3..59d0eac 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -45,6 +45,9 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map) .globl startup_64 startup_64: + /* Sanitize CPU configuration */ + call verify_cpu + /* * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1, * and someone has loaded an identity mapped page table @@ -160,6 +163,9 @@ ENTRY(secondary_startup_64) * after the boot processor executes this code. */ + /* Sanitize CPU configuration */ + call verify_cpu + /* Enable PAE mode and PGE */ movl$(X86_CR4_PAE | X86_CR4_PGE), %eax movq%rax, %cr4 @@ -253,6 +259,8 @@ ENTRY(secondary_startup_64) pushq %rax# target address in negative space lretq +#include "verify_cpu.S" + /* SMP bootup changes these two */ __REFDATA .align 8 diff --git a/arch/x86/kernel/verify_cpu.S b/arch/x86/kernel/verify_cpu.S index b9242ba..4cf401f 100644 --- a/arch/x86/kernel/verify_cpu.S +++ b/arch/x86/kernel/verify_cpu.S @@ -34,10 +34,11 @@ #include verify_cpu: - pushfl # Save caller passed flags - pushl $0 # Kill any dangerous flags - popfl + pushf # Save caller passed flags + push$0 # Kill any dangerous flags + popf +#ifndef __x86_64__ pushfl # standard way to check for cpuid popl%eax movl%eax,%ebx @@ -48,6 +49,7 @@ verify_cpu: popl%eax cmpl%eax,%ebx jz verify_cpu_no_longmode # cpu has no cpuid +#endif movl$0x0,%eax # See if cpuid 1 is implemented cpuid @@ -130,10 +132,10 @@ verify_cpu_sse_test: jmp verify_cpu_sse_test # try again verify_cpu_no_longmode: - popfl # Restore caller passed flags + popf# Restore caller passed flags movl $1,%eax ret verify_cpu_sse_ok: - popfl # Restore caller passed flags + popf# Restore caller passed flags xorl %eax, %eax ret -- 1.9.1
[PATCH 3.4 095/125] xen-blkback: only read request operation from shared ring once
From: Roger Pau Monné <roger@citrix.com> 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 1f13d75ccb806260079e0679d55d9253e370ec8a upstream. A compiler may load a switch statement value multiple times, which could be bad when the value is in memory shared with the frontend. When converting a non-native request to a native one, ensure that src->operation is only loaded once by using READ_ONCE(). This is part of XSA155. Signed-off-by: Roger Pau Monné <roger@citrix.com> Signed-off-by: David Vrabel <david.vra...@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> [lizf: Backported to 3.4: - adjust context - call ACCESS_ONCE instead of READ_ONCE] Signed-off-by: Zefan Li <lize...@huawei.com> --- drivers/block/xen-blkback/common.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 933adc5..47e5b65 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -256,8 +256,8 @@ static inline void blkif_get_x86_32_req(struct blkif_request *dst, struct blkif_x86_32_request *src) { int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; - dst->operation = src->operation; - switch (src->operation) { + dst->operation = ACCESS_ONCE(src->operation); + switch (dst->operation) { case BLKIF_OP_READ: case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: @@ -292,8 +292,8 @@ static inline void blkif_get_x86_64_req(struct blkif_request *dst, struct blkif_x86_64_request *src) { int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; - dst->operation = src->operation; - switch (src->operation) { + dst->operation = ACCESS_ONCE(src->operation); + switch (dst->operation) { case BLKIF_OP_READ: case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: -- 1.9.1
[PATCH 3.4 061/125] fuse: break infinite loop in fuse_fill_write_pages()
From: Roman Gushchin3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 3ca8138f014a913f98e6ef40e939868e1e9ea876 upstream. I got a report about unkillable task eating CPU. Further investigation shows, that the problem is in the fuse_fill_write_pages() function. If iov's first segment has zero length, we get an infinite loop, because we never reach iov_iter_advance() call. Fix this by calling iov_iter_advance() before repeating an attempt to copy data from userspace. A similar problem is described in 124d3b7041f ("fix writev regression: pan hanging unkillable and un-straceable"). If zero-length segmend is followed by segment with invalid address, iov_iter_fault_in_readable() checks only first segment (zero-length), iov_iter_copy_from_user_atomic() skips it, fails at second and returns zero -> goto again without skipping zero-length segment. Patch calls iov_iter_advance() before goto again: we'll skip zero-length segment at second iteraction and iov_iter_fault_in_readable() will detect invalid address. Special thanks to Konstantin Khlebnikov, who helped a lot with the commit description. Cc: Andrew Morton Cc: Maxim Patlasov Cc: Konstantin Khlebnikov Signed-off-by: Roman Gushchin Signed-off-by: Miklos Szeredi Fixes: ea9b9907b82a ("fuse: implement perform_write") Signed-off-by: Zefan Li --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e4f1f1a..951457a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -846,6 +846,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req, mark_page_accessed(page); + iov_iter_advance(ii, tmp); if (!tmp) { unlock_page(page); page_cache_release(page); @@ -857,7 +858,6 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req, req->pages[req->num_pages] = page; req->num_pages++; - iov_iter_advance(ii, tmp); count += tmp; pos += tmp; offset += tmp; -- 1.9.1
[PATCH 3.4 045/125] iio: lpc32xx_adc: fix warnings caused by enabling unprepared clock
From: Vladimir Zapolskiy3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 01bb70ae0b98d266fa3e860482c7ce22fa482a6e upstream. If common clock framework is configured, the driver generates a warning, which is fixed by this change: root@devkit3250:~# cat /sys/bus/iio/devices/iio\:device0/in_voltage0_raw [ cut here ] WARNING: CPU: 0 PID: 724 at drivers/clk/clk.c:727 clk_core_enable+0x2c/0xa4() Modules linked in: sc16is7xx snd_soc_uda1380 CPU: 0 PID: 724 Comm: cat Not tainted 4.3.0-rc2+ #198 Hardware name: LPC32XX SoC (Flattened Device Tree) Backtrace: [<>] (dump_backtrace) from [<>] (show_stack+0x18/0x1c) [<>] (show_stack) from [<>] (dump_stack+0x20/0x28) [<>] (dump_stack) from [<>] (warn_slowpath_common+0x90/0xb8) [<>] (warn_slowpath_common) from [<>] (warn_slowpath_null+0x24/0x2c) [<>] (warn_slowpath_null) from [<>] (clk_core_enable+0x2c/0xa4) [<>] (clk_core_enable) from [<>] (clk_enable+0x24/0x38) [<>] (clk_enable) from [<>] (lpc32xx_read_raw+0x38/0x80) [<>] (lpc32xx_read_raw) from [<>] (iio_read_channel_info+0x70/0x94) [<>] (iio_read_channel_info) from [<>] (dev_attr_show+0x28/0x4c) [<>] (dev_attr_show) from [<>] (sysfs_kf_seq_show+0x8c/0xf0) [<>] (sysfs_kf_seq_show) from [<>] (kernfs_seq_show+0x2c/0x30) [<>] (kernfs_seq_show) from [<>] (seq_read+0x1c8/0x440) [<>] (seq_read) from [<>] (kernfs_fop_read+0x38/0x170) [<>] (kernfs_fop_read) from [<>] (do_readv_writev+0x16c/0x238) [<>] (do_readv_writev) from [<>] (vfs_readv+0x50/0x58) [<>] (vfs_readv) from [<>] (default_file_splice_read+0x1a4/0x308) [<>] (default_file_splice_read) from [<>] (do_splice_to+0x78/0x84) [<>] (do_splice_to) from [<>] (splice_direct_to_actor+0xc8/0x1cc) [<>] (splice_direct_to_actor) from [<>] (do_splice_direct+0xa0/0xb8) [<>] (do_splice_direct) from [<>] (do_sendfile+0x1a8/0x30c) [<>] (do_sendfile) from [<>] (SyS_sendfile64+0x104/0x10c) [<>] (SyS_sendfile64) from [<>] (ret_fast_syscall+0x0/0x38) Signed-off-by: Vladimir Zapolskiy Signed-off-by: Jonathan Cameron Signed-off-by: Zefan Li --- drivers/staging/iio/adc/lpc32xx_adc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/iio/adc/lpc32xx_adc.c b/drivers/staging/iio/adc/lpc32xx_adc.c index dfc9033..37ca387 100644 --- a/drivers/staging/iio/adc/lpc32xx_adc.c +++ b/drivers/staging/iio/adc/lpc32xx_adc.c @@ -75,7 +75,7 @@ static int lpc32xx_read_raw(struct iio_dev *indio_dev, if (mask == 0) { mutex_lock(_dev->mlock); - clk_enable(info->clk); + clk_prepare_enable(info->clk); /* Measurement setup */ __raw_writel(AD_INTERNAL | (chan->address) | AD_REFp | AD_REFm, LPC32XX_ADC_SELECT(info->adc_base)); @@ -83,7 +83,7 @@ static int lpc32xx_read_raw(struct iio_dev *indio_dev, __raw_writel(AD_PDN_CTRL | AD_STROBE, LPC32XX_ADC_CTRL(info->adc_base)); wait_for_completion(>completion); /* set by ISR */ - clk_disable(info->clk); + clk_disable_unprepare(info->clk); *val = info->value; mutex_unlock(_dev->mlock); -- 1.9.1
[PATCH 3.4 060/125] fix sysvfs symlinks
From: Al Viro 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 0ebf7f10d67a70e120f365018f1c5fce9ddc567d upstream. The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Signed-off-by: Al Viro [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- fs/sysv/inode.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c index 3da5ce2..fcdd63c 100644 --- a/fs/sysv/inode.c +++ b/fs/sysv/inode.c @@ -176,14 +176,8 @@ void sysv_set_inode(struct inode *inode, dev_t rdev) inode->i_fop = _dir_operations; inode->i_mapping->a_ops = _aops; } else if (S_ISLNK(inode->i_mode)) { - if (inode->i_blocks) { - inode->i_op = _symlink_inode_operations; - inode->i_mapping->a_ops = _aops; - } else { - inode->i_op = _fast_symlink_inode_operations; - nd_terminate_link(SYSV_I(inode)->i_data, inode->i_size, - sizeof(SYSV_I(inode)->i_data) - 1); - } + inode->i_op = _symlink_inode_operations; + inode->i_mapping->a_ops = _aops; } else init_special_inode(inode, inode->i_mode, rdev); } -- 1.9.1
[PATCH 3.4 063/125] ext4: Fix handling of extended tv_sec
From: David Turner 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream. In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by: David Turner Signed-off-by: Theodore Ts'o Reported-by: Mark Harris Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732 Signed-off-by: Zefan Li --- fs/ext4/ext4.h | 51 --- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b9cdb6d..aedf75f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -704,19 +705,55 @@ struct move_extent { <= (EXT4_GOOD_OLD_INODE_SIZE + \ (einode)->i_extra_isize)) \ +/* + * We use an encoding that preserves the times for extra epoch "00": + * + * extra msb of adjust for signed + * epoch 32-bit 32-bit tv_sec to + * bits timedecoded 64-bit tv_sec 64-bit tv_sec valid time range + * 0 01-0x8000..-0x0001 0x0 1901-12-13..1969-12-31 + * 0 000x0..0x07fff 0x0 1970-01-01..2038-01-19 + * 0 110x08000..0x0 0x1 2038-01-19..2106-02-07 + * 0 100x1..0x17fff 0x1 2106-02-07..2174-02-25 + * 1 010x18000..0x1 0x2 2174-02-25..2242-03-16 + * 1 000x2..0x27fff 0x2 2242-03-16..2310-04-04 + * 1 110x28000..0x2 0x3 2310-04-04..2378-04-22 + * 1 100x3..0x37fff 0x3 2378-04-22..2446-05-10 + * + * Note that previous versions of the kernel on 64-bit systems would + * incorrectly use extra epoch bits 1,1 for dates between 1901 and + * 1970. e2fsck will correct this, assuming that it is run on the + * affected filesystem before 2242. + */ + static inline __le32 ext4_encode_extra_time(struct timespec *time) { - return cpu_to_le32((sizeof(time->tv_sec) > 4 ? - (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) | - ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); + u32 extra = sizeof(time->tv_sec) > 4 ? + ((time->tv_sec - (s32)time->tv_sec) >> 32) & EXT4_EPOCH_MASK : 0; + return cpu_to_le32(extra | (time->tv_nsec << EXT4_EPOCH_BITS)); } static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) { - if (sizeof(time->tv_sec) > 4) - time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) - << 32; - time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; + if (unlikely(sizeof(time->tv_sec) > 4 && + (extra & cpu_to_le32(EXT4_EPOCH_MASK { +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,20,0) + /* Handle legacy encoding of pre-1970 dates with epoch +* bits 1,1. We assume that by kernel version 4.20, +* everyone will have run fsck over the affected +* filesystems to correct the problem. (This +* backwards compatibility may be removed before this +* time, at the discretion of the ext4 developers.) +*/ + u64 extra_bits = le32_to_cpu(extra) & EXT4_EPOCH_MASK; + if (extra_bits == 3 && ((time->tv_sec) & 0x8000) != 0) + extra_bits = 0; + time->tv_sec += extra_bits << 32; +#else + time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; +#endif + } + time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; } #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ -- 1.9.1
[PATCH 3.4 028/125] x86/cpu: Call verify_cpu() after having entered long mode too
From: Borislav Petkov 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 04633df0c43d710e5f696b06539c100898678235 upstream. When we get loaded by a 64-bit bootloader, kernel entry point is startup_64 in head_64.S. We don't trust any and all bootloaders because some will fiddle with CPU configuration so we go ahead and massage each CPU into sanity again. For example, some dell BIOSes have this XD disable feature which set IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround for other OSes but Linux sure doesn't need it. A similar thing is present in the Surface 3 firmware - see https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit only on the BSP: # rdmsr -a 0x1a0 400850089 850089 850089 850089 I know, right?! There's not even an off switch in there. So fix all those cases by sanitizing the 64-bit entry point too. For that, make verify_cpu() callable in 64-bit mode also. Requested-and-debugged-by: "H. Peter Anvin" Reported-and-tested-by: Bastien Nocera Signed-off-by: Borislav Petkov Cc: Matt Fleming Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email...@alien8.de Signed-off-by: Thomas Gleixner [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li --- arch/x86/kernel/head_64.S| 8 arch/x86/kernel/verify_cpu.S | 12 +++- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 40f4eb3..59d0eac 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -45,6 +45,9 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map) .globl startup_64 startup_64: + /* Sanitize CPU configuration */ + call verify_cpu + /* * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1, * and someone has loaded an identity mapped page table @@ -160,6 +163,9 @@ ENTRY(secondary_startup_64) * after the boot processor executes this code. */ + /* Sanitize CPU configuration */ + call verify_cpu + /* Enable PAE mode and PGE */ movl$(X86_CR4_PAE | X86_CR4_PGE), %eax movq%rax, %cr4 @@ -253,6 +259,8 @@ ENTRY(secondary_startup_64) pushq %rax# target address in negative space lretq +#include "verify_cpu.S" + /* SMP bootup changes these two */ __REFDATA .align 8 diff --git a/arch/x86/kernel/verify_cpu.S b/arch/x86/kernel/verify_cpu.S index b9242ba..4cf401f 100644 --- a/arch/x86/kernel/verify_cpu.S +++ b/arch/x86/kernel/verify_cpu.S @@ -34,10 +34,11 @@ #include verify_cpu: - pushfl # Save caller passed flags - pushl $0 # Kill any dangerous flags - popfl + pushf # Save caller passed flags + push$0 # Kill any dangerous flags + popf +#ifndef __x86_64__ pushfl # standard way to check for cpuid popl%eax movl%eax,%ebx @@ -48,6 +49,7 @@ verify_cpu: popl%eax cmpl%eax,%ebx jz verify_cpu_no_longmode # cpu has no cpuid +#endif movl$0x0,%eax # See if cpuid 1 is implemented cpuid @@ -130,10 +132,10 @@ verify_cpu_sse_test: jmp verify_cpu_sse_test # try again verify_cpu_no_longmode: - popfl # Restore caller passed flags + popf# Restore caller passed flags movl $1,%eax ret verify_cpu_sse_ok: - popfl # Restore caller passed flags + popf# Restore caller passed flags xorl %eax, %eax ret -- 1.9.1
[PATCH 3.4 095/125] xen-blkback: only read request operation from shared ring once
From: Roger Pau Monné 3.4.113-rc1 review patch. If anyone has any objections, please let me know. -- commit 1f13d75ccb806260079e0679d55d9253e370ec8a upstream. A compiler may load a switch statement value multiple times, which could be bad when the value is in memory shared with the frontend. When converting a non-native request to a native one, ensure that src->operation is only loaded once by using READ_ONCE(). This is part of XSA155. Signed-off-by: Roger Pau Monné Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk [lizf: Backported to 3.4: - adjust context - call ACCESS_ONCE instead of READ_ONCE] Signed-off-by: Zefan Li --- drivers/block/xen-blkback/common.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 933adc5..47e5b65 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -256,8 +256,8 @@ static inline void blkif_get_x86_32_req(struct blkif_request *dst, struct blkif_x86_32_request *src) { int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; - dst->operation = src->operation; - switch (src->operation) { + dst->operation = ACCESS_ONCE(src->operation); + switch (dst->operation) { case BLKIF_OP_READ: case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: @@ -292,8 +292,8 @@ static inline void blkif_get_x86_64_req(struct blkif_request *dst, struct blkif_x86_64_request *src) { int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; - dst->operation = src->operation; - switch (src->operation) { + dst->operation = ACCESS_ONCE(src->operation); + switch (dst->operation) { case BLKIF_OP_READ: case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: -- 1.9.1