Re: [PATCH V2] leds: trigger: Introduce an USB port trigger
On 18 July 2016 at 07:40, Peter Chenwrote: > On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote: >> On 18 July 2016 at 04:31, Peter Chen wrote: >> > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote: >> >> + >> >> +usbport trigger: >> >> +- usb-ports : List of USB ports that usbport should observed for turning >> >> on a >> >> + given LED. >> >> + >> > >> > %s/should/should be >> >> Thanks. >> >> >> >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c >> >> b/drivers/leds/trigger/ledtrig-usbport.c >> >> new file mode 100644 >> >> index 000..97b064c >> >> --- /dev/null >> >> +++ b/drivers/leds/trigger/ledtrig-usbport.c >> >> @@ -0,0 +1,206 @@ >> >> +/* >> >> + * USB port LED trigger >> >> + * >> >> + * Copyright (C) 2016 Rafał Miłecki >> >> + * >> >> + * This program is free software; you can redistribute it and/or modify >> >> + * it under the terms of the GNU General Public License as published by >> >> + * the Free Software Foundation; either version 2 of the License, or (at >> >> + * your option) any later version. >> >> + */ >> > >> > GPL v2 only. >> > >> >> +MODULE_AUTHOR("Rafał Miłecki "); >> >> +MODULE_DESCRIPTION("USB port trigger"); >> >> +MODULE_LICENSE("GPL"); >> > >> > GPL v2 >> >> What's the reason for this? I don't have any real preference, but I >> never heard heard about kernel/Linux preference neither. >> > > https://en.wikipedia.org/wiki/Linux_kernel Well, Linux is released under GPL v2, I'm well aware of that. It means all its code needs to be GPL v2 compatible. There are multiple compatible licenses: MIT, BSD 3-clause, BSD 2-clause. The one I used allows treating code as GPL V2 as well. I could release this code using MIT and it should be acceptable as well. I still don't see what's wrong with the picked license. -- Rafał
Re: [PATCH V2] leds: trigger: Introduce an USB port trigger
On 18 July 2016 at 07:40, Peter Chen wrote: > On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote: >> On 18 July 2016 at 04:31, Peter Chen wrote: >> > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote: >> >> + >> >> +usbport trigger: >> >> +- usb-ports : List of USB ports that usbport should observed for turning >> >> on a >> >> + given LED. >> >> + >> > >> > %s/should/should be >> >> Thanks. >> >> >> >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c >> >> b/drivers/leds/trigger/ledtrig-usbport.c >> >> new file mode 100644 >> >> index 000..97b064c >> >> --- /dev/null >> >> +++ b/drivers/leds/trigger/ledtrig-usbport.c >> >> @@ -0,0 +1,206 @@ >> >> +/* >> >> + * USB port LED trigger >> >> + * >> >> + * Copyright (C) 2016 Rafał Miłecki >> >> + * >> >> + * This program is free software; you can redistribute it and/or modify >> >> + * it under the terms of the GNU General Public License as published by >> >> + * the Free Software Foundation; either version 2 of the License, or (at >> >> + * your option) any later version. >> >> + */ >> > >> > GPL v2 only. >> > >> >> +MODULE_AUTHOR("Rafał Miłecki "); >> >> +MODULE_DESCRIPTION("USB port trigger"); >> >> +MODULE_LICENSE("GPL"); >> > >> > GPL v2 >> >> What's the reason for this? I don't have any real preference, but I >> never heard heard about kernel/Linux preference neither. >> > > https://en.wikipedia.org/wiki/Linux_kernel Well, Linux is released under GPL v2, I'm well aware of that. It means all its code needs to be GPL v2 compatible. There are multiple compatible licenses: MIT, BSD 3-clause, BSD 2-clause. The one I used allows treating code as GPL V2 as well. I could release this code using MIT and it should be acceptable as well. I still don't see what's wrong with the picked license. -- Rafał
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/kernel/idle_book3s.S between commit: 69c592ed40d3 ("powerpc/opal: Add real mode call wrappers") from the powerpc tree and commit: fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") from the kvm tree. I fixed it up (on Michael's advise, I used the version form the kvm tree) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/kernel/idle_book3s.S between commit: 69c592ed40d3 ("powerpc/opal: Add real mode call wrappers") from the powerpc tree and commit: fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") from the kvm tree. I fixed it up (on Michael's advise, I used the version form the kvm tree) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/kernel/exceptions-64s.S between commit: 9baaef0a22c8 ("powerpc/irq: Add support for HV virtualization interrupts") from the powerpc tree and commit: fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/kernel/exceptions-64s.S index 6200e4925d26,0eba47e074b9.. --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@@ -669,8 -680,8 +669,10 @@@ _GLOBAL(__replay_interrupt BEGIN_FTR_SECTION cmpwi r3,0xe80 beq h_doorbell_common + cmpwi r3,0xea0 + beq h_virt_irq_common + cmpwi r3,0xe60 + beq hmi_exception_common FTR_SECTION_ELSE cmpwi r3,0xa00 beq doorbell_super_common @@@ -1161,18 -1172,9 +1163,18 @@@ fwnmi_data_area . = 0x8000 #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */ + STD_EXCEPTION_COMMON(0xf60, facility_unavailable, facility_unavailable_exception) + STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, facility_unavailable_exception) + +#ifdef CONFIG_CBE_RAS + STD_EXCEPTION_COMMON(0x1200, cbe_system_error, cbe_system_error_exception) + STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception) + STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception) +#endif /* CONFIG_CBE_RAS */ + .globl hmi_exception_early hmi_exception_early: - EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0xe60) + EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, 0xe62) mr r10,r1 /* Save r1 */ ld r1,PACAEMERGSP(r13) /* Use emergency stack */ subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/kernel/exceptions-64s.S between commit: 9baaef0a22c8 ("powerpc/irq: Add support for HV virtualization interrupts") from the powerpc tree and commit: fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/kernel/exceptions-64s.S index 6200e4925d26,0eba47e074b9.. --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@@ -669,8 -680,8 +669,10 @@@ _GLOBAL(__replay_interrupt BEGIN_FTR_SECTION cmpwi r3,0xe80 beq h_doorbell_common + cmpwi r3,0xea0 + beq h_virt_irq_common + cmpwi r3,0xe60 + beq hmi_exception_common FTR_SECTION_ELSE cmpwi r3,0xa00 beq doorbell_super_common @@@ -1161,18 -1172,9 +1163,18 @@@ fwnmi_data_area . = 0x8000 #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */ + STD_EXCEPTION_COMMON(0xf60, facility_unavailable, facility_unavailable_exception) + STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, facility_unavailable_exception) + +#ifdef CONFIG_CBE_RAS + STD_EXCEPTION_COMMON(0x1200, cbe_system_error, cbe_system_error_exception) + STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception) + STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception) +#endif /* CONFIG_CBE_RAS */ + .globl hmi_exception_early hmi_exception_early: - EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0xe60) + EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, 0xe62) mr r10,r1 /* Save r1 */ ld r1,PACAEMERGSP(r13) /* Use emergency stack */ subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/
Re: [patch] phy: phy-brcm-sata: fix a loop timeout
On Tue, Jun 21, 2016 at 2:07 PM, Dan Carpenterwrote: > Since this loop is a post op then it means we end with "try == -1" but > afterward we test for if it's zero. Fix this by changing to a pre-op so > we end on zero. Thanks Dan. That should be pre-op. Thnaks Dhananjay > > Fixes: 024812889ad1 ('phy: Add SATA3 PHY support for Broadcom NSP SoC') > Signed-off-by: Dan Carpenter > > diff --git a/drivers/phy/phy-brcm-sata.c b/drivers/phy/phy-brcm-sata.c > index 18d6626..c86456f 100644 > --- a/drivers/phy/phy-brcm-sata.c > +++ b/drivers/phy/phy-brcm-sata.c > @@ -329,7 +329,7 @@ static int brcm_nsp_sata_init(struct brcm_sata_port *port) > > /* Wait for pll_seq_done bit */ > try = 50; > - while (try--) { > + while (--try) { > val = brcm_sata_phy_rd(base, BLOCK0_REG_BANK, > BLOCK0_XGXSSTATUS); > if (val & BLOCK0_XGXSSTATUS_PLL_LOCK)
Re: [patch] phy: phy-brcm-sata: fix a loop timeout
On Tue, Jun 21, 2016 at 2:07 PM, Dan Carpenter wrote: > Since this loop is a post op then it means we end with "try == -1" but > afterward we test for if it's zero. Fix this by changing to a pre-op so > we end on zero. Thanks Dan. That should be pre-op. Thnaks Dhananjay > > Fixes: 024812889ad1 ('phy: Add SATA3 PHY support for Broadcom NSP SoC') > Signed-off-by: Dan Carpenter > > diff --git a/drivers/phy/phy-brcm-sata.c b/drivers/phy/phy-brcm-sata.c > index 18d6626..c86456f 100644 > --- a/drivers/phy/phy-brcm-sata.c > +++ b/drivers/phy/phy-brcm-sata.c > @@ -329,7 +329,7 @@ static int brcm_nsp_sata_init(struct brcm_sata_port *port) > > /* Wait for pll_seq_done bit */ > try = 50; > - while (try--) { > + while (--try) { > val = brcm_sata_phy_rd(base, BLOCK0_REG_BANK, > BLOCK0_XGXSSTATUS); > if (val & BLOCK0_XGXSSTATUS_PLL_LOCK)
Re: [PATCH 04/10] phy: da8xx-usb: new driver for DA8xx SoC USB PHY
Hi Arnd, On Saturday 16 July 2016 02:44 AM, Arnd Bergmann wrote: > On Tuesday, July 5, 2016 10:53:51 AM CEST Kishon Vijay Abraham I wrote: >> From: David Lechner>> >> This is a new phy driver for the SoC USB controllers on the TI DA8xx >> family of microcontrollers. The USB 1.1 PHY is just a simple on/off. >> The USB 2.0 PHY also allows overriding the VBUS and ID pins. >> >> Signed-off-by: David Lechner >> Signed-off-by: Kishon Vijay Abraham I > > This is now in linux-next, but fails to build: > >> +#include >> +#include >> +#include >> +#include > > drivers/phy/phy-da8xx-usb.c:19:37: fatal error: linux/mfd/da8xx-cfgchip.h: No > such file or directory I'll look at this. Thanks Kishon
Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
Hello, On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote: > On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kimwrote: > > The virtio pstore driver provides interface to the pstore subsystem so > > that the guest kernel's log/dump message can be saved on the host > > machine. Users can access the log file directly on the host, or on the > > guest at the next boot using pstore filesystem. It currently deals with > > kernel log (printk) buffer only, but we can extend it to have other > > information (like ftrace dump) later. > > > > It supports legacy PCI device using single order-2 page buffer. As all > > operation of pstore is synchronous, it would be fine IMHO. However I > > don't know how to make write operation synchronous since it's called > > with a spinlock held (from any context including NMI). > > > > Cc: Paolo Bonzini > > Cc: Radim Kr??m > > Cc: "Michael S. Tsirkin" > > Cc: Anthony Liguori > > Cc: Anton Vorontsov > > Cc: Colin Cross > > Cc: Kees Cook > > Cc: Tony Luck > > Cc: Steven Rostedt > > Cc: Ingo Molnar > > Cc: Minchan Kim > > Cc: k...@vger.kernel.org > > Cc: qemu-de...@nongnu.org > > Cc: virtualizat...@lists.linux-foundation.org > > Signed-off-by: Namhyung Kim > > This looks great to me! I'd love to use this in qemu. (Right now I go > through hoops to use the ramoops backend for testing.) > > Reviewed-by: Kees Cook Thank you! > > Notes below... > [SNIP] > > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id > > type) > > +{ > > + u16 ret; > > + > > + switch (type) { > > + case PSTORE_TYPE_DMESG: > > + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG); > > + break; > > + default: > > + ret = cpu_to_virtio16(vps->vdev, > > VIRTIO_PSTORE_TYPE_UNKNOWN); > > + break; > > + } > > I would love to see this support PSTORE_TYPE_CONSOLE too. It should be > relatively easy to add: I think it'd just be another virtio command? Do you want to append the data to the host file as guest does printk()? I think it needs some kind of buffer management, but it's not hard to add IMHO. > > > + > > + return ret; > > +} > > + [SNIP] > > +static int notrace virt_pstore_write(enum pstore_type_id type, > > +enum kmsg_dump_reason reason, > > +u64 *id, unsigned int part, int count, > > +bool compressed, size_t size, > > +struct pstore_info *psi) > > +{ > > + struct virtio_pstore *vps = psi->data; > > + struct virtio_pstore_hdr *hdr = >hdr; > > + struct scatterlist sg[2]; > > + unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0; > > + > > + *id = vps->id++; > > + > > + hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_WRITE); > > + hdr->id= cpu_to_virtio64(vps->vdev, *id); > > + hdr->flags = cpu_to_virtio32(vps->vdev, flags); > > + hdr->type = to_virtio_type(vps, type); > > + > > + sg_init_table(sg, 2); > > + sg_set_buf([0], hdr, sizeof(*hdr)); > > + sg_set_buf([1], psi->buf, size); > > + virtqueue_add_outbuf(vps->vq, sg, 2, vps, GFP_ATOMIC); > > + virtqueue_kick(vps->vq); > > + > > + /* TODO: make it synchronous */ > > + return 0; > > The down side to this being asynchronous is the lack of error > reporting. Perhaps this could check hdr->type before queuing and error > for any VIRTIO_PSTORE_TYPE_UNKNOWN message instead of trying to send > it? I cannot follow, sorry. Could you please elaborate it more? > > > +} > > + > > +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count, > > +struct timespec time, struct pstore_info *psi) > > +{ > > + struct virtio_pstore *vps = psi->data; > > + struct virtio_pstore_hdr *hdr = >hdr; > > + struct scatterlist sg[1]; > > + unsigned int len; > > + > > + hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_ERASE); > > + hdr->id= cpu_to_virtio64(vps->vdev, id); > > + hdr->type = to_virtio_type(vps, type); > > + > > + sg_init_one(sg, hdr, sizeof(*hdr)); > > + virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL); > > + virtqueue_kick(vps->vq); > > + > > + wait_event(vps->acked, virtqueue_get_buf(vps->vq, )); > > + return 0; > > +} > > + > > +static int virt_pstore_init(struct virtio_pstore *vps) > > +{ > > + struct pstore_info *psinfo = >pstore; > > + int err; > > + > > + vps->id = 0; > > + vps->buflen = 0; > > +
Re: [PATCH 04/10] phy: da8xx-usb: new driver for DA8xx SoC USB PHY
Hi Arnd, On Saturday 16 July 2016 02:44 AM, Arnd Bergmann wrote: > On Tuesday, July 5, 2016 10:53:51 AM CEST Kishon Vijay Abraham I wrote: >> From: David Lechner >> >> This is a new phy driver for the SoC USB controllers on the TI DA8xx >> family of microcontrollers. The USB 1.1 PHY is just a simple on/off. >> The USB 2.0 PHY also allows overriding the VBUS and ID pins. >> >> Signed-off-by: David Lechner >> Signed-off-by: Kishon Vijay Abraham I > > This is now in linux-next, but fails to build: > >> +#include >> +#include >> +#include >> +#include > > drivers/phy/phy-da8xx-usb.c:19:37: fatal error: linux/mfd/da8xx-cfgchip.h: No > such file or directory I'll look at this. Thanks Kishon
Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
Hello, On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote: > On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim wrote: > > The virtio pstore driver provides interface to the pstore subsystem so > > that the guest kernel's log/dump message can be saved on the host > > machine. Users can access the log file directly on the host, or on the > > guest at the next boot using pstore filesystem. It currently deals with > > kernel log (printk) buffer only, but we can extend it to have other > > information (like ftrace dump) later. > > > > It supports legacy PCI device using single order-2 page buffer. As all > > operation of pstore is synchronous, it would be fine IMHO. However I > > don't know how to make write operation synchronous since it's called > > with a spinlock held (from any context including NMI). > > > > Cc: Paolo Bonzini > > Cc: Radim Kr??m > > Cc: "Michael S. Tsirkin" > > Cc: Anthony Liguori > > Cc: Anton Vorontsov > > Cc: Colin Cross > > Cc: Kees Cook > > Cc: Tony Luck > > Cc: Steven Rostedt > > Cc: Ingo Molnar > > Cc: Minchan Kim > > Cc: k...@vger.kernel.org > > Cc: qemu-de...@nongnu.org > > Cc: virtualizat...@lists.linux-foundation.org > > Signed-off-by: Namhyung Kim > > This looks great to me! I'd love to use this in qemu. (Right now I go > through hoops to use the ramoops backend for testing.) > > Reviewed-by: Kees Cook Thank you! > > Notes below... > [SNIP] > > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id > > type) > > +{ > > + u16 ret; > > + > > + switch (type) { > > + case PSTORE_TYPE_DMESG: > > + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG); > > + break; > > + default: > > + ret = cpu_to_virtio16(vps->vdev, > > VIRTIO_PSTORE_TYPE_UNKNOWN); > > + break; > > + } > > I would love to see this support PSTORE_TYPE_CONSOLE too. It should be > relatively easy to add: I think it'd just be another virtio command? Do you want to append the data to the host file as guest does printk()? I think it needs some kind of buffer management, but it's not hard to add IMHO. > > > + > > + return ret; > > +} > > + [SNIP] > > +static int notrace virt_pstore_write(enum pstore_type_id type, > > +enum kmsg_dump_reason reason, > > +u64 *id, unsigned int part, int count, > > +bool compressed, size_t size, > > +struct pstore_info *psi) > > +{ > > + struct virtio_pstore *vps = psi->data; > > + struct virtio_pstore_hdr *hdr = >hdr; > > + struct scatterlist sg[2]; > > + unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0; > > + > > + *id = vps->id++; > > + > > + hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_WRITE); > > + hdr->id= cpu_to_virtio64(vps->vdev, *id); > > + hdr->flags = cpu_to_virtio32(vps->vdev, flags); > > + hdr->type = to_virtio_type(vps, type); > > + > > + sg_init_table(sg, 2); > > + sg_set_buf([0], hdr, sizeof(*hdr)); > > + sg_set_buf([1], psi->buf, size); > > + virtqueue_add_outbuf(vps->vq, sg, 2, vps, GFP_ATOMIC); > > + virtqueue_kick(vps->vq); > > + > > + /* TODO: make it synchronous */ > > + return 0; > > The down side to this being asynchronous is the lack of error > reporting. Perhaps this could check hdr->type before queuing and error > for any VIRTIO_PSTORE_TYPE_UNKNOWN message instead of trying to send > it? I cannot follow, sorry. Could you please elaborate it more? > > > +} > > + > > +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count, > > +struct timespec time, struct pstore_info *psi) > > +{ > > + struct virtio_pstore *vps = psi->data; > > + struct virtio_pstore_hdr *hdr = >hdr; > > + struct scatterlist sg[1]; > > + unsigned int len; > > + > > + hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_ERASE); > > + hdr->id= cpu_to_virtio64(vps->vdev, id); > > + hdr->type = to_virtio_type(vps, type); > > + > > + sg_init_one(sg, hdr, sizeof(*hdr)); > > + virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL); > > + virtqueue_kick(vps->vq); > > + > > + wait_event(vps->acked, virtqueue_get_buf(vps->vq, )); > > + return 0; > > +} > > + > > +static int virt_pstore_init(struct virtio_pstore *vps) > > +{ > > + struct pstore_info *psinfo = >pstore; > > + int err; > > + > > + vps->id = 0; > > + vps->buflen = 0; > > + psinfo->bufsize = VIRT_PSTORE_BUFSIZE; > > + psinfo->buf = (void *)__get_free_pages(GFP_KERNEL, > > VIRT_PSTORE_ORDER); > > + if (!psinfo->buf) { > > + pr_err("cannot allocate pstore buffer\n"); > > + return -ENOMEM; > > + } > > + > > +
RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's
> -Original Message- > From: Tan, Jui Nee > Sent: Monday, July 18, 2016 11:35 AM > To: 'Paul Gortmaker'; > andriy.shevche...@linux.intel.com > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com; > t...@linutronix.de; mi...@redhat.com; H. Peter Anvin ; > X86 ML ; pty...@xes-inc.com; Lee Jones > ; Linus Walleij ; linux- > g...@vger.kernel.org; LKML ; Yong, > Jonathan ; Yu, Ong Hock > ; Voon, Weifeng ; Wan > Mohamad, Wan Ahmad Zainie > > Subject: RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband > bridge support driver for Intel SOC's > > > > > -Original Message- > > From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On > > Behalf Of Paul Gortmaker > > Sent: Friday, July 15, 2016 8:01 AM > > To: Tan, Jui Nee > > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com; > > andriy.shevche...@linux.intel.com; t...@linutronix.de; > > mi...@redhat.com; H. Peter Anvin ; X86 ML > > ; pty...@xes-inc.com; Lee Jones > ; > > Linus Walleij ; linux-g...@vger.kernel.org; LKML > > ; Yong, Jonathan > > ; Yu, Ong Hock ; > Voon, > > Weifeng ; Wan Mohamad, Wan Ahmad Zainie > > > > Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband > > bridge support driver for Intel SOC's > > > > On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee > wrote: > > > From: Andy Shevchenko > > > > > > There is already one and at least one more user coming which require > > > an access to Primary to Sideband bridge (P2SB) in order to get IO or > > > MMIO bar hidden by BIOS. > > > Create a driver to access P2SB for x86 devices. > > > > > > Signed-off-by: Yong, Jonathan > > > Signed-off-by: Andy Shevchenko > > > --- > > > Changes in V6: > > > - No change > > > > > > Changes in V5: > > > - No change > > > > > > Changes in V4: > > > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from > > > [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge > > support driver for Intel SOC's > > > to > > > [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO > > pinctrl in non-ACPI system > > > since the config is used in latter patch. > > > > > > Changes in V3: > > > - No change > > > > > > Changes in V2: > > > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select > > PINCTRL" > > > to fix kbuildbot error > > > > > > arch/x86/Kconfig | 4 ++ > > > arch/x86/include/asm/p2sb.h | 27 +++ > > > arch/x86/platform/intel/Makefile | 1 + > > > arch/x86/platform/intel/p2sb.c | 99 > > > > > 4 files changed, 131 insertions(+) > > > create mode 100644 arch/x86/include/asm/p2sb.h create mode 100644 > > > arch/x86/platform/intel/p2sb.c > > > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index > > > d9a94da..d305d81 100644 > > > --- a/arch/x86/Kconfig > > > +++ b/arch/x86/Kconfig > > > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG > > > > > > If you don't require the option or are in doubt, say N. > > > > > > +config P2SB > > > + tristate > > > > OK, this is tristate, but then > > > P2SB is tristate as currently it is only used by LPC_ICH that is tristate too. > ... > config LPC_ICH > tristate "Intel ICH LPC" > depends on X86 && PCI > select MFD_CORE > select P2SB > ... > > > + depends on PCI > > > + > > > config X86_RDC321X > > > bool "RDC R-321x SoC" > > > depends on X86_32 > > > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h > > > new file mode 100644 index 000..686e07b > > > --- /dev/null > > > +++ b/arch/x86/include/asm/p2sb.h > > > @@ -0,0 +1,27 @@ > > > +/* > > > + * Primary to Sideband bridge (P2SB) access support */ > > > + > > > +#ifndef P2SB_SYMS_H > > > +#define P2SB_SYMS_H > > > + > > > +#include > > > +#include > > > + > > > +#if IS_ENABLED(CONFIG_P2SB) > > > + > > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > > + struct resource *res); > > > + > > > +#else /* CONFIG_P2SB is not set */ > > > + > > > +static inline > > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > > + struct resource *res) > > > +{ > > > + return -ENODEV; > > > +} > > > + > > > +#endif /* CONFIG_P2SB */ > > > + > > > +#endif /* P2SB_SYMS_H */ > > > diff
RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's
> -Original Message- > From: Tan, Jui Nee > Sent: Monday, July 18, 2016 11:35 AM > To: 'Paul Gortmaker' ; > andriy.shevche...@linux.intel.com > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com; > t...@linutronix.de; mi...@redhat.com; H. Peter Anvin ; > X86 ML ; pty...@xes-inc.com; Lee Jones > ; Linus Walleij ; linux- > g...@vger.kernel.org; LKML ; Yong, > Jonathan ; Yu, Ong Hock > ; Voon, Weifeng ; Wan > Mohamad, Wan Ahmad Zainie > > Subject: RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband > bridge support driver for Intel SOC's > > > > > -Original Message- > > From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On > > Behalf Of Paul Gortmaker > > Sent: Friday, July 15, 2016 8:01 AM > > To: Tan, Jui Nee > > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com; > > andriy.shevche...@linux.intel.com; t...@linutronix.de; > > mi...@redhat.com; H. Peter Anvin ; X86 ML > > ; pty...@xes-inc.com; Lee Jones > ; > > Linus Walleij ; linux-g...@vger.kernel.org; LKML > > ; Yong, Jonathan > > ; Yu, Ong Hock ; > Voon, > > Weifeng ; Wan Mohamad, Wan Ahmad Zainie > > > > Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband > > bridge support driver for Intel SOC's > > > > On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee > wrote: > > > From: Andy Shevchenko > > > > > > There is already one and at least one more user coming which require > > > an access to Primary to Sideband bridge (P2SB) in order to get IO or > > > MMIO bar hidden by BIOS. > > > Create a driver to access P2SB for x86 devices. > > > > > > Signed-off-by: Yong, Jonathan > > > Signed-off-by: Andy Shevchenko > > > --- > > > Changes in V6: > > > - No change > > > > > > Changes in V5: > > > - No change > > > > > > Changes in V4: > > > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from > > > [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge > > support driver for Intel SOC's > > > to > > > [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO > > pinctrl in non-ACPI system > > > since the config is used in latter patch. > > > > > > Changes in V3: > > > - No change > > > > > > Changes in V2: > > > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select > > PINCTRL" > > > to fix kbuildbot error > > > > > > arch/x86/Kconfig | 4 ++ > > > arch/x86/include/asm/p2sb.h | 27 +++ > > > arch/x86/platform/intel/Makefile | 1 + > > > arch/x86/platform/intel/p2sb.c | 99 > > > > > 4 files changed, 131 insertions(+) > > > create mode 100644 arch/x86/include/asm/p2sb.h create mode 100644 > > > arch/x86/platform/intel/p2sb.c > > > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index > > > d9a94da..d305d81 100644 > > > --- a/arch/x86/Kconfig > > > +++ b/arch/x86/Kconfig > > > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG > > > > > > If you don't require the option or are in doubt, say N. > > > > > > +config P2SB > > > + tristate > > > > OK, this is tristate, but then > > > P2SB is tristate as currently it is only used by LPC_ICH that is tristate too. > ... > config LPC_ICH > tristate "Intel ICH LPC" > depends on X86 && PCI > select MFD_CORE > select P2SB > ... > > > + depends on PCI > > > + > > > config X86_RDC321X > > > bool "RDC R-321x SoC" > > > depends on X86_32 > > > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h > > > new file mode 100644 index 000..686e07b > > > --- /dev/null > > > +++ b/arch/x86/include/asm/p2sb.h > > > @@ -0,0 +1,27 @@ > > > +/* > > > + * Primary to Sideband bridge (P2SB) access support */ > > > + > > > +#ifndef P2SB_SYMS_H > > > +#define P2SB_SYMS_H > > > + > > > +#include > > > +#include > > > + > > > +#if IS_ENABLED(CONFIG_P2SB) > > > + > > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > > + struct resource *res); > > > + > > > +#else /* CONFIG_P2SB is not set */ > > > + > > > +static inline > > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > > + struct resource *res) > > > +{ > > > + return -ENODEV; > > > +} > > > + > > > +#endif /* CONFIG_P2SB */ > > > + > > > +#endif /* P2SB_SYMS_H */ > > > diff --git a/arch/x86/platform/intel/Makefile > > > b/arch/x86/platform/intel/Makefile > > > index b878032..dbf9f10 100644 > > > --- a/arch/x86/platform/intel/Makefile > > > +++ b/arch/x86/platform/intel/Makefile > > > @@ -1 +1,2 @@ > > > obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o > > > +obj-$(CONFIG_P2SB) += p2sb.o > > > diff --git a/arch/x86/platform/intel/p2sb.c > > > b/arch/x86/platform/intel/p2sb.c new file mode 100644 index > > > 000..8be47a4 > > > --- /dev/null > > > +++ b/arch/x86/platform/intel/p2sb.c > > > @@ -0,0 +1,99 @@ > > > +/* > > > + * Primary to Sideband bridge
Re: [PATCH V2] leds: trigger: Introduce an USB port trigger
On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote: > On 18 July 2016 at 04:31, Peter Chenwrote: > > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote: > >> + > >> +usbport trigger: > >> +- usb-ports : List of USB ports that usbport should observed for turning > >> on a > >> + given LED. > >> + > > > > %s/should/should be > > Thanks. > > > >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c > >> b/drivers/leds/trigger/ledtrig-usbport.c > >> new file mode 100644 > >> index 000..97b064c > >> --- /dev/null > >> +++ b/drivers/leds/trigger/ledtrig-usbport.c > >> @@ -0,0 +1,206 @@ > >> +/* > >> + * USB port LED trigger > >> + * > >> + * Copyright (C) 2016 Rafał Miłecki > >> + * > >> + * This program is free software; you can redistribute it and/or modify > >> + * it under the terms of the GNU General Public License as published by > >> + * the Free Software Foundation; either version 2 of the License, or (at > >> + * your option) any later version. > >> + */ > > > > GPL v2 only. > > > >> +MODULE_AUTHOR("Rafał Miłecki "); > >> +MODULE_DESCRIPTION("USB port trigger"); > >> +MODULE_LICENSE("GPL"); > > > > GPL v2 > > What's the reason for this? I don't have any real preference, but I > never heard heard about kernel/Linux preference neither. > https://en.wikipedia.org/wiki/Linux_kernel -- Best Regards, Peter Chen
Re: [PATCH V2] leds: trigger: Introduce an USB port trigger
On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote: > On 18 July 2016 at 04:31, Peter Chen wrote: > > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote: > >> + > >> +usbport trigger: > >> +- usb-ports : List of USB ports that usbport should observed for turning > >> on a > >> + given LED. > >> + > > > > %s/should/should be > > Thanks. > > > >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c > >> b/drivers/leds/trigger/ledtrig-usbport.c > >> new file mode 100644 > >> index 000..97b064c > >> --- /dev/null > >> +++ b/drivers/leds/trigger/ledtrig-usbport.c > >> @@ -0,0 +1,206 @@ > >> +/* > >> + * USB port LED trigger > >> + * > >> + * Copyright (C) 2016 Rafał Miłecki > >> + * > >> + * This program is free software; you can redistribute it and/or modify > >> + * it under the terms of the GNU General Public License as published by > >> + * the Free Software Foundation; either version 2 of the License, or (at > >> + * your option) any later version. > >> + */ > > > > GPL v2 only. > > > >> +MODULE_AUTHOR("Rafał Miłecki "); > >> +MODULE_DESCRIPTION("USB port trigger"); > >> +MODULE_LICENSE("GPL"); > > > > GPL v2 > > What's the reason for this? I don't have any real preference, but I > never heard heard about kernel/Linux preference neither. > https://en.wikipedia.org/wiki/Linux_kernel -- Best Regards, Peter Chen
Re: [PATCH 1/2] mem-hotplug: use GFP_HIGHUSER_MOVABLE in, alloc_migrate_target()
On Fri, Jul 15, 2016 at 10:47:06AM +0800, Xishi Qiu wrote: > alloc_migrate_target() is called from migrate_pages(), and the page > is always from user space, so we can add __GFP_HIGHMEM directly. No, all migratable pages are not from user space. For example, blockdev file cache has __GFP_MOVABLE and migratable but it has no __GFP_HIGHMEM and __GFP_USER. And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE. Thanks.
Re: [PATCH 1/2] mem-hotplug: use GFP_HIGHUSER_MOVABLE in, alloc_migrate_target()
On Fri, Jul 15, 2016 at 10:47:06AM +0800, Xishi Qiu wrote: > alloc_migrate_target() is called from migrate_pages(), and the page > is always from user space, so we can add __GFP_HIGHMEM directly. No, all migratable pages are not from user space. For example, blockdev file cache has __GFP_MOVABLE and migratable but it has no __GFP_HIGHMEM and __GFP_USER. And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE. Thanks.
Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
On Mon, Jul 11, 2016 at 04:01:52PM -0700, David Rientjes wrote: > On Thu, 30 Jun 2016, Joonsoo Kim wrote: > > > We need to find a root cause of this problem, first. > > > > I guess that this problem would happen when isolate_freepages_block() > > early stop due to watermark check (if your patch is applied to your > > kernel). If scanner meets, cached pfn will be reset and your patch > > doesn't have any effect. So, I guess that scanner doesn't meet. > > > > If the scanners meet, we should rely on deferred compaction to suppress > further attempts in the near future. This is outside the scope of this > fix. > > > We enter the compaction with enough free memory so stop in > > isolate_freepages_block() should be unlikely event but your number > > shows that it happens frequently? > > > > It's not the only reason why freepages will be returned to the buddy > allocator: if locks become contended because we are spending too much time > compacting memory, we can persistently get free pages returned to the end > of the zone and then repeatedly iterate >100GB of memory on every call to > isolate_freepages(), which makes its own contended checks fire more often. > This patch is only an attempt to prevent lenghty iterations when we have > recently scanned the memory and found freepages to not be isolatable. Hmm... I can't understand how freepage scanner is persistently expensive. After freepage scanner get freepages, migration isn't stopped until either migratable pages are empty or freepages are empty. If there is no freepage, above problem doesn't happen so I assume that there is no migratable pages after calling migrate_pages(). If there is no migratable pages, it means that freepages are used by migration. Sometimes later, freepages in that pageblock are exhausted by migration and freepage scanner will move the next pageblock. So, I cannot understand how it is persistently expensive. Am I missing something? If it is caused by the fact that too many freepages are isolated at once (up to migratable pages), we can modify logic to stop isolating freepages when the pageblock is changed and freepage scanner has one or more freepages. > > > In addition, I worry that your previous patch that makes > > isolate_freepages_block() stop when watermark doesn't meet would cause > > compaction non-progress. Amount of free memory can be flutuated so > > watermark fail would be temporaral. We need to break compaction in > > this case? It would decrease compaction success rate if there is a > > memory hogger in parallel. Any idea? > > > > In my opinion, which I think is quite well known by now, the compaction > freeing scanner shouldn't be checking _any_ watermark. The end result is > that we're migrating memory, not allocating additional memory; determining > if compaction should be done is best left lower on the stack. Hmm...if there are many parallel compactors and we have no watermark check, they consume all emergency memory. It can be mitigated by isolating just one freepage in this case, but, potential risk would not be disappeared. Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
On Mon, Jul 11, 2016 at 04:01:52PM -0700, David Rientjes wrote: > On Thu, 30 Jun 2016, Joonsoo Kim wrote: > > > We need to find a root cause of this problem, first. > > > > I guess that this problem would happen when isolate_freepages_block() > > early stop due to watermark check (if your patch is applied to your > > kernel). If scanner meets, cached pfn will be reset and your patch > > doesn't have any effect. So, I guess that scanner doesn't meet. > > > > If the scanners meet, we should rely on deferred compaction to suppress > further attempts in the near future. This is outside the scope of this > fix. > > > We enter the compaction with enough free memory so stop in > > isolate_freepages_block() should be unlikely event but your number > > shows that it happens frequently? > > > > It's not the only reason why freepages will be returned to the buddy > allocator: if locks become contended because we are spending too much time > compacting memory, we can persistently get free pages returned to the end > of the zone and then repeatedly iterate >100GB of memory on every call to > isolate_freepages(), which makes its own contended checks fire more often. > This patch is only an attempt to prevent lenghty iterations when we have > recently scanned the memory and found freepages to not be isolatable. Hmm... I can't understand how freepage scanner is persistently expensive. After freepage scanner get freepages, migration isn't stopped until either migratable pages are empty or freepages are empty. If there is no freepage, above problem doesn't happen so I assume that there is no migratable pages after calling migrate_pages(). If there is no migratable pages, it means that freepages are used by migration. Sometimes later, freepages in that pageblock are exhausted by migration and freepage scanner will move the next pageblock. So, I cannot understand how it is persistently expensive. Am I missing something? If it is caused by the fact that too many freepages are isolated at once (up to migratable pages), we can modify logic to stop isolating freepages when the pageblock is changed and freepage scanner has one or more freepages. > > > In addition, I worry that your previous patch that makes > > isolate_freepages_block() stop when watermark doesn't meet would cause > > compaction non-progress. Amount of free memory can be flutuated so > > watermark fail would be temporaral. We need to break compaction in > > this case? It would decrease compaction success rate if there is a > > memory hogger in parallel. Any idea? > > > > In my opinion, which I think is quite well known by now, the compaction > freeing scanner shouldn't be checking _any_ watermark. The end result is > that we're migrating memory, not allocating additional memory; determining > if compaction should be done is best left lower on the stack. Hmm...if there are many parallel compactors and we have no watermark check, they consume all emergency memory. It can be mitigated by isolating just one freepage in this case, but, potential risk would not be disappeared. Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH] dwc_eth_qos: Remove deprecated create_singlethread_workqueue
From: Bhaktipriya ShridharDate: Sat, 16 Jul 2016 13:53:28 +0530 > alloc_workqueue replaces deprecated create_singlethread_workqueue(). > > A dedicated workqueue has been used since the workitem viz > lp->txtimeout_reinit is involved in reinitialization if a TX timeout > occurs, which is necessary to guarantee forward progress in packet > processing. As a network device can be used during memory reclaim, the > workqueue needs forward progress guarantee under memory pressure. > WQ_MEM_RECLAIM has been set to ensure this. > > Since there is only a single work item, explicit concurrency limit is > unnecessary here. > > Signed-off-by: Bhaktipriya Shridhar Applied.
Re: [PATCH] dwc_eth_qos: Remove deprecated create_singlethread_workqueue
From: Bhaktipriya Shridhar Date: Sat, 16 Jul 2016 13:53:28 +0530 > alloc_workqueue replaces deprecated create_singlethread_workqueue(). > > A dedicated workqueue has been used since the workitem viz > lp->txtimeout_reinit is involved in reinitialization if a TX timeout > occurs, which is necessary to guarantee forward progress in packet > processing. As a network device can be used during memory reclaim, the > workqueue needs forward progress guarantee under memory pressure. > WQ_MEM_RECLAIM has been set to ensure this. > > Since there is only a single work item, explicit concurrency limit is > unnecessary here. > > Signed-off-by: Bhaktipriya Shridhar Applied.
linux-next: manual merge of the rcu tree with the tip tree
Hi Paul, Today's linux-next merge of the rcu tree got a conflict in: kernel/rcu/tree.c between commit: 4df8374254ea ("rcu: Convert rcutree to hotplug state machine") from the tip tree and commit: 2a84cde733b0 ("rcu: Exact CPU-online tracking for RCU") from the rcu tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc kernel/rcu/tree.c index e5164deb51e1,5663d1e899d3.. --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@@ -3812,54 -3809,34 +3809,80 @@@ int rcutree_prepare_cpu(unsigned int cp for_each_rcu_flavor(rsp) rcu_init_percpu_data(cpu, rsp); + + rcu_prepare_kthreads(cpu); + rcu_spawn_all_nocb_kthreads(cpu); + + return 0; +} + +static void rcutree_affinity_setting(unsigned int cpu, int outgoing) +{ + struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu); + + rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); +} + +int rcutree_online_cpu(unsigned int cpu) +{ + sync_sched_exp_online_cleanup(cpu); + rcutree_affinity_setting(cpu, -1); + return 0; +} + +int rcutree_offline_cpu(unsigned int cpu) +{ + rcutree_affinity_setting(cpu, cpu); + return 0; +} + + +int rcutree_dying_cpu(unsigned int cpu) +{ + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) + rcu_cleanup_dying_cpu(rsp); + return 0; +} + +int rcutree_dead_cpu(unsigned int cpu) +{ + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) { + rcu_cleanup_dead_cpu(cpu, rsp); + do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu)); + } + return 0; } + /* + * Mark the specified CPU as being online so that subsequent grace periods + * (both expedited and normal) will wait on it. Note that this means that + * incoming CPUs are not allowed to use RCU read-side critical sections + * until this function is called. Failing to observe this restriction + * will result in lockdep splats. + */ + void rcu_cpu_starting(unsigned int cpu) + { + unsigned long flags; + unsigned long mask; + struct rcu_data *rdp; + struct rcu_node *rnp; + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) { + rdp = this_cpu_ptr(rsp->rda); + rnp = rdp->mynode; + mask = rdp->grpmask; + raw_spin_lock_irqsave_rcu_node(rnp, flags); + rnp->qsmaskinitnext |= mask; + rnp->expmaskinitnext |= mask; + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); + } + } + #ifdef CONFIG_HOTPLUG_CPU /* * The CPU is exiting the idle loop into the arch_cpu_idle_dead() @@@ -4208,9 -4231,12 +4231,11 @@@ void __init rcu_init(void * this is called early in boot, before either interrupts * or the scheduler are operational. */ - cpu_notifier(rcu_cpu_notify, 0); pm_notifier(rcu_pm_notify, 0); - for_each_online_cpu(cpu) + for_each_online_cpu(cpu) { - rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu); + rcutree_prepare_cpu(cpu); + rcu_cpu_starting(cpu); + } } #include "tree_exp.h"
linux-next: manual merge of the rcu tree with the tip tree
Hi Paul, Today's linux-next merge of the rcu tree got a conflict in: kernel/rcu/tree.c between commit: 4df8374254ea ("rcu: Convert rcutree to hotplug state machine") from the tip tree and commit: 2a84cde733b0 ("rcu: Exact CPU-online tracking for RCU") from the rcu tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc kernel/rcu/tree.c index e5164deb51e1,5663d1e899d3.. --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@@ -3812,54 -3809,34 +3809,80 @@@ int rcutree_prepare_cpu(unsigned int cp for_each_rcu_flavor(rsp) rcu_init_percpu_data(cpu, rsp); + + rcu_prepare_kthreads(cpu); + rcu_spawn_all_nocb_kthreads(cpu); + + return 0; +} + +static void rcutree_affinity_setting(unsigned int cpu, int outgoing) +{ + struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu); + + rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); +} + +int rcutree_online_cpu(unsigned int cpu) +{ + sync_sched_exp_online_cleanup(cpu); + rcutree_affinity_setting(cpu, -1); + return 0; +} + +int rcutree_offline_cpu(unsigned int cpu) +{ + rcutree_affinity_setting(cpu, cpu); + return 0; +} + + +int rcutree_dying_cpu(unsigned int cpu) +{ + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) + rcu_cleanup_dying_cpu(rsp); + return 0; +} + +int rcutree_dead_cpu(unsigned int cpu) +{ + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) { + rcu_cleanup_dead_cpu(cpu, rsp); + do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu)); + } + return 0; } + /* + * Mark the specified CPU as being online so that subsequent grace periods + * (both expedited and normal) will wait on it. Note that this means that + * incoming CPUs are not allowed to use RCU read-side critical sections + * until this function is called. Failing to observe this restriction + * will result in lockdep splats. + */ + void rcu_cpu_starting(unsigned int cpu) + { + unsigned long flags; + unsigned long mask; + struct rcu_data *rdp; + struct rcu_node *rnp; + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) { + rdp = this_cpu_ptr(rsp->rda); + rnp = rdp->mynode; + mask = rdp->grpmask; + raw_spin_lock_irqsave_rcu_node(rnp, flags); + rnp->qsmaskinitnext |= mask; + rnp->expmaskinitnext |= mask; + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); + } + } + #ifdef CONFIG_HOTPLUG_CPU /* * The CPU is exiting the idle loop into the arch_cpu_idle_dead() @@@ -4208,9 -4231,12 +4231,11 @@@ void __init rcu_init(void * this is called early in boot, before either interrupts * or the scheduler are operational. */ - cpu_notifier(rcu_cpu_notify, 0); pm_notifier(rcu_pm_notify, 0); - for_each_online_cpu(cpu) + for_each_online_cpu(cpu) { - rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu); + rcutree_prepare_cpu(cpu); + rcu_cpu_starting(cpu); + } } #include "tree_exp.h"
Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg
On 07/18/16 at 06:44am, Borislav Petkov wrote: > On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote: > > I would say avoiding ratelimit during boot make no much sense. Userspace > > can not > > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process > > has not run yet. > > You're right - kernel_init() sets SYSTEM_RUNNING before running the init > process. I probably should kill all that logic in the second patch. > > > I means to set printk.devkmsg=off by default, userspace can set it to > > on by sysctl. > > That can't happen: DEVKMSG_LOG_MASK_LOCK. Sorry, seems I do not get your point, suppose using the bis defined in your patch, shouldn't below work? #define DEVKMSG_LOG_MASK_DEFAULT2 Thanks Dave
Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg
On 07/18/16 at 06:44am, Borislav Petkov wrote: > On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote: > > I would say avoiding ratelimit during boot make no much sense. Userspace > > can not > > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process > > has not run yet. > > You're right - kernel_init() sets SYSTEM_RUNNING before running the init > process. I probably should kill all that logic in the second patch. > > > I means to set printk.devkmsg=off by default, userspace can set it to > > on by sysctl. > > That can't happen: DEVKMSG_LOG_MASK_LOCK. Sorry, seems I do not get your point, suppose using the bis defined in your patch, shouldn't below work? #define DEVKMSG_LOG_MASK_DEFAULT2 Thanks Dave
Re: [PATCH/RFC] Re: linux-next: build failure after merge of the luto-misc tree
Hi Arnaldo, On Fri, 15 Jul 2016 12:43:26 -0300 Arnaldo Carvalho de Melowrote: > > Ok, same results, it works, queuing this one, ack? Stephen, does it work > for you? Sorry, no. See my other email. I am cross building (if that makes a difference). -- Cheers, Stephen Rothwell
Re: [PATCH/RFC] Re: linux-next: build failure after merge of the luto-misc tree
Hi Arnaldo, On Fri, 15 Jul 2016 12:43:26 -0300 Arnaldo Carvalho de Melo wrote: > > Ok, same results, it works, queuing this one, ack? Stephen, does it work > for you? Sorry, no. See my other email. I am cross building (if that makes a difference). -- Cheers, Stephen Rothwell
linux-next: build failure after merge of the tip tree
Hi all, After merging the tip tree, today's linux-next build (x86_64 allmodconfig) failed like this: In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from elf.h:23, from builtin-check.c:33: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/string.h:5, from ../lib/str_error_r.c:4: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ cat: /home/sfr/next/x86_64_allmodconfig/tools/objtool/.str_error_r.o.d: No such file or directory Build:17: recipe for target '/home/sfr/next/x86_64_allmodconfig/tools/objtool/str_error_r.o' failed In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from elf.h:23, from special.h:22, from special.c:26: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/string.h:5, from ../lib/string.c:18: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from elf.h:23, from elf.c:30: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from arch/x86/../../elf.h:23, from arch/x86/decode.c:26: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ Makefile:42: recipe for target '/home/sfr/next/x86_64_allmodconfig/tools/objtool/objtool-in.o' failed Makefile:60: recipe for target 'objtool' failed I have added this patch for today: From: Stephen RothwellDate: Mon, 18 Jul 2016 14:58:39 +1000 Subject: [PATCH] tools: Simplify __BITS_PER_LONG define Signed-off-by: Stephen Rothwell --- tools/include/asm-generic/bitsperlong.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/include/asm-generic/bitsperlong.h b/tools/include/asm-generic/bitsperlong.h index 45eca517efb3..f46853474fd3 100644 --- a/tools/include/asm-generic/bitsperlong.h +++ b/tools/include/asm-generic/bitsperlong.h @@ -10,7 +10,8 @@ #endif #if BITS_PER_LONG != __BITS_PER_LONG -#error Inconsistent word size. Check asm/bitsperlong.h +#undef __BITS_PER_LONG +#define __BITS_PER_LONGBITS_PER_LONG #endif #ifndef BITS_PER_LONG_LONG -- 2.8.1 -- Cheers, Stephen Rothwell
linux-next: build failure after merge of the tip tree
Hi all, After merging the tip tree, today's linux-next build (x86_64 allmodconfig) failed like this: In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from elf.h:23, from builtin-check.c:33: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/string.h:5, from ../lib/str_error_r.c:4: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ cat: /home/sfr/next/x86_64_allmodconfig/tools/objtool/.str_error_r.o.d: No such file or directory Build:17: recipe for target '/home/sfr/next/x86_64_allmodconfig/tools/objtool/str_error_r.o' failed In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from elf.h:23, from special.h:22, from special.c:26: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/string.h:5, from ../lib/string.c:18: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from elf.h:23, from elf.c:30: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0, from /usr/include/asm-generic/int-ll64.h:11, from /usr/include/powerpc64le-linux-gnu/asm/types.h:27, from tools/include/linux/types.h:9, from tools/include/linux/list.h:4, from arch/x86/../../elf.h:23, from arch/x86/decode.c:26: tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word size. Check asm/bitsperlong.h #error Inconsistent word size. Check asm/bitsperlong.h ^ Makefile:42: recipe for target '/home/sfr/next/x86_64_allmodconfig/tools/objtool/objtool-in.o' failed Makefile:60: recipe for target 'objtool' failed I have added this patch for today: From: Stephen Rothwell Date: Mon, 18 Jul 2016 14:58:39 +1000 Subject: [PATCH] tools: Simplify __BITS_PER_LONG define Signed-off-by: Stephen Rothwell --- tools/include/asm-generic/bitsperlong.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/include/asm-generic/bitsperlong.h b/tools/include/asm-generic/bitsperlong.h index 45eca517efb3..f46853474fd3 100644 --- a/tools/include/asm-generic/bitsperlong.h +++ b/tools/include/asm-generic/bitsperlong.h @@ -10,7 +10,8 @@ #endif #if BITS_PER_LONG != __BITS_PER_LONG -#error Inconsistent word size. Check asm/bitsperlong.h +#undef __BITS_PER_LONG +#define __BITS_PER_LONGBITS_PER_LONG #endif #ifndef BITS_PER_LONG_LONG -- 2.8.1 -- Cheers, Stephen Rothwell
Re: linux-next: manual merge of the kspp tree with the arm64 tree
On Sun, Jul 17, 2016 at 10:06 PM, Stephen Rothwellwrote: > Hi Kees, > > On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook wrote: >> >> If I'm reading correctly, this second fixup is wrong. It should read; >> >> kasan_check_read(from, n); >> check_object_size(from, n, true); >> return __arch_copy_to_user(to, from, n); >> >> (i.e. fix double space between "return" and "__arch_copy..." in both >> chunks and add check_object_size() calls after the kasan calls in both >> chunks. > > Yep, sorry. I will fix it up tomorrow. Cool, thanks! :) -Kees -- Kees Cook Brillo & Chrome OS Security
Re: linux-next: manual merge of the kspp tree with the arm64 tree
On Sun, Jul 17, 2016 at 10:06 PM, Stephen Rothwell wrote: > Hi Kees, > > On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook wrote: >> >> If I'm reading correctly, this second fixup is wrong. It should read; >> >> kasan_check_read(from, n); >> check_object_size(from, n, true); >> return __arch_copy_to_user(to, from, n); >> >> (i.e. fix double space between "return" and "__arch_copy..." in both >> chunks and add check_object_size() calls after the kasan calls in both >> chunks. > > Yep, sorry. I will fix it up tomorrow. Cool, thanks! :) -Kees -- Kees Cook Brillo & Chrome OS Security
Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kimwrote: > The virtio pstore driver provides interface to the pstore subsystem so > that the guest kernel's log/dump message can be saved on the host > machine. Users can access the log file directly on the host, or on the > guest at the next boot using pstore filesystem. It currently deals with > kernel log (printk) buffer only, but we can extend it to have other > information (like ftrace dump) later. > > It supports legacy PCI device using single order-2 page buffer. As all > operation of pstore is synchronous, it would be fine IMHO. However I > don't know how to make write operation synchronous since it's called > with a spinlock held (from any context including NMI). > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: "Michael S. Tsirkin" > Cc: Anthony Liguori > Cc: Anton Vorontsov > Cc: Colin Cross > Cc: Kees Cook > Cc: Tony Luck > Cc: Steven Rostedt > Cc: Ingo Molnar > Cc: Minchan Kim > Cc: k...@vger.kernel.org > Cc: qemu-de...@nongnu.org > Cc: virtualizat...@lists.linux-foundation.org > Signed-off-by: Namhyung Kim This looks great to me! I'd love to use this in qemu. (Right now I go through hoops to use the ramoops backend for testing.) Reviewed-by: Kees Cook Notes below... > --- > drivers/virtio/Kconfig | 10 ++ > drivers/virtio/Makefile| 1 + > drivers/virtio/virtio_pstore.c | 317 > + > include/uapi/linux/Kbuild | 1 + > include/uapi/linux/virtio_ids.h| 1 + > include/uapi/linux/virtio_pstore.h | 53 +++ > 6 files changed, 383 insertions(+) > create mode 100644 drivers/virtio/virtio_pstore.c > create mode 100644 include/uapi/linux/virtio_pstore.h > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig > index 77590320d44c..8f0e6c796c12 100644 > --- a/drivers/virtio/Kconfig > +++ b/drivers/virtio/Kconfig > @@ -58,6 +58,16 @@ config VIRTIO_INPUT > > If unsure, say M. > > +config VIRTIO_PSTORE > + tristate "Virtio pstore driver" > + depends on VIRTIO > + depends on PSTORE > + ---help--- > +This driver supports virtio pstore devices to save/restore > +panic and oops messages on the host. > + > +If unsure, say M. > + > config VIRTIO_MMIO > tristate "Platform bus driver for memory mapped virtio devices" > depends on HAS_IOMEM && HAS_DMA > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile > index 41e30e3dc842..bee68cb26d48 100644 > --- a/drivers/virtio/Makefile > +++ b/drivers/virtio/Makefile > @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o > virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o > obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o > obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o > +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o > diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c > new file mode 100644 > index ..6fe62c0f1508 > --- /dev/null > +++ b/drivers/virtio/virtio_pstore.c > @@ -0,0 +1,317 @@ > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define VIRT_PSTORE_ORDER2 > +#define VIRT_PSTORE_BUFSIZE (4096 << VIRT_PSTORE_ORDER) > + > +struct virtio_pstore { > + struct virtio_device*vdev; > + struct virtqueue*vq; > + struct pstore_info pstore; > + struct virtio_pstore_hdr hdr; > + size_t buflen; > + u64 id; > + > + /* Waiting for host to ack */ > + wait_queue_head_t acked; > +}; > + > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id > type) > +{ > + u16 ret; > + > + switch (type) { > + case PSTORE_TYPE_DMESG: > + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG); > + break; > + default: > + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN); > + break; > + } I would love to see this support PSTORE_TYPE_CONSOLE too. It should be relatively easy to add: I think it'd just be another virtio command? > + > + return ret; > +} > + > +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 > type) > +{ > + enum pstore_type_id ret; > + > + switch (virtio16_to_cpu(vps->vdev, type)) { > + case VIRTIO_PSTORE_TYPE_DMESG: > + ret = PSTORE_TYPE_DMESG; > + break; > + default: > + ret = PSTORE_TYPE_UNKNOWN; > + break; > + } > + > + return ret; >
Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim wrote: > The virtio pstore driver provides interface to the pstore subsystem so > that the guest kernel's log/dump message can be saved on the host > machine. Users can access the log file directly on the host, or on the > guest at the next boot using pstore filesystem. It currently deals with > kernel log (printk) buffer only, but we can extend it to have other > information (like ftrace dump) later. > > It supports legacy PCI device using single order-2 page buffer. As all > operation of pstore is synchronous, it would be fine IMHO. However I > don't know how to make write operation synchronous since it's called > with a spinlock held (from any context including NMI). > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: "Michael S. Tsirkin" > Cc: Anthony Liguori > Cc: Anton Vorontsov > Cc: Colin Cross > Cc: Kees Cook > Cc: Tony Luck > Cc: Steven Rostedt > Cc: Ingo Molnar > Cc: Minchan Kim > Cc: k...@vger.kernel.org > Cc: qemu-de...@nongnu.org > Cc: virtualizat...@lists.linux-foundation.org > Signed-off-by: Namhyung Kim This looks great to me! I'd love to use this in qemu. (Right now I go through hoops to use the ramoops backend for testing.) Reviewed-by: Kees Cook Notes below... > --- > drivers/virtio/Kconfig | 10 ++ > drivers/virtio/Makefile| 1 + > drivers/virtio/virtio_pstore.c | 317 > + > include/uapi/linux/Kbuild | 1 + > include/uapi/linux/virtio_ids.h| 1 + > include/uapi/linux/virtio_pstore.h | 53 +++ > 6 files changed, 383 insertions(+) > create mode 100644 drivers/virtio/virtio_pstore.c > create mode 100644 include/uapi/linux/virtio_pstore.h > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig > index 77590320d44c..8f0e6c796c12 100644 > --- a/drivers/virtio/Kconfig > +++ b/drivers/virtio/Kconfig > @@ -58,6 +58,16 @@ config VIRTIO_INPUT > > If unsure, say M. > > +config VIRTIO_PSTORE > + tristate "Virtio pstore driver" > + depends on VIRTIO > + depends on PSTORE > + ---help--- > +This driver supports virtio pstore devices to save/restore > +panic and oops messages on the host. > + > +If unsure, say M. > + > config VIRTIO_MMIO > tristate "Platform bus driver for memory mapped virtio devices" > depends on HAS_IOMEM && HAS_DMA > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile > index 41e30e3dc842..bee68cb26d48 100644 > --- a/drivers/virtio/Makefile > +++ b/drivers/virtio/Makefile > @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o > virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o > obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o > obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o > +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o > diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c > new file mode 100644 > index ..6fe62c0f1508 > --- /dev/null > +++ b/drivers/virtio/virtio_pstore.c > @@ -0,0 +1,317 @@ > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define VIRT_PSTORE_ORDER2 > +#define VIRT_PSTORE_BUFSIZE (4096 << VIRT_PSTORE_ORDER) > + > +struct virtio_pstore { > + struct virtio_device*vdev; > + struct virtqueue*vq; > + struct pstore_info pstore; > + struct virtio_pstore_hdr hdr; > + size_t buflen; > + u64 id; > + > + /* Waiting for host to ack */ > + wait_queue_head_t acked; > +}; > + > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id > type) > +{ > + u16 ret; > + > + switch (type) { > + case PSTORE_TYPE_DMESG: > + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG); > + break; > + default: > + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN); > + break; > + } I would love to see this support PSTORE_TYPE_CONSOLE too. It should be relatively easy to add: I think it'd just be another virtio command? > + > + return ret; > +} > + > +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 > type) > +{ > + enum pstore_type_id ret; > + > + switch (virtio16_to_cpu(vps->vdev, type)) { > + case VIRTIO_PSTORE_TYPE_DMESG: > + ret = PSTORE_TYPE_DMESG; > + break; > + default: > + ret = PSTORE_TYPE_UNKNOWN; > + break; > + } > + > + return ret; > +} > + > +static void virtpstore_ack(struct virtqueue *vq) > +{ > + struct virtio_pstore *vps = vq->vdev->priv; > + > + wake_up(>acked); > +} > + > +static int virt_pstore_open(struct pstore_info *psi) > +{ > + struct virtio_pstore *vps = psi->data; > + struct
Re: [PATCH 2/3] xen-scsiback: One function call less in scsiback_device_action() after error detection
On 16/07/16 22:23, SF Markus Elfring wrote: > From: Markus Elfring> Date: Sat, 16 Jul 2016 21:42:42 +0200 > > The kfree() function was called in one case by the > scsiback_device_action() function during error handling > even if the passed variable "tmr" contained a null pointer. > > Adjust jump targets according to the Linux coding style convention. > > Signed-off-by: Markus Elfring > --- > drivers/xen/xen-scsiback.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c > index 4a48c06..7612bc9 100644 > --- a/drivers/xen/xen-scsiback.c > +++ b/drivers/xen/xen-scsiback.c > @@ -606,7 +606,7 @@ static void scsiback_device_action(struct vscsibk_pend > *pending_req, > tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL); > if (!tmr) { > target_put_sess_cmd(se_cmd); > - goto err; > + goto do_resp; > } Hmm, I'm not convinced this is an improvement. I'd rather rename the new error label to "put_cmd" and get rid of the braces in above if statement: - if (!tmr) { - target_put_sess_cmd(se_cmd); - goto err; - } + if (!tmr) + goto put_cmd; and then in the error path: -err: +put_cmd: + target_put_sess_cmd(se_cmd); +free_tmr: kfree(tmr); Juergen > > init_waitqueue_head(>tmr_wait); > @@ -616,7 +616,7 @@ static void scsiback_device_action(struct vscsibk_pend > *pending_req, > unpacked_lun, tmr, act, GFP_KERNEL, > tag, TARGET_SCF_ACK_KREF); > if (rc) > - goto err; > + goto free_tmr; > > wait_event(tmr->tmr_wait, atomic_read(>tmr_complete)); > > @@ -626,8 +626,9 @@ static void scsiback_device_action(struct vscsibk_pend > *pending_req, > scsiback_do_resp_with_sense(NULL, err, 0, pending_req); > transport_generic_free_cmd(_req->se_cmd, 1); > return; > -err: > +free_tmr: > kfree(tmr); > +do_resp: > scsiback_do_resp_with_sense(NULL, err, 0, pending_req); > } > >
Re: [PATCH 2/3] xen-scsiback: One function call less in scsiback_device_action() after error detection
On 16/07/16 22:23, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sat, 16 Jul 2016 21:42:42 +0200 > > The kfree() function was called in one case by the > scsiback_device_action() function during error handling > even if the passed variable "tmr" contained a null pointer. > > Adjust jump targets according to the Linux coding style convention. > > Signed-off-by: Markus Elfring > --- > drivers/xen/xen-scsiback.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c > index 4a48c06..7612bc9 100644 > --- a/drivers/xen/xen-scsiback.c > +++ b/drivers/xen/xen-scsiback.c > @@ -606,7 +606,7 @@ static void scsiback_device_action(struct vscsibk_pend > *pending_req, > tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL); > if (!tmr) { > target_put_sess_cmd(se_cmd); > - goto err; > + goto do_resp; > } Hmm, I'm not convinced this is an improvement. I'd rather rename the new error label to "put_cmd" and get rid of the braces in above if statement: - if (!tmr) { - target_put_sess_cmd(se_cmd); - goto err; - } + if (!tmr) + goto put_cmd; and then in the error path: -err: +put_cmd: + target_put_sess_cmd(se_cmd); +free_tmr: kfree(tmr); Juergen > > init_waitqueue_head(>tmr_wait); > @@ -616,7 +616,7 @@ static void scsiback_device_action(struct vscsibk_pend > *pending_req, > unpacked_lun, tmr, act, GFP_KERNEL, > tag, TARGET_SCF_ACK_KREF); > if (rc) > - goto err; > + goto free_tmr; > > wait_event(tmr->tmr_wait, atomic_read(>tmr_complete)); > > @@ -626,8 +626,9 @@ static void scsiback_device_action(struct vscsibk_pend > *pending_req, > scsiback_do_resp_with_sense(NULL, err, 0, pending_req); > transport_generic_free_cmd(_req->se_cmd, 1); > return; > -err: > +free_tmr: > kfree(tmr); > +do_resp: > scsiback_do_resp_with_sense(NULL, err, 0, pending_req); > } > >
Re: linux-next: manual merge of the kspp tree with the arm64 tree
Hi Kees, On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cookwrote: > > If I'm reading correctly, this second fixup is wrong. It should read; > > kasan_check_read(from, n); > check_object_size(from, n, true); > return __arch_copy_to_user(to, from, n); > > (i.e. fix double space between "return" and "__arch_copy..." in both > chunks and add check_object_size() calls after the kasan calls in both > chunks. Yep, sorry. I will fix it up tomorrow. -- Cheers, Stephen Rothwell
Re: linux-next: manual merge of the kspp tree with the arm64 tree
Hi Kees, On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook wrote: > > If I'm reading correctly, this second fixup is wrong. It should read; > > kasan_check_read(from, n); > check_object_size(from, n, true); > return __arch_copy_to_user(to, from, n); > > (i.e. fix double space between "return" and "__arch_copy..." in both > chunks and add check_object_size() calls after the kasan calls in both > chunks. Yep, sorry. I will fix it up tomorrow. -- Cheers, Stephen Rothwell
Re: [PATCH 3/3] xen-scsiback: Pass a failure indication as a constant
On 16/07/16 22:24, SF Markus Elfring wrote: > From: Markus Elfring> Date: Sat, 16 Jul 2016 21:55:01 +0200 > > Pass the constant "FAILED" in a function call directly instead of > using an intialisation for a local variable. > > Signed-off-by: Markus Elfring Reviewed-by: Juergen Gross Juergen
Re: [PATCH 3/3] xen-scsiback: Pass a failure indication as a constant
On 16/07/16 22:24, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sat, 16 Jul 2016 21:55:01 +0200 > > Pass the constant "FAILED" in a function call directly instead of > using an intialisation for a local variable. > > Signed-off-by: Markus Elfring Reviewed-by: Juergen Gross Juergen
Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps
On Thu, Jul 14, 2016 at 10:32:09AM +0200, Vlastimil Babka wrote: > On 07/14/2016 07:23 AM, Joonsoo Kim wrote: > >On Fri, Jul 08, 2016 at 11:11:47AM +0100, Mel Gorman wrote: > >>On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote: > >> > >>It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU > >>for the whole node that may or may not have lower zone pages at the end > >>of the LRU. If it does, then the allocation request will be satisfied. > >>If it does not, then kswapd will think the node is balanced and get > >>rewoken to do a zone-constrained reclaim pass. > > > >If zone-constrained request could go direct reclaim pass, there would > >be no problem. But, please assume that request is zone-constrained > >without __GFP_DIRECT_RECLAIM which is common for some device driver > >implementation. And, please assume one more thing that this request > >always comes with zone-unconstrained allocation request. In this case, > >your max() logic will set kswapd_classzone_idx to highest zone index > >and re-worken kswapd would not balance for low zone again. In the end, > >zone-constrained allocation request without __GFP_DIRECT_RECLAIM could > >fail. > > I don't think there's a problem in the scenario? Kswapd will keep > being woken up and reclaim from the node lru. It will hit and free > any low zone pages that are on the lru, even though it doesn't > "balance for low zone". Eventually it will either satisfy the > constrained allocation by reclaiming those low-zone pages during the > repeated wakeups, or the low-zone wakeups will stop coming together > with higher-zone wakeups and then it will reclaim the low-zone pages > in a single low-zone wakeup. If the zone-constrained request is not Yes, probability of this would be low. > allowed to fail, then it will just keep waking up kswapd and waiting > for the progress. If it's allowed to fail (i.e. not __GFP_NOFAIL), > but not allowed to direct reclaim, it goes "goto nopage" rather > quickly in __alloc_pages_slowpath(), without any waiting for > kswapd's progress, so there's not really much difference whether the > kswapd wakeup picked up a low classzone or not. Note the Hmm... Even if allocation could fail, we should do our best to prevent failure. Relying on luck isn't good idea to me. Thanks. > __GFP_NOFAIL but ~__GFP_DIRECT_RECLAIM is a WARN_ON_ONCE() scenario, > so definitely not common... > > >Thanks. > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps
On Thu, Jul 14, 2016 at 10:32:09AM +0200, Vlastimil Babka wrote: > On 07/14/2016 07:23 AM, Joonsoo Kim wrote: > >On Fri, Jul 08, 2016 at 11:11:47AM +0100, Mel Gorman wrote: > >>On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote: > >> > >>It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU > >>for the whole node that may or may not have lower zone pages at the end > >>of the LRU. If it does, then the allocation request will be satisfied. > >>If it does not, then kswapd will think the node is balanced and get > >>rewoken to do a zone-constrained reclaim pass. > > > >If zone-constrained request could go direct reclaim pass, there would > >be no problem. But, please assume that request is zone-constrained > >without __GFP_DIRECT_RECLAIM which is common for some device driver > >implementation. And, please assume one more thing that this request > >always comes with zone-unconstrained allocation request. In this case, > >your max() logic will set kswapd_classzone_idx to highest zone index > >and re-worken kswapd would not balance for low zone again. In the end, > >zone-constrained allocation request without __GFP_DIRECT_RECLAIM could > >fail. > > I don't think there's a problem in the scenario? Kswapd will keep > being woken up and reclaim from the node lru. It will hit and free > any low zone pages that are on the lru, even though it doesn't > "balance for low zone". Eventually it will either satisfy the > constrained allocation by reclaiming those low-zone pages during the > repeated wakeups, or the low-zone wakeups will stop coming together > with higher-zone wakeups and then it will reclaim the low-zone pages > in a single low-zone wakeup. If the zone-constrained request is not Yes, probability of this would be low. > allowed to fail, then it will just keep waking up kswapd and waiting > for the progress. If it's allowed to fail (i.e. not __GFP_NOFAIL), > but not allowed to direct reclaim, it goes "goto nopage" rather > quickly in __alloc_pages_slowpath(), without any waiting for > kswapd's progress, so there's not really much difference whether the > kswapd wakeup picked up a low classzone or not. Note the Hmm... Even if allocation could fail, we should do our best to prevent failure. Relying on luck isn't good idea to me. Thanks. > __GFP_NOFAIL but ~__GFP_DIRECT_RECLAIM is a WARN_ON_ONCE() scenario, > so definitely not common... > > >Thanks. > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH 1/3] xen-scsiback: Delete an unnecessary check before the function call "kfree"
On 16/07/16 22:22, SF Markus Elfring wrote: > From: Markus Elfring> Date: Sat, 16 Jul 2016 21:21:05 +0200 > > The kfree() function tests whether its argument is NULL and then > returns immediately. Thus the test around the call is not needed. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring Reviewed-by: Juergen Gross Juergen
Re: [PATCH 1/3] xen-scsiback: Delete an unnecessary check before the function call "kfree"
On 16/07/16 22:22, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sat, 16 Jul 2016 21:21:05 +0200 > > The kfree() function tests whether its argument is NULL and then > returns immediately. Thus the test around the call is not needed. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring Reviewed-by: Juergen Gross Juergen
[BUG] kernel BUG at arch/x86/mm/pageattr.c:216!
Hi all, I'm getting BUG_ON occurred in a panic at arch/x86/mm/pageattr.c:216! on 3.10.0-327.el7 (RHEL 7.2) I want to do a test, to expect system will reboot immediately after panic. But, in drm_fb_helper_panic, may trigger a BUG_ON at arch/x86/mm/pageattr.c:216! Does anyone has good idea to fix it? The code is like bellow: 210 static void cpa_flush_array(unsigned long *start, int numpages, int cache, 211 int in_flags, struct page **pages) 212 { 213 unsigned int i, level; 214 unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */ 215 216 BUG_ON(irqs_disabled()); 217 218 on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1); 219 220 if (!cache || do_wbinvd) 221 return; 222 223 /* 224 * We only need to flush on one CPU, 225 * clflush is a MESI-coherent instruction that 226 * will cause all other CPUs to flush the same 227 * cachelines: 228 */ 229 for (i = 0; i < numpages; i++) { 230 unsigned long addr; 231 pte_t *pte; 232 233 if (in_flags & CPA_PAGES_ARRAY) 234 addr = (unsigned long)page_address(pages[i]); 235 else 236 addr = start[i]; 237 238 pte = lookup_address(addr, ); 239 240 /* 241 * Only flush present addresses: 242 */ 243 if (pte && (pte_val(*pte) & _PAGE_PRESENT)) 244 clflush_cache_range((void *)addr, PAGE_SIZE); 245 } 246 } --- crash messages --- [ 1336.567485] test_module: call panic() function in process context 3 times. [ 1336.567542] Kernel panic - not syncing: call panic() function in process context. [ 1336.567607] CPU: 0 PID: 9566 Comm: bash Tainted: G OE V--- 3.10.0-327.el7.x86_64 [ 1336.567699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 1336.567789] 8116f900 035a0a10 88007adc7e00 81638844 [ 1336.567848] 88007adc7e80 81632097 0008 88007adc7e90 [ 1336.567943] 88007adc7e30 035a0a10 8008 88007ec0d6c8 [ 1336.567992] Call Trace: [ 1336.567992] [] ? clear_zonelist_oom+0xa0/0xa0 [ 1336.567992] [] dump_stack+0x19/0x1b [ 1336.567992] [] panic+0xd8/0x20f [ 1336.567992] [] ? clear_zonelist_oom+0xa0/0xa0 [ 1336.567992] [] dev_wr_actions+0x6d9/0xf60 [test_module] [ 1336.567992] [] dev_wr_handler+0xa6/0x120 [test_module] [ 1336.567992] [] vfs_write+0xbd/0x1e0 [ 1336.567992] [] ? trace_do_page_fault+0x43/0x110 [ 1336.567992] [] SyS_write+0x7f/0xe0 [ 1336.567992] [] system_call_fastpath+0x16/0x1b [ 1336.567992] drm_kms_helper: panic occurred, switching back to text console [ 1336.567992] [ cut here ] [ 1336.567992] kernel BUG at arch/x86/mm/pageattr.c:216! [ 1336.567992] invalid opcode: [#1] SMP [ 1336.567992] Modules linked in: test_module(O) ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables signo_catch(OV) cirrus ppdev parport_pc parport syscopyarea sysfillrect sysimgblt ttm drm_kms_helper drm serio_raw virtio_balloon i2c_piix4 i2c_core pcspkr xfs libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic pata_acpi virtio_console virtio_scsi ata_piix virtio_pci e1000 libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [ 1336.567992] CPU: 0 PID: 9566 Comm: bash Tainted: G O 3.10.0-327.el7.x86_64 [ 1336.567992] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 1336.567992] task: 88007afef300 ti: 88007adc4000 task.ti: 88007adc4000 [ 1336.567992] RIP: 0010:[] [] change_page_attr_set_clr+0x4c8/0x4d0 [ 1336.567992] RSP: 0018:88007adc7538 EFLAGS: 00010046 [ 1336.567992] RAX: 0046 RBX: RCX: 0004 [ 1336.567992] RDX: 2200 RSI: RDI: 8000 [ 1336.567992] RBP: 88007adc75d0 R08: 0010 R09: 8800 [ 1336.567992] R10: 3688 R11: 811a738f R12: 0010 [ 1336.567992] R13: R14: 0200 R15: 0005 [ 1336.567992] FS: 7fee378b1740() GS:88007ec0() knlGS: [ 1336.567992] CS: 0010 DS: ES: CR0: 80050033 [ 1336.567992] CR2:
Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps
On Thu, Jul 14, 2016 at 10:05:00AM +0100, Mel Gorman wrote: > On Thu, Jul 14, 2016 at 02:23:32PM +0900, Joonsoo Kim wrote: > > > > > > > > > And, I'd like to know why max() is used for classzone_idx rather > > > > > > than > > > > > > min()? I think that kswapd should balance the lowest zone requested. > > > > > > > > > > > > > > > > If there are two allocation requests -- one zone-constraned and the > > > > > other > > > > > zone-unconstrained, it does not make sense to have kswapd skip the > > > > > pages > > > > > usable for the zone-unconstrained and waste a load of CPU. You could > > > > > > > > I agree that, in this case, it's not good to skip the pages usable > > > > for the zone-unconstrained request. But, what I am concerned is that > > > > kswapd stop reclaim prematurely in the view of zone-constrained > > > > requestor. > > > > > > It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU > > > for the whole node that may or may not have lower zone pages at the end > > > of the LRU. If it does, then the allocation request will be satisfied. > > > If it does not, then kswapd will think the node is balanced and get > > > rewoken to do a zone-constrained reclaim pass. > > > > If zone-constrained request could go direct reclaim pass, there would > > be no problem. But, please assume that request is zone-constrained > > without __GFP_DIRECT_RECLAIM which is common for some device driver > > implementation. > > Then it's likely GFP_ATOMIC and it'll wake kswapd on each failure. If > kswapd is containtly awake for highmem requests then we're reclaiming > everything anyway. Remember that if kswapd is reclaiming for higher zones, > it'll still cover the lower zones eventually. There is no guarantee that > skipping the highmem pages will satisfy the atomic allocations any faster > but consuming the CPU to skip the pages is a definite cost. Okay. > > Even worse, skipping highmem pages when a highmem pages are required may > ake lowmem pressure worse because those pages are freed faster and can > be consumed by zone-unconstrained requests. Okay. > > If this really is a problem in practice then we can consider having > allocation requests that are zone-constrained and !__GFP_DIRECT_RECLAIM > set a flag and use the min classzone for the wakeup. That flag remains > set until kswapd takes at least one pass using the lower classzone and > clears it. The classzone will not be adjusted higher until that flag is It would work. > cleared. I don't think we should do it without evidence that it's a real > problem because kswapd potentially uses useless CPU and the potential for > higher lowmem pressure. Hmmm... I think differently. Your patch changes current behaviour without any evidence. Code simplification cannot compensate potential stability issue. Before your patch, kswapd try to balance for minimum classzone so until dis-advantage of this approach is proved, it's better to keep original logic. Thanks.
Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps
On Thu, Jul 14, 2016 at 10:05:00AM +0100, Mel Gorman wrote: > On Thu, Jul 14, 2016 at 02:23:32PM +0900, Joonsoo Kim wrote: > > > > > > > > > And, I'd like to know why max() is used for classzone_idx rather > > > > > > than > > > > > > min()? I think that kswapd should balance the lowest zone requested. > > > > > > > > > > > > > > > > If there are two allocation requests -- one zone-constraned and the > > > > > other > > > > > zone-unconstrained, it does not make sense to have kswapd skip the > > > > > pages > > > > > usable for the zone-unconstrained and waste a load of CPU. You could > > > > > > > > I agree that, in this case, it's not good to skip the pages usable > > > > for the zone-unconstrained request. But, what I am concerned is that > > > > kswapd stop reclaim prematurely in the view of zone-constrained > > > > requestor. > > > > > > It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU > > > for the whole node that may or may not have lower zone pages at the end > > > of the LRU. If it does, then the allocation request will be satisfied. > > > If it does not, then kswapd will think the node is balanced and get > > > rewoken to do a zone-constrained reclaim pass. > > > > If zone-constrained request could go direct reclaim pass, there would > > be no problem. But, please assume that request is zone-constrained > > without __GFP_DIRECT_RECLAIM which is common for some device driver > > implementation. > > Then it's likely GFP_ATOMIC and it'll wake kswapd on each failure. If > kswapd is containtly awake for highmem requests then we're reclaiming > everything anyway. Remember that if kswapd is reclaiming for higher zones, > it'll still cover the lower zones eventually. There is no guarantee that > skipping the highmem pages will satisfy the atomic allocations any faster > but consuming the CPU to skip the pages is a definite cost. Okay. > > Even worse, skipping highmem pages when a highmem pages are required may > ake lowmem pressure worse because those pages are freed faster and can > be consumed by zone-unconstrained requests. Okay. > > If this really is a problem in practice then we can consider having > allocation requests that are zone-constrained and !__GFP_DIRECT_RECLAIM > set a flag and use the min classzone for the wakeup. That flag remains > set until kswapd takes at least one pass using the lower classzone and > clears it. The classzone will not be adjusted higher until that flag is It would work. > cleared. I don't think we should do it without evidence that it's a real > problem because kswapd potentially uses useless CPU and the potential for > higher lowmem pressure. Hmmm... I think differently. Your patch changes current behaviour without any evidence. Code simplification cannot compensate potential stability issue. Before your patch, kswapd try to balance for minimum classzone so until dis-advantage of this approach is proved, it's better to keep original logic. Thanks.
[BUG] kernel BUG at arch/x86/mm/pageattr.c:216!
Hi all, I'm getting BUG_ON occurred in a panic at arch/x86/mm/pageattr.c:216! on 3.10.0-327.el7 (RHEL 7.2) I want to do a test, to expect system will reboot immediately after panic. But, in drm_fb_helper_panic, may trigger a BUG_ON at arch/x86/mm/pageattr.c:216! Does anyone has good idea to fix it? The code is like bellow: 210 static void cpa_flush_array(unsigned long *start, int numpages, int cache, 211 int in_flags, struct page **pages) 212 { 213 unsigned int i, level; 214 unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */ 215 216 BUG_ON(irqs_disabled()); 217 218 on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1); 219 220 if (!cache || do_wbinvd) 221 return; 222 223 /* 224 * We only need to flush on one CPU, 225 * clflush is a MESI-coherent instruction that 226 * will cause all other CPUs to flush the same 227 * cachelines: 228 */ 229 for (i = 0; i < numpages; i++) { 230 unsigned long addr; 231 pte_t *pte; 232 233 if (in_flags & CPA_PAGES_ARRAY) 234 addr = (unsigned long)page_address(pages[i]); 235 else 236 addr = start[i]; 237 238 pte = lookup_address(addr, ); 239 240 /* 241 * Only flush present addresses: 242 */ 243 if (pte && (pte_val(*pte) & _PAGE_PRESENT)) 244 clflush_cache_range((void *)addr, PAGE_SIZE); 245 } 246 } --- crash messages --- [ 1336.567485] test_module: call panic() function in process context 3 times. [ 1336.567542] Kernel panic - not syncing: call panic() function in process context. [ 1336.567607] CPU: 0 PID: 9566 Comm: bash Tainted: G OE V--- 3.10.0-327.el7.x86_64 [ 1336.567699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 1336.567789] 8116f900 035a0a10 88007adc7e00 81638844 [ 1336.567848] 88007adc7e80 81632097 0008 88007adc7e90 [ 1336.567943] 88007adc7e30 035a0a10 8008 88007ec0d6c8 [ 1336.567992] Call Trace: [ 1336.567992] [] ? clear_zonelist_oom+0xa0/0xa0 [ 1336.567992] [] dump_stack+0x19/0x1b [ 1336.567992] [] panic+0xd8/0x20f [ 1336.567992] [] ? clear_zonelist_oom+0xa0/0xa0 [ 1336.567992] [] dev_wr_actions+0x6d9/0xf60 [test_module] [ 1336.567992] [] dev_wr_handler+0xa6/0x120 [test_module] [ 1336.567992] [] vfs_write+0xbd/0x1e0 [ 1336.567992] [] ? trace_do_page_fault+0x43/0x110 [ 1336.567992] [] SyS_write+0x7f/0xe0 [ 1336.567992] [] system_call_fastpath+0x16/0x1b [ 1336.567992] drm_kms_helper: panic occurred, switching back to text console [ 1336.567992] [ cut here ] [ 1336.567992] kernel BUG at arch/x86/mm/pageattr.c:216! [ 1336.567992] invalid opcode: [#1] SMP [ 1336.567992] Modules linked in: test_module(O) ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables signo_catch(OV) cirrus ppdev parport_pc parport syscopyarea sysfillrect sysimgblt ttm drm_kms_helper drm serio_raw virtio_balloon i2c_piix4 i2c_core pcspkr xfs libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic pata_acpi virtio_console virtio_scsi ata_piix virtio_pci e1000 libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [ 1336.567992] CPU: 0 PID: 9566 Comm: bash Tainted: G O 3.10.0-327.el7.x86_64 [ 1336.567992] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 1336.567992] task: 88007afef300 ti: 88007adc4000 task.ti: 88007adc4000 [ 1336.567992] RIP: 0010:[] [] change_page_attr_set_clr+0x4c8/0x4d0 [ 1336.567992] RSP: 0018:88007adc7538 EFLAGS: 00010046 [ 1336.567992] RAX: 0046 RBX: RCX: 0004 [ 1336.567992] RDX: 2200 RSI: RDI: 8000 [ 1336.567992] RBP: 88007adc75d0 R08: 0010 R09: 8800 [ 1336.567992] R10: 3688 R11: 811a738f R12: 0010 [ 1336.567992] R13: R14: 0200 R15: 0005 [ 1336.567992] FS: 7fee378b1740() GS:88007ec0() knlGS: [ 1336.567992] CS: 0010 DS: ES: CR0: 80050033 [ 1336.567992] CR2:
Re: linux-next: manual merge of the kspp tree with the arm64 tree
On Sun, Jul 17, 2016 at 7:59 PM, Stephen Rothwellwrote: > Hi Kees, > > Today's linux-next merge of the kspp tree got a conflict in: > > arch/arm64/include/asm/uaccess.h > > between commit: > > bffe1baff5d5 ("arm64: kasan: instrument user memory access API") > > from the arm64 tree and commit: > > b19e7f50f056 ("arm64/uaccess: Enable hardened usercopy") > > from the kspp tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > -- > Cheers, > Stephen Rothwell > > diff --cc arch/arm64/include/asm/uaccess.h > index 5e834d10b291,1779cbdb7838.. > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@@ -264,14 -276,14 +264,15 @@@ extern unsigned long __must_check __cle > > static inline unsigned long __must_check __copy_from_user(void *to, const > void __user *from, unsigned long n) > { > + kasan_check_write(to, n); > - return __arch_copy_from_user(to, from, n); > + check_object_size(to, n, false); > + return __arch_copy_from_user(to, from, n); > } > > static inline unsigned long __must_check __copy_to_user(void __user *to, > const void *from, unsigned long n) > { > - check_object_size(from, n, true); > + kasan_check_read(from, n); > - return __arch_copy_to_user(to, from, n); > + return __arch_copy_to_user(to, from, n); If I'm reading correctly, this second fixup is wrong. It should read; kasan_check_read(from, n); check_object_size(from, n, true); return __arch_copy_to_user(to, from, n); (i.e. fix double space between "return" and "__arch_copy..." in both chunks and add check_object_size() calls after the kasan calls in both chunks. -Kees -- Kees Cook Brillo & Chrome OS Security
Re: linux-next: manual merge of the kspp tree with the arm64 tree
On Sun, Jul 17, 2016 at 7:59 PM, Stephen Rothwell wrote: > Hi Kees, > > Today's linux-next merge of the kspp tree got a conflict in: > > arch/arm64/include/asm/uaccess.h > > between commit: > > bffe1baff5d5 ("arm64: kasan: instrument user memory access API") > > from the arm64 tree and commit: > > b19e7f50f056 ("arm64/uaccess: Enable hardened usercopy") > > from the kspp tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > -- > Cheers, > Stephen Rothwell > > diff --cc arch/arm64/include/asm/uaccess.h > index 5e834d10b291,1779cbdb7838.. > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@@ -264,14 -276,14 +264,15 @@@ extern unsigned long __must_check __cle > > static inline unsigned long __must_check __copy_from_user(void *to, const > void __user *from, unsigned long n) > { > + kasan_check_write(to, n); > - return __arch_copy_from_user(to, from, n); > + check_object_size(to, n, false); > + return __arch_copy_from_user(to, from, n); > } > > static inline unsigned long __must_check __copy_to_user(void __user *to, > const void *from, unsigned long n) > { > - check_object_size(from, n, true); > + kasan_check_read(from, n); > - return __arch_copy_to_user(to, from, n); > + return __arch_copy_to_user(to, from, n); If I'm reading correctly, this second fixup is wrong. It should read; kasan_check_read(from, n); check_object_size(from, n, true); return __arch_copy_to_user(to, from, n); (i.e. fix double space between "return" and "__arch_copy..." in both chunks and add check_object_size() calls after the kasan calls in both chunks. -Kees -- Kees Cook Brillo & Chrome OS Security
Re: [PATCH 04/31] mm, vmscan: begin reclaiming pages on a per-node basis
On Thu, Jul 14, 2016 at 09:48:41AM +0200, Vlastimil Babka wrote: > On 07/14/2016 08:28 AM, Joonsoo Kim wrote: > >On Fri, Jul 08, 2016 at 11:05:32AM +0100, Mel Gorman wrote: > >>On Fri, Jul 08, 2016 at 11:28:52AM +0900, Joonsoo Kim wrote: > >>>On Thu, Jul 07, 2016 at 10:48:08AM +0100, Mel Gorman wrote: > On Thu, Jul 07, 2016 at 10:12:12AM +0900, Joonsoo Kim wrote: > >>@@ -1402,6 +1406,11 @@ static unsigned long isolate_lru_pages(unsigned > >>long nr_to_scan, > >> > >>VM_BUG_ON_PAGE(!PageLRU(page), page); > >> > >>+ if (page_zonenum(page) > sc->reclaim_idx) { > >>+ list_move(>lru, _skipped); > >>+ continue; > >>+ } > >>+ > > > >I think that we don't need to skip LRU pages in active list. What we'd > >like to do is just skipping actual reclaim since it doesn't make > >freepage that we need. It's unrelated to skip the page in active list. > > > > Why? > > The active aging is sometimes about simply aging the LRU list. Aging the > active list based on the timing of when a zone-constrained allocation > arrives > potentially introduces the same zone-balancing problems we currently have > and applying them to node-lru. > >>> > >>>Could you explain more? I don't understand why aging the active list > >>>based on the timing of when a zone-constrained allocation arrives > >>>introduces the zone-balancing problem again. > >>> > >> > >>I mispoke. Avoid rotation of the active list based on the timing of a > >>zone-constrained allocation is what I think potentially introduces problems. > >>If there are zone-constrained allocations aging the active list then I worry > >>that pages would be artificially preserved on the active list. No matter > >>what we do, there is distortion of the aging for zone-constrained allocation > >>because right now, it may deactivate high zone pages sooner than expected. > >> > >>>I think that if above logic is applied to both the active/inactive > >>>list, it could cause zone-balancing problem. LRU pages on lower zone > >>>can be resident on memory with more chance. > >> > >>If anything, with node-based LRU, it's high zone pages that can be resident > >>on memory for longer but only if there are zone-constrained allocations. > >>If we always reclaim based on age regardless of allocation requirements > >>then there is a risk that high zones are reclaimed far earlier than > >>expected. > >> > >>Basically, whether we skip pages in the active list or not there are > >>distortions with page aging and the impact is workload dependent. Right now, > >>I see no clear advantage to special casing active aging. > >> > >>If we suspect this is a problem in the future, it would be a simple matter > >>of adding an additional bool parameter to isolate_lru_pages. > > > >Okay. I agree that it would be a simple matter. > > > >> > >And, I have a concern that if inactive LRU is full with higher zone's > >LRU pages, reclaim with low reclaim_idx could be stuck. > > That is an outside possibility but unlikely given that it would require > that all outstanding allocation requests are zone-contrained. If it > happens > >>> > >>>I'm not sure that it is outside possibility. It can also happens if there > >>>is zone-contrained allocation requestor and parallel memory hogger. In > >>>this case, memory would be reclaimed by memory hogger but memory hogger > >>>would > >>>consume them again so inactive LRU is continually full with higher > >>>zone's LRU pages and zone-contrained allocation requestor cannot > >>>progress. > >>> > >> > >>The same memory hogger will also be reclaiming the highmem pages and > >>reallocating highmem pages. > >> > It would be preferred to have an actual test case for this so the > altered ratio can be tested instead of introducing code that may be > useless or dead. > >>> > >>>Yes, actual test case would be preferred. I will try to implement > >>>an artificial test case by myself but I'm not sure when I can do it. > >>> > >> > >>That would be appreciated. > > > >I make an artificial test case and test this series by using next tree > >(next-20160713) and found a regression. > > > > [...] > > >Mem-Info: > >active_anon:18779 inactive_anon:18 isolated_anon:0 > > active_file:91577 inactive_file:320615 isolated_file:0 > > unevictable:0 dirty:0 writeback:0 unstable:0 > > slab_reclaimable:6741 slab_unreclaimable:18124 > > mapped:389774 shmem:95 pagetables:18332 bounce:0 > > free:8194 free_pcp:140 free_cma:0 > >Node 0 active_anon:75116kB inactive_anon:72kB active_file:366308kB > >inactive_file:1282460kB unevictable:0kB isolated(anon):0kB > >isolated(file):0kB mapped:1559096kB dirty:0kB writeback:0kB shmem:0kB > >shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 380kB writeback_tmp:0kB > >unstable:0kB all_unreclaimable? yes > >Node 0 DMA free:2172kB min:204kB low:252kB high:300kB
Re: [PATCH 04/31] mm, vmscan: begin reclaiming pages on a per-node basis
On Thu, Jul 14, 2016 at 09:48:41AM +0200, Vlastimil Babka wrote: > On 07/14/2016 08:28 AM, Joonsoo Kim wrote: > >On Fri, Jul 08, 2016 at 11:05:32AM +0100, Mel Gorman wrote: > >>On Fri, Jul 08, 2016 at 11:28:52AM +0900, Joonsoo Kim wrote: > >>>On Thu, Jul 07, 2016 at 10:48:08AM +0100, Mel Gorman wrote: > On Thu, Jul 07, 2016 at 10:12:12AM +0900, Joonsoo Kim wrote: > >>@@ -1402,6 +1406,11 @@ static unsigned long isolate_lru_pages(unsigned > >>long nr_to_scan, > >> > >>VM_BUG_ON_PAGE(!PageLRU(page), page); > >> > >>+ if (page_zonenum(page) > sc->reclaim_idx) { > >>+ list_move(>lru, _skipped); > >>+ continue; > >>+ } > >>+ > > > >I think that we don't need to skip LRU pages in active list. What we'd > >like to do is just skipping actual reclaim since it doesn't make > >freepage that we need. It's unrelated to skip the page in active list. > > > > Why? > > The active aging is sometimes about simply aging the LRU list. Aging the > active list based on the timing of when a zone-constrained allocation > arrives > potentially introduces the same zone-balancing problems we currently have > and applying them to node-lru. > >>> > >>>Could you explain more? I don't understand why aging the active list > >>>based on the timing of when a zone-constrained allocation arrives > >>>introduces the zone-balancing problem again. > >>> > >> > >>I mispoke. Avoid rotation of the active list based on the timing of a > >>zone-constrained allocation is what I think potentially introduces problems. > >>If there are zone-constrained allocations aging the active list then I worry > >>that pages would be artificially preserved on the active list. No matter > >>what we do, there is distortion of the aging for zone-constrained allocation > >>because right now, it may deactivate high zone pages sooner than expected. > >> > >>>I think that if above logic is applied to both the active/inactive > >>>list, it could cause zone-balancing problem. LRU pages on lower zone > >>>can be resident on memory with more chance. > >> > >>If anything, with node-based LRU, it's high zone pages that can be resident > >>on memory for longer but only if there are zone-constrained allocations. > >>If we always reclaim based on age regardless of allocation requirements > >>then there is a risk that high zones are reclaimed far earlier than > >>expected. > >> > >>Basically, whether we skip pages in the active list or not there are > >>distortions with page aging and the impact is workload dependent. Right now, > >>I see no clear advantage to special casing active aging. > >> > >>If we suspect this is a problem in the future, it would be a simple matter > >>of adding an additional bool parameter to isolate_lru_pages. > > > >Okay. I agree that it would be a simple matter. > > > >> > >And, I have a concern that if inactive LRU is full with higher zone's > >LRU pages, reclaim with low reclaim_idx could be stuck. > > That is an outside possibility but unlikely given that it would require > that all outstanding allocation requests are zone-contrained. If it > happens > >>> > >>>I'm not sure that it is outside possibility. It can also happens if there > >>>is zone-contrained allocation requestor and parallel memory hogger. In > >>>this case, memory would be reclaimed by memory hogger but memory hogger > >>>would > >>>consume them again so inactive LRU is continually full with higher > >>>zone's LRU pages and zone-contrained allocation requestor cannot > >>>progress. > >>> > >> > >>The same memory hogger will also be reclaiming the highmem pages and > >>reallocating highmem pages. > >> > It would be preferred to have an actual test case for this so the > altered ratio can be tested instead of introducing code that may be > useless or dead. > >>> > >>>Yes, actual test case would be preferred. I will try to implement > >>>an artificial test case by myself but I'm not sure when I can do it. > >>> > >> > >>That would be appreciated. > > > >I make an artificial test case and test this series by using next tree > >(next-20160713) and found a regression. > > > > [...] > > >Mem-Info: > >active_anon:18779 inactive_anon:18 isolated_anon:0 > > active_file:91577 inactive_file:320615 isolated_file:0 > > unevictable:0 dirty:0 writeback:0 unstable:0 > > slab_reclaimable:6741 slab_unreclaimable:18124 > > mapped:389774 shmem:95 pagetables:18332 bounce:0 > > free:8194 free_pcp:140 free_cma:0 > >Node 0 active_anon:75116kB inactive_anon:72kB active_file:366308kB > >inactive_file:1282460kB unevictable:0kB isolated(anon):0kB > >isolated(file):0kB mapped:1559096kB dirty:0kB writeback:0kB shmem:0kB > >shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 380kB writeback_tmp:0kB > >unstable:0kB all_unreclaimable? yes > >Node 0 DMA free:2172kB min:204kB low:252kB high:300kB
Re: [PATCH V2] leds: trigger: Introduce an USB port trigger
On 18 July 2016 at 04:31, Peter Chenwrote: > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote: >> + >> +usbport trigger: >> +- usb-ports : List of USB ports that usbport should observed for turning on >> a >> + given LED. >> + > > %s/should/should be Thanks. >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c >> b/drivers/leds/trigger/ledtrig-usbport.c >> new file mode 100644 >> index 000..97b064c >> --- /dev/null >> +++ b/drivers/leds/trigger/ledtrig-usbport.c >> @@ -0,0 +1,206 @@ >> +/* >> + * USB port LED trigger >> + * >> + * Copyright (C) 2016 Rafał Miłecki >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License, or (at >> + * your option) any later version. >> + */ > > GPL v2 only. > >> +MODULE_AUTHOR("Rafał Miłecki "); >> +MODULE_DESCRIPTION("USB port trigger"); >> +MODULE_LICENSE("GPL"); > > GPL v2 What's the reason for this? I don't have any real preference, but I never heard heard about kernel/Linux preference neither. -- Rafał
Re: [PATCH V2] leds: trigger: Introduce an USB port trigger
On 18 July 2016 at 04:31, Peter Chen wrote: > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote: >> + >> +usbport trigger: >> +- usb-ports : List of USB ports that usbport should observed for turning on >> a >> + given LED. >> + > > %s/should/should be Thanks. >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c >> b/drivers/leds/trigger/ledtrig-usbport.c >> new file mode 100644 >> index 000..97b064c >> --- /dev/null >> +++ b/drivers/leds/trigger/ledtrig-usbport.c >> @@ -0,0 +1,206 @@ >> +/* >> + * USB port LED trigger >> + * >> + * Copyright (C) 2016 Rafał Miłecki >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License, or (at >> + * your option) any later version. >> + */ > > GPL v2 only. > >> +MODULE_AUTHOR("Rafał Miłecki "); >> +MODULE_DESCRIPTION("USB port trigger"); >> +MODULE_LICENSE("GPL"); > > GPL v2 What's the reason for this? I don't have any real preference, but I never heard heard about kernel/Linux preference neither. -- Rafał
Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg
On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote: > I would say avoiding ratelimit during boot make no much sense. Userspace can > not > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process > has not run yet. You're right - kernel_init() sets SYSTEM_RUNNING before running the init process. I probably should kill all that logic in the second patch. > I means to set printk.devkmsg=off by default, userspace can set it to > on by sysctl. That can't happen: DEVKMSG_LOG_MASK_LOCK. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg
On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote: > I would say avoiding ratelimit during boot make no much sense. Userspace can > not > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process > has not run yet. You're right - kernel_init() sets SYSTEM_RUNNING before running the init process. I probably should kill all that logic in the second patch. > I means to set printk.devkmsg=off by default, userspace can set it to > on by sysctl. That can't happen: DEVKMSG_LOG_MASK_LOCK. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls
On Thu, Jul 14, 2016 at 1:07 AM, Ingo Molnarwrote: > > * Andy Lutomirski wrote: > >> On Wed, Jul 13, 2016 at 12:56 AM, Ingo Molnar wrote: >> > >> > * Andy Lutomirski wrote: >> > >> >> > If we push a PKRU value into a thread between the rdpkru() and >> >> > wrpkru(), we'll >> >> > lose the content of that "push". I'm not sure there's any way to >> >> > guarantee >> >> > this with a user-controlled register. >> >> >> >> We could try to insist that user code uses some vsyscall helper that >> >> tracks >> >> which bits are as-yet-unassigned. That's quite messy, though. >> > >> > Actually, if we turned the vDSO into something more like a minimal >> > user-space >> > library with the ability to run at process startup as well to prepare stuff >> > then it's painful to get right only *once*, and there will be tons of other >> > areas where a proper per thread data storage on the user-space side would >> > be >> > immensely useful! >> >> Doing this could be tricky: how exactly is the vDSO supposed to find >> per-thread >> data without breaking existing glibc? > > So I think the way this could be done is by allocating it itself. The vDSO vma > itself is 'external' to glibc as well to begin with - this would be a small > extension to that concept. But how does the vdso code find it? FS and GS are both spoken for by existing userspace. --Andy
Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls
On Thu, Jul 14, 2016 at 1:07 AM, Ingo Molnar wrote: > > * Andy Lutomirski wrote: > >> On Wed, Jul 13, 2016 at 12:56 AM, Ingo Molnar wrote: >> > >> > * Andy Lutomirski wrote: >> > >> >> > If we push a PKRU value into a thread between the rdpkru() and >> >> > wrpkru(), we'll >> >> > lose the content of that "push". I'm not sure there's any way to >> >> > guarantee >> >> > this with a user-controlled register. >> >> >> >> We could try to insist that user code uses some vsyscall helper that >> >> tracks >> >> which bits are as-yet-unassigned. That's quite messy, though. >> > >> > Actually, if we turned the vDSO into something more like a minimal >> > user-space >> > library with the ability to run at process startup as well to prepare stuff >> > then it's painful to get right only *once*, and there will be tons of other >> > areas where a proper per thread data storage on the user-space side would >> > be >> > immensely useful! >> >> Doing this could be tricky: how exactly is the vDSO supposed to find >> per-thread >> data without breaking existing glibc? > > So I think the way this could be done is by allocating it itself. The vDSO vma > itself is 'external' to glibc as well to begin with - this would be a small > extension to that concept. But how does the vdso code find it? FS and GS are both spoken for by existing userspace. --Andy
[PATCH 2/3] qemu: Implement virtio-pstore device
From: Namhyung KimAdd virtio pstore device to allow kernel log files saved on the host. It will save the log files on the directory given by pstore device option. $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ... (guest) # echo c > /proc/sysrq-trigger $ ls dir-xx dmesg-0.enc.z dmesg-1.enc.z The log files are usually compressed using zlib. Users can see the log messages directly on the host or on the guest (using pstore filesystem). Cc: Paolo Bonzini Cc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: qemu-de...@nongnu.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Namhyung Kim --- hw/virtio/Makefile.objs| 2 +- hw/virtio/virtio-pci.c | 50 hw/virtio/virtio-pci.h | 14 + hw/virtio/virtio-pstore.c | 328 + include/hw/pci/pci.h | 1 + include/hw/virtio/virtio-pstore.h | 30 ++ include/standard-headers/linux/virtio_ids.h| 1 + .../linux/{virtio_ids.h => virtio_pstore.h}| 48 +-- qdev-monitor.c | 1 + 9 files changed, 455 insertions(+), 20 deletions(-) create mode 100644 hw/virtio/virtio-pstore.c create mode 100644 include/hw/virtio/virtio-pstore.h copy include/standard-headers/linux/{virtio_ids.h => virtio_pstore.h} (63%) diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index 3e2b175..aae7082 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -4,4 +4,4 @@ common-obj-y += virtio-bus.o common-obj-y += virtio-mmio.o obj-y += virtio.o virtio-balloon.o -obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o +obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o virtio-pstore.o diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 2b34b43..8281b80 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2416,6 +2416,55 @@ static const TypeInfo virtio_host_pci_info = { }; #endif +/* virtio-pstore-pci */ + +static void virtio_pstore_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) +{ +VirtIOPstorePCI *vps = VIRTIO_PSTORE_PCI(vpci_dev); +DeviceState *vdev = DEVICE(>vdev); +Error *err = NULL; + +qdev_set_parent_bus(vdev, BUS(_dev->bus)); +object_property_set_bool(OBJECT(vdev), true, "realized", ); +if (err) { +error_propagate(errp, err); +return; +} +} + +static void virtio_pstore_pci_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass); +PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass); + +k->realize = virtio_pstore_pci_realize; +set_bit(DEVICE_CATEGORY_MISC, dc->categories); + +pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET; +pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PSTORE; +pcidev_k->revision = VIRTIO_PCI_ABI_VERSION; +pcidev_k->class_id = PCI_CLASS_OTHERS; +} + +static void virtio_pstore_pci_instance_init(Object *obj) +{ +VirtIOPstorePCI *dev = VIRTIO_PSTORE_PCI(obj); + +virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev), +TYPE_VIRTIO_PSTORE); +object_property_add_alias(obj, "directory", OBJECT(>vdev), + "directory", _abort); +} + +static const TypeInfo virtio_pstore_pci_info = { +.name = TYPE_VIRTIO_PSTORE_PCI, +.parent= TYPE_VIRTIO_PCI, +.instance_size = sizeof(VirtIOPstorePCI), +.instance_init = virtio_pstore_pci_instance_init, +.class_init= virtio_pstore_pci_class_init, +}; + /* virtio-pci-bus */ static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size, @@ -2485,6 +2534,7 @@ static void virtio_pci_register_types(void) #ifdef CONFIG_VHOST_SCSI type_register_static(_scsi_pci_info); #endif +type_register_static(_pstore_pci_info); } type_init(virtio_pci_register_types) diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h index e4548c2..b4c039f 100644 --- a/hw/virtio/virtio-pci.h +++ b/hw/virtio/virtio-pci.h @@ -31,6 +31,7 @@ #ifdef CONFIG_VHOST_SCSI #include "hw/virtio/vhost-scsi.h" #endif +#include "hw/virtio/virtio-pstore.h" typedef struct VirtIOPCIProxy VirtIOPCIProxy; typedef struct VirtIOBlkPCI VirtIOBlkPCI; @@ -44,6 +45,7 @@ typedef struct VirtIOInputPCI VirtIOInputPCI; typedef struct VirtIOInputHIDPCI VirtIOInputHIDPCI; typedef struct VirtIOInputHostPCI VirtIOInputHostPCI; typedef struct
[PATCH 2/3] qemu: Implement virtio-pstore device
From: Namhyung Kim Add virtio pstore device to allow kernel log files saved on the host. It will save the log files on the directory given by pstore device option. $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ... (guest) # echo c > /proc/sysrq-trigger $ ls dir-xx dmesg-0.enc.z dmesg-1.enc.z The log files are usually compressed using zlib. Users can see the log messages directly on the host or on the guest (using pstore filesystem). Cc: Paolo Bonzini Cc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: qemu-de...@nongnu.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Namhyung Kim --- hw/virtio/Makefile.objs| 2 +- hw/virtio/virtio-pci.c | 50 hw/virtio/virtio-pci.h | 14 + hw/virtio/virtio-pstore.c | 328 + include/hw/pci/pci.h | 1 + include/hw/virtio/virtio-pstore.h | 30 ++ include/standard-headers/linux/virtio_ids.h| 1 + .../linux/{virtio_ids.h => virtio_pstore.h}| 48 +-- qdev-monitor.c | 1 + 9 files changed, 455 insertions(+), 20 deletions(-) create mode 100644 hw/virtio/virtio-pstore.c create mode 100644 include/hw/virtio/virtio-pstore.h copy include/standard-headers/linux/{virtio_ids.h => virtio_pstore.h} (63%) diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index 3e2b175..aae7082 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -4,4 +4,4 @@ common-obj-y += virtio-bus.o common-obj-y += virtio-mmio.o obj-y += virtio.o virtio-balloon.o -obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o +obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o virtio-pstore.o diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 2b34b43..8281b80 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2416,6 +2416,55 @@ static const TypeInfo virtio_host_pci_info = { }; #endif +/* virtio-pstore-pci */ + +static void virtio_pstore_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) +{ +VirtIOPstorePCI *vps = VIRTIO_PSTORE_PCI(vpci_dev); +DeviceState *vdev = DEVICE(>vdev); +Error *err = NULL; + +qdev_set_parent_bus(vdev, BUS(_dev->bus)); +object_property_set_bool(OBJECT(vdev), true, "realized", ); +if (err) { +error_propagate(errp, err); +return; +} +} + +static void virtio_pstore_pci_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass); +PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass); + +k->realize = virtio_pstore_pci_realize; +set_bit(DEVICE_CATEGORY_MISC, dc->categories); + +pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET; +pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PSTORE; +pcidev_k->revision = VIRTIO_PCI_ABI_VERSION; +pcidev_k->class_id = PCI_CLASS_OTHERS; +} + +static void virtio_pstore_pci_instance_init(Object *obj) +{ +VirtIOPstorePCI *dev = VIRTIO_PSTORE_PCI(obj); + +virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev), +TYPE_VIRTIO_PSTORE); +object_property_add_alias(obj, "directory", OBJECT(>vdev), + "directory", _abort); +} + +static const TypeInfo virtio_pstore_pci_info = { +.name = TYPE_VIRTIO_PSTORE_PCI, +.parent= TYPE_VIRTIO_PCI, +.instance_size = sizeof(VirtIOPstorePCI), +.instance_init = virtio_pstore_pci_instance_init, +.class_init= virtio_pstore_pci_class_init, +}; + /* virtio-pci-bus */ static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size, @@ -2485,6 +2534,7 @@ static void virtio_pci_register_types(void) #ifdef CONFIG_VHOST_SCSI type_register_static(_scsi_pci_info); #endif +type_register_static(_pstore_pci_info); } type_init(virtio_pci_register_types) diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h index e4548c2..b4c039f 100644 --- a/hw/virtio/virtio-pci.h +++ b/hw/virtio/virtio-pci.h @@ -31,6 +31,7 @@ #ifdef CONFIG_VHOST_SCSI #include "hw/virtio/vhost-scsi.h" #endif +#include "hw/virtio/virtio-pstore.h" typedef struct VirtIOPCIProxy VirtIOPCIProxy; typedef struct VirtIOBlkPCI VirtIOBlkPCI; @@ -44,6 +45,7 @@ typedef struct VirtIOInputPCI VirtIOInputPCI; typedef struct VirtIOInputHIDPCI VirtIOInputHIDPCI; typedef struct VirtIOInputHostPCI VirtIOInputHostPCI; typedef struct VirtIOGPUPCI VirtIOGPUPCI; +typedef struct VirtIOPstorePCI VirtIOPstorePCI; /* virtio-pci-bus */ @@ -311,6 +313,18 @@ struct VirtIOGPUPCI { VirtIOGPU vdev; }; +/* + * virtio-pstore-pci: This extends VirtioPCIProxy. + */ +#define
[RFC/PATCHSET 0/3] virtio-pstore: Implement virtio pstore device
Hello, This patchset is a proof of concept of virtio-pstore idea [1]. It has some rough edges and I'm not familiar with this area, so please give me feedbacks and advices if I'm going to a wrong direction. It started from the fact that dumping ftrace buffer at kernel oops/panic takes too much time. Although there's a way to reduce the size of the original data, sometimes I want to have the information as many as possible. Maybe kexec/kdump can solve this problem but it consumes some portion of guest memory so I'd like to avoid it. And I know the qemu + crashtool can dump and analyze the whole guest memory including the ftrace buffer without wasting guest memory, but it adds one more layer and has some limitation as an out-of-tree tool like not being in sync with the kernel changes. So I think it'd be great using the pstore interface to dump guest kernel data on the host. One can read the data on the host directly or on the guest (at the next boot) using pstore filesystem as usual. While this patchset only implements dumping kernel log buffer, it can be extended to have ftrace buffer and probably some more.. The patch 0001 implements virtio pstore driver. It has a single virt queue, pstore buffer and header structure. The virtio_pstore_hdr struct is to give information about the current pstore operation. The patch 0002 and 0003 implement virtio-pstore legacy PCI device on qemu-kvm and kvmtool respectively. I referenced virtio-baloon and virtio-rng implementations and I don't know whether kvmtool supports modern virtio 1.0+ spec. For example, using virtio-pstore on qemu looks like below: $ qemu-system-x86_64 -enable-kvm -device virtio-pstore,directory=xxx When guest kernel gets panic the log messages will be saved under the xxx directory. $ ls xxx dmesg-0.enc.z dmesg-1.enc.z As you can see the pstore subsystem compresses the log data using zlib. The data can be extracted with the following command: $ cat xxx/dmesg-0.enc.z | \ > python -c 'import sys, zlib; print(zlib.decompress(sys.stdin.read()))' Oops#1 Part1 <5>[0.00] Linux version 4.6.0kvm+ (namhyung@danjae) (gcc version 5.3.0 (GCC) ) #145 SMP Mon Jul 18 10:22:45 KST 2016 <6>[0.00] Command line: root=/dev/vda console=ttyS0 <6>[0.00] x86/fpu: Legacy x87 FPU detected. <6>[0.00] x86/fpu: Using 'eager' FPU context switches. <6>[0.00] e820: BIOS-provided physical RAM map: <6>[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable <6>[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved <6>[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved <6>[0.00] BIOS-e820: [mem 0x0010-0x07fddfff] usable <6>[0.00] BIOS-e820: [mem 0x07fde000-0x07ff] reserved <6>[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved <6>[0.00] BIOS-e820: [mem 0xfffc-0x] reserved <6>[0.00] NX (Execute Disable) protection: active <6>[0.00] SMBIOS 2.8 present. <7>[0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 ... Maybe we can add a config option to control the compression later. Cc: Paolo BonziniCc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: qemu-de...@nongnu.org Cc: virtualizat...@lists.linux-foundation.org [1] https://lkml.org/lkml/2016/7/1/6 Thanks, Namhyung
[RFC/PATCHSET 0/3] virtio-pstore: Implement virtio pstore device
Hello, This patchset is a proof of concept of virtio-pstore idea [1]. It has some rough edges and I'm not familiar with this area, so please give me feedbacks and advices if I'm going to a wrong direction. It started from the fact that dumping ftrace buffer at kernel oops/panic takes too much time. Although there's a way to reduce the size of the original data, sometimes I want to have the information as many as possible. Maybe kexec/kdump can solve this problem but it consumes some portion of guest memory so I'd like to avoid it. And I know the qemu + crashtool can dump and analyze the whole guest memory including the ftrace buffer without wasting guest memory, but it adds one more layer and has some limitation as an out-of-tree tool like not being in sync with the kernel changes. So I think it'd be great using the pstore interface to dump guest kernel data on the host. One can read the data on the host directly or on the guest (at the next boot) using pstore filesystem as usual. While this patchset only implements dumping kernel log buffer, it can be extended to have ftrace buffer and probably some more.. The patch 0001 implements virtio pstore driver. It has a single virt queue, pstore buffer and header structure. The virtio_pstore_hdr struct is to give information about the current pstore operation. The patch 0002 and 0003 implement virtio-pstore legacy PCI device on qemu-kvm and kvmtool respectively. I referenced virtio-baloon and virtio-rng implementations and I don't know whether kvmtool supports modern virtio 1.0+ spec. For example, using virtio-pstore on qemu looks like below: $ qemu-system-x86_64 -enable-kvm -device virtio-pstore,directory=xxx When guest kernel gets panic the log messages will be saved under the xxx directory. $ ls xxx dmesg-0.enc.z dmesg-1.enc.z As you can see the pstore subsystem compresses the log data using zlib. The data can be extracted with the following command: $ cat xxx/dmesg-0.enc.z | \ > python -c 'import sys, zlib; print(zlib.decompress(sys.stdin.read()))' Oops#1 Part1 <5>[0.00] Linux version 4.6.0kvm+ (namhyung@danjae) (gcc version 5.3.0 (GCC) ) #145 SMP Mon Jul 18 10:22:45 KST 2016 <6>[0.00] Command line: root=/dev/vda console=ttyS0 <6>[0.00] x86/fpu: Legacy x87 FPU detected. <6>[0.00] x86/fpu: Using 'eager' FPU context switches. <6>[0.00] e820: BIOS-provided physical RAM map: <6>[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable <6>[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved <6>[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved <6>[0.00] BIOS-e820: [mem 0x0010-0x07fddfff] usable <6>[0.00] BIOS-e820: [mem 0x07fde000-0x07ff] reserved <6>[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved <6>[0.00] BIOS-e820: [mem 0xfffc-0x] reserved <6>[0.00] NX (Execute Disable) protection: active <6>[0.00] SMBIOS 2.8 present. <7>[0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 ... Maybe we can add a config option to control the compression later. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: qemu-de...@nongnu.org Cc: virtualizat...@lists.linux-foundation.org [1] https://lkml.org/lkml/2016/7/1/6 Thanks, Namhyung
[PATCH 1/3] virtio: Basic implementation of virtio pstore driver
The virtio pstore driver provides interface to the pstore subsystem so that the guest kernel's log/dump message can be saved on the host machine. Users can access the log file directly on the host, or on the guest at the next boot using pstore filesystem. It currently deals with kernel log (printk) buffer only, but we can extend it to have other information (like ftrace dump) later. It supports legacy PCI device using single order-2 page buffer. As all operation of pstore is synchronous, it would be fine IMHO. However I don't know how to make write operation synchronous since it's called with a spinlock held (from any context including NMI). Cc: Paolo BonziniCc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: qemu-de...@nongnu.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Namhyung Kim --- drivers/virtio/Kconfig | 10 ++ drivers/virtio/Makefile| 1 + drivers/virtio/virtio_pstore.c | 317 + include/uapi/linux/Kbuild | 1 + include/uapi/linux/virtio_ids.h| 1 + include/uapi/linux/virtio_pstore.h | 53 +++ 6 files changed, 383 insertions(+) create mode 100644 drivers/virtio/virtio_pstore.c create mode 100644 include/uapi/linux/virtio_pstore.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 77590320d44c..8f0e6c796c12 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -58,6 +58,16 @@ config VIRTIO_INPUT If unsure, say M. +config VIRTIO_PSTORE + tristate "Virtio pstore driver" + depends on VIRTIO + depends on PSTORE + ---help--- +This driver supports virtio pstore devices to save/restore +panic and oops messages on the host. + +If unsure, say M. + config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices" depends on HAS_IOMEM && HAS_DMA diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 41e30e3dc842..bee68cb26d48 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c new file mode 100644 index ..6fe62c0f1508 --- /dev/null +++ b/drivers/virtio/virtio_pstore.c @@ -0,0 +1,317 @@ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include + +#define VIRT_PSTORE_ORDER2 +#define VIRT_PSTORE_BUFSIZE (4096 << VIRT_PSTORE_ORDER) + +struct virtio_pstore { + struct virtio_device*vdev; + struct virtqueue*vq; + struct pstore_info pstore; + struct virtio_pstore_hdr hdr; + size_t buflen; + u64 id; + + /* Waiting for host to ack */ + wait_queue_head_t acked; +}; + +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type) +{ + u16 ret; + + switch (type) { + case PSTORE_TYPE_DMESG: + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG); + break; + default: + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN); + break; + } + + return ret; +} + +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 type) +{ + enum pstore_type_id ret; + + switch (virtio16_to_cpu(vps->vdev, type)) { + case VIRTIO_PSTORE_TYPE_DMESG: + ret = PSTORE_TYPE_DMESG; + break; + default: + ret = PSTORE_TYPE_UNKNOWN; + break; + } + + return ret; +} + +static void virtpstore_ack(struct virtqueue *vq) +{ + struct virtio_pstore *vps = vq->vdev->priv; + + wake_up(>acked); +} + +static int virt_pstore_open(struct pstore_info *psi) +{ + struct virtio_pstore *vps = psi->data; + struct virtio_pstore_hdr *hdr = >hdr; + struct scatterlist sg[1]; + unsigned int len; + + hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_OPEN); + + sg_init_one(sg, hdr, sizeof(*hdr)); + virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL); + virtqueue_kick(vps->vq); + + wait_event(vps->acked, virtqueue_get_buf(vps->vq, )); + return 0; +} + +static int
[PATCH 1/3] virtio: Basic implementation of virtio pstore driver
The virtio pstore driver provides interface to the pstore subsystem so that the guest kernel's log/dump message can be saved on the host machine. Users can access the log file directly on the host, or on the guest at the next boot using pstore filesystem. It currently deals with kernel log (printk) buffer only, but we can extend it to have other information (like ftrace dump) later. It supports legacy PCI device using single order-2 page buffer. As all operation of pstore is synchronous, it would be fine IMHO. However I don't know how to make write operation synchronous since it's called with a spinlock held (from any context including NMI). Cc: Paolo Bonzini Cc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: qemu-de...@nongnu.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Namhyung Kim --- drivers/virtio/Kconfig | 10 ++ drivers/virtio/Makefile| 1 + drivers/virtio/virtio_pstore.c | 317 + include/uapi/linux/Kbuild | 1 + include/uapi/linux/virtio_ids.h| 1 + include/uapi/linux/virtio_pstore.h | 53 +++ 6 files changed, 383 insertions(+) create mode 100644 drivers/virtio/virtio_pstore.c create mode 100644 include/uapi/linux/virtio_pstore.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 77590320d44c..8f0e6c796c12 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -58,6 +58,16 @@ config VIRTIO_INPUT If unsure, say M. +config VIRTIO_PSTORE + tristate "Virtio pstore driver" + depends on VIRTIO + depends on PSTORE + ---help--- +This driver supports virtio pstore devices to save/restore +panic and oops messages on the host. + +If unsure, say M. + config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices" depends on HAS_IOMEM && HAS_DMA diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 41e30e3dc842..bee68cb26d48 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c new file mode 100644 index ..6fe62c0f1508 --- /dev/null +++ b/drivers/virtio/virtio_pstore.c @@ -0,0 +1,317 @@ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include + +#define VIRT_PSTORE_ORDER2 +#define VIRT_PSTORE_BUFSIZE (4096 << VIRT_PSTORE_ORDER) + +struct virtio_pstore { + struct virtio_device*vdev; + struct virtqueue*vq; + struct pstore_info pstore; + struct virtio_pstore_hdr hdr; + size_t buflen; + u64 id; + + /* Waiting for host to ack */ + wait_queue_head_t acked; +}; + +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type) +{ + u16 ret; + + switch (type) { + case PSTORE_TYPE_DMESG: + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG); + break; + default: + ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN); + break; + } + + return ret; +} + +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 type) +{ + enum pstore_type_id ret; + + switch (virtio16_to_cpu(vps->vdev, type)) { + case VIRTIO_PSTORE_TYPE_DMESG: + ret = PSTORE_TYPE_DMESG; + break; + default: + ret = PSTORE_TYPE_UNKNOWN; + break; + } + + return ret; +} + +static void virtpstore_ack(struct virtqueue *vq) +{ + struct virtio_pstore *vps = vq->vdev->priv; + + wake_up(>acked); +} + +static int virt_pstore_open(struct pstore_info *psi) +{ + struct virtio_pstore *vps = psi->data; + struct virtio_pstore_hdr *hdr = >hdr; + struct scatterlist sg[1]; + unsigned int len; + + hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_OPEN); + + sg_init_one(sg, hdr, sizeof(*hdr)); + virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL); + virtqueue_kick(vps->vq); + + wait_event(vps->acked, virtqueue_get_buf(vps->vq, )); + return 0; +} + +static int virt_pstore_close(struct pstore_info *psi) +{ + struct virtio_pstore *vps = psi->data; + struct virtio_pstore_hdr *hdr = >hdr; + struct scatterlist sg[1]; + unsigned int len; + + hdr->cmd =
[PATCH 3/3] kvmtool: Implement virtio-pstore device
Add virtio pstore device to allow kernel log messages saved on the host. With this patch, it will save the log files under directory given by --pstore option. $ lkvm run --pstore=dir-xx (guest) # echo c > /proc/sysrq-trigger $ ls dir-xx dmesg-0.enc.z dmesg-1.enc.z The log files are usually compressed using zlib. User can easily see the messages on the host or on the guest (using pstore filesystem). Cc: Paolo BonziniCc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Namhyung Kim --- Makefile | 1 + builtin-run.c| 2 + include/kvm/kvm-config.h | 1 + include/kvm/virtio-pci-dev.h | 2 + include/kvm/virtio-pstore.h | 31 include/linux/virtio_ids.h | 1 + virtio/pstore.c | 359 +++ 7 files changed, 397 insertions(+) create mode 100644 include/kvm/virtio-pstore.h create mode 100644 virtio/pstore.c diff --git a/Makefile b/Makefile index 1f0196f..d7462b9 100644 --- a/Makefile +++ b/Makefile @@ -67,6 +67,7 @@ OBJS += virtio/net.o OBJS += virtio/rng.o OBJS+= virtio/balloon.o OBJS += virtio/pci.o +OBJS += virtio/pstore.o OBJS += disk/blk.o OBJS += disk/qcow.o OBJS += disk/raw.o diff --git a/builtin-run.c b/builtin-run.c index 72b878d..08c12dd 100644 --- a/builtin-run.c +++ b/builtin-run.c @@ -128,6 +128,8 @@ void kvm_run_set_wrapper_sandbox(void) " rootfs"), \ OPT_STRING('\0', "hugetlbfs", &(cfg)->hugetlbfs_path, "path", \ "Hugetlbfs path"), \ + OPT_STRING('\0', "pstore", &(cfg)->pstore_path, "path", \ + "pstore data path"),\ \ OPT_GROUP("Kernel options:"), \ OPT_STRING('k', "kernel", &(cfg)->kernel_filename, "kernel",\ diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h index 386fa8c..42b7651 100644 --- a/include/kvm/kvm-config.h +++ b/include/kvm/kvm-config.h @@ -45,6 +45,7 @@ struct kvm_config { const char *hugetlbfs_path; const char *custom_rootfs_name; const char *real_cmdline; + const char *pstore_path; struct virtio_net_params *net_params; bool single_step; bool vnc; diff --git a/include/kvm/virtio-pci-dev.h b/include/kvm/virtio-pci-dev.h index 48ae018..4339d94 100644 --- a/include/kvm/virtio-pci-dev.h +++ b/include/kvm/virtio-pci-dev.h @@ -15,6 +15,7 @@ #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 #define PCI_DEVICE_ID_VIRTIO_SCSI 0x1008 #define PCI_DEVICE_ID_VIRTIO_9P0x1009 +#define PCI_DEVICE_ID_VIRTIO_PSTORE0x100a #define PCI_DEVICE_ID_VESA 0x2000 #define PCI_DEVICE_ID_PCI_SHMEM0x0001 @@ -34,5 +35,6 @@ #define PCI_CLASS_RNG 0xff #define PCI_CLASS_BLN 0xff #define PCI_CLASS_9P 0xff +#define PCI_CLASS_PSTORE 0xff #endif /* VIRTIO_PCI_DEV_H_ */ diff --git a/include/kvm/virtio-pstore.h b/include/kvm/virtio-pstore.h new file mode 100644 index 000..293ab57 --- /dev/null +++ b/include/kvm/virtio-pstore.h @@ -0,0 +1,31 @@ +#ifndef KVM__PSTORE_VIRTIO_H +#define KVM__PSTORE_VIRTIO_H + +struct kvm; + +#define VIRTIO_PSTORE_TYPE_UNKNOWN 0 +#define VIRTIO_PSTORE_TYPE_DMESG1 + +#define VIRTIO_PSTORE_CMD_NULL 0 +#define VIRTIO_PSTORE_CMD_OPEN 1 +#define VIRTIO_PSTORE_CMD_READ 2 +#define VIRTIO_PSTORE_CMD_WRITE 3 +#define VIRTIO_PSTORE_CMD_ERASE 4 +#define VIRTIO_PSTORE_CMD_CLOSE 5 + +#define VIRTIO_PSTORE_FL_COMPRESSED 1 + +struct pstore_hdr { + u64 id; + u32 flags; + u16 cmd; + u16 type; + u64 time_sec; + u32 time_nsec; + u32 unused; +}; + +int virtio_pstore__init(struct kvm *kvm); +int virtio_pstore__exit(struct kvm *kvm); + +#endif /* KVM__PSTORE_VIRTIO_H */ diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h index 5f60aa4..f34cabc 100644 --- a/include/linux/virtio_ids.h +++ b/include/linux/virtio_ids.h @@ -40,5 +40,6 @@ #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
[PATCH 3/3] kvmtool: Implement virtio-pstore device
Add virtio pstore device to allow kernel log messages saved on the host. With this patch, it will save the log files under directory given by --pstore option. $ lkvm run --pstore=dir-xx (guest) # echo c > /proc/sysrq-trigger $ ls dir-xx dmesg-0.enc.z dmesg-1.enc.z The log files are usually compressed using zlib. User can easily see the messages on the host or on the guest (using pstore filesystem). Cc: Paolo Bonzini Cc: Radim Krčmář Cc: "Michael S. Tsirkin" Cc: Anthony Liguori Cc: Anton Vorontsov Cc: Colin Cross Cc: Kees Cook Cc: Tony Luck Cc: Steven Rostedt Cc: Ingo Molnar Cc: Minchan Kim Cc: k...@vger.kernel.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Namhyung Kim --- Makefile | 1 + builtin-run.c| 2 + include/kvm/kvm-config.h | 1 + include/kvm/virtio-pci-dev.h | 2 + include/kvm/virtio-pstore.h | 31 include/linux/virtio_ids.h | 1 + virtio/pstore.c | 359 +++ 7 files changed, 397 insertions(+) create mode 100644 include/kvm/virtio-pstore.h create mode 100644 virtio/pstore.c diff --git a/Makefile b/Makefile index 1f0196f..d7462b9 100644 --- a/Makefile +++ b/Makefile @@ -67,6 +67,7 @@ OBJS += virtio/net.o OBJS += virtio/rng.o OBJS+= virtio/balloon.o OBJS += virtio/pci.o +OBJS += virtio/pstore.o OBJS += disk/blk.o OBJS += disk/qcow.o OBJS += disk/raw.o diff --git a/builtin-run.c b/builtin-run.c index 72b878d..08c12dd 100644 --- a/builtin-run.c +++ b/builtin-run.c @@ -128,6 +128,8 @@ void kvm_run_set_wrapper_sandbox(void) " rootfs"), \ OPT_STRING('\0', "hugetlbfs", &(cfg)->hugetlbfs_path, "path", \ "Hugetlbfs path"), \ + OPT_STRING('\0', "pstore", &(cfg)->pstore_path, "path", \ + "pstore data path"),\ \ OPT_GROUP("Kernel options:"), \ OPT_STRING('k', "kernel", &(cfg)->kernel_filename, "kernel",\ diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h index 386fa8c..42b7651 100644 --- a/include/kvm/kvm-config.h +++ b/include/kvm/kvm-config.h @@ -45,6 +45,7 @@ struct kvm_config { const char *hugetlbfs_path; const char *custom_rootfs_name; const char *real_cmdline; + const char *pstore_path; struct virtio_net_params *net_params; bool single_step; bool vnc; diff --git a/include/kvm/virtio-pci-dev.h b/include/kvm/virtio-pci-dev.h index 48ae018..4339d94 100644 --- a/include/kvm/virtio-pci-dev.h +++ b/include/kvm/virtio-pci-dev.h @@ -15,6 +15,7 @@ #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 #define PCI_DEVICE_ID_VIRTIO_SCSI 0x1008 #define PCI_DEVICE_ID_VIRTIO_9P0x1009 +#define PCI_DEVICE_ID_VIRTIO_PSTORE0x100a #define PCI_DEVICE_ID_VESA 0x2000 #define PCI_DEVICE_ID_PCI_SHMEM0x0001 @@ -34,5 +35,6 @@ #define PCI_CLASS_RNG 0xff #define PCI_CLASS_BLN 0xff #define PCI_CLASS_9P 0xff +#define PCI_CLASS_PSTORE 0xff #endif /* VIRTIO_PCI_DEV_H_ */ diff --git a/include/kvm/virtio-pstore.h b/include/kvm/virtio-pstore.h new file mode 100644 index 000..293ab57 --- /dev/null +++ b/include/kvm/virtio-pstore.h @@ -0,0 +1,31 @@ +#ifndef KVM__PSTORE_VIRTIO_H +#define KVM__PSTORE_VIRTIO_H + +struct kvm; + +#define VIRTIO_PSTORE_TYPE_UNKNOWN 0 +#define VIRTIO_PSTORE_TYPE_DMESG1 + +#define VIRTIO_PSTORE_CMD_NULL 0 +#define VIRTIO_PSTORE_CMD_OPEN 1 +#define VIRTIO_PSTORE_CMD_READ 2 +#define VIRTIO_PSTORE_CMD_WRITE 3 +#define VIRTIO_PSTORE_CMD_ERASE 4 +#define VIRTIO_PSTORE_CMD_CLOSE 5 + +#define VIRTIO_PSTORE_FL_COMPRESSED 1 + +struct pstore_hdr { + u64 id; + u32 flags; + u16 cmd; + u16 type; + u64 time_sec; + u32 time_nsec; + u32 unused; +}; + +int virtio_pstore__init(struct kvm *kvm); +int virtio_pstore__exit(struct kvm *kvm); + +#endif /* KVM__PSTORE_VIRTIO_H */ diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h index 5f60aa4..f34cabc 100644 --- a/include/linux/virtio_ids.h +++ b/include/linux/virtio_ids.h @@ -40,5 +40,6 @@ #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */ #define VIRTIO_ID_CAIF12 /* Virtio caif */ #define VIRTIO_ID_INPUT18 /* virtio input */ +#define VIRTIO_ID_PSTORE 19 /* virtio pstore */ #endif /* _LINUX_VIRTIO_IDS_H */ diff --git a/virtio/pstore.c b/virtio/pstore.c new
Re: [PATCH v3 12/17] mm, compaction: more reliably increase direct compaction priority
On Fri, Jul 15, 2016 at 03:37:52PM +0200, Vlastimil Babka wrote: > On 07/06/2016 07:39 AM, Joonsoo Kim wrote: > > On Fri, Jun 24, 2016 at 11:54:32AM +0200, Vlastimil Babka wrote: > >> During reclaim/compaction loop, compaction priority can be increased by the > >> should_compact_retry() function, but the current code is not optimal. > >> Priority > >> is only increased when compaction_failed() is true, which means that > >> compaction > >> has scanned the whole zone. This may not happen even after multiple > >> attempts > >> with the lower priority due to parallel activity, so we might needlessly > >> struggle on the lower priority and possibly run out of compaction retry > >> attempts in the process. > >> > >> We can remove these corner cases by increasing compaction priority > >> regardless > >> of compaction_failed(). Examining further the compaction result can be > >> postponed only after reaching the highest priority. This is a simple > >> solution > >> and we don't need to worry about reaching the highest priority "too soon" > >> here, > >> because hen should_compact_retry() is called it means that the system is > >> already struggling and the allocation is supposed to either try as hard as > >> possible, or it cannot fail at all. There's not much point staying at lower > >> priorities with heuristics that may result in only partial compaction. > >> Also we now count compaction retries only after reaching the highest > >> priority. > > > > I'm not sure that this patch is safe. Deferring and skip-bit in > > compaction is highly related to reclaim/compaction. Just ignoring them and > > (almost) > > unconditionally increasing compaction priority will result in less > > reclaim and less success rate on compaction. > > I don't see why less reclaim? Reclaim is always attempted before > compaction and compaction priority doesn't affect it. And as long as > reclaim wants to retry, should_compact_retry() isn't even called, so the > priority stays. I wanted to change that in v1, but Michal suggested I > shouldn't. I assume the situation that there is no !costly highorder freepage because of fragmentation. In this case, should_reclaim_retry() would return false since watermark cannot be met due to absence of high order freepage. Now, please see should_compact_retry() with assumption that there are enough order-0 free pages. Reclaim/compaction is only retried two times (SYNC_LIGHT and SYNC_FULL) with your patchset since compaction_withdrawn() return false with enough freepages and !COMPACT_SKIPPED. But, before your patchset, COMPACT_PARTIAL_SKIPPED and COMPACT_DEFERRED is considered as withdrawn so will retry reclaim/compaction more times. As I said before, more reclaim (more freepage) increase migration scanner's scan range and then increase compaction success probability. Therefore, your patchset which makes reclaim/compaction retry less times deterministically would not be safe. > > > And, as a necessarily, it > > would trigger OOM more frequently. > > OOM is only allowed for costly orders. If reclaim itself doesn't want to > retry for non-costly orders anymore, and we finally start calling > should_compact_retry(), then I guess the system is really struggling > already and eventual OOM wouldn't be premature? Premature is really subjective so I don't know. Anyway, I tested your patchset with simple test case and it causes a regression. My test setup is: Mem: 512 MB vm.compact_unevictable_allowed = 0 Mlocked Mem: 225 MB by using mlock(). With some tricks, mlocked pages are spread so memory is highly fragmented. fork 500 This test causes OOM with your patchset but not without your patchset. Thanks. > > It would not be your fault. This patch is reasonable in current > > situation. It just makes current things more deterministic > > although I dislike that current things and this patch would amplify > > those problem. > > > > Thanks. > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH v3 12/17] mm, compaction: more reliably increase direct compaction priority
On Fri, Jul 15, 2016 at 03:37:52PM +0200, Vlastimil Babka wrote: > On 07/06/2016 07:39 AM, Joonsoo Kim wrote: > > On Fri, Jun 24, 2016 at 11:54:32AM +0200, Vlastimil Babka wrote: > >> During reclaim/compaction loop, compaction priority can be increased by the > >> should_compact_retry() function, but the current code is not optimal. > >> Priority > >> is only increased when compaction_failed() is true, which means that > >> compaction > >> has scanned the whole zone. This may not happen even after multiple > >> attempts > >> with the lower priority due to parallel activity, so we might needlessly > >> struggle on the lower priority and possibly run out of compaction retry > >> attempts in the process. > >> > >> We can remove these corner cases by increasing compaction priority > >> regardless > >> of compaction_failed(). Examining further the compaction result can be > >> postponed only after reaching the highest priority. This is a simple > >> solution > >> and we don't need to worry about reaching the highest priority "too soon" > >> here, > >> because hen should_compact_retry() is called it means that the system is > >> already struggling and the allocation is supposed to either try as hard as > >> possible, or it cannot fail at all. There's not much point staying at lower > >> priorities with heuristics that may result in only partial compaction. > >> Also we now count compaction retries only after reaching the highest > >> priority. > > > > I'm not sure that this patch is safe. Deferring and skip-bit in > > compaction is highly related to reclaim/compaction. Just ignoring them and > > (almost) > > unconditionally increasing compaction priority will result in less > > reclaim and less success rate on compaction. > > I don't see why less reclaim? Reclaim is always attempted before > compaction and compaction priority doesn't affect it. And as long as > reclaim wants to retry, should_compact_retry() isn't even called, so the > priority stays. I wanted to change that in v1, but Michal suggested I > shouldn't. I assume the situation that there is no !costly highorder freepage because of fragmentation. In this case, should_reclaim_retry() would return false since watermark cannot be met due to absence of high order freepage. Now, please see should_compact_retry() with assumption that there are enough order-0 free pages. Reclaim/compaction is only retried two times (SYNC_LIGHT and SYNC_FULL) with your patchset since compaction_withdrawn() return false with enough freepages and !COMPACT_SKIPPED. But, before your patchset, COMPACT_PARTIAL_SKIPPED and COMPACT_DEFERRED is considered as withdrawn so will retry reclaim/compaction more times. As I said before, more reclaim (more freepage) increase migration scanner's scan range and then increase compaction success probability. Therefore, your patchset which makes reclaim/compaction retry less times deterministically would not be safe. > > > And, as a necessarily, it > > would trigger OOM more frequently. > > OOM is only allowed for costly orders. If reclaim itself doesn't want to > retry for non-costly orders anymore, and we finally start calling > should_compact_retry(), then I guess the system is really struggling > already and eventual OOM wouldn't be premature? Premature is really subjective so I don't know. Anyway, I tested your patchset with simple test case and it causes a regression. My test setup is: Mem: 512 MB vm.compact_unevictable_allowed = 0 Mlocked Mem: 225 MB by using mlock(). With some tricks, mlocked pages are spread so memory is highly fragmented. fork 500 This test causes OOM with your patchset but not without your patchset. Thanks. > > It would not be your fault. This patch is reasonable in current > > situation. It just makes current things more deterministic > > although I dislike that current things and this patch would amplify > > those problem. > > > > Thanks. > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
[PATCH] Staging: ks7010: michael_mic: fixed macros coding style issue
Fixed coding style issue: Enclose multiple statements macros definition in a do while loop. Use one space around binary operators. Signed-off-by: Sunbing--- drivers/staging/ks7010/michael_mic.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/staging/ks7010/michael_mic.c b/drivers/staging/ks7010/michael_mic.c index e14c109..d332678 100644 --- a/drivers/staging/ks7010/michael_mic.c +++ b/drivers/staging/ks7010/michael_mic.c @@ -20,15 +20,21 @@ #define getUInt32( A, B ) (uint32_t)(A[B+0] << 0) + (A[B+1] << 8) + (A[B+2] << 16) + (A[B+3] << 24) // Convert from UInt32 to Byte[] in a portable way -#define putUInt32( A, B, C ) A[B+0] = (uint8_t) (C & 0xff); \ - A[B+1] = (uint8_t) ((C>>8) & 0xff); \ - A[B+2] = (uint8_t) ((C>>16) & 0xff);\ - A[B+3] = (uint8_t) ((C>>24) & 0xff) +#define putUInt32(A, B, C) \ +do { \ + A[B + 0] = (uint8_t)(C & 0xff); \ + A[B + 1] = (uint8_t)((C >> 8) & 0xff); \ + A[B + 2] = (uint8_t)((C >> 16) & 0xff); \ + A[B + 3] = (uint8_t)((C >> 24) & 0xff); \ +} while (0) // Reset the state to the empty message. -#define MichaelClear( A ) A->L = A->K0; \ - A->R = A->K1; \ - A->nBytesInM = 0; +#define MichaelClear(A)\ +do { \ + A->L = A->K0; \ + A->R = A->K1; \ + A->nBytesInM = 0; \ +} while (0) static void MichaelInitializeFunction(struct michel_mic_t *Mic, uint8_t * key) -- 2.1.0
[PATCH] Staging: ks7010: michael_mic: fixed macros coding style issue
Fixed coding style issue: Enclose multiple statements macros definition in a do while loop. Use one space around binary operators. Signed-off-by: Sunbing --- drivers/staging/ks7010/michael_mic.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/staging/ks7010/michael_mic.c b/drivers/staging/ks7010/michael_mic.c index e14c109..d332678 100644 --- a/drivers/staging/ks7010/michael_mic.c +++ b/drivers/staging/ks7010/michael_mic.c @@ -20,15 +20,21 @@ #define getUInt32( A, B ) (uint32_t)(A[B+0] << 0) + (A[B+1] << 8) + (A[B+2] << 16) + (A[B+3] << 24) // Convert from UInt32 to Byte[] in a portable way -#define putUInt32( A, B, C ) A[B+0] = (uint8_t) (C & 0xff); \ - A[B+1] = (uint8_t) ((C>>8) & 0xff); \ - A[B+2] = (uint8_t) ((C>>16) & 0xff);\ - A[B+3] = (uint8_t) ((C>>24) & 0xff) +#define putUInt32(A, B, C) \ +do { \ + A[B + 0] = (uint8_t)(C & 0xff); \ + A[B + 1] = (uint8_t)((C >> 8) & 0xff); \ + A[B + 2] = (uint8_t)((C >> 16) & 0xff); \ + A[B + 3] = (uint8_t)((C >> 24) & 0xff); \ +} while (0) // Reset the state to the empty message. -#define MichaelClear( A ) A->L = A->K0; \ - A->R = A->K1; \ - A->nBytesInM = 0; +#define MichaelClear(A)\ +do { \ + A->L = A->K0; \ + A->R = A->K1; \ + A->nBytesInM = 0; \ +} while (0) static void MichaelInitializeFunction(struct michel_mic_t *Mic, uint8_t * key) -- 2.1.0
Re: [PATCH 1/1] tracing, bpf: Implement function bpf_probe_write
On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote: > > +static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) > +{ > + void *to = (void *) (long) r1; > + void *from = (void *) (long) r2; > + int size = (int) r3; > + > + /* check if we're in a user context */ > + if (unlikely(in_interrupt())) > + return -EINVAL; > + if (unlikely(!current->pid)) > + return -EINVAL; > + > + return copy_to_user(to, from, size); > +} thanks for the patch, unfortunately it's not that straightforward. copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and you'll see the splat since bpf programs are protected by rcu. Also 'current' can be null and I'm not sure what current->pid does. So the writing to user memory either has to be verified to avoid sleeping and faults or we need to use something like task_work_add mechanism. Ideas are certainly welcome.
Re: [PATCH 1/1] tracing, bpf: Implement function bpf_probe_write
On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote: > > +static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) > +{ > + void *to = (void *) (long) r1; > + void *from = (void *) (long) r2; > + int size = (int) r3; > + > + /* check if we're in a user context */ > + if (unlikely(in_interrupt())) > + return -EINVAL; > + if (unlikely(!current->pid)) > + return -EINVAL; > + > + return copy_to_user(to, from, size); > +} thanks for the patch, unfortunately it's not that straightforward. copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and you'll see the splat since bpf programs are protected by rcu. Also 'current' can be null and I'm not sure what current->pid does. So the writing to user memory either has to be verified to avoid sleeping and faults or we need to use something like task_work_add mechanism. Ideas are certainly welcome.
linux-next: manual merge of the device-mapper tree with the block tree
Hi all, Today's linux-next merge of the device-mapper tree got a conflict in: include/linux/blkdev.h between commit: 288dab8a35a0 ("block: add a separate operation type for secure erase") from the block tree and commit: ff6bbdd8ef75 ("block: add QUEUE_FLAG_DAX for devices to advertise their DAX support") from the device-mapper tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/linux/blkdev.h index 9ae49ccaac95,1493ab3a537f.. --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@@ -592,8 -593,9 +593,9 @@@ static inline void queue_flag_clear(uns #define blk_queue_stackable(q)\ test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags) #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) -#define blk_queue_secdiscard(q) (blk_queue_discard(q) && \ - test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags)) +#define blk_queue_secure_erase(q) \ + (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags)) + #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
linux-next: manual merge of the device-mapper tree with the block tree
Hi all, Today's linux-next merge of the device-mapper tree got a conflict in: include/linux/blkdev.h between commit: 288dab8a35a0 ("block: add a separate operation type for secure erase") from the block tree and commit: ff6bbdd8ef75 ("block: add QUEUE_FLAG_DAX for devices to advertise their DAX support") from the device-mapper tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/linux/blkdev.h index 9ae49ccaac95,1493ab3a537f.. --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@@ -592,8 -593,9 +593,9 @@@ static inline void queue_flag_clear(uns #define blk_queue_stackable(q)\ test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags) #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) -#define blk_queue_secdiscard(q) (blk_queue_discard(q) && \ - test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags)) +#define blk_queue_secure_erase(q) \ + (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags)) + #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
[PATCH v2 04/10] binfmt_flat: clean up create_flat_tables() and stack accesses
In addition to better code clarity, this brings proper usage of user memory accessors everywhere the stack is touched. This is essential for making this work on MMU systems. Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 117 ++- 1 file changed, 63 insertions(+), 54 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 64feb873f0..9538901fe8 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -115,50 +115,58 @@ static int flat_core_dump(struct coredump_params *cprm) /* * create_flat_tables() parses the env- and arg-strings in new user * memory and creates the pointer tables from them, and puts their - * addresses on the "stack", returning the new stack pointer value. + * addresses on the "stack", recording the new stack pointer value. */ -static unsigned long create_flat_tables( - unsigned long pp, - struct linux_binprm * bprm) +static int create_flat_tables(struct linux_binprm * bprm, unsigned long arg_start) { - unsigned long *argv,*envp; - unsigned long * sp; - char * p = (char*)pp; - int argc = bprm->argc; - int envc = bprm->envc; - char uninitialized_var(dummy); - - sp = (unsigned long *)p; - sp -= (envc + argc + 2) + 1 + (flat_argvp_envp_on_stack() ? 2 : 0); - sp = (unsigned long *) ((unsigned long)sp & -FLAT_STACK_ALIGN); - argv = sp + 1 + (flat_argvp_envp_on_stack() ? 2 : 0); - envp = argv + (argc + 1); + char __user *p; + unsigned long __user *sp; + long i, len; + p = (char __user *)arg_start; + sp = (unsigned long __user *)current->mm->start_stack; + + sp -= bprm->envc + 1; + sp -= bprm->argc + 1; + sp -= flat_argvp_envp_on_stack() ? 2 : 0; + sp -= 1; /* */ + + current->mm->start_stack = (unsigned long)sp & -FLAT_STACK_ALIGN; + sp = (unsigned long __user *)current->mm->start_stack; + + __put_user(bprm->argc, sp++); if (flat_argvp_envp_on_stack()) { - put_user((unsigned long) envp, sp + 2); - put_user((unsigned long) argv, sp + 1); - } - - put_user(argc, sp); - current->mm->arg_start = (unsigned long) p; - while (argc-->0) { - put_user((unsigned long) p, argv++); - do { - get_user(dummy, p); p++; - } while (dummy); - } - put_user((unsigned long) NULL, argv); - current->mm->arg_end = current->mm->env_start = (unsigned long) p; - while (envc-->0) { - put_user((unsigned long)p, envp); envp++; - do { - get_user(dummy, p); p++; - } while (dummy); - } - put_user((unsigned long) NULL, envp); - current->mm->env_end = (unsigned long) p; - return (unsigned long)sp; + unsigned long argv, envp; + argv = (unsigned long)(sp + 2); + envp = (unsigned long)(sp + 2 + bprm->argc + 1); + __put_user(argv, sp++); + __put_user(envp, sp++); + } + + current->mm->arg_start = (unsigned long)p; + for (i = bprm->argc; i > 0; i--) { + __put_user((unsigned long)p, sp++); + len = strnlen_user(p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + p += len; + } + __put_user(0, sp++); + current->mm->arg_end = (unsigned long)p; + + current->mm->env_start = (unsigned long) p; + for (i = bprm->envc; i > 0; i--) { + __put_user((unsigned long)p, sp++); + len = strnlen_user(p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + p += len; + } + __put_user(0, sp++); + current->mm->env_end = (unsigned long)p; + + return 0; } // @@ -854,7 +862,7 @@ static int load_flat_binary(struct linux_binprm * bprm) { struct lib_info libinfo; struct pt_regs *regs = current_pt_regs(); - unsigned long sp, stack_len; + unsigned long stack_len; unsigned long start_addr; int res; int i, j; @@ -868,11 +876,10 @@ static int load_flat_binary(struct linux_binprm * bprm) * pedantic and include space for the argv/envp array as it may have * a lot of entries. */ -#define TOP_OF_ARGS (PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *)) - stack_len = TOP_OF_ARGS - bprm->p; /* the strings */ - stack_len += (bprm->argc + 1) * sizeof(char *); /* the argv array */ - stack_len += (bprm->envc + 1) * sizeof(char *); /* the envp array */ - stack_len += FLAT_STACK_ALIGN - 1; /* reserve for upcoming alignment */ + stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p;
[PATCH v2 04/10] binfmt_flat: clean up create_flat_tables() and stack accesses
In addition to better code clarity, this brings proper usage of user memory accessors everywhere the stack is touched. This is essential for making this work on MMU systems. Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 117 ++- 1 file changed, 63 insertions(+), 54 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 64feb873f0..9538901fe8 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -115,50 +115,58 @@ static int flat_core_dump(struct coredump_params *cprm) /* * create_flat_tables() parses the env- and arg-strings in new user * memory and creates the pointer tables from them, and puts their - * addresses on the "stack", returning the new stack pointer value. + * addresses on the "stack", recording the new stack pointer value. */ -static unsigned long create_flat_tables( - unsigned long pp, - struct linux_binprm * bprm) +static int create_flat_tables(struct linux_binprm * bprm, unsigned long arg_start) { - unsigned long *argv,*envp; - unsigned long * sp; - char * p = (char*)pp; - int argc = bprm->argc; - int envc = bprm->envc; - char uninitialized_var(dummy); - - sp = (unsigned long *)p; - sp -= (envc + argc + 2) + 1 + (flat_argvp_envp_on_stack() ? 2 : 0); - sp = (unsigned long *) ((unsigned long)sp & -FLAT_STACK_ALIGN); - argv = sp + 1 + (flat_argvp_envp_on_stack() ? 2 : 0); - envp = argv + (argc + 1); + char __user *p; + unsigned long __user *sp; + long i, len; + p = (char __user *)arg_start; + sp = (unsigned long __user *)current->mm->start_stack; + + sp -= bprm->envc + 1; + sp -= bprm->argc + 1; + sp -= flat_argvp_envp_on_stack() ? 2 : 0; + sp -= 1; /* */ + + current->mm->start_stack = (unsigned long)sp & -FLAT_STACK_ALIGN; + sp = (unsigned long __user *)current->mm->start_stack; + + __put_user(bprm->argc, sp++); if (flat_argvp_envp_on_stack()) { - put_user((unsigned long) envp, sp + 2); - put_user((unsigned long) argv, sp + 1); - } - - put_user(argc, sp); - current->mm->arg_start = (unsigned long) p; - while (argc-->0) { - put_user((unsigned long) p, argv++); - do { - get_user(dummy, p); p++; - } while (dummy); - } - put_user((unsigned long) NULL, argv); - current->mm->arg_end = current->mm->env_start = (unsigned long) p; - while (envc-->0) { - put_user((unsigned long)p, envp); envp++; - do { - get_user(dummy, p); p++; - } while (dummy); - } - put_user((unsigned long) NULL, envp); - current->mm->env_end = (unsigned long) p; - return (unsigned long)sp; + unsigned long argv, envp; + argv = (unsigned long)(sp + 2); + envp = (unsigned long)(sp + 2 + bprm->argc + 1); + __put_user(argv, sp++); + __put_user(envp, sp++); + } + + current->mm->arg_start = (unsigned long)p; + for (i = bprm->argc; i > 0; i--) { + __put_user((unsigned long)p, sp++); + len = strnlen_user(p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + p += len; + } + __put_user(0, sp++); + current->mm->arg_end = (unsigned long)p; + + current->mm->env_start = (unsigned long) p; + for (i = bprm->envc; i > 0; i--) { + __put_user((unsigned long)p, sp++); + len = strnlen_user(p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + p += len; + } + __put_user(0, sp++); + current->mm->env_end = (unsigned long)p; + + return 0; } // @@ -854,7 +862,7 @@ static int load_flat_binary(struct linux_binprm * bprm) { struct lib_info libinfo; struct pt_regs *regs = current_pt_regs(); - unsigned long sp, stack_len; + unsigned long stack_len; unsigned long start_addr; int res; int i, j; @@ -868,11 +876,10 @@ static int load_flat_binary(struct linux_binprm * bprm) * pedantic and include space for the argv/envp array as it may have * a lot of entries. */ -#define TOP_OF_ARGS (PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *)) - stack_len = TOP_OF_ARGS - bprm->p; /* the strings */ - stack_len += (bprm->argc + 1) * sizeof(char *); /* the argv array */ - stack_len += (bprm->envc + 1) * sizeof(char *); /* the envp array */ - stack_len += FLAT_STACK_ALIGN - 1; /* reserve for upcoming alignment */ + stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */
[PATCH v2 08/10] binfmt_flat: update libraries' data segment pointer with userspace accessors
This is needed on systems with a MMU. This also gets rid of the strangest C code I've seen lateli i.e. an integer indexed with a pointer value within square brackets. That really looked backwards. Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index e981e66bb5..3221ed9d7c 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -902,12 +902,19 @@ static int load_flat_binary(struct linux_binprm * bprm) return res; /* Update data segment pointers for all libraries */ - for (i=0; i
[PATCH v2 09/10] binfmt_flat: add MMU-specific support
Not much else to do at this point except for the different stack setups. SuperH and Xtensa could be added to the allowed list if they implement __put_user_unaligned() and __get_user_unaligned(). Signed-off-by: Nicolas Pitre--- fs/Kconfig.binfmt | 3 ++- fs/binfmt_flat.c | 16 +--- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt index 72c03354c1..4c09d93d95 100644 --- a/fs/Kconfig.binfmt +++ b/fs/Kconfig.binfmt @@ -89,7 +89,8 @@ config BINFMT_SCRIPT config BINFMT_FLAT bool "Kernel support for flat binaries" - depends on !MMU && (!FRV || BROKEN) + depends on !MMU || ARM || M68K + depends on !FRV || BROKEN help Support uClinux FLAT format binaries. diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 3221ed9d7c..4cb0c4b6ae 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -546,7 +546,7 @@ static int load_flat_file(struct linux_binprm * bprm, * case, and then the fully copied to RAM case which lumps * it all together. */ - if ((flags & (FLAT_FLAG_RAM|FLAT_FLAG_GZIP)) == 0) { + if (!IS_ENABLED(CONFIG_MMU) && !(flags & (FLAT_FLAG_RAM|FLAT_FLAG_GZIP))) { /* * this should give us a ROM ptr, but if it doesn't we don't * really care @@ -687,7 +687,9 @@ static int load_flat_file(struct linux_binprm * bprm, */ current->mm->start_brk = datapos + data_len + bss_len; current->mm->brk = (current->mm->start_brk + 3) & ~3; +#ifndef CONFIG_MMU current->mm->context.end_brk = memp + memp_size - stack_len; +#endif } if (flags & FLAT_FLAG_KTRACE) { @@ -878,7 +880,7 @@ static int load_flat_binary(struct linux_binprm * bprm) { struct lib_info libinfo; struct pt_regs *regs = current_pt_regs(); - unsigned long stack_len; + unsigned long stack_len = 0; unsigned long start_addr; int res; int i, j; @@ -892,7 +894,9 @@ static int load_flat_binary(struct linux_binprm * bprm) * pedantic and include space for the argv/envp array as it may have * a lot of entries. */ - stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */ +#ifndef CONFIG_MMU + stack_len += PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */ +#endif stack_len += (bprm->argc + 1) * sizeof(char *); /* the argv array */ stack_len += (bprm->envc + 1) * sizeof(char *); /* the envp array */ stack_len = ALIGN(stack_len, FLAT_STACK_ALIGN); @@ -920,6 +924,11 @@ static int load_flat_binary(struct linux_binprm * bprm) set_binfmt(_format); +#ifdef CONFIG_MMU + res = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT); + if (!res) + res = create_flat_tables(bprm, bprm->p); +#else /* Stash our initial stack pointer into the mm structure */ current->mm->start_stack = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4; @@ -929,6 +938,7 @@ static int load_flat_binary(struct linux_binprm * bprm) res = transfer_args_to_stack(bprm, >mm->start_stack); if (!res) res = create_flat_tables(bprm, current->mm->start_stack); +#endif if (res) return res; -- 2.7.4
[PATCH v2 08/10] binfmt_flat: update libraries' data segment pointer with userspace accessors
This is needed on systems with a MMU. This also gets rid of the strangest C code I've seen lateli i.e. an integer indexed with a pointer value within square brackets. That really looked backwards. Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index e981e66bb5..3221ed9d7c 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -902,12 +902,19 @@ static int load_flat_binary(struct linux_binprm * bprm) return res; /* Update data segment pointers for all libraries */ - for (i=0; i
[PATCH v2 09/10] binfmt_flat: add MMU-specific support
Not much else to do at this point except for the different stack setups. SuperH and Xtensa could be added to the allowed list if they implement __put_user_unaligned() and __get_user_unaligned(). Signed-off-by: Nicolas Pitre --- fs/Kconfig.binfmt | 3 ++- fs/binfmt_flat.c | 16 +--- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt index 72c03354c1..4c09d93d95 100644 --- a/fs/Kconfig.binfmt +++ b/fs/Kconfig.binfmt @@ -89,7 +89,8 @@ config BINFMT_SCRIPT config BINFMT_FLAT bool "Kernel support for flat binaries" - depends on !MMU && (!FRV || BROKEN) + depends on !MMU || ARM || M68K + depends on !FRV || BROKEN help Support uClinux FLAT format binaries. diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 3221ed9d7c..4cb0c4b6ae 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -546,7 +546,7 @@ static int load_flat_file(struct linux_binprm * bprm, * case, and then the fully copied to RAM case which lumps * it all together. */ - if ((flags & (FLAT_FLAG_RAM|FLAT_FLAG_GZIP)) == 0) { + if (!IS_ENABLED(CONFIG_MMU) && !(flags & (FLAT_FLAG_RAM|FLAT_FLAG_GZIP))) { /* * this should give us a ROM ptr, but if it doesn't we don't * really care @@ -687,7 +687,9 @@ static int load_flat_file(struct linux_binprm * bprm, */ current->mm->start_brk = datapos + data_len + bss_len; current->mm->brk = (current->mm->start_brk + 3) & ~3; +#ifndef CONFIG_MMU current->mm->context.end_brk = memp + memp_size - stack_len; +#endif } if (flags & FLAT_FLAG_KTRACE) { @@ -878,7 +880,7 @@ static int load_flat_binary(struct linux_binprm * bprm) { struct lib_info libinfo; struct pt_regs *regs = current_pt_regs(); - unsigned long stack_len; + unsigned long stack_len = 0; unsigned long start_addr; int res; int i, j; @@ -892,7 +894,9 @@ static int load_flat_binary(struct linux_binprm * bprm) * pedantic and include space for the argv/envp array as it may have * a lot of entries. */ - stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */ +#ifndef CONFIG_MMU + stack_len += PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */ +#endif stack_len += (bprm->argc + 1) * sizeof(char *); /* the argv array */ stack_len += (bprm->envc + 1) * sizeof(char *); /* the envp array */ stack_len = ALIGN(stack_len, FLAT_STACK_ALIGN); @@ -920,6 +924,11 @@ static int load_flat_binary(struct linux_binprm * bprm) set_binfmt(_format); +#ifdef CONFIG_MMU + res = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT); + if (!res) + res = create_flat_tables(bprm, bprm->p); +#else /* Stash our initial stack pointer into the mm structure */ current->mm->start_stack = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4; @@ -929,6 +938,7 @@ static int load_flat_binary(struct linux_binprm * bprm) res = transfer_args_to_stack(bprm, >mm->start_stack); if (!res) res = create_flat_tables(bprm, current->mm->start_stack); +#endif if (res) return res; -- 2.7.4
[PATCH 1/1] netfilter: Add helper array register/unregister functions
From: Gao FengAdd nf_ct_helper_init, nf_conntrack_helpers_register/unregister functions to enhance the conntrack helper codes. Signed-off-by: Gao Feng --- include/net/netfilter/nf_conntrack_helper.h | 16 ++ net/netfilter/nf_conntrack_ftp.c| 58 +++--- net/netfilter/nf_conntrack_helper.c | 58 ++ net/netfilter/nf_conntrack_irc.c| 36 +- net/netfilter/nf_conntrack_sane.c | 55 +++-- net/netfilter/nf_conntrack_sip.c| 75 +++-- net/netfilter/nf_conntrack_tftp.c | 48 ++ 7 files changed, 165 insertions(+), 181 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 6cf614bc..8c0c08f 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -58,10 +58,26 @@ struct nf_conntrack_helper *__nf_conntrack_helper_find(const char *name, struct nf_conntrack_helper *nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum); +void nf_ct_helper_init(struct nf_conntrack_helper *helper, + u16 l3num, u16 protonum, const char *name, + u16 default_port, u16 spec_port, + const struct nf_conntrack_expect_policy *exp_pol, + u32 expect_class_max, u32 data_len, + int (*help)(struct sk_buff *skb, unsigned int protoff, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo), + int (*from_nlattr)(struct nlattr *attr, + struct nf_conn *ct), + struct module *module); int nf_conntrack_helper_register(struct nf_conntrack_helper *); void nf_conntrack_helper_unregister(struct nf_conntrack_helper *); +int nf_conntrack_helpers_register(struct nf_conntrack_helper *, + unsigned int); +void nf_conntrack_helpers_unregister(struct nf_conntrack_helper *, + unsigned int); + struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, struct nf_conntrack_helper *helper, gfp_t gfp); diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c index 19efeba..e15640d 100644 --- a/net/netfilter/nf_conntrack_ftp.c +++ b/net/netfilter/nf_conntrack_ftp.c @@ -572,7 +572,7 @@ static int nf_ct_ftp_from_nlattr(struct nlattr *attr, struct nf_conn *ct) return 0; } -static struct nf_conntrack_helper ftp[MAX_PORTS][2] __read_mostly; +static struct nf_conntrack_helper ftp[MAX_PORTS * 2] __read_mostly; static const struct nf_conntrack_expect_policy ftp_exp_policy = { .max_expected = 1, @@ -582,24 +582,13 @@ static const struct nf_conntrack_expect_policy ftp_exp_policy = { /* don't make this __exit, since it's called from __init ! */ static void nf_conntrack_ftp_fini(void) { - int i, j; - for (i = 0; i < ports_c; i++) { - for (j = 0; j < 2; j++) { - if (ftp[i][j].me == NULL) - continue; - - pr_debug("unregistering helper for pf: %d port: %d\n", -ftp[i][j].tuple.src.l3num, ports[i]); - nf_conntrack_helper_unregister([i][j]); - } - } - + nf_conntrack_helpers_unregister(ftp, ports_c * 2); kfree(ftp_buffer); } static int __init nf_conntrack_ftp_init(void) { - int i, j = -1, ret = 0; + int i, ret = 0; ftp_buffer = kmalloc(65536, GFP_KERNEL); if (!ftp_buffer) @@ -611,32 +600,21 @@ static int __init nf_conntrack_ftp_init(void) /* FIXME should be configurable whether IPv4 and IPv6 FTP connections are tracked or not - YK */ for (i = 0; i < ports_c; i++) { - ftp[i][0].tuple.src.l3num = PF_INET; - ftp[i][1].tuple.src.l3num = PF_INET6; - for (j = 0; j < 2; j++) { - ftp[i][j].data_len = sizeof(struct nf_ct_ftp_master); - ftp[i][j].tuple.src.u.tcp.port = htons(ports[i]); - ftp[i][j].tuple.dst.protonum = IPPROTO_TCP; - ftp[i][j].expect_policy = _exp_policy; - ftp[i][j].me = THIS_MODULE; - ftp[i][j].help = help; - ftp[i][j].from_nlattr = nf_ct_ftp_from_nlattr; - if (ports[i] == FTP_PORT) - sprintf(ftp[i][j].name, "ftp"); - else -
[PATCH 1/1] netfilter: Add helper array register/unregister functions
From: Gao Feng Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister functions to enhance the conntrack helper codes. Signed-off-by: Gao Feng --- include/net/netfilter/nf_conntrack_helper.h | 16 ++ net/netfilter/nf_conntrack_ftp.c| 58 +++--- net/netfilter/nf_conntrack_helper.c | 58 ++ net/netfilter/nf_conntrack_irc.c| 36 +- net/netfilter/nf_conntrack_sane.c | 55 +++-- net/netfilter/nf_conntrack_sip.c| 75 +++-- net/netfilter/nf_conntrack_tftp.c | 48 ++ 7 files changed, 165 insertions(+), 181 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 6cf614bc..8c0c08f 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -58,10 +58,26 @@ struct nf_conntrack_helper *__nf_conntrack_helper_find(const char *name, struct nf_conntrack_helper *nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum); +void nf_ct_helper_init(struct nf_conntrack_helper *helper, + u16 l3num, u16 protonum, const char *name, + u16 default_port, u16 spec_port, + const struct nf_conntrack_expect_policy *exp_pol, + u32 expect_class_max, u32 data_len, + int (*help)(struct sk_buff *skb, unsigned int protoff, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo), + int (*from_nlattr)(struct nlattr *attr, + struct nf_conn *ct), + struct module *module); int nf_conntrack_helper_register(struct nf_conntrack_helper *); void nf_conntrack_helper_unregister(struct nf_conntrack_helper *); +int nf_conntrack_helpers_register(struct nf_conntrack_helper *, + unsigned int); +void nf_conntrack_helpers_unregister(struct nf_conntrack_helper *, + unsigned int); + struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, struct nf_conntrack_helper *helper, gfp_t gfp); diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c index 19efeba..e15640d 100644 --- a/net/netfilter/nf_conntrack_ftp.c +++ b/net/netfilter/nf_conntrack_ftp.c @@ -572,7 +572,7 @@ static int nf_ct_ftp_from_nlattr(struct nlattr *attr, struct nf_conn *ct) return 0; } -static struct nf_conntrack_helper ftp[MAX_PORTS][2] __read_mostly; +static struct nf_conntrack_helper ftp[MAX_PORTS * 2] __read_mostly; static const struct nf_conntrack_expect_policy ftp_exp_policy = { .max_expected = 1, @@ -582,24 +582,13 @@ static const struct nf_conntrack_expect_policy ftp_exp_policy = { /* don't make this __exit, since it's called from __init ! */ static void nf_conntrack_ftp_fini(void) { - int i, j; - for (i = 0; i < ports_c; i++) { - for (j = 0; j < 2; j++) { - if (ftp[i][j].me == NULL) - continue; - - pr_debug("unregistering helper for pf: %d port: %d\n", -ftp[i][j].tuple.src.l3num, ports[i]); - nf_conntrack_helper_unregister([i][j]); - } - } - + nf_conntrack_helpers_unregister(ftp, ports_c * 2); kfree(ftp_buffer); } static int __init nf_conntrack_ftp_init(void) { - int i, j = -1, ret = 0; + int i, ret = 0; ftp_buffer = kmalloc(65536, GFP_KERNEL); if (!ftp_buffer) @@ -611,32 +600,21 @@ static int __init nf_conntrack_ftp_init(void) /* FIXME should be configurable whether IPv4 and IPv6 FTP connections are tracked or not - YK */ for (i = 0; i < ports_c; i++) { - ftp[i][0].tuple.src.l3num = PF_INET; - ftp[i][1].tuple.src.l3num = PF_INET6; - for (j = 0; j < 2; j++) { - ftp[i][j].data_len = sizeof(struct nf_ct_ftp_master); - ftp[i][j].tuple.src.u.tcp.port = htons(ports[i]); - ftp[i][j].tuple.dst.protonum = IPPROTO_TCP; - ftp[i][j].expect_policy = _exp_policy; - ftp[i][j].me = THIS_MODULE; - ftp[i][j].help = help; - ftp[i][j].from_nlattr = nf_ct_ftp_from_nlattr; - if (ports[i] == FTP_PORT) - sprintf(ftp[i][j].name, "ftp"); - else - sprintf(ftp[i][j].name,
RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's
> -Original Message- > From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On > Behalf Of Paul Gortmaker > Sent: Friday, July 15, 2016 8:01 AM > To: Tan, Jui Nee> Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com; > andriy.shevche...@linux.intel.com; t...@linutronix.de; > mi...@redhat.com; H. Peter Anvin ; X86 ML > ; pty...@xes-inc.com; Lee Jones ; > Linus Walleij ; linux-g...@vger.kernel.org; LKML > ; Yong, Jonathan > ; Yu, Ong Hock ; Voon, > Weifeng ; Wan Mohamad, Wan Ahmad Zainie > > Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband > bridge support driver for Intel SOC's > > On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee wrote: > > From: Andy Shevchenko > > > > There is already one and at least one more user coming which require > > an access to Primary to Sideband bridge (P2SB) in order to get IO or > > MMIO bar hidden by BIOS. > > Create a driver to access P2SB for x86 devices. > > > > Signed-off-by: Yong, Jonathan > > Signed-off-by: Andy Shevchenko > > --- > > Changes in V6: > > - No change > > > > Changes in V5: > > - No change > > > > Changes in V4: > > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from > > [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge > support driver for Intel SOC's > > to > > [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO > pinctrl in non-ACPI system > > since the config is used in latter patch. > > > > Changes in V3: > > - No change > > > > Changes in V2: > > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select > PINCTRL" > > to fix kbuildbot error > > > > arch/x86/Kconfig | 4 ++ > > arch/x86/include/asm/p2sb.h | 27 +++ > > arch/x86/platform/intel/Makefile | 1 + > > arch/x86/platform/intel/p2sb.c | 99 > > > 4 files changed, 131 insertions(+) > > create mode 100644 arch/x86/include/asm/p2sb.h create mode 100644 > > arch/x86/platform/intel/p2sb.c > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index > > d9a94da..d305d81 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG > > > > If you don't require the option or are in doubt, say N. > > > > +config P2SB > > + tristate > > OK, this is tristate, but then > P2SB is tristate as currently it is only used by LPC_ICH that is tristate too. ... config LPC_ICH tristate "Intel ICH LPC" depends on X86 && PCI select MFD_CORE select P2SB ... > > + depends on PCI > > + > > config X86_RDC321X > > bool "RDC R-321x SoC" > > depends on X86_32 > > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h > > new file mode 100644 index 000..686e07b > > --- /dev/null > > +++ b/arch/x86/include/asm/p2sb.h > > @@ -0,0 +1,27 @@ > > +/* > > + * Primary to Sideband bridge (P2SB) access support */ > > + > > +#ifndef P2SB_SYMS_H > > +#define P2SB_SYMS_H > > + > > +#include > > +#include > > + > > +#if IS_ENABLED(CONFIG_P2SB) > > + > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > + struct resource *res); > > + > > +#else /* CONFIG_P2SB is not set */ > > + > > +static inline > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > + struct resource *res) > > +{ > > + return -ENODEV; > > +} > > + > > +#endif /* CONFIG_P2SB */ > > + > > +#endif /* P2SB_SYMS_H */ > > diff --git a/arch/x86/platform/intel/Makefile > > b/arch/x86/platform/intel/Makefile > > index b878032..dbf9f10 100644 > > --- a/arch/x86/platform/intel/Makefile > > +++ b/arch/x86/platform/intel/Makefile > > @@ -1 +1,2 @@ > > obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o > > +obj-$(CONFIG_P2SB) += p2sb.o > > diff --git a/arch/x86/platform/intel/p2sb.c > > b/arch/x86/platform/intel/p2sb.c new file mode 100644 index > > 000..8be47a4 > > --- /dev/null > > +++ b/arch/x86/platform/intel/p2sb.c > > @@ -0,0 +1,99 @@ > > +/* > > + * Primary to Sideband bridge (P2SB) driver > > + * > > + * Copyright (c) 2016, Intel Corporation. > > + * > > + * Authors: Andy Shevchenko > > + * Jonathan Yong > > + * > > + * This program is free software; you can redistribute it and/or > > +modify it > > + * under the terms and conditions of the GNU General Public License, > > + * version 2, as published by the Free Software Foundation. > > + * > > + * This program is
RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's
> -Original Message- > From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On > Behalf Of Paul Gortmaker > Sent: Friday, July 15, 2016 8:01 AM > To: Tan, Jui Nee > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com; > andriy.shevche...@linux.intel.com; t...@linutronix.de; > mi...@redhat.com; H. Peter Anvin ; X86 ML > ; pty...@xes-inc.com; Lee Jones ; > Linus Walleij ; linux-g...@vger.kernel.org; LKML > ; Yong, Jonathan > ; Yu, Ong Hock ; Voon, > Weifeng ; Wan Mohamad, Wan Ahmad Zainie > > Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband > bridge support driver for Intel SOC's > > On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee wrote: > > From: Andy Shevchenko > > > > There is already one and at least one more user coming which require > > an access to Primary to Sideband bridge (P2SB) in order to get IO or > > MMIO bar hidden by BIOS. > > Create a driver to access P2SB for x86 devices. > > > > Signed-off-by: Yong, Jonathan > > Signed-off-by: Andy Shevchenko > > --- > > Changes in V6: > > - No change > > > > Changes in V5: > > - No change > > > > Changes in V4: > > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from > > [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge > support driver for Intel SOC's > > to > > [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO > pinctrl in non-ACPI system > > since the config is used in latter patch. > > > > Changes in V3: > > - No change > > > > Changes in V2: > > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select > PINCTRL" > > to fix kbuildbot error > > > > arch/x86/Kconfig | 4 ++ > > arch/x86/include/asm/p2sb.h | 27 +++ > > arch/x86/platform/intel/Makefile | 1 + > > arch/x86/platform/intel/p2sb.c | 99 > > > 4 files changed, 131 insertions(+) > > create mode 100644 arch/x86/include/asm/p2sb.h create mode 100644 > > arch/x86/platform/intel/p2sb.c > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index > > d9a94da..d305d81 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG > > > > If you don't require the option or are in doubt, say N. > > > > +config P2SB > > + tristate > > OK, this is tristate, but then > P2SB is tristate as currently it is only used by LPC_ICH that is tristate too. ... config LPC_ICH tristate "Intel ICH LPC" depends on X86 && PCI select MFD_CORE select P2SB ... > > + depends on PCI > > + > > config X86_RDC321X > > bool "RDC R-321x SoC" > > depends on X86_32 > > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h > > new file mode 100644 index 000..686e07b > > --- /dev/null > > +++ b/arch/x86/include/asm/p2sb.h > > @@ -0,0 +1,27 @@ > > +/* > > + * Primary to Sideband bridge (P2SB) access support */ > > + > > +#ifndef P2SB_SYMS_H > > +#define P2SB_SYMS_H > > + > > +#include > > +#include > > + > > +#if IS_ENABLED(CONFIG_P2SB) > > + > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > + struct resource *res); > > + > > +#else /* CONFIG_P2SB is not set */ > > + > > +static inline > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn, > > + struct resource *res) > > +{ > > + return -ENODEV; > > +} > > + > > +#endif /* CONFIG_P2SB */ > > + > > +#endif /* P2SB_SYMS_H */ > > diff --git a/arch/x86/platform/intel/Makefile > > b/arch/x86/platform/intel/Makefile > > index b878032..dbf9f10 100644 > > --- a/arch/x86/platform/intel/Makefile > > +++ b/arch/x86/platform/intel/Makefile > > @@ -1 +1,2 @@ > > obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o > > +obj-$(CONFIG_P2SB) += p2sb.o > > diff --git a/arch/x86/platform/intel/p2sb.c > > b/arch/x86/platform/intel/p2sb.c new file mode 100644 index > > 000..8be47a4 > > --- /dev/null > > +++ b/arch/x86/platform/intel/p2sb.c > > @@ -0,0 +1,99 @@ > > +/* > > + * Primary to Sideband bridge (P2SB) driver > > + * > > + * Copyright (c) 2016, Intel Corporation. > > + * > > + * Authors: Andy Shevchenko > > + * Jonathan Yong > > + * > > + * This program is free software; you can redistribute it and/or > > +modify it > > + * under the terms and conditions of the GNU General Public License, > > + * version 2, as published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope it will be useful, but > > +WITHOUT > > + * ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY > > +or > > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public > > +License for > > + * more details. > > + * > > + */ > > + > > +#include > > +#include > > ...and module.h is included, but yet... > > > +#include > > +#include > > + > > +#include > > + > > +#define SBREG_BAR
[PATCH v2 07/10] binfmt_flat: use clear_user() rather than memset() to clear .bss
This is needed on systems with a MMU. Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index c85f8f1239..e981e66bb5 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -803,10 +803,11 @@ static int load_flat_file(struct linux_binprm * bprm, flush_icache_range(start_code, end_code); /* zero the BSS, BRK and stack areas */ - memset((void*)(datapos + data_len), 0, bss_len + + if (clear_user((void __user *)(datapos + data_len), bss_len + (memp + memp_size - stack_len - /* end brk */ libinfo->lib_list[id].start_brk) + /* start brk */ - stack_len); + stack_len)) + return -EFAULT; return 0; err: -- 2.7.4
[PATCH v2 07/10] binfmt_flat: use clear_user() rather than memset() to clear .bss
This is needed on systems with a MMU. Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index c85f8f1239..e981e66bb5 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -803,10 +803,11 @@ static int load_flat_file(struct linux_binprm * bprm, flush_icache_range(start_code, end_code); /* zero the BSS, BRK and stack areas */ - memset((void*)(datapos + data_len), 0, bss_len + + if (clear_user((void __user *)(datapos + data_len), bss_len + (memp + memp_size - stack_len - /* end brk */ libinfo->lib_list[id].start_brk) + /* start brk */ - stack_len); + stack_len)) + return -EFAULT; return 0; err: -- 2.7.4
[PATCH v2 06/10] binfmt_flat: use proper user space accessors with old relocs code
Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 28 ++-- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index fc0ee3ed5d..c85f8f1239 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -394,38 +394,41 @@ static void old_reloc(unsigned long rl) static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" }; #endif flat_v2_reloc_t r; - unsigned long *ptr; + unsigned long __user *ptr; + unsigned long val; r.value = rl; #if defined(CONFIG_COLDFIRE) - ptr = (unsigned long *) (current->mm->start_code + r.reloc.offset); + ptr = (unsigned long __user *)(current->mm->start_code + r.reloc.offset); #else - ptr = (unsigned long *) (current->mm->start_data + r.reloc.offset); + ptr = (unsigned long __user *)(current->mm->start_data + r.reloc.offset); #endif + __get_user(val, ptr); #ifdef DEBUG printk("Relocation of variable at DATASEG+%x " "(address %p, currently %lx) into segment %s\n", - r.reloc.offset, ptr, *ptr, segment[r.reloc.type]); + r.reloc.offset, ptr, val, segment[r.reloc.type]); #endif switch (r.reloc.type) { case OLD_FLAT_RELOC_TYPE_TEXT: - *ptr += current->mm->start_code; + val += current->mm->start_code; break; case OLD_FLAT_RELOC_TYPE_DATA: - *ptr += current->mm->start_data; + val += current->mm->start_data; break; case OLD_FLAT_RELOC_TYPE_BSS: - *ptr += current->mm->end_data; + val += current->mm->end_data; break; default: printk("BINFMT_FLAT: Unknown relocation type=%x\n", r.reloc.type); break; } + __put_user(val, ptr); #ifdef DEBUG - printk("Relocation became %lx\n", *ptr); + printk("Relocation became %lx\n", val); #endif } @@ -788,8 +791,13 @@ static int load_flat_file(struct linux_binprm * bprm, } } } else { - for (i=0; i < relocs; i++) - old_reloc(ntohl(reloc[i])); + for (i=0; i < relocs; i++) { + unsigned long relval; + if (get_user(relval, reloc + i)) + return -EFAULT; + relval = ntohl(relval); + old_reloc(relval); + } } flush_icache_range(start_code, end_code); -- 2.7.4
[PATCH v2 10/10] binfmt_flat: allow compressed flat binary format to work on MMU systems
Let's take the simple and obvious approach by decompressing the binary into a kernel buffer and then copying it to user space. Those who are looking for more performance on a MMU system are unlikely to choose this executable format anyway. Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 44 ++-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 4cb0c4b6ae..24deae4dcb 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -637,6 +638,7 @@ static int load_flat_file(struct linux_binprm * bprm, * load it all in and treat it like a RAM load from now on */ if (flags & FLAT_FLAG_GZIP) { +#ifndef CONFIG_MMU result = decompress_exec(bprm, sizeof (struct flat_hdr), (((char *) textpos) + sizeof (struct flat_hdr)), (text_len + full_data @@ -644,14 +646,52 @@ static int load_flat_file(struct linux_binprm * bprm, 0); memmove((void *) datapos, (void *) realdatastart, full_data); +#else + /* +* This is used on MMU systems mainly for testing. +* Let's use a kernel buffer to simplify things. +*/ + long unz_text_len = text_len - sizeof(struct flat_hdr); + long unz_len = unz_text_len + full_data; + char *unz_data = vmalloc(unz_len); + if (!unz_data) { + result = -ENOMEM; + } else { + result = decompress_exec(bprm, sizeof(struct flat_hdr), +unz_data, unz_len, 0); + if (result == 0 && + (copy_to_user((void __user *)textpos + sizeof(struct flat_hdr), + unz_data, unz_text_len) || +copy_to_user((void __user *)datapos, + unz_data + unz_text_len, full_data))) + result = -EFAULT; + vfree(unz_data); + } +#endif } else if (flags & FLAT_FLAG_GZDATA) { result = read_code(bprm->file, textpos, 0, text_len); - if (!IS_ERR_VALUE(result)) + if (!IS_ERR_VALUE(result)) { +#ifndef CONFIG_MMU result = decompress_exec(bprm, text_len, (char *) datapos, full_data, 0); +#else + char *unz_data = vmalloc(full_data); + if (!unz_data) { + result = -ENOMEM; + } else { + result = decompress_exec(bprm, text_len, + unz_data, full_data, 0); + if (result == 0 && + copy_to_user((void __user *)datapos, +unz_data, full_data)) + result = -EFAULT; + vfree(unz_data); + } +#endif + } } else -#endif +#endif /* CONFIG_BINFMT_ZFLAT */ { result = read_code(bprm->file, textpos, 0, text_len); if (!IS_ERR_VALUE(result)) -- 2.7.4
[PULL REQUEST] [PATCH v2 00/10] allow BFLT executables on systems with a MMU
This series provides the necessary changes to allow "flat" executable binaries meant for no-MMU systems to actually run on systems with a MMU. This can also be found in the following git repo: git://git.linaro.org/people/nicolas.pitre/linux binfmt_flat_with_mmu *Why?* Because developing and testing natively on a large system with lots of RAM makes it so much more convenient to use all the existing profiling tools and debugging facilities that a kernel with lots of RAM can give. And incidentally, those systems with lots of RAM all have a MMU. *Why not use elf_fdpic?* The flat executable format is simple with very small footprint overhead, either in the executables themselves or kernel support. This makes the flat format more suitable than elf_fdpic for very small single-user-app embedded systems. And while elf_fdpic binaries can run on MMU systems, flat binaries still couldn't, which just felt wrong. So here it is. The no-MMU support should remain unaffected. Please consider for pulling. Tested on ARM only with a busybox build. Changes since v1: - Removed SuperH and Xtensa from the Kconfig rule as they fail to build due to lack of get/put_unaligned_user(). - Clarified some commit logs a bit. diffstat: arch/arm/include/asm/flat.h | 5 +- arch/m68k/include/asm/flat.h | 5 +- fs/Kconfig.binfmt| 3 +- fs/binfmt_elf_fdpic.c| 38 +--- fs/binfmt_flat.c | 372 +-- fs/exec.c| 33 include/linux/binfmts.h | 2 + 7 files changed, 268 insertions(+), 190 deletions(-)
[PATCH v2 05/10] binfmt_flat: use proper user space accessors with relocs processing code
Relocs are fixed up in place in user space memory. The appropriate accessors are required for this code to work with an active MMU. The architecture specific handlers for ARM and M68K are also covered. SuperH and Xtensa are left out as they doesn't implement __get_user_unaligned() and __put_user_unaligned() yet. The other architectures that use BFLT don't have any MMU. Signed-off-by: Nicolas Pitre--- arch/arm/include/asm/flat.h | 5 +++-- arch/m68k/include/asm/flat.h | 5 +++-- fs/binfmt_flat.c | 31 +++ 3 files changed, 25 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/flat.h b/arch/arm/include/asm/flat.h index e847d23351..acf1d14b89 100644 --- a/arch/arm/include/asm/flat.h +++ b/arch/arm/include/asm/flat.h @@ -8,8 +8,9 @@ #defineflat_argvp_envp_on_stack() 1 #defineflat_old_ram_flag(flags)(flags) #defineflat_reloc_valid(reloc, size) ((reloc) <= (size)) -#defineflat_get_addr_from_rp(rp, relval, flags, persistent) ((void)persistent,get_unaligned(rp)) -#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp) +#defineflat_get_addr_from_rp(rp, relval, flags, persistent) \ + ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; }) +#defineflat_put_addr_at_rp(rp, val, relval) __put_user_unaligned(val, rp) #defineflat_get_relocate_addr(rel) (rel) #defineflat_set_persistent(relval, p) 0 diff --git a/arch/m68k/include/asm/flat.h b/arch/m68k/include/asm/flat.h index f9454b89a5..f3f592d03e 100644 --- a/arch/m68k/include/asm/flat.h +++ b/arch/m68k/include/asm/flat.h @@ -8,8 +8,9 @@ #defineflat_argvp_envp_on_stack() 1 #defineflat_old_ram_flag(flags)(flags) #defineflat_reloc_valid(reloc, size) ((reloc) <= (size)) -#defineflat_get_addr_from_rp(rp, relval, flags, p) get_unaligned(rp) -#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp) +#defineflat_get_addr_from_rp(rp, relval, flags, p) \ + ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; }) +#defineflat_put_addr_at_rp(rp, val, relval) __put_user_unaligned(val, rp) #defineflat_get_relocate_addr(rel) (rel) static inline int flat_set_persistent(unsigned long relval, diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 9538901fe8..fc0ee3ed5d 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -438,7 +438,7 @@ static int load_flat_file(struct linux_binprm * bprm, unsigned long textpos, datapos, realdatastart; unsigned long text_len, data_len, bss_len, stack_len, full_data, flags; unsigned long len, memp, memp_size, extra, rlim; - unsigned long *reloc, *rp; + unsigned long __user *reloc, *rp; struct inode *inode; int i, rev, relocs; loff_t fpos; @@ -600,7 +600,7 @@ static int load_flat_file(struct linux_binprm * bprm, goto err; } - reloc = (unsigned long *) + reloc = (unsigned long __user *) (datapos + (ntohl(hdr->reloc_start) - text_len)); memp = realdatastart; memp_size = len; @@ -625,7 +625,7 @@ static int load_flat_file(struct linux_binprm * bprm, MAX_SHARED_LIBS * sizeof(unsigned long), FLAT_DATA_ALIGN); - reloc = (unsigned long *) + reloc = (unsigned long __user *) (datapos + (ntohl(hdr->reloc_start) - text_len)); memp = textpos; memp_size = len; @@ -718,15 +718,20 @@ static int load_flat_file(struct linux_binprm * bprm, * image. */ if (flags & FLAT_FLAG_GOTPIC) { - for (rp = (unsigned long *)datapos; *rp != 0x; rp++) { - unsigned long addr; - if (*rp) { - addr = calc_reloc(*rp, libinfo, id, 0); + for (rp = (unsigned long __user *)datapos; ; rp++) { + unsigned long addr, rp_val; + if (get_user(rp_val, rp)) + return -EFAULT; + if (rp_val == 0x) + break; + if (rp_val) { + addr = calc_reloc(rp_val, libinfo, id, 0); if (addr == RELOC_FAILED) { ret = -ENOEXEC; goto err; } - *rp = addr; + if (put_user(addr, rp)) + return -EFAULT; }
[PATCH v2 06/10] binfmt_flat: use proper user space accessors with old relocs code
Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 28 ++-- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index fc0ee3ed5d..c85f8f1239 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -394,38 +394,41 @@ static void old_reloc(unsigned long rl) static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" }; #endif flat_v2_reloc_t r; - unsigned long *ptr; + unsigned long __user *ptr; + unsigned long val; r.value = rl; #if defined(CONFIG_COLDFIRE) - ptr = (unsigned long *) (current->mm->start_code + r.reloc.offset); + ptr = (unsigned long __user *)(current->mm->start_code + r.reloc.offset); #else - ptr = (unsigned long *) (current->mm->start_data + r.reloc.offset); + ptr = (unsigned long __user *)(current->mm->start_data + r.reloc.offset); #endif + __get_user(val, ptr); #ifdef DEBUG printk("Relocation of variable at DATASEG+%x " "(address %p, currently %lx) into segment %s\n", - r.reloc.offset, ptr, *ptr, segment[r.reloc.type]); + r.reloc.offset, ptr, val, segment[r.reloc.type]); #endif switch (r.reloc.type) { case OLD_FLAT_RELOC_TYPE_TEXT: - *ptr += current->mm->start_code; + val += current->mm->start_code; break; case OLD_FLAT_RELOC_TYPE_DATA: - *ptr += current->mm->start_data; + val += current->mm->start_data; break; case OLD_FLAT_RELOC_TYPE_BSS: - *ptr += current->mm->end_data; + val += current->mm->end_data; break; default: printk("BINFMT_FLAT: Unknown relocation type=%x\n", r.reloc.type); break; } + __put_user(val, ptr); #ifdef DEBUG - printk("Relocation became %lx\n", *ptr); + printk("Relocation became %lx\n", val); #endif } @@ -788,8 +791,13 @@ static int load_flat_file(struct linux_binprm * bprm, } } } else { - for (i=0; i < relocs; i++) - old_reloc(ntohl(reloc[i])); + for (i=0; i < relocs; i++) { + unsigned long relval; + if (get_user(relval, reloc + i)) + return -EFAULT; + relval = ntohl(relval); + old_reloc(relval); + } } flush_icache_range(start_code, end_code); -- 2.7.4
[PATCH v2 10/10] binfmt_flat: allow compressed flat binary format to work on MMU systems
Let's take the simple and obvious approach by decompressing the binary into a kernel buffer and then copying it to user space. Those who are looking for more performance on a MMU system are unlikely to choose this executable format anyway. Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 44 ++-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 4cb0c4b6ae..24deae4dcb 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -637,6 +638,7 @@ static int load_flat_file(struct linux_binprm * bprm, * load it all in and treat it like a RAM load from now on */ if (flags & FLAT_FLAG_GZIP) { +#ifndef CONFIG_MMU result = decompress_exec(bprm, sizeof (struct flat_hdr), (((char *) textpos) + sizeof (struct flat_hdr)), (text_len + full_data @@ -644,14 +646,52 @@ static int load_flat_file(struct linux_binprm * bprm, 0); memmove((void *) datapos, (void *) realdatastart, full_data); +#else + /* +* This is used on MMU systems mainly for testing. +* Let's use a kernel buffer to simplify things. +*/ + long unz_text_len = text_len - sizeof(struct flat_hdr); + long unz_len = unz_text_len + full_data; + char *unz_data = vmalloc(unz_len); + if (!unz_data) { + result = -ENOMEM; + } else { + result = decompress_exec(bprm, sizeof(struct flat_hdr), +unz_data, unz_len, 0); + if (result == 0 && + (copy_to_user((void __user *)textpos + sizeof(struct flat_hdr), + unz_data, unz_text_len) || +copy_to_user((void __user *)datapos, + unz_data + unz_text_len, full_data))) + result = -EFAULT; + vfree(unz_data); + } +#endif } else if (flags & FLAT_FLAG_GZDATA) { result = read_code(bprm->file, textpos, 0, text_len); - if (!IS_ERR_VALUE(result)) + if (!IS_ERR_VALUE(result)) { +#ifndef CONFIG_MMU result = decompress_exec(bprm, text_len, (char *) datapos, full_data, 0); +#else + char *unz_data = vmalloc(full_data); + if (!unz_data) { + result = -ENOMEM; + } else { + result = decompress_exec(bprm, text_len, + unz_data, full_data, 0); + if (result == 0 && + copy_to_user((void __user *)datapos, +unz_data, full_data)) + result = -EFAULT; + vfree(unz_data); + } +#endif + } } else -#endif +#endif /* CONFIG_BINFMT_ZFLAT */ { result = read_code(bprm->file, textpos, 0, text_len); if (!IS_ERR_VALUE(result)) -- 2.7.4
[PULL REQUEST] [PATCH v2 00/10] allow BFLT executables on systems with a MMU
This series provides the necessary changes to allow "flat" executable binaries meant for no-MMU systems to actually run on systems with a MMU. This can also be found in the following git repo: git://git.linaro.org/people/nicolas.pitre/linux binfmt_flat_with_mmu *Why?* Because developing and testing natively on a large system with lots of RAM makes it so much more convenient to use all the existing profiling tools and debugging facilities that a kernel with lots of RAM can give. And incidentally, those systems with lots of RAM all have a MMU. *Why not use elf_fdpic?* The flat executable format is simple with very small footprint overhead, either in the executables themselves or kernel support. This makes the flat format more suitable than elf_fdpic for very small single-user-app embedded systems. And while elf_fdpic binaries can run on MMU systems, flat binaries still couldn't, which just felt wrong. So here it is. The no-MMU support should remain unaffected. Please consider for pulling. Tested on ARM only with a busybox build. Changes since v1: - Removed SuperH and Xtensa from the Kconfig rule as they fail to build due to lack of get/put_unaligned_user(). - Clarified some commit logs a bit. diffstat: arch/arm/include/asm/flat.h | 5 +- arch/m68k/include/asm/flat.h | 5 +- fs/Kconfig.binfmt| 3 +- fs/binfmt_elf_fdpic.c| 38 +--- fs/binfmt_flat.c | 372 +-- fs/exec.c| 33 include/linux/binfmts.h | 2 + 7 files changed, 268 insertions(+), 190 deletions(-)
[PATCH v2 05/10] binfmt_flat: use proper user space accessors with relocs processing code
Relocs are fixed up in place in user space memory. The appropriate accessors are required for this code to work with an active MMU. The architecture specific handlers for ARM and M68K are also covered. SuperH and Xtensa are left out as they doesn't implement __get_user_unaligned() and __put_user_unaligned() yet. The other architectures that use BFLT don't have any MMU. Signed-off-by: Nicolas Pitre --- arch/arm/include/asm/flat.h | 5 +++-- arch/m68k/include/asm/flat.h | 5 +++-- fs/binfmt_flat.c | 31 +++ 3 files changed, 25 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/flat.h b/arch/arm/include/asm/flat.h index e847d23351..acf1d14b89 100644 --- a/arch/arm/include/asm/flat.h +++ b/arch/arm/include/asm/flat.h @@ -8,8 +8,9 @@ #defineflat_argvp_envp_on_stack() 1 #defineflat_old_ram_flag(flags)(flags) #defineflat_reloc_valid(reloc, size) ((reloc) <= (size)) -#defineflat_get_addr_from_rp(rp, relval, flags, persistent) ((void)persistent,get_unaligned(rp)) -#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp) +#defineflat_get_addr_from_rp(rp, relval, flags, persistent) \ + ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; }) +#defineflat_put_addr_at_rp(rp, val, relval) __put_user_unaligned(val, rp) #defineflat_get_relocate_addr(rel) (rel) #defineflat_set_persistent(relval, p) 0 diff --git a/arch/m68k/include/asm/flat.h b/arch/m68k/include/asm/flat.h index f9454b89a5..f3f592d03e 100644 --- a/arch/m68k/include/asm/flat.h +++ b/arch/m68k/include/asm/flat.h @@ -8,8 +8,9 @@ #defineflat_argvp_envp_on_stack() 1 #defineflat_old_ram_flag(flags)(flags) #defineflat_reloc_valid(reloc, size) ((reloc) <= (size)) -#defineflat_get_addr_from_rp(rp, relval, flags, p) get_unaligned(rp) -#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp) +#defineflat_get_addr_from_rp(rp, relval, flags, p) \ + ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; }) +#defineflat_put_addr_at_rp(rp, val, relval) __put_user_unaligned(val, rp) #defineflat_get_relocate_addr(rel) (rel) static inline int flat_set_persistent(unsigned long relval, diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 9538901fe8..fc0ee3ed5d 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -438,7 +438,7 @@ static int load_flat_file(struct linux_binprm * bprm, unsigned long textpos, datapos, realdatastart; unsigned long text_len, data_len, bss_len, stack_len, full_data, flags; unsigned long len, memp, memp_size, extra, rlim; - unsigned long *reloc, *rp; + unsigned long __user *reloc, *rp; struct inode *inode; int i, rev, relocs; loff_t fpos; @@ -600,7 +600,7 @@ static int load_flat_file(struct linux_binprm * bprm, goto err; } - reloc = (unsigned long *) + reloc = (unsigned long __user *) (datapos + (ntohl(hdr->reloc_start) - text_len)); memp = realdatastart; memp_size = len; @@ -625,7 +625,7 @@ static int load_flat_file(struct linux_binprm * bprm, MAX_SHARED_LIBS * sizeof(unsigned long), FLAT_DATA_ALIGN); - reloc = (unsigned long *) + reloc = (unsigned long __user *) (datapos + (ntohl(hdr->reloc_start) - text_len)); memp = textpos; memp_size = len; @@ -718,15 +718,20 @@ static int load_flat_file(struct linux_binprm * bprm, * image. */ if (flags & FLAT_FLAG_GOTPIC) { - for (rp = (unsigned long *)datapos; *rp != 0x; rp++) { - unsigned long addr; - if (*rp) { - addr = calc_reloc(*rp, libinfo, id, 0); + for (rp = (unsigned long __user *)datapos; ; rp++) { + unsigned long addr, rp_val; + if (get_user(rp_val, rp)) + return -EFAULT; + if (rp_val == 0x) + break; + if (rp_val) { + addr = calc_reloc(rp_val, libinfo, id, 0); if (addr == RELOC_FAILED) { ret = -ENOEXEC; goto err; } - *rp = addr; + if (put_user(addr, rp)) + return -EFAULT; } } } @@
[PATCH v2 03/10] binfmt_flat: use generic transfer_args_to_stack()
This gets rid of the rather ugly, open coded and suboptimal copy code. Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 085059d879..64feb873f0 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -854,10 +854,8 @@ static int load_flat_binary(struct linux_binprm * bprm) { struct lib_info libinfo; struct pt_regs *regs = current_pt_regs(); - unsigned long p = bprm->p; - unsigned long stack_len; + unsigned long sp, stack_len; unsigned long start_addr; - unsigned long *sp; int res; int i, j; @@ -892,15 +890,15 @@ static int load_flat_binary(struct linux_binprm * bprm) set_binfmt(_format); - p = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4; - DBG_FLT("p=%lx\n", p); + sp = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4; + DBG_FLT("sp=%lx\n", sp); - /* copy the arg pages onto the stack, this could be more efficient :-) */ - for (i = TOP_OF_ARGS - 1; i >= bprm->p; i--) - * (char *) --p = - ((char *) page_address(bprm->page[i/PAGE_SIZE]))[i % PAGE_SIZE]; + /* copy the arg pages onto the stack */ + res = transfer_args_to_stack(bprm, ); + if (res) + return res; - sp = (unsigned long *) create_flat_tables(p, bprm); + sp = create_flat_tables(sp, bprm); /* Fake some return addresses to ensure the call chain will * initialise library in order for us. We are required to call @@ -912,14 +910,14 @@ static int load_flat_binary(struct linux_binprm * bprm) for (i = MAX_SHARED_LIBS-1; i>0; i--) { if (libinfo.lib_list[i].loaded) { /* Push previos first to call address */ - --sp; put_user(start_addr, sp); + --sp; put_user(start_addr, (unsigned long *)sp); start_addr = libinfo.lib_list[i].entry; } } #endif /* Stash our initial stack pointer into the mm structure */ - current->mm->start_stack = (unsigned long )sp; + current->mm->start_stack = sp; #ifdef FLAT_PLAT_INIT FLAT_PLAT_INIT(regs); -- 2.7.4
[PATCH v2 01/10] binfmt_flat: assorted cleanups
Remove excessive casts, do some code grouping, etc. No functional changes. Signed-off-by: Nicolas Pitre--- fs/binfmt_flat.c | 118 ++- 1 file changed, 56 insertions(+), 62 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index caf9e39bb8..085059d879 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -80,7 +80,7 @@ struct lib_info { unsigned long text_len; /* Length of text segment */ unsigned long entry;/* Start address for this module */ unsigned long build_date; /* When this one was compiled */ - short loaded; /* Has this library been loaded? */ + bool loaded;/* Has this library been loaded? */ } lib_list[MAX_SHARED_LIBS]; }; @@ -107,7 +107,7 @@ static struct linux_binfmt flat_format = { static int flat_core_dump(struct coredump_params *cprm) { printk("Process %s:%d received signr %d and should have core dumped\n", - current->comm, current->pid, (int) cprm->siginfo->si_signo); + current->comm, current->pid, cprm->siginfo->si_signo); return(1); } @@ -190,7 +190,7 @@ static int decompress_exec( loff_t fpos; int ret, retval; - DBG_FLT("decompress_exec(offset=%x,buf=%x,len=%x)\n",(int)offset, (int)dst, (int)len); + DBG_FLT("decompress_exec(offset=%lx,buf=%p,len=%lx)\n",offset, dst, len); memset(, 0, sizeof(strm)); strm.workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); @@ -358,8 +358,8 @@ calc_reloc(unsigned long r, struct lib_info *p, int curid, int internalp) text_len = p->lib_list[id].text_len; if (!flat_reloc_valid(r, start_brk - start_data + text_len)) { - printk("BINFMT_FLAT: reloc outside program 0x%x (0 - 0x%x/0x%x)", - (int) r,(int)(start_brk-start_data+text_len),(int)text_len); + printk("BINFMT_FLAT: reloc outside program 0x%lx (0 - 0x%lx/0x%lx)", + r, start_brk-start_data+text_len, text_len); goto failed; } @@ -383,7 +383,7 @@ failed: static void old_reloc(unsigned long rl) { #ifdef DEBUG - char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" }; + static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" }; #endif flat_v2_reloc_t r; unsigned long *ptr; @@ -397,8 +397,8 @@ static void old_reloc(unsigned long rl) #ifdef DEBUG printk("Relocation of variable at DATASEG+%x " - "(address %p, currently %x) into segment %s\n", - r.reloc.offset, ptr, (int)*ptr, segment[r.reloc.type]); + "(address %p, currently %lx) into segment %s\n", + r.reloc.offset, ptr, *ptr, segment[r.reloc.type]); #endif switch (r.reloc.type) { @@ -417,7 +417,7 @@ static void old_reloc(unsigned long rl) } #ifdef DEBUG - printk("Relocation became %x\n", (int)*ptr); + printk("Relocation became %lx\n", *ptr); #endif } @@ -427,17 +427,15 @@ static int load_flat_file(struct linux_binprm * bprm, struct lib_info *libinfo, int id, unsigned long *extra_stack) { struct flat_hdr * hdr; - unsigned long textpos = 0, datapos = 0, result; - unsigned long realdatastart = 0; - unsigned long text_len, data_len, bss_len, stack_len, flags; - unsigned long full_data; - unsigned long len, memp = 0; - unsigned long memp_size, extra, rlim; - unsigned long *reloc = 0, *rp; + unsigned long textpos, datapos, realdatastart; + unsigned long text_len, data_len, bss_len, stack_len, full_data, flags; + unsigned long len, memp, memp_size, extra, rlim; + unsigned long *reloc, *rp; struct inode *inode; - int i, rev, relocs = 0; + int i, rev, relocs; loff_t fpos; unsigned long start_code, end_code; + ssize_t result; int ret; hdr = ((struct flat_hdr *) bprm->buf); /* exec-header */ @@ -481,8 +479,8 @@ static int load_flat_file(struct linux_binprm * bprm, /* Don't allow old format executables to use shared libraries */ if (rev == OLD_FLAT_VERSION && id != 0) { - printk("BINFMT_FLAT: shared libraries are not available before rev 0x%x\n", - (int) FLAT_VERSION); + printk("BINFMT_FLAT: shared libraries are not available before rev 0x%lx\n", + FLAT_VERSION); ret = -ENOEXEC; goto err; } @@ -517,11 +515,9 @@ static int load_flat_file(struct linux_binprm * bprm, /* Flush all traces of the currently running executable */ if (id == 0) { - result =
[PATCH v2 02/10] elf_fdpic_transfer_args_to_stack(): make it generic
This copying of arguments and environment is common to both NOMMU binary formats we support. Let's make the elf_fdpic version available to the flat format as well. While at it, improve the code a bit not to copy below the actual data area. Signed-off-by: Nicolas Pitre--- fs/binfmt_elf_fdpic.c | 38 ++ fs/exec.c | 33 + include/linux/binfmts.h | 2 ++ 3 files changed, 37 insertions(+), 36 deletions(-) diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index 203589311b..464a972e88 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -67,8 +67,6 @@ static int create_elf_fdpic_tables(struct linux_binprm *, struct mm_struct *, struct elf_fdpic_params *); #ifndef CONFIG_MMU -static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *, - unsigned long *); static int elf_fdpic_map_file_constdisp_on_uclinux(struct elf_fdpic_params *, struct file *, struct mm_struct *); @@ -515,8 +513,9 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, sp = mm->start_stack; /* stack the program arguments and environment */ - if (elf_fdpic_transfer_args_to_stack(bprm, ) < 0) + if (transfer_args_to_stack(bprm, ) < 0) return -EFAULT; + sp &= ~15; #endif /* @@ -711,39 +710,6 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, /*/ /* - * transfer the program arguments and environment from the holding pages onto - * the stack - */ -#ifndef CONFIG_MMU -static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *bprm, - unsigned long *_sp) -{ - unsigned long index, stop, sp; - char *src; - int ret = 0; - - stop = bprm->p >> PAGE_SHIFT; - sp = *_sp; - - for (index = MAX_ARG_PAGES - 1; index >= stop; index--) { - src = kmap(bprm->page[index]); - sp -= PAGE_SIZE; - if (copy_to_user((void *) sp, src, PAGE_SIZE) != 0) - ret = -EFAULT; - kunmap(bprm->page[index]); - if (ret < 0) - goto out; - } - - *_sp = (*_sp - (MAX_ARG_PAGES * PAGE_SIZE - bprm->p)) & ~15; - -out: - return ret; -} -#endif - -/*/ -/* * load the appropriate binary image (executable or interpreter) into memory * - we assume no MMU is available * - if no other PIC bits are set in params->hdr->e_flags diff --git a/fs/exec.c b/fs/exec.c index 887c1c955d..ef0df2f092 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -762,6 +762,39 @@ out_unlock: } EXPORT_SYMBOL(setup_arg_pages); +#else + +/* + * Transfer the program arguments and environment from the holding pages + * onto the stack. The provided stack pointer is adjusted accordingly. + */ +int transfer_args_to_stack(struct linux_binprm *bprm, + unsigned long *sp_location) +{ + unsigned long index, stop, sp; + int ret = 0; + + stop = bprm->p >> PAGE_SHIFT; + sp = *sp_location; + + for (index = MAX_ARG_PAGES - 1; index >= stop; index--) { + unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0; + char *src = kmap(bprm->page[index]) + offset; + sp -= PAGE_SIZE - offset; + if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0) + ret = -EFAULT; + kunmap(bprm->page[index]); + if (ret) + goto out; + } + + *sp_location = sp; + +out: + return ret; +} +EXPORT_SYMBOL(transfer_args_to_stack); + #endif /* CONFIG_MMU */ static struct file *do_open_execat(int fd, struct filename *name, int flags) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 314b3caa70..1303b570b1 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -113,6 +113,8 @@ extern int suid_dumpable; extern int setup_arg_pages(struct linux_binprm * bprm, unsigned long stack_top, int executable_stack); +extern int transfer_args_to_stack(struct linux_binprm *bprm, + unsigned long *sp_location); extern int bprm_change_interp(char *interp, struct linux_binprm *bprm); extern int copy_strings_kernel(int argc, const char *const *argv, struct linux_binprm *bprm); -- 2.7.4
[PATCH v2 03/10] binfmt_flat: use generic transfer_args_to_stack()
This gets rid of the rather ugly, open coded and suboptimal copy code. Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 085059d879..64feb873f0 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -854,10 +854,8 @@ static int load_flat_binary(struct linux_binprm * bprm) { struct lib_info libinfo; struct pt_regs *regs = current_pt_regs(); - unsigned long p = bprm->p; - unsigned long stack_len; + unsigned long sp, stack_len; unsigned long start_addr; - unsigned long *sp; int res; int i, j; @@ -892,15 +890,15 @@ static int load_flat_binary(struct linux_binprm * bprm) set_binfmt(_format); - p = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4; - DBG_FLT("p=%lx\n", p); + sp = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4; + DBG_FLT("sp=%lx\n", sp); - /* copy the arg pages onto the stack, this could be more efficient :-) */ - for (i = TOP_OF_ARGS - 1; i >= bprm->p; i--) - * (char *) --p = - ((char *) page_address(bprm->page[i/PAGE_SIZE]))[i % PAGE_SIZE]; + /* copy the arg pages onto the stack */ + res = transfer_args_to_stack(bprm, ); + if (res) + return res; - sp = (unsigned long *) create_flat_tables(p, bprm); + sp = create_flat_tables(sp, bprm); /* Fake some return addresses to ensure the call chain will * initialise library in order for us. We are required to call @@ -912,14 +910,14 @@ static int load_flat_binary(struct linux_binprm * bprm) for (i = MAX_SHARED_LIBS-1; i>0; i--) { if (libinfo.lib_list[i].loaded) { /* Push previos first to call address */ - --sp; put_user(start_addr, sp); + --sp; put_user(start_addr, (unsigned long *)sp); start_addr = libinfo.lib_list[i].entry; } } #endif /* Stash our initial stack pointer into the mm structure */ - current->mm->start_stack = (unsigned long )sp; + current->mm->start_stack = sp; #ifdef FLAT_PLAT_INIT FLAT_PLAT_INIT(regs); -- 2.7.4
[PATCH v2 01/10] binfmt_flat: assorted cleanups
Remove excessive casts, do some code grouping, etc. No functional changes. Signed-off-by: Nicolas Pitre --- fs/binfmt_flat.c | 118 ++- 1 file changed, 56 insertions(+), 62 deletions(-) diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index caf9e39bb8..085059d879 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -80,7 +80,7 @@ struct lib_info { unsigned long text_len; /* Length of text segment */ unsigned long entry;/* Start address for this module */ unsigned long build_date; /* When this one was compiled */ - short loaded; /* Has this library been loaded? */ + bool loaded;/* Has this library been loaded? */ } lib_list[MAX_SHARED_LIBS]; }; @@ -107,7 +107,7 @@ static struct linux_binfmt flat_format = { static int flat_core_dump(struct coredump_params *cprm) { printk("Process %s:%d received signr %d and should have core dumped\n", - current->comm, current->pid, (int) cprm->siginfo->si_signo); + current->comm, current->pid, cprm->siginfo->si_signo); return(1); } @@ -190,7 +190,7 @@ static int decompress_exec( loff_t fpos; int ret, retval; - DBG_FLT("decompress_exec(offset=%x,buf=%x,len=%x)\n",(int)offset, (int)dst, (int)len); + DBG_FLT("decompress_exec(offset=%lx,buf=%p,len=%lx)\n",offset, dst, len); memset(, 0, sizeof(strm)); strm.workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); @@ -358,8 +358,8 @@ calc_reloc(unsigned long r, struct lib_info *p, int curid, int internalp) text_len = p->lib_list[id].text_len; if (!flat_reloc_valid(r, start_brk - start_data + text_len)) { - printk("BINFMT_FLAT: reloc outside program 0x%x (0 - 0x%x/0x%x)", - (int) r,(int)(start_brk-start_data+text_len),(int)text_len); + printk("BINFMT_FLAT: reloc outside program 0x%lx (0 - 0x%lx/0x%lx)", + r, start_brk-start_data+text_len, text_len); goto failed; } @@ -383,7 +383,7 @@ failed: static void old_reloc(unsigned long rl) { #ifdef DEBUG - char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" }; + static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" }; #endif flat_v2_reloc_t r; unsigned long *ptr; @@ -397,8 +397,8 @@ static void old_reloc(unsigned long rl) #ifdef DEBUG printk("Relocation of variable at DATASEG+%x " - "(address %p, currently %x) into segment %s\n", - r.reloc.offset, ptr, (int)*ptr, segment[r.reloc.type]); + "(address %p, currently %lx) into segment %s\n", + r.reloc.offset, ptr, *ptr, segment[r.reloc.type]); #endif switch (r.reloc.type) { @@ -417,7 +417,7 @@ static void old_reloc(unsigned long rl) } #ifdef DEBUG - printk("Relocation became %x\n", (int)*ptr); + printk("Relocation became %lx\n", *ptr); #endif } @@ -427,17 +427,15 @@ static int load_flat_file(struct linux_binprm * bprm, struct lib_info *libinfo, int id, unsigned long *extra_stack) { struct flat_hdr * hdr; - unsigned long textpos = 0, datapos = 0, result; - unsigned long realdatastart = 0; - unsigned long text_len, data_len, bss_len, stack_len, flags; - unsigned long full_data; - unsigned long len, memp = 0; - unsigned long memp_size, extra, rlim; - unsigned long *reloc = 0, *rp; + unsigned long textpos, datapos, realdatastart; + unsigned long text_len, data_len, bss_len, stack_len, full_data, flags; + unsigned long len, memp, memp_size, extra, rlim; + unsigned long *reloc, *rp; struct inode *inode; - int i, rev, relocs = 0; + int i, rev, relocs; loff_t fpos; unsigned long start_code, end_code; + ssize_t result; int ret; hdr = ((struct flat_hdr *) bprm->buf); /* exec-header */ @@ -481,8 +479,8 @@ static int load_flat_file(struct linux_binprm * bprm, /* Don't allow old format executables to use shared libraries */ if (rev == OLD_FLAT_VERSION && id != 0) { - printk("BINFMT_FLAT: shared libraries are not available before rev 0x%x\n", - (int) FLAT_VERSION); + printk("BINFMT_FLAT: shared libraries are not available before rev 0x%lx\n", + FLAT_VERSION); ret = -ENOEXEC; goto err; } @@ -517,11 +515,9 @@ static int load_flat_file(struct linux_binprm * bprm, /* Flush all traces of the currently running executable */ if (id == 0) { - result =
[PATCH v2 02/10] elf_fdpic_transfer_args_to_stack(): make it generic
This copying of arguments and environment is common to both NOMMU binary formats we support. Let's make the elf_fdpic version available to the flat format as well. While at it, improve the code a bit not to copy below the actual data area. Signed-off-by: Nicolas Pitre --- fs/binfmt_elf_fdpic.c | 38 ++ fs/exec.c | 33 + include/linux/binfmts.h | 2 ++ 3 files changed, 37 insertions(+), 36 deletions(-) diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index 203589311b..464a972e88 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -67,8 +67,6 @@ static int create_elf_fdpic_tables(struct linux_binprm *, struct mm_struct *, struct elf_fdpic_params *); #ifndef CONFIG_MMU -static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *, - unsigned long *); static int elf_fdpic_map_file_constdisp_on_uclinux(struct elf_fdpic_params *, struct file *, struct mm_struct *); @@ -515,8 +513,9 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, sp = mm->start_stack; /* stack the program arguments and environment */ - if (elf_fdpic_transfer_args_to_stack(bprm, ) < 0) + if (transfer_args_to_stack(bprm, ) < 0) return -EFAULT; + sp &= ~15; #endif /* @@ -711,39 +710,6 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, /*/ /* - * transfer the program arguments and environment from the holding pages onto - * the stack - */ -#ifndef CONFIG_MMU -static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *bprm, - unsigned long *_sp) -{ - unsigned long index, stop, sp; - char *src; - int ret = 0; - - stop = bprm->p >> PAGE_SHIFT; - sp = *_sp; - - for (index = MAX_ARG_PAGES - 1; index >= stop; index--) { - src = kmap(bprm->page[index]); - sp -= PAGE_SIZE; - if (copy_to_user((void *) sp, src, PAGE_SIZE) != 0) - ret = -EFAULT; - kunmap(bprm->page[index]); - if (ret < 0) - goto out; - } - - *_sp = (*_sp - (MAX_ARG_PAGES * PAGE_SIZE - bprm->p)) & ~15; - -out: - return ret; -} -#endif - -/*/ -/* * load the appropriate binary image (executable or interpreter) into memory * - we assume no MMU is available * - if no other PIC bits are set in params->hdr->e_flags diff --git a/fs/exec.c b/fs/exec.c index 887c1c955d..ef0df2f092 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -762,6 +762,39 @@ out_unlock: } EXPORT_SYMBOL(setup_arg_pages); +#else + +/* + * Transfer the program arguments and environment from the holding pages + * onto the stack. The provided stack pointer is adjusted accordingly. + */ +int transfer_args_to_stack(struct linux_binprm *bprm, + unsigned long *sp_location) +{ + unsigned long index, stop, sp; + int ret = 0; + + stop = bprm->p >> PAGE_SHIFT; + sp = *sp_location; + + for (index = MAX_ARG_PAGES - 1; index >= stop; index--) { + unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0; + char *src = kmap(bprm->page[index]) + offset; + sp -= PAGE_SIZE - offset; + if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0) + ret = -EFAULT; + kunmap(bprm->page[index]); + if (ret) + goto out; + } + + *sp_location = sp; + +out: + return ret; +} +EXPORT_SYMBOL(transfer_args_to_stack); + #endif /* CONFIG_MMU */ static struct file *do_open_execat(int fd, struct filename *name, int flags) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 314b3caa70..1303b570b1 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -113,6 +113,8 @@ extern int suid_dumpable; extern int setup_arg_pages(struct linux_binprm * bprm, unsigned long stack_top, int executable_stack); +extern int transfer_args_to_stack(struct linux_binprm *bprm, + unsigned long *sp_location); extern int bprm_change_interp(char *interp, struct linux_binprm *bprm); extern int copy_strings_kernel(int argc, const char *const *argv, struct linux_binprm *bprm); -- 2.7.4