Re: [PATCH V2] leds: trigger: Introduce an USB port trigger

2016-07-17 Thread Rafał Miłecki
On 18 July 2016 at 07:40, Peter Chen  wrote:
> On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote:
>> On 18 July 2016 at 04:31, Peter Chen  wrote:
>> > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote:
>> >> +
>> >> +usbport trigger:
>> >> +- usb-ports : List of USB ports that usbport should observed for turning 
>> >> on a
>> >> +   given LED.
>> >> +
>> >
>> > %s/should/should be
>>
>> Thanks.
>>
>>
>> >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c 
>> >> b/drivers/leds/trigger/ledtrig-usbport.c
>> >> new file mode 100644
>> >> index 000..97b064c
>> >> --- /dev/null
>> >> +++ b/drivers/leds/trigger/ledtrig-usbport.c
>> >> @@ -0,0 +1,206 @@
>> >> +/*
>> >> + * USB port LED trigger
>> >> + *
>> >> + * Copyright (C) 2016 Rafał Miłecki 
>> >> + *
>> >> + * This program is free software; you can redistribute it and/or modify
>> >> + * it under the terms of the GNU General Public License as published by
>> >> + * the Free Software Foundation; either version 2 of the License, or (at
>> >> + * your option) any later version.
>> >> + */
>> >
>> > GPL v2 only.
>> >
>> >> +MODULE_AUTHOR("Rafał Miłecki ");
>> >> +MODULE_DESCRIPTION("USB port trigger");
>> >> +MODULE_LICENSE("GPL");
>> >
>> > GPL v2
>>
>> What's the reason for this? I don't have any real preference, but I
>> never heard heard about kernel/Linux preference neither.
>>
>
> https://en.wikipedia.org/wiki/Linux_kernel

Well, Linux is released under GPL v2, I'm well aware of that. It means
all its code needs to be GPL v2 compatible. There are multiple
compatible licenses: MIT, BSD 3-clause, BSD 2-clause. The one I used
allows treating code as GPL V2 as well. I could release this code
using MIT and it should be acceptable as well.

I still don't see what's wrong with the picked license.

-- 
Rafał


Re: [PATCH V2] leds: trigger: Introduce an USB port trigger

2016-07-17 Thread Rafał Miłecki
On 18 July 2016 at 07:40, Peter Chen  wrote:
> On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote:
>> On 18 July 2016 at 04:31, Peter Chen  wrote:
>> > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote:
>> >> +
>> >> +usbport trigger:
>> >> +- usb-ports : List of USB ports that usbport should observed for turning 
>> >> on a
>> >> +   given LED.
>> >> +
>> >
>> > %s/should/should be
>>
>> Thanks.
>>
>>
>> >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c 
>> >> b/drivers/leds/trigger/ledtrig-usbport.c
>> >> new file mode 100644
>> >> index 000..97b064c
>> >> --- /dev/null
>> >> +++ b/drivers/leds/trigger/ledtrig-usbport.c
>> >> @@ -0,0 +1,206 @@
>> >> +/*
>> >> + * USB port LED trigger
>> >> + *
>> >> + * Copyright (C) 2016 Rafał Miłecki 
>> >> + *
>> >> + * This program is free software; you can redistribute it and/or modify
>> >> + * it under the terms of the GNU General Public License as published by
>> >> + * the Free Software Foundation; either version 2 of the License, or (at
>> >> + * your option) any later version.
>> >> + */
>> >
>> > GPL v2 only.
>> >
>> >> +MODULE_AUTHOR("Rafał Miłecki ");
>> >> +MODULE_DESCRIPTION("USB port trigger");
>> >> +MODULE_LICENSE("GPL");
>> >
>> > GPL v2
>>
>> What's the reason for this? I don't have any real preference, but I
>> never heard heard about kernel/Linux preference neither.
>>
>
> https://en.wikipedia.org/wiki/Linux_kernel

Well, Linux is released under GPL v2, I'm well aware of that. It means
all its code needs to be GPL v2 compatible. There are multiple
compatible licenses: MIT, BSD 3-clause, BSD 2-clause. The one I used
allows treating code as GPL V2 as well. I could release this code
using MIT and it should be acceptable as well.

I still don't see what's wrong with the picked license.

-- 
Rafał


linux-next: manual merge of the kvm tree with the powerpc tree

2016-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/kernel/idle_book3s.S

between commit:

  69c592ed40d3 ("powerpc/opal: Add real mode call wrappers")

from the powerpc tree and commit:

  fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on 
HMI interrupt")

from the kvm tree.

I fixed it up (on Michael's advise, I used the version form the kvm tree)
and can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the kvm tree with the powerpc tree

2016-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/kernel/idle_book3s.S

between commit:

  69c592ed40d3 ("powerpc/opal: Add real mode call wrappers")

from the powerpc tree and commit:

  fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on 
HMI interrupt")

from the kvm tree.

I fixed it up (on Michael's advise, I used the version form the kvm tree)
and can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the kvm tree with the powerpc tree

2016-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/kernel/exceptions-64s.S

between commit:

  9baaef0a22c8 ("powerpc/irq: Add support for HV virtualization interrupts")

from the powerpc tree and commit:

  fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on 
HMI interrupt")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/kernel/exceptions-64s.S
index 6200e4925d26,0eba47e074b9..
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@@ -669,8 -680,8 +669,10 @@@ _GLOBAL(__replay_interrupt
  BEGIN_FTR_SECTION
cmpwi   r3,0xe80
beq h_doorbell_common
 +  cmpwi   r3,0xea0
 +  beq h_virt_irq_common
+   cmpwi   r3,0xe60
+   beq hmi_exception_common
  FTR_SECTION_ELSE
cmpwi   r3,0xa00
beq doorbell_super_common
@@@ -1161,18 -1172,9 +1163,18 @@@ fwnmi_data_area
. = 0x8000
  #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */
  
 +  STD_EXCEPTION_COMMON(0xf60, facility_unavailable, 
facility_unavailable_exception)
 +  STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, 
facility_unavailable_exception)
 +
 +#ifdef CONFIG_CBE_RAS
 +  STD_EXCEPTION_COMMON(0x1200, cbe_system_error, 
cbe_system_error_exception)
 +  STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception)
 +  STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception)
 +#endif /* CONFIG_CBE_RAS */
 +
.globl hmi_exception_early
  hmi_exception_early:
-   EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0xe60)
+   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, 0xe62)
mr  r10,r1  /* Save r1  */
ld  r1,PACAEMERGSP(r13) /* Use emergency stack  */
subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/


linux-next: manual merge of the kvm tree with the powerpc tree

2016-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/kernel/exceptions-64s.S

between commit:

  9baaef0a22c8 ("powerpc/irq: Add support for HV virtualization interrupts")

from the powerpc tree and commit:

  fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on 
HMI interrupt")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/kernel/exceptions-64s.S
index 6200e4925d26,0eba47e074b9..
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@@ -669,8 -680,8 +669,10 @@@ _GLOBAL(__replay_interrupt
  BEGIN_FTR_SECTION
cmpwi   r3,0xe80
beq h_doorbell_common
 +  cmpwi   r3,0xea0
 +  beq h_virt_irq_common
+   cmpwi   r3,0xe60
+   beq hmi_exception_common
  FTR_SECTION_ELSE
cmpwi   r3,0xa00
beq doorbell_super_common
@@@ -1161,18 -1172,9 +1163,18 @@@ fwnmi_data_area
. = 0x8000
  #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */
  
 +  STD_EXCEPTION_COMMON(0xf60, facility_unavailable, 
facility_unavailable_exception)
 +  STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, 
facility_unavailable_exception)
 +
 +#ifdef CONFIG_CBE_RAS
 +  STD_EXCEPTION_COMMON(0x1200, cbe_system_error, 
cbe_system_error_exception)
 +  STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception)
 +  STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception)
 +#endif /* CONFIG_CBE_RAS */
 +
.globl hmi_exception_early
  hmi_exception_early:
-   EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0xe60)
+   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, 0xe62)
mr  r10,r1  /* Save r1  */
ld  r1,PACAEMERGSP(r13) /* Use emergency stack  */
subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/


Re: [patch] phy: phy-brcm-sata: fix a loop timeout

2016-07-17 Thread Yendapally Reddy Dhananjaya Reddy
On Tue, Jun 21, 2016 at 2:07 PM, Dan Carpenter  wrote:
> Since this loop is a post op then it means we end with "try == -1" but
> afterward we test for if it's zero.  Fix this by changing to a pre-op so
> we end on zero.

Thanks Dan. That should be pre-op.

Thnaks
Dhananjay
>
> Fixes: 024812889ad1 ('phy: Add SATA3 PHY support for Broadcom NSP SoC')
> Signed-off-by: Dan Carpenter 
>
> diff --git a/drivers/phy/phy-brcm-sata.c b/drivers/phy/phy-brcm-sata.c
> index 18d6626..c86456f 100644
> --- a/drivers/phy/phy-brcm-sata.c
> +++ b/drivers/phy/phy-brcm-sata.c
> @@ -329,7 +329,7 @@ static int brcm_nsp_sata_init(struct brcm_sata_port *port)
>
> /* Wait for pll_seq_done bit */
> try = 50;
> -   while (try--) {
> +   while (--try) {
> val = brcm_sata_phy_rd(base, BLOCK0_REG_BANK,
> BLOCK0_XGXSSTATUS);
> if (val & BLOCK0_XGXSSTATUS_PLL_LOCK)


Re: [patch] phy: phy-brcm-sata: fix a loop timeout

2016-07-17 Thread Yendapally Reddy Dhananjaya Reddy
On Tue, Jun 21, 2016 at 2:07 PM, Dan Carpenter  wrote:
> Since this loop is a post op then it means we end with "try == -1" but
> afterward we test for if it's zero.  Fix this by changing to a pre-op so
> we end on zero.

Thanks Dan. That should be pre-op.

Thnaks
Dhananjay
>
> Fixes: 024812889ad1 ('phy: Add SATA3 PHY support for Broadcom NSP SoC')
> Signed-off-by: Dan Carpenter 
>
> diff --git a/drivers/phy/phy-brcm-sata.c b/drivers/phy/phy-brcm-sata.c
> index 18d6626..c86456f 100644
> --- a/drivers/phy/phy-brcm-sata.c
> +++ b/drivers/phy/phy-brcm-sata.c
> @@ -329,7 +329,7 @@ static int brcm_nsp_sata_init(struct brcm_sata_port *port)
>
> /* Wait for pll_seq_done bit */
> try = 50;
> -   while (try--) {
> +   while (--try) {
> val = brcm_sata_phy_rd(base, BLOCK0_REG_BANK,
> BLOCK0_XGXSSTATUS);
> if (val & BLOCK0_XGXSSTATUS_PLL_LOCK)


Re: [PATCH 04/10] phy: da8xx-usb: new driver for DA8xx SoC USB PHY

2016-07-17 Thread Kishon Vijay Abraham I
Hi Arnd,

On Saturday 16 July 2016 02:44 AM, Arnd Bergmann wrote:
> On Tuesday, July 5, 2016 10:53:51 AM CEST Kishon Vijay Abraham I wrote:
>> From: David Lechner 
>>
>> This is a new phy driver for the SoC USB controllers on the TI DA8xx
>> family of microcontrollers. The USB 1.1 PHY is just a simple on/off.
>> The USB 2.0 PHY also allows overriding the VBUS and ID pins.
>>
>> Signed-off-by: David Lechner 
>> Signed-off-by: Kishon Vijay Abraham I 
> 
> This is now in linux-next, but fails to build:
> 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
> 
> drivers/phy/phy-da8xx-usb.c:19:37: fatal error: linux/mfd/da8xx-cfgchip.h: No 
> such file or directory

I'll look at this.

Thanks
Kishon


Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-07-17 Thread Namhyung Kim
Hello,

On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote:
> On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim  wrote:
> > The virtio pstore driver provides interface to the pstore subsystem so
> > that the guest kernel's log/dump message can be saved on the host
> > machine.  Users can access the log file directly on the host, or on the
> > guest at the next boot using pstore filesystem.  It currently deals with
> > kernel log (printk) buffer only, but we can extend it to have other
> > information (like ftrace dump) later.
> >
> > It supports legacy PCI device using single order-2 page buffer.  As all
> > operation of pstore is synchronous, it would be fine IMHO.  However I
> > don't know how to make write operation synchronous since it's called
> > with a spinlock held (from any context including NMI).
> >
> > Cc: Paolo Bonzini 
> > Cc: Radim Kr??m 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Anthony Liguori 
> > Cc: Anton Vorontsov 
> > Cc: Colin Cross 
> > Cc: Kees Cook 
> > Cc: Tony Luck 
> > Cc: Steven Rostedt 
> > Cc: Ingo Molnar 
> > Cc: Minchan Kim 
> > Cc: k...@vger.kernel.org
> > Cc: qemu-de...@nongnu.org
> > Cc: virtualizat...@lists.linux-foundation.org
> > Signed-off-by: Namhyung Kim 
> 
> This looks great to me! I'd love to use this in qemu. (Right now I go
> through hoops to use the ramoops backend for testing.)
> 
> Reviewed-by: Kees Cook 

Thank you!

> 
> Notes below...
>

[SNIP]
> > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id 
> > type)
> > +{
> > +   u16 ret;
> > +
> > +   switch (type) {
> > +   case PSTORE_TYPE_DMESG:
> > +   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
> > +   break;
> > +   default:
> > +   ret = cpu_to_virtio16(vps->vdev, 
> > VIRTIO_PSTORE_TYPE_UNKNOWN);
> > +   break;
> > +   }
> 
> I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
> relatively easy to add: I think it'd just be another virtio command?

Do you want to append the data to the host file as guest does
printk()?  I think it needs some kind of buffer management, but it's
not hard to add IMHO.


> 
> > +
> > +   return ret;
> > +}
> > +

[SNIP]
> > +static int notrace virt_pstore_write(enum pstore_type_id type,
> > +enum kmsg_dump_reason reason,
> > +u64 *id, unsigned int part, int count,
> > +bool compressed, size_t size,
> > +struct pstore_info *psi)
> > +{
> > +   struct virtio_pstore *vps = psi->data;
> > +   struct virtio_pstore_hdr *hdr = >hdr;
> > +   struct scatterlist sg[2];
> > +   unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
> > +
> > +   *id = vps->id++;
> > +
> > +   hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_WRITE);
> > +   hdr->id= cpu_to_virtio64(vps->vdev, *id);
> > +   hdr->flags = cpu_to_virtio32(vps->vdev, flags);
> > +   hdr->type  = to_virtio_type(vps, type);
> > +
> > +   sg_init_table(sg, 2);
> > +   sg_set_buf([0], hdr, sizeof(*hdr));
> > +   sg_set_buf([1], psi->buf, size);
> > +   virtqueue_add_outbuf(vps->vq, sg, 2, vps, GFP_ATOMIC);
> > +   virtqueue_kick(vps->vq);
> > +
> > +   /* TODO: make it synchronous */
> > +   return 0;
> 
> The down side to this being asynchronous is the lack of error
> reporting. Perhaps this could check hdr->type before queuing and error
> for any VIRTIO_PSTORE_TYPE_UNKNOWN message instead of trying to send
> it?

I cannot follow, sorry.  Could you please elaborate it more?


> 
> > +}
> > +
> > +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count,
> > +struct timespec time, struct pstore_info *psi)
> > +{
> > +   struct virtio_pstore *vps = psi->data;
> > +   struct virtio_pstore_hdr *hdr = >hdr;
> > +   struct scatterlist sg[1];
> > +   unsigned int len;
> > +
> > +   hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_ERASE);
> > +   hdr->id= cpu_to_virtio64(vps->vdev, id);
> > +   hdr->type  = to_virtio_type(vps, type);
> > +
> > +   sg_init_one(sg, hdr, sizeof(*hdr));
> > +   virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL);
> > +   virtqueue_kick(vps->vq);
> > +
> > +   wait_event(vps->acked, virtqueue_get_buf(vps->vq, ));
> > +   return 0;
> > +}
> > +
> > +static int virt_pstore_init(struct virtio_pstore *vps)
> > +{
> > +   struct pstore_info *psinfo = >pstore;
> > +   int err;
> > +
> > +   vps->id = 0;
> > +   vps->buflen = 0;
> > +   

Re: [PATCH 04/10] phy: da8xx-usb: new driver for DA8xx SoC USB PHY

2016-07-17 Thread Kishon Vijay Abraham I
Hi Arnd,

On Saturday 16 July 2016 02:44 AM, Arnd Bergmann wrote:
> On Tuesday, July 5, 2016 10:53:51 AM CEST Kishon Vijay Abraham I wrote:
>> From: David Lechner 
>>
>> This is a new phy driver for the SoC USB controllers on the TI DA8xx
>> family of microcontrollers. The USB 1.1 PHY is just a simple on/off.
>> The USB 2.0 PHY also allows overriding the VBUS and ID pins.
>>
>> Signed-off-by: David Lechner 
>> Signed-off-by: Kishon Vijay Abraham I 
> 
> This is now in linux-next, but fails to build:
> 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
> 
> drivers/phy/phy-da8xx-usb.c:19:37: fatal error: linux/mfd/da8xx-cfgchip.h: No 
> such file or directory

I'll look at this.

Thanks
Kishon


Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-07-17 Thread Namhyung Kim
Hello,

On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote:
> On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim  wrote:
> > The virtio pstore driver provides interface to the pstore subsystem so
> > that the guest kernel's log/dump message can be saved on the host
> > machine.  Users can access the log file directly on the host, or on the
> > guest at the next boot using pstore filesystem.  It currently deals with
> > kernel log (printk) buffer only, but we can extend it to have other
> > information (like ftrace dump) later.
> >
> > It supports legacy PCI device using single order-2 page buffer.  As all
> > operation of pstore is synchronous, it would be fine IMHO.  However I
> > don't know how to make write operation synchronous since it's called
> > with a spinlock held (from any context including NMI).
> >
> > Cc: Paolo Bonzini 
> > Cc: Radim Kr??m 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Anthony Liguori 
> > Cc: Anton Vorontsov 
> > Cc: Colin Cross 
> > Cc: Kees Cook 
> > Cc: Tony Luck 
> > Cc: Steven Rostedt 
> > Cc: Ingo Molnar 
> > Cc: Minchan Kim 
> > Cc: k...@vger.kernel.org
> > Cc: qemu-de...@nongnu.org
> > Cc: virtualizat...@lists.linux-foundation.org
> > Signed-off-by: Namhyung Kim 
> 
> This looks great to me! I'd love to use this in qemu. (Right now I go
> through hoops to use the ramoops backend for testing.)
> 
> Reviewed-by: Kees Cook 

Thank you!

> 
> Notes below...
>

[SNIP]
> > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id 
> > type)
> > +{
> > +   u16 ret;
> > +
> > +   switch (type) {
> > +   case PSTORE_TYPE_DMESG:
> > +   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
> > +   break;
> > +   default:
> > +   ret = cpu_to_virtio16(vps->vdev, 
> > VIRTIO_PSTORE_TYPE_UNKNOWN);
> > +   break;
> > +   }
> 
> I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
> relatively easy to add: I think it'd just be another virtio command?

Do you want to append the data to the host file as guest does
printk()?  I think it needs some kind of buffer management, but it's
not hard to add IMHO.


> 
> > +
> > +   return ret;
> > +}
> > +

[SNIP]
> > +static int notrace virt_pstore_write(enum pstore_type_id type,
> > +enum kmsg_dump_reason reason,
> > +u64 *id, unsigned int part, int count,
> > +bool compressed, size_t size,
> > +struct pstore_info *psi)
> > +{
> > +   struct virtio_pstore *vps = psi->data;
> > +   struct virtio_pstore_hdr *hdr = >hdr;
> > +   struct scatterlist sg[2];
> > +   unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
> > +
> > +   *id = vps->id++;
> > +
> > +   hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_WRITE);
> > +   hdr->id= cpu_to_virtio64(vps->vdev, *id);
> > +   hdr->flags = cpu_to_virtio32(vps->vdev, flags);
> > +   hdr->type  = to_virtio_type(vps, type);
> > +
> > +   sg_init_table(sg, 2);
> > +   sg_set_buf([0], hdr, sizeof(*hdr));
> > +   sg_set_buf([1], psi->buf, size);
> > +   virtqueue_add_outbuf(vps->vq, sg, 2, vps, GFP_ATOMIC);
> > +   virtqueue_kick(vps->vq);
> > +
> > +   /* TODO: make it synchronous */
> > +   return 0;
> 
> The down side to this being asynchronous is the lack of error
> reporting. Perhaps this could check hdr->type before queuing and error
> for any VIRTIO_PSTORE_TYPE_UNKNOWN message instead of trying to send
> it?

I cannot follow, sorry.  Could you please elaborate it more?


> 
> > +}
> > +
> > +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count,
> > +struct timespec time, struct pstore_info *psi)
> > +{
> > +   struct virtio_pstore *vps = psi->data;
> > +   struct virtio_pstore_hdr *hdr = >hdr;
> > +   struct scatterlist sg[1];
> > +   unsigned int len;
> > +
> > +   hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_ERASE);
> > +   hdr->id= cpu_to_virtio64(vps->vdev, id);
> > +   hdr->type  = to_virtio_type(vps, type);
> > +
> > +   sg_init_one(sg, hdr, sizeof(*hdr));
> > +   virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL);
> > +   virtqueue_kick(vps->vq);
> > +
> > +   wait_event(vps->acked, virtqueue_get_buf(vps->vq, ));
> > +   return 0;
> > +}
> > +
> > +static int virt_pstore_init(struct virtio_pstore *vps)
> > +{
> > +   struct pstore_info *psinfo = >pstore;
> > +   int err;
> > +
> > +   vps->id = 0;
> > +   vps->buflen = 0;
> > +   psinfo->bufsize = VIRT_PSTORE_BUFSIZE;
> > +   psinfo->buf = (void *)__get_free_pages(GFP_KERNEL, 
> > VIRT_PSTORE_ORDER);
> > +   if (!psinfo->buf) {
> > +   pr_err("cannot allocate pstore buffer\n");
> > +   return -ENOMEM;
> > +   }
> > +
> > +   

RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's

2016-07-17 Thread Tan, Jui Nee


> -Original Message-
> From: Tan, Jui Nee
> Sent: Monday, July 18, 2016 11:35 AM
> To: 'Paul Gortmaker' ;
> andriy.shevche...@linux.intel.com
> Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com;
> t...@linutronix.de; mi...@redhat.com; H. Peter Anvin ;
> X86 ML ; pty...@xes-inc.com; Lee Jones
> ; Linus Walleij ; linux-
> g...@vger.kernel.org; LKML ; Yong,
> Jonathan ; Yu, Ong Hock
> ; Voon, Weifeng ; Wan
> Mohamad, Wan Ahmad Zainie
> 
> Subject: RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband
> bridge support driver for Intel SOC's
> 
> 
> 
> > -Original Message-
> > From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On
> > Behalf Of Paul Gortmaker
> > Sent: Friday, July 15, 2016 8:01 AM
> > To: Tan, Jui Nee 
> > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com;
> > andriy.shevche...@linux.intel.com; t...@linutronix.de;
> > mi...@redhat.com; H. Peter Anvin ; X86 ML
> > ; pty...@xes-inc.com; Lee Jones
> ;
> > Linus Walleij ; linux-g...@vger.kernel.org; LKML
> > ; Yong, Jonathan
> > ; Yu, Ong Hock ;
> Voon,
> > Weifeng ; Wan Mohamad, Wan Ahmad Zainie
> > 
> > Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband
> > bridge support driver for Intel SOC's
> >
> > On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee 
> wrote:
> > > From: Andy Shevchenko 
> > >
> > > There is already one and at least one more user coming which require
> > > an access to Primary to Sideband bridge (P2SB) in order to get IO or
> > > MMIO bar hidden by BIOS.
> > > Create a driver to access P2SB for x86 devices.
> > >
> > > Signed-off-by: Yong, Jonathan 
> > > Signed-off-by: Andy Shevchenko 
> > > ---
> > > Changes in V6:
> > > - No change
> > >
> > > Changes in V5:
> > > - No change
> > >
> > > Changes in V4:
> > > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from
> > >   [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge
> > support driver for Intel SOC's
> > >   to
> > >   [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO
> > pinctrl in non-ACPI system
> > >   since the config is used in latter patch.
> > >
> > > Changes in V3:
> > > - No change
> > >
> > > Changes in V2:
> > > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select
> > PINCTRL"
> > >   to fix kbuildbot error
> > >
> > >  arch/x86/Kconfig |  4 ++
> > >  arch/x86/include/asm/p2sb.h  | 27 +++
> > >  arch/x86/platform/intel/Makefile |  1 +
> > >  arch/x86/platform/intel/p2sb.c   | 99
> > 
> > >  4 files changed, 131 insertions(+)
> > >  create mode 100644 arch/x86/include/asm/p2sb.h  create mode 100644
> > > arch/x86/platform/intel/p2sb.c
> > >
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > > d9a94da..d305d81 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG
> > >
> > >   If you don't require the option or are in doubt, say N.
> > >
> > > +config P2SB
> > > +   tristate
> >
> > OK, this is tristate, but then
> >
> P2SB is tristate as currently it is only used by LPC_ICH that is tristate too.
> ...
> config LPC_ICH
>   tristate "Intel ICH LPC"
>   depends on X86 && PCI
>   select MFD_CORE
>   select P2SB
> ...
> > > +   depends on PCI
> > > +
> > >  config X86_RDC321X
> > > bool "RDC R-321x SoC"
> > > depends on X86_32
> > > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h
> > > new file mode 100644 index 000..686e07b
> > > --- /dev/null
> > > +++ b/arch/x86/include/asm/p2sb.h
> > > @@ -0,0 +1,27 @@
> > > +/*
> > > + * Primary to Sideband bridge (P2SB) access support  */
> > > +
> > > +#ifndef P2SB_SYMS_H
> > > +#define P2SB_SYMS_H
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +#if IS_ENABLED(CONFIG_P2SB)
> > > +
> > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > > +   struct resource *res);
> > > +
> > > +#else /* CONFIG_P2SB is not set */
> > > +
> > > +static inline
> > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > > +   struct resource *res)
> > > +{
> > > +   return -ENODEV;
> > > +}
> > > +
> > > +#endif /* CONFIG_P2SB */
> > > +
> > > +#endif /* P2SB_SYMS_H */
> > > diff 

RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's

2016-07-17 Thread Tan, Jui Nee


> -Original Message-
> From: Tan, Jui Nee
> Sent: Monday, July 18, 2016 11:35 AM
> To: 'Paul Gortmaker' ;
> andriy.shevche...@linux.intel.com
> Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com;
> t...@linutronix.de; mi...@redhat.com; H. Peter Anvin ;
> X86 ML ; pty...@xes-inc.com; Lee Jones
> ; Linus Walleij ; linux-
> g...@vger.kernel.org; LKML ; Yong,
> Jonathan ; Yu, Ong Hock
> ; Voon, Weifeng ; Wan
> Mohamad, Wan Ahmad Zainie
> 
> Subject: RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband
> bridge support driver for Intel SOC's
> 
> 
> 
> > -Original Message-
> > From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On
> > Behalf Of Paul Gortmaker
> > Sent: Friday, July 15, 2016 8:01 AM
> > To: Tan, Jui Nee 
> > Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com;
> > andriy.shevche...@linux.intel.com; t...@linutronix.de;
> > mi...@redhat.com; H. Peter Anvin ; X86 ML
> > ; pty...@xes-inc.com; Lee Jones
> ;
> > Linus Walleij ; linux-g...@vger.kernel.org; LKML
> > ; Yong, Jonathan
> > ; Yu, Ong Hock ;
> Voon,
> > Weifeng ; Wan Mohamad, Wan Ahmad Zainie
> > 
> > Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband
> > bridge support driver for Intel SOC's
> >
> > On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee 
> wrote:
> > > From: Andy Shevchenko 
> > >
> > > There is already one and at least one more user coming which require
> > > an access to Primary to Sideband bridge (P2SB) in order to get IO or
> > > MMIO bar hidden by BIOS.
> > > Create a driver to access P2SB for x86 devices.
> > >
> > > Signed-off-by: Yong, Jonathan 
> > > Signed-off-by: Andy Shevchenko 
> > > ---
> > > Changes in V6:
> > > - No change
> > >
> > > Changes in V5:
> > > - No change
> > >
> > > Changes in V4:
> > > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from
> > >   [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge
> > support driver for Intel SOC's
> > >   to
> > >   [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO
> > pinctrl in non-ACPI system
> > >   since the config is used in latter patch.
> > >
> > > Changes in V3:
> > > - No change
> > >
> > > Changes in V2:
> > > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select
> > PINCTRL"
> > >   to fix kbuildbot error
> > >
> > >  arch/x86/Kconfig |  4 ++
> > >  arch/x86/include/asm/p2sb.h  | 27 +++
> > >  arch/x86/platform/intel/Makefile |  1 +
> > >  arch/x86/platform/intel/p2sb.c   | 99
> > 
> > >  4 files changed, 131 insertions(+)
> > >  create mode 100644 arch/x86/include/asm/p2sb.h  create mode 100644
> > > arch/x86/platform/intel/p2sb.c
> > >
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > > d9a94da..d305d81 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG
> > >
> > >   If you don't require the option or are in doubt, say N.
> > >
> > > +config P2SB
> > > +   tristate
> >
> > OK, this is tristate, but then
> >
> P2SB is tristate as currently it is only used by LPC_ICH that is tristate too.
> ...
> config LPC_ICH
>   tristate "Intel ICH LPC"
>   depends on X86 && PCI
>   select MFD_CORE
>   select P2SB
> ...
> > > +   depends on PCI
> > > +
> > >  config X86_RDC321X
> > > bool "RDC R-321x SoC"
> > > depends on X86_32
> > > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h
> > > new file mode 100644 index 000..686e07b
> > > --- /dev/null
> > > +++ b/arch/x86/include/asm/p2sb.h
> > > @@ -0,0 +1,27 @@
> > > +/*
> > > + * Primary to Sideband bridge (P2SB) access support  */
> > > +
> > > +#ifndef P2SB_SYMS_H
> > > +#define P2SB_SYMS_H
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +#if IS_ENABLED(CONFIG_P2SB)
> > > +
> > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > > +   struct resource *res);
> > > +
> > > +#else /* CONFIG_P2SB is not set */
> > > +
> > > +static inline
> > > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > > +   struct resource *res)
> > > +{
> > > +   return -ENODEV;
> > > +}
> > > +
> > > +#endif /* CONFIG_P2SB */
> > > +
> > > +#endif /* P2SB_SYMS_H */
> > > diff --git a/arch/x86/platform/intel/Makefile
> > > b/arch/x86/platform/intel/Makefile
> > > index b878032..dbf9f10 100644
> > > --- a/arch/x86/platform/intel/Makefile
> > > +++ b/arch/x86/platform/intel/Makefile
> > > @@ -1 +1,2 @@
> > >  obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
> > > +obj-$(CONFIG_P2SB) += p2sb.o
> > > diff --git a/arch/x86/platform/intel/p2sb.c
> > > b/arch/x86/platform/intel/p2sb.c new file mode 100644 index
> > > 000..8be47a4
> > > --- /dev/null
> > > +++ b/arch/x86/platform/intel/p2sb.c
> > > @@ -0,0 +1,99 @@
> > > +/*
> > > + * Primary to Sideband bridge 

Re: [PATCH V2] leds: trigger: Introduce an USB port trigger

2016-07-17 Thread Peter Chen
On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote:
> On 18 July 2016 at 04:31, Peter Chen  wrote:
> > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote:
> >> +
> >> +usbport trigger:
> >> +- usb-ports : List of USB ports that usbport should observed for turning 
> >> on a
> >> +   given LED.
> >> +
> >
> > %s/should/should be
> 
> Thanks.
> 
> 
> >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c 
> >> b/drivers/leds/trigger/ledtrig-usbport.c
> >> new file mode 100644
> >> index 000..97b064c
> >> --- /dev/null
> >> +++ b/drivers/leds/trigger/ledtrig-usbport.c
> >> @@ -0,0 +1,206 @@
> >> +/*
> >> + * USB port LED trigger
> >> + *
> >> + * Copyright (C) 2016 Rafał Miłecki 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or (at
> >> + * your option) any later version.
> >> + */
> >
> > GPL v2 only.
> >
> >> +MODULE_AUTHOR("Rafał Miłecki ");
> >> +MODULE_DESCRIPTION("USB port trigger");
> >> +MODULE_LICENSE("GPL");
> >
> > GPL v2
> 
> What's the reason for this? I don't have any real preference, but I
> never heard heard about kernel/Linux preference neither.
> 

https://en.wikipedia.org/wiki/Linux_kernel

-- 

Best Regards,
Peter Chen


Re: [PATCH V2] leds: trigger: Introduce an USB port trigger

2016-07-17 Thread Peter Chen
On Mon, Jul 18, 2016 at 06:44:49AM +0200, Rafał Miłecki wrote:
> On 18 July 2016 at 04:31, Peter Chen  wrote:
> > On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote:
> >> +
> >> +usbport trigger:
> >> +- usb-ports : List of USB ports that usbport should observed for turning 
> >> on a
> >> +   given LED.
> >> +
> >
> > %s/should/should be
> 
> Thanks.
> 
> 
> >> diff --git a/drivers/leds/trigger/ledtrig-usbport.c 
> >> b/drivers/leds/trigger/ledtrig-usbport.c
> >> new file mode 100644
> >> index 000..97b064c
> >> --- /dev/null
> >> +++ b/drivers/leds/trigger/ledtrig-usbport.c
> >> @@ -0,0 +1,206 @@
> >> +/*
> >> + * USB port LED trigger
> >> + *
> >> + * Copyright (C) 2016 Rafał Miłecki 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or (at
> >> + * your option) any later version.
> >> + */
> >
> > GPL v2 only.
> >
> >> +MODULE_AUTHOR("Rafał Miłecki ");
> >> +MODULE_DESCRIPTION("USB port trigger");
> >> +MODULE_LICENSE("GPL");
> >
> > GPL v2
> 
> What's the reason for this? I don't have any real preference, but I
> never heard heard about kernel/Linux preference neither.
> 

https://en.wikipedia.org/wiki/Linux_kernel

-- 

Best Regards,
Peter Chen


Re: [PATCH 1/2] mem-hotplug: use GFP_HIGHUSER_MOVABLE in, alloc_migrate_target()

2016-07-17 Thread Joonsoo Kim
On Fri, Jul 15, 2016 at 10:47:06AM +0800, Xishi Qiu wrote:
> alloc_migrate_target() is called from migrate_pages(), and the page
> is always from user space, so we can add __GFP_HIGHMEM directly.

No, all migratable pages are not from user space. For example,
blockdev file cache has __GFP_MOVABLE and migratable but it has no
__GFP_HIGHMEM and __GFP_USER.

And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE.

Thanks.



Re: [PATCH 1/2] mem-hotplug: use GFP_HIGHUSER_MOVABLE in, alloc_migrate_target()

2016-07-17 Thread Joonsoo Kim
On Fri, Jul 15, 2016 at 10:47:06AM +0800, Xishi Qiu wrote:
> alloc_migrate_target() is called from migrate_pages(), and the page
> is always from user space, so we can add __GFP_HIGHMEM directly.

No, all migratable pages are not from user space. For example,
blockdev file cache has __GFP_MOVABLE and migratable but it has no
__GFP_HIGHMEM and __GFP_USER.

And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE.

Thanks.



Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive

2016-07-17 Thread Joonsoo Kim
On Mon, Jul 11, 2016 at 04:01:52PM -0700, David Rientjes wrote:
> On Thu, 30 Jun 2016, Joonsoo Kim wrote:
> 
> > We need to find a root cause of this problem, first.
> > 
> > I guess that this problem would happen when isolate_freepages_block()
> > early stop due to watermark check (if your patch is applied to your
> > kernel). If scanner meets, cached pfn will be reset and your patch
> > doesn't have any effect. So, I guess that scanner doesn't meet.
> > 
> 
> If the scanners meet, we should rely on deferred compaction to suppress 
> further attempts in the near future.  This is outside the scope of this 
> fix.
> 
> > We enter the compaction with enough free memory so stop in
> > isolate_freepages_block() should be unlikely event but your number
> > shows that it happens frequently?
> > 
> 
> It's not the only reason why freepages will be returned to the buddy 
> allocator: if locks become contended because we are spending too much time 
> compacting memory, we can persistently get free pages returned to the end 
> of the zone and then repeatedly iterate >100GB of memory on every call to 
> isolate_freepages(), which makes its own contended checks fire more often.  
> This patch is only an attempt to prevent lenghty iterations when we have 
> recently scanned the memory and found freepages to not be isolatable.

Hmm... I can't understand how freepage scanner is persistently
expensive. After freepage scanner get freepages, migration isn't
stopped until either migratable pages are empty or freepages are empty.

If there is no freepage, above problem doesn't happen so I assume that
there is no migratable pages after calling migrate_pages().

If there is no migratable pages, it means that freepages are used by
migration. Sometimes later, freepages in that pageblock are exhausted by
migration and freepage scanner will move the next pageblock. So, I
cannot understand how it is persistently expensive.

Am I missing something?

If it is caused by the fact that too many freepages are isolated at
once (up to migratable pages), we can modify logic to stop isolating
freepages when the pageblock is changed and freepage scanner has one
or more freepages.

> 
> > In addition, I worry that your previous patch that makes
> > isolate_freepages_block() stop when watermark doesn't meet would cause
> > compaction non-progress. Amount of free memory can be flutuated so
> > watermark fail would be temporaral. We need to break compaction in
> > this case? It would decrease compaction success rate if there is a
> > memory hogger in parallel. Any idea?
> > 
> 
> In my opinion, which I think is quite well known by now, the compaction 
> freeing scanner shouldn't be checking _any_ watermark.  The end result is 
> that we're migrating memory, not allocating additional memory; determining 
> if compaction should be done is best left lower on the stack.

Hmm...if there are many parallel compactors and we have no watermark check,
they consume all emergency memory. It can be mitigated by isolating
just one freepage in this case, but, potential risk would not
be disappeared.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive

2016-07-17 Thread Joonsoo Kim
On Mon, Jul 11, 2016 at 04:01:52PM -0700, David Rientjes wrote:
> On Thu, 30 Jun 2016, Joonsoo Kim wrote:
> 
> > We need to find a root cause of this problem, first.
> > 
> > I guess that this problem would happen when isolate_freepages_block()
> > early stop due to watermark check (if your patch is applied to your
> > kernel). If scanner meets, cached pfn will be reset and your patch
> > doesn't have any effect. So, I guess that scanner doesn't meet.
> > 
> 
> If the scanners meet, we should rely on deferred compaction to suppress 
> further attempts in the near future.  This is outside the scope of this 
> fix.
> 
> > We enter the compaction with enough free memory so stop in
> > isolate_freepages_block() should be unlikely event but your number
> > shows that it happens frequently?
> > 
> 
> It's not the only reason why freepages will be returned to the buddy 
> allocator: if locks become contended because we are spending too much time 
> compacting memory, we can persistently get free pages returned to the end 
> of the zone and then repeatedly iterate >100GB of memory on every call to 
> isolate_freepages(), which makes its own contended checks fire more often.  
> This patch is only an attempt to prevent lenghty iterations when we have 
> recently scanned the memory and found freepages to not be isolatable.

Hmm... I can't understand how freepage scanner is persistently
expensive. After freepage scanner get freepages, migration isn't
stopped until either migratable pages are empty or freepages are empty.

If there is no freepage, above problem doesn't happen so I assume that
there is no migratable pages after calling migrate_pages().

If there is no migratable pages, it means that freepages are used by
migration. Sometimes later, freepages in that pageblock are exhausted by
migration and freepage scanner will move the next pageblock. So, I
cannot understand how it is persistently expensive.

Am I missing something?

If it is caused by the fact that too many freepages are isolated at
once (up to migratable pages), we can modify logic to stop isolating
freepages when the pageblock is changed and freepage scanner has one
or more freepages.

> 
> > In addition, I worry that your previous patch that makes
> > isolate_freepages_block() stop when watermark doesn't meet would cause
> > compaction non-progress. Amount of free memory can be flutuated so
> > watermark fail would be temporaral. We need to break compaction in
> > this case? It would decrease compaction success rate if there is a
> > memory hogger in parallel. Any idea?
> > 
> 
> In my opinion, which I think is quite well known by now, the compaction 
> freeing scanner shouldn't be checking _any_ watermark.  The end result is 
> that we're migrating memory, not allocating additional memory; determining 
> if compaction should be done is best left lower on the stack.

Hmm...if there are many parallel compactors and we have no watermark check,
they consume all emergency memory. It can be mitigated by isolating
just one freepage in this case, but, potential risk would not
be disappeared.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH] dwc_eth_qos: Remove deprecated create_singlethread_workqueue

2016-07-17 Thread David Miller
From: Bhaktipriya Shridhar 
Date: Sat, 16 Jul 2016 13:53:28 +0530

> alloc_workqueue replaces deprecated create_singlethread_workqueue().
> 
> A dedicated workqueue has been used since the workitem viz
> lp->txtimeout_reinit is involved in reinitialization if a TX timeout
> occurs, which is necessary to guarantee forward progress in packet
> processing. As a network device can be used during memory reclaim, the
> workqueue needs forward progress guarantee under memory pressure.
> WQ_MEM_RECLAIM has been set to ensure this.
> 
> Since there is only a single work item, explicit concurrency limit is
> unnecessary here.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.


Re: [PATCH] dwc_eth_qos: Remove deprecated create_singlethread_workqueue

2016-07-17 Thread David Miller
From: Bhaktipriya Shridhar 
Date: Sat, 16 Jul 2016 13:53:28 +0530

> alloc_workqueue replaces deprecated create_singlethread_workqueue().
> 
> A dedicated workqueue has been used since the workitem viz
> lp->txtimeout_reinit is involved in reinitialization if a TX timeout
> occurs, which is necessary to guarantee forward progress in packet
> processing. As a network device can be used during memory reclaim, the
> workqueue needs forward progress guarantee under memory pressure.
> WQ_MEM_RECLAIM has been set to ensure this.
> 
> Since there is only a single work item, explicit concurrency limit is
> unnecessary here.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.


linux-next: manual merge of the rcu tree with the tip tree

2016-07-17 Thread Stephen Rothwell
Hi Paul,

Today's linux-next merge of the rcu tree got a conflict in:

  kernel/rcu/tree.c

between commit:

  4df8374254ea ("rcu: Convert rcutree to hotplug state machine")

from the tip tree and commit:

  2a84cde733b0 ("rcu: Exact CPU-online tracking for RCU")

from the rcu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/rcu/tree.c
index e5164deb51e1,5663d1e899d3..
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@@ -3812,54 -3809,34 +3809,80 @@@ int rcutree_prepare_cpu(unsigned int cp
  
for_each_rcu_flavor(rsp)
rcu_init_percpu_data(cpu, rsp);
 +
 +  rcu_prepare_kthreads(cpu);
 +  rcu_spawn_all_nocb_kthreads(cpu);
 +
 +  return 0;
 +}
 +
 +static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
 +{
 +  struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu);
 +
 +  rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
 +}
 +
 +int rcutree_online_cpu(unsigned int cpu)
 +{
 +  sync_sched_exp_online_cleanup(cpu);
 +  rcutree_affinity_setting(cpu, -1);
 +  return 0;
 +}
 +
 +int rcutree_offline_cpu(unsigned int cpu)
 +{
 +  rcutree_affinity_setting(cpu, cpu);
 +  return 0;
 +}
 +
 +
 +int rcutree_dying_cpu(unsigned int cpu)
 +{
 +  struct rcu_state *rsp;
 +
 +  for_each_rcu_flavor(rsp)
 +  rcu_cleanup_dying_cpu(rsp);
 +  return 0;
 +}
 +
 +int rcutree_dead_cpu(unsigned int cpu)
 +{
 +  struct rcu_state *rsp;
 +
 +  for_each_rcu_flavor(rsp) {
 +  rcu_cleanup_dead_cpu(cpu, rsp);
 +  do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
 +  }
 +  return 0;
  }
  
+ /*
+  * Mark the specified CPU as being online so that subsequent grace periods
+  * (both expedited and normal) will wait on it.  Note that this means that
+  * incoming CPUs are not allowed to use RCU read-side critical sections
+  * until this function is called.  Failing to observe this restriction
+  * will result in lockdep splats.
+  */
+ void rcu_cpu_starting(unsigned int cpu)
+ {
+   unsigned long flags;
+   unsigned long mask;
+   struct rcu_data *rdp;
+   struct rcu_node *rnp;
+   struct rcu_state *rsp;
+ 
+   for_each_rcu_flavor(rsp) {
+   rdp = this_cpu_ptr(rsp->rda);
+   rnp = rdp->mynode;
+   mask = rdp->grpmask;
+   raw_spin_lock_irqsave_rcu_node(rnp, flags);
+   rnp->qsmaskinitnext |= mask;
+   rnp->expmaskinitnext |= mask;
+   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+   }
+ }
+ 
  #ifdef CONFIG_HOTPLUG_CPU
  /*
   * The CPU is exiting the idle loop into the arch_cpu_idle_dead()
@@@ -4208,9 -4231,12 +4231,11 @@@ void __init rcu_init(void
 * this is called early in boot, before either interrupts
 * or the scheduler are operational.
 */
 -  cpu_notifier(rcu_cpu_notify, 0);
pm_notifier(rcu_pm_notify, 0);
-   for_each_online_cpu(cpu)
+   for_each_online_cpu(cpu) {
 -  rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
 +  rcutree_prepare_cpu(cpu);
+   rcu_cpu_starting(cpu);
+   }
  }
  
  #include "tree_exp.h"


linux-next: manual merge of the rcu tree with the tip tree

2016-07-17 Thread Stephen Rothwell
Hi Paul,

Today's linux-next merge of the rcu tree got a conflict in:

  kernel/rcu/tree.c

between commit:

  4df8374254ea ("rcu: Convert rcutree to hotplug state machine")

from the tip tree and commit:

  2a84cde733b0 ("rcu: Exact CPU-online tracking for RCU")

from the rcu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/rcu/tree.c
index e5164deb51e1,5663d1e899d3..
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@@ -3812,54 -3809,34 +3809,80 @@@ int rcutree_prepare_cpu(unsigned int cp
  
for_each_rcu_flavor(rsp)
rcu_init_percpu_data(cpu, rsp);
 +
 +  rcu_prepare_kthreads(cpu);
 +  rcu_spawn_all_nocb_kthreads(cpu);
 +
 +  return 0;
 +}
 +
 +static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
 +{
 +  struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu);
 +
 +  rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
 +}
 +
 +int rcutree_online_cpu(unsigned int cpu)
 +{
 +  sync_sched_exp_online_cleanup(cpu);
 +  rcutree_affinity_setting(cpu, -1);
 +  return 0;
 +}
 +
 +int rcutree_offline_cpu(unsigned int cpu)
 +{
 +  rcutree_affinity_setting(cpu, cpu);
 +  return 0;
 +}
 +
 +
 +int rcutree_dying_cpu(unsigned int cpu)
 +{
 +  struct rcu_state *rsp;
 +
 +  for_each_rcu_flavor(rsp)
 +  rcu_cleanup_dying_cpu(rsp);
 +  return 0;
 +}
 +
 +int rcutree_dead_cpu(unsigned int cpu)
 +{
 +  struct rcu_state *rsp;
 +
 +  for_each_rcu_flavor(rsp) {
 +  rcu_cleanup_dead_cpu(cpu, rsp);
 +  do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
 +  }
 +  return 0;
  }
  
+ /*
+  * Mark the specified CPU as being online so that subsequent grace periods
+  * (both expedited and normal) will wait on it.  Note that this means that
+  * incoming CPUs are not allowed to use RCU read-side critical sections
+  * until this function is called.  Failing to observe this restriction
+  * will result in lockdep splats.
+  */
+ void rcu_cpu_starting(unsigned int cpu)
+ {
+   unsigned long flags;
+   unsigned long mask;
+   struct rcu_data *rdp;
+   struct rcu_node *rnp;
+   struct rcu_state *rsp;
+ 
+   for_each_rcu_flavor(rsp) {
+   rdp = this_cpu_ptr(rsp->rda);
+   rnp = rdp->mynode;
+   mask = rdp->grpmask;
+   raw_spin_lock_irqsave_rcu_node(rnp, flags);
+   rnp->qsmaskinitnext |= mask;
+   rnp->expmaskinitnext |= mask;
+   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+   }
+ }
+ 
  #ifdef CONFIG_HOTPLUG_CPU
  /*
   * The CPU is exiting the idle loop into the arch_cpu_idle_dead()
@@@ -4208,9 -4231,12 +4231,11 @@@ void __init rcu_init(void
 * this is called early in boot, before either interrupts
 * or the scheduler are operational.
 */
 -  cpu_notifier(rcu_cpu_notify, 0);
pm_notifier(rcu_pm_notify, 0);
-   for_each_online_cpu(cpu)
+   for_each_online_cpu(cpu) {
 -  rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
 +  rcutree_prepare_cpu(cpu);
+   rcu_cpu_starting(cpu);
+   }
  }
  
  #include "tree_exp.h"


Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg

2016-07-17 Thread Dave Young
On 07/18/16 at 06:44am, Borislav Petkov wrote:
> On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote:
> > I would say avoiding ratelimit during boot make no much sense. Userspace 
> > can not
> > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
> > has not run yet.
> 
> You're right - kernel_init() sets SYSTEM_RUNNING before running the init
> process. I probably should kill all that logic in the second patch.
> 
> > I means to set printk.devkmsg=off by default, userspace can set it to
> > on by sysctl.
> 
> That can't happen: DEVKMSG_LOG_MASK_LOCK.

Sorry, seems I do not get your point, suppose using the bis defined in your
patch, shouldn't below work?

#define DEVKMSG_LOG_MASK_DEFAULT2

Thanks
Dave


Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg

2016-07-17 Thread Dave Young
On 07/18/16 at 06:44am, Borislav Petkov wrote:
> On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote:
> > I would say avoiding ratelimit during boot make no much sense. Userspace 
> > can not
> > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
> > has not run yet.
> 
> You're right - kernel_init() sets SYSTEM_RUNNING before running the init
> process. I probably should kill all that logic in the second patch.
> 
> > I means to set printk.devkmsg=off by default, userspace can set it to
> > on by sysctl.
> 
> That can't happen: DEVKMSG_LOG_MASK_LOCK.

Sorry, seems I do not get your point, suppose using the bis defined in your
patch, shouldn't below work?

#define DEVKMSG_LOG_MASK_DEFAULT2

Thanks
Dave


Re: [PATCH/RFC] Re: linux-next: build failure after merge of the luto-misc tree

2016-07-17 Thread Stephen Rothwell
Hi Arnaldo,

On Fri, 15 Jul 2016 12:43:26 -0300 Arnaldo Carvalho de Melo  
wrote:
>
> Ok, same results, it works, queuing this one, ack? Stephen, does it work
> for you?

Sorry, no.  See my other email.

I am cross building (if that makes a difference).

-- 
Cheers,
Stephen Rothwell


Re: [PATCH/RFC] Re: linux-next: build failure after merge of the luto-misc tree

2016-07-17 Thread Stephen Rothwell
Hi Arnaldo,

On Fri, 15 Jul 2016 12:43:26 -0300 Arnaldo Carvalho de Melo  
wrote:
>
> Ok, same results, it works, queuing this one, ack? Stephen, does it work
> for you?

Sorry, no.  See my other email.

I am cross building (if that makes a difference).

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the tip tree

2016-07-17 Thread Stephen Rothwell
Hi all,

After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from elf.h:23,
 from builtin-check.c:33:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/string.h:5,
 from ../lib/str_error_r.c:4:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
cat: /home/sfr/next/x86_64_allmodconfig/tools/objtool/.str_error_r.o.d: No such 
file or directory
Build:17: recipe for target 
'/home/sfr/next/x86_64_allmodconfig/tools/objtool/str_error_r.o' failed
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from elf.h:23,
 from special.h:22, 
 from special.c:26: 
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/string.h:5,
 from ../lib/string.c:18:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from elf.h:23,
 from elf.c:30:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from arch/x86/../../elf.h:23,
 from arch/x86/decode.c:26:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
Makefile:42: recipe for target 
'/home/sfr/next/x86_64_allmodconfig/tools/objtool/objtool-in.o' failed
Makefile:60: recipe for target 'objtool' failed

I have added this patch for today:

From: Stephen Rothwell 
Date: Mon, 18 Jul 2016 14:58:39 +1000
Subject: [PATCH] tools: Simplify __BITS_PER_LONG define

Signed-off-by: Stephen Rothwell 
---
 tools/include/asm-generic/bitsperlong.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/include/asm-generic/bitsperlong.h 
b/tools/include/asm-generic/bitsperlong.h
index 45eca517efb3..f46853474fd3 100644
--- a/tools/include/asm-generic/bitsperlong.h
+++ b/tools/include/asm-generic/bitsperlong.h
@@ -10,7 +10,8 @@
 #endif
 
 #if BITS_PER_LONG != __BITS_PER_LONG
-#error Inconsistent word size. Check asm/bitsperlong.h
+#undef __BITS_PER_LONG
+#define __BITS_PER_LONGBITS_PER_LONG
 #endif
 
 #ifndef BITS_PER_LONG_LONG
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the tip tree

2016-07-17 Thread Stephen Rothwell
Hi all,

After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from elf.h:23,
 from builtin-check.c:33:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/string.h:5,
 from ../lib/str_error_r.c:4:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
cat: /home/sfr/next/x86_64_allmodconfig/tools/objtool/.str_error_r.o.d: No such 
file or directory
Build:17: recipe for target 
'/home/sfr/next/x86_64_allmodconfig/tools/objtool/str_error_r.o' failed
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from elf.h:23,
 from special.h:22, 
 from special.c:26: 
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/string.h:5,
 from ../lib/string.c:18:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from elf.h:23,
 from elf.c:30:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
In file included from tools/arch/x86/include/uapi/asm/bitsperlong.h:10:0,
 from /usr/include/asm-generic/int-ll64.h:11,
 from /usr/include/powerpc64le-linux-gnu/asm/types.h:27,
 from tools/include/linux/types.h:9,
 from tools/include/linux/list.h:4,
 from arch/x86/../../elf.h:23,
 from arch/x86/decode.c:26:
tools/include/asm-generic/bitsperlong.h:13:2: error: #error Inconsistent word 
size. Check asm/bitsperlong.h
 #error Inconsistent word size. Check asm/bitsperlong.h
  ^
Makefile:42: recipe for target 
'/home/sfr/next/x86_64_allmodconfig/tools/objtool/objtool-in.o' failed
Makefile:60: recipe for target 'objtool' failed

I have added this patch for today:

From: Stephen Rothwell 
Date: Mon, 18 Jul 2016 14:58:39 +1000
Subject: [PATCH] tools: Simplify __BITS_PER_LONG define

Signed-off-by: Stephen Rothwell 
---
 tools/include/asm-generic/bitsperlong.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/include/asm-generic/bitsperlong.h 
b/tools/include/asm-generic/bitsperlong.h
index 45eca517efb3..f46853474fd3 100644
--- a/tools/include/asm-generic/bitsperlong.h
+++ b/tools/include/asm-generic/bitsperlong.h
@@ -10,7 +10,8 @@
 #endif
 
 #if BITS_PER_LONG != __BITS_PER_LONG
-#error Inconsistent word size. Check asm/bitsperlong.h
+#undef __BITS_PER_LONG
+#define __BITS_PER_LONGBITS_PER_LONG
 #endif
 
 #ifndef BITS_PER_LONG_LONG
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the kspp tree with the arm64 tree

2016-07-17 Thread Kees Cook
On Sun, Jul 17, 2016 at 10:06 PM, Stephen Rothwell  
wrote:
> Hi Kees,
>
> On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook  wrote:
>>
>> If I'm reading correctly, this second fixup is wrong. It should read;
>>
>> kasan_check_read(from, n);
>> check_object_size(from, n, true);
>> return __arch_copy_to_user(to, from, n);
>>
>> (i.e. fix double space between "return" and "__arch_copy..." in both
>> chunks and add check_object_size() calls after the kasan calls in both
>> chunks.
>
> Yep, sorry.  I will fix it up tomorrow.

Cool, thanks! :)

-Kees

-- 
Kees Cook
Brillo & Chrome OS Security


Re: linux-next: manual merge of the kspp tree with the arm64 tree

2016-07-17 Thread Kees Cook
On Sun, Jul 17, 2016 at 10:06 PM, Stephen Rothwell  
wrote:
> Hi Kees,
>
> On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook  wrote:
>>
>> If I'm reading correctly, this second fixup is wrong. It should read;
>>
>> kasan_check_read(from, n);
>> check_object_size(from, n, true);
>> return __arch_copy_to_user(to, from, n);
>>
>> (i.e. fix double space between "return" and "__arch_copy..." in both
>> chunks and add check_object_size() calls after the kasan calls in both
>> chunks.
>
> Yep, sorry.  I will fix it up tomorrow.

Cool, thanks! :)

-Kees

-- 
Kees Cook
Brillo & Chrome OS Security


Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-07-17 Thread Kees Cook
On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim  wrote:
> The virtio pstore driver provides interface to the pstore subsystem so
> that the guest kernel's log/dump message can be saved on the host
> machine.  Users can access the log file directly on the host, or on the
> guest at the next boot using pstore filesystem.  It currently deals with
> kernel log (printk) buffer only, but we can extend it to have other
> information (like ftrace dump) later.
>
> It supports legacy PCI device using single order-2 page buffer.  As all
> operation of pstore is synchronous, it would be fine IMHO.  However I
> don't know how to make write operation synchronous since it's called
> with a spinlock held (from any context including NMI).
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: "Michael S. Tsirkin" 
> Cc: Anthony Liguori 
> Cc: Anton Vorontsov 
> Cc: Colin Cross 
> Cc: Kees Cook 
> Cc: Tony Luck 
> Cc: Steven Rostedt 
> Cc: Ingo Molnar 
> Cc: Minchan Kim 
> Cc: k...@vger.kernel.org
> Cc: qemu-de...@nongnu.org
> Cc: virtualizat...@lists.linux-foundation.org
> Signed-off-by: Namhyung Kim 

This looks great to me! I'd love to use this in qemu. (Right now I go
through hoops to use the ramoops backend for testing.)

Reviewed-by: Kees Cook 

Notes below...

> ---
>  drivers/virtio/Kconfig |  10 ++
>  drivers/virtio/Makefile|   1 +
>  drivers/virtio/virtio_pstore.c | 317 
> +
>  include/uapi/linux/Kbuild  |   1 +
>  include/uapi/linux/virtio_ids.h|   1 +
>  include/uapi/linux/virtio_pstore.h |  53 +++
>  6 files changed, 383 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pstore.c
>  create mode 100644 include/uapi/linux/virtio_pstore.h
>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 77590320d44c..8f0e6c796c12 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -58,6 +58,16 @@ config VIRTIO_INPUT
>
>  If unsure, say M.
>
> +config VIRTIO_PSTORE
> +   tristate "Virtio pstore driver"
> +   depends on VIRTIO
> +   depends on PSTORE
> +   ---help---
> +This driver supports virtio pstore devices to save/restore
> +panic and oops messages on the host.
> +
> +If unsure, say M.
> +
>   config VIRTIO_MMIO
> tristate "Platform bus driver for memory mapped virtio devices"
> depends on HAS_IOMEM && HAS_DMA
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 41e30e3dc842..bee68cb26d48 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
> diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
> new file mode 100644
> index ..6fe62c0f1508
> --- /dev/null
> +++ b/drivers/virtio/virtio_pstore.c
> @@ -0,0 +1,317 @@
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define VIRT_PSTORE_ORDER2
> +#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
> +
> +struct virtio_pstore {
> +   struct virtio_device*vdev;
> +   struct virtqueue*vq;
> +   struct pstore_info   pstore;
> +   struct virtio_pstore_hdr hdr;
> +   size_t   buflen;
> +   u64  id;
> +
> +   /* Waiting for host to ack */
> +   wait_queue_head_t   acked;
> +};
> +
> +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id 
> type)
> +{
> +   u16 ret;
> +
> +   switch (type) {
> +   case PSTORE_TYPE_DMESG:
> +   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
> +   break;
> +   default:
> +   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
> +   break;
> +   }

I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
relatively easy to add: I think it'd just be another virtio command?

> +
> +   return ret;
> +}
> +
> +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 
> type)
> +{
> +   enum pstore_type_id ret;
> +
> +   switch (virtio16_to_cpu(vps->vdev, type)) {
> +   case VIRTIO_PSTORE_TYPE_DMESG:
> +   ret = PSTORE_TYPE_DMESG;
> +   break;
> +   default:
> +   ret = PSTORE_TYPE_UNKNOWN;
> +   break;
> +   }
> +
> +   return ret;
> 

Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-07-17 Thread Kees Cook
On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim  wrote:
> The virtio pstore driver provides interface to the pstore subsystem so
> that the guest kernel's log/dump message can be saved on the host
> machine.  Users can access the log file directly on the host, or on the
> guest at the next boot using pstore filesystem.  It currently deals with
> kernel log (printk) buffer only, but we can extend it to have other
> information (like ftrace dump) later.
>
> It supports legacy PCI device using single order-2 page buffer.  As all
> operation of pstore is synchronous, it would be fine IMHO.  However I
> don't know how to make write operation synchronous since it's called
> with a spinlock held (from any context including NMI).
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: "Michael S. Tsirkin" 
> Cc: Anthony Liguori 
> Cc: Anton Vorontsov 
> Cc: Colin Cross 
> Cc: Kees Cook 
> Cc: Tony Luck 
> Cc: Steven Rostedt 
> Cc: Ingo Molnar 
> Cc: Minchan Kim 
> Cc: k...@vger.kernel.org
> Cc: qemu-de...@nongnu.org
> Cc: virtualizat...@lists.linux-foundation.org
> Signed-off-by: Namhyung Kim 

This looks great to me! I'd love to use this in qemu. (Right now I go
through hoops to use the ramoops backend for testing.)

Reviewed-by: Kees Cook 

Notes below...

> ---
>  drivers/virtio/Kconfig |  10 ++
>  drivers/virtio/Makefile|   1 +
>  drivers/virtio/virtio_pstore.c | 317 
> +
>  include/uapi/linux/Kbuild  |   1 +
>  include/uapi/linux/virtio_ids.h|   1 +
>  include/uapi/linux/virtio_pstore.h |  53 +++
>  6 files changed, 383 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pstore.c
>  create mode 100644 include/uapi/linux/virtio_pstore.h
>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 77590320d44c..8f0e6c796c12 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -58,6 +58,16 @@ config VIRTIO_INPUT
>
>  If unsure, say M.
>
> +config VIRTIO_PSTORE
> +   tristate "Virtio pstore driver"
> +   depends on VIRTIO
> +   depends on PSTORE
> +   ---help---
> +This driver supports virtio pstore devices to save/restore
> +panic and oops messages on the host.
> +
> +If unsure, say M.
> +
>   config VIRTIO_MMIO
> tristate "Platform bus driver for memory mapped virtio devices"
> depends on HAS_IOMEM && HAS_DMA
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 41e30e3dc842..bee68cb26d48 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
> diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
> new file mode 100644
> index ..6fe62c0f1508
> --- /dev/null
> +++ b/drivers/virtio/virtio_pstore.c
> @@ -0,0 +1,317 @@
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define VIRT_PSTORE_ORDER2
> +#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
> +
> +struct virtio_pstore {
> +   struct virtio_device*vdev;
> +   struct virtqueue*vq;
> +   struct pstore_info   pstore;
> +   struct virtio_pstore_hdr hdr;
> +   size_t   buflen;
> +   u64  id;
> +
> +   /* Waiting for host to ack */
> +   wait_queue_head_t   acked;
> +};
> +
> +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id 
> type)
> +{
> +   u16 ret;
> +
> +   switch (type) {
> +   case PSTORE_TYPE_DMESG:
> +   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
> +   break;
> +   default:
> +   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
> +   break;
> +   }

I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
relatively easy to add: I think it'd just be another virtio command?

> +
> +   return ret;
> +}
> +
> +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 
> type)
> +{
> +   enum pstore_type_id ret;
> +
> +   switch (virtio16_to_cpu(vps->vdev, type)) {
> +   case VIRTIO_PSTORE_TYPE_DMESG:
> +   ret = PSTORE_TYPE_DMESG;
> +   break;
> +   default:
> +   ret = PSTORE_TYPE_UNKNOWN;
> +   break;
> +   }
> +
> +   return ret;
> +}
> +
> +static void virtpstore_ack(struct virtqueue *vq)
> +{
> +   struct virtio_pstore *vps = vq->vdev->priv;
> +
> +   wake_up(>acked);
> +}
> +
> +static int virt_pstore_open(struct pstore_info *psi)
> +{
> +   struct virtio_pstore *vps = psi->data;
> +   struct 

Re: [PATCH 2/3] xen-scsiback: One function call less in scsiback_device_action() after error detection

2016-07-17 Thread Juergen Gross
On 16/07/16 22:23, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 16 Jul 2016 21:42:42 +0200
> 
> The kfree() function was called in one case by the
> scsiback_device_action() function during error handling
> even if the passed variable "tmr" contained a null pointer.
> 
> Adjust jump targets according to the Linux coding style convention.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/xen/xen-scsiback.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
> index 4a48c06..7612bc9 100644
> --- a/drivers/xen/xen-scsiback.c
> +++ b/drivers/xen/xen-scsiback.c
> @@ -606,7 +606,7 @@ static void scsiback_device_action(struct vscsibk_pend 
> *pending_req,
>   tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL);
>   if (!tmr) {
>   target_put_sess_cmd(se_cmd);
> - goto err;
> + goto do_resp;
>   }

Hmm, I'm not convinced this is an improvement.

I'd rather rename the new error label to "put_cmd" and get rid of the
braces in above if statement:

-   if (!tmr) {
-   target_put_sess_cmd(se_cmd);
-   goto err;
-   }
+   if (!tmr)
+   goto put_cmd;

and then in the error path:

-err:
+put_cmd:
+   target_put_sess_cmd(se_cmd);
+free_tmr:
kfree(tmr);


Juergen

>  
>   init_waitqueue_head(>tmr_wait);
> @@ -616,7 +616,7 @@ static void scsiback_device_action(struct vscsibk_pend 
> *pending_req,
>  unpacked_lun, tmr, act, GFP_KERNEL,
>  tag, TARGET_SCF_ACK_KREF);
>   if (rc)
> - goto err;
> + goto free_tmr;
>  
>   wait_event(tmr->tmr_wait, atomic_read(>tmr_complete));
>  
> @@ -626,8 +626,9 @@ static void scsiback_device_action(struct vscsibk_pend 
> *pending_req,
>   scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
>   transport_generic_free_cmd(_req->se_cmd, 1);
>   return;
> -err:
> +free_tmr:
>   kfree(tmr);
> +do_resp:
>   scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
>  }
>  
> 



Re: [PATCH 2/3] xen-scsiback: One function call less in scsiback_device_action() after error detection

2016-07-17 Thread Juergen Gross
On 16/07/16 22:23, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 16 Jul 2016 21:42:42 +0200
> 
> The kfree() function was called in one case by the
> scsiback_device_action() function during error handling
> even if the passed variable "tmr" contained a null pointer.
> 
> Adjust jump targets according to the Linux coding style convention.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/xen/xen-scsiback.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
> index 4a48c06..7612bc9 100644
> --- a/drivers/xen/xen-scsiback.c
> +++ b/drivers/xen/xen-scsiback.c
> @@ -606,7 +606,7 @@ static void scsiback_device_action(struct vscsibk_pend 
> *pending_req,
>   tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL);
>   if (!tmr) {
>   target_put_sess_cmd(se_cmd);
> - goto err;
> + goto do_resp;
>   }

Hmm, I'm not convinced this is an improvement.

I'd rather rename the new error label to "put_cmd" and get rid of the
braces in above if statement:

-   if (!tmr) {
-   target_put_sess_cmd(se_cmd);
-   goto err;
-   }
+   if (!tmr)
+   goto put_cmd;

and then in the error path:

-err:
+put_cmd:
+   target_put_sess_cmd(se_cmd);
+free_tmr:
kfree(tmr);


Juergen

>  
>   init_waitqueue_head(>tmr_wait);
> @@ -616,7 +616,7 @@ static void scsiback_device_action(struct vscsibk_pend 
> *pending_req,
>  unpacked_lun, tmr, act, GFP_KERNEL,
>  tag, TARGET_SCF_ACK_KREF);
>   if (rc)
> - goto err;
> + goto free_tmr;
>  
>   wait_event(tmr->tmr_wait, atomic_read(>tmr_complete));
>  
> @@ -626,8 +626,9 @@ static void scsiback_device_action(struct vscsibk_pend 
> *pending_req,
>   scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
>   transport_generic_free_cmd(_req->se_cmd, 1);
>   return;
> -err:
> +free_tmr:
>   kfree(tmr);
> +do_resp:
>   scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
>  }
>  
> 



Re: linux-next: manual merge of the kspp tree with the arm64 tree

2016-07-17 Thread Stephen Rothwell
Hi Kees,

On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook  wrote:
>
> If I'm reading correctly, this second fixup is wrong. It should read;
> 
> kasan_check_read(from, n);
> check_object_size(from, n, true);
> return __arch_copy_to_user(to, from, n);
> 
> (i.e. fix double space between "return" and "__arch_copy..." in both
> chunks and add check_object_size() calls after the kasan calls in both
> chunks.

Yep, sorry.  I will fix it up tomorrow.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the kspp tree with the arm64 tree

2016-07-17 Thread Stephen Rothwell
Hi Kees,

On Sun, 17 Jul 2016 21:49:40 -0700 Kees Cook  wrote:
>
> If I'm reading correctly, this second fixup is wrong. It should read;
> 
> kasan_check_read(from, n);
> check_object_size(from, n, true);
> return __arch_copy_to_user(to, from, n);
> 
> (i.e. fix double space between "return" and "__arch_copy..." in both
> chunks and add check_object_size() calls after the kasan calls in both
> chunks.

Yep, sorry.  I will fix it up tomorrow.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH 3/3] xen-scsiback: Pass a failure indication as a constant

2016-07-17 Thread Juergen Gross
On 16/07/16 22:24, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 16 Jul 2016 21:55:01 +0200
> 
> Pass the constant "FAILED" in a function call directly instead of
> using an intialisation for a local variable.
> 
> Signed-off-by: Markus Elfring 

Reviewed-by: Juergen Gross 


Juergen


Re: [PATCH 3/3] xen-scsiback: Pass a failure indication as a constant

2016-07-17 Thread Juergen Gross
On 16/07/16 22:24, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 16 Jul 2016 21:55:01 +0200
> 
> Pass the constant "FAILED" in a function call directly instead of
> using an intialisation for a local variable.
> 
> Signed-off-by: Markus Elfring 

Reviewed-by: Juergen Gross 


Juergen


Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps

2016-07-17 Thread Joonsoo Kim
On Thu, Jul 14, 2016 at 10:32:09AM +0200, Vlastimil Babka wrote:
> On 07/14/2016 07:23 AM, Joonsoo Kim wrote:
> >On Fri, Jul 08, 2016 at 11:11:47AM +0100, Mel Gorman wrote:
> >>On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote:
> >>
> >>It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU
> >>for the whole node that may or may not have lower zone pages at the end
> >>of the LRU. If it does, then the allocation request will be satisfied.
> >>If it does not, then kswapd will think the node is balanced and get
> >>rewoken to do a zone-constrained reclaim pass.
> >
> >If zone-constrained request could go direct reclaim pass, there would
> >be no problem. But, please assume that request is zone-constrained
> >without __GFP_DIRECT_RECLAIM which is common for some device driver
> >implementation. And, please assume one more thing that this request
> >always comes with zone-unconstrained allocation request. In this case,
> >your max() logic will set kswapd_classzone_idx to highest zone index
> >and re-worken kswapd would not balance for low zone again. In the end,
> >zone-constrained allocation request without __GFP_DIRECT_RECLAIM could
> >fail.
> 
> I don't think there's a problem in the scenario? Kswapd will keep
> being woken up and reclaim from the node lru. It will hit and free
> any low zone pages that are on the lru, even though it doesn't
> "balance for low zone". Eventually it will either satisfy the
> constrained allocation by reclaiming those low-zone pages during the
> repeated wakeups, or the low-zone wakeups will stop coming together
> with higher-zone wakeups and then it will reclaim the low-zone pages
> in a single low-zone wakeup. If the zone-constrained request is not

Yes, probability of this would be low.

> allowed to fail, then it will just keep waking up kswapd and waiting
> for the progress. If it's allowed to fail (i.e. not __GFP_NOFAIL),
> but not allowed to direct reclaim, it goes "goto nopage" rather
> quickly in __alloc_pages_slowpath(), without any waiting for
> kswapd's progress, so there's not really much difference whether the
> kswapd wakeup picked up a low classzone or not. Note the

Hmm... Even if allocation could fail, we should do our best to prevent
failure. Relying on luck isn't good idea to me.

Thanks.

> __GFP_NOFAIL but ~__GFP_DIRECT_RECLAIM is a WARN_ON_ONCE() scenario,
> so definitely not common...
> 
> >Thanks.
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps

2016-07-17 Thread Joonsoo Kim
On Thu, Jul 14, 2016 at 10:32:09AM +0200, Vlastimil Babka wrote:
> On 07/14/2016 07:23 AM, Joonsoo Kim wrote:
> >On Fri, Jul 08, 2016 at 11:11:47AM +0100, Mel Gorman wrote:
> >>On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote:
> >>
> >>It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU
> >>for the whole node that may or may not have lower zone pages at the end
> >>of the LRU. If it does, then the allocation request will be satisfied.
> >>If it does not, then kswapd will think the node is balanced and get
> >>rewoken to do a zone-constrained reclaim pass.
> >
> >If zone-constrained request could go direct reclaim pass, there would
> >be no problem. But, please assume that request is zone-constrained
> >without __GFP_DIRECT_RECLAIM which is common for some device driver
> >implementation. And, please assume one more thing that this request
> >always comes with zone-unconstrained allocation request. In this case,
> >your max() logic will set kswapd_classzone_idx to highest zone index
> >and re-worken kswapd would not balance for low zone again. In the end,
> >zone-constrained allocation request without __GFP_DIRECT_RECLAIM could
> >fail.
> 
> I don't think there's a problem in the scenario? Kswapd will keep
> being woken up and reclaim from the node lru. It will hit and free
> any low zone pages that are on the lru, even though it doesn't
> "balance for low zone". Eventually it will either satisfy the
> constrained allocation by reclaiming those low-zone pages during the
> repeated wakeups, or the low-zone wakeups will stop coming together
> with higher-zone wakeups and then it will reclaim the low-zone pages
> in a single low-zone wakeup. If the zone-constrained request is not

Yes, probability of this would be low.

> allowed to fail, then it will just keep waking up kswapd and waiting
> for the progress. If it's allowed to fail (i.e. not __GFP_NOFAIL),
> but not allowed to direct reclaim, it goes "goto nopage" rather
> quickly in __alloc_pages_slowpath(), without any waiting for
> kswapd's progress, so there's not really much difference whether the
> kswapd wakeup picked up a low classzone or not. Note the

Hmm... Even if allocation could fail, we should do our best to prevent
failure. Relying on luck isn't good idea to me.

Thanks.

> __GFP_NOFAIL but ~__GFP_DIRECT_RECLAIM is a WARN_ON_ONCE() scenario,
> so definitely not common...
> 
> >Thanks.
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH 1/3] xen-scsiback: Delete an unnecessary check before the function call "kfree"

2016-07-17 Thread Juergen Gross
On 16/07/16 22:22, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 16 Jul 2016 21:21:05 +0200
> 
> The kfree() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Reviewed-by: Juergen Gross 


Juergen


Re: [PATCH 1/3] xen-scsiback: Delete an unnecessary check before the function call "kfree"

2016-07-17 Thread Juergen Gross
On 16/07/16 22:22, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sat, 16 Jul 2016 21:21:05 +0200
> 
> The kfree() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Reviewed-by: Juergen Gross 


Juergen


[BUG] kernel BUG at arch/x86/mm/pageattr.c:216!

2016-07-17 Thread Xie XiuQi
Hi all,

I'm getting BUG_ON occurred in a panic at arch/x86/mm/pageattr.c:216! on 
3.10.0-327.el7 (RHEL 7.2)
I want to do a test, to expect system will reboot immediately after panic.
But, in drm_fb_helper_panic, may trigger a BUG_ON at arch/x86/mm/pageattr.c:216!

Does anyone has good idea to fix it?

The code is like bellow:
 210 static void cpa_flush_array(unsigned long *start, int numpages, int cache,
 211 int in_flags, struct page **pages)
 212 {
 213 unsigned int i, level;
 214 unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M 
threshold */
 215
 216 BUG_ON(irqs_disabled());
 217
 218 on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
 219
 220 if (!cache || do_wbinvd)
 221 return;
 222
 223 /*
 224  * We only need to flush on one CPU,
 225  * clflush is a MESI-coherent instruction that
 226  * will cause all other CPUs to flush the same
 227  * cachelines:
 228  */
 229 for (i = 0; i < numpages; i++) {
 230 unsigned long addr;
 231 pte_t *pte;
 232
 233 if (in_flags & CPA_PAGES_ARRAY)
 234 addr = (unsigned long)page_address(pages[i]);
 235 else
 236 addr = start[i];
 237
 238 pte = lookup_address(addr, );
 239
 240 /*
 241  * Only flush present addresses:
 242  */
 243 if (pte && (pte_val(*pte) & _PAGE_PRESENT))
 244 clflush_cache_range((void *)addr, PAGE_SIZE);
 245 }
 246 }


--- crash messages ---
[ 1336.567485] test_module: call panic() function in process context 3 times.
[ 1336.567542] Kernel panic - not syncing: call panic() function in process 
context.

[ 1336.567607] CPU: 0 PID: 9566 Comm: bash Tainted: G   OE  
V---   3.10.0-327.el7.x86_64
[ 1336.567699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 1336.567789]  8116f900 035a0a10 88007adc7e00 
81638844
[ 1336.567848]  88007adc7e80 81632097 0008 
88007adc7e90
[ 1336.567943]  88007adc7e30 035a0a10 8008 
88007ec0d6c8
[ 1336.567992] Call Trace:
[ 1336.567992]  [] ? clear_zonelist_oom+0xa0/0xa0
[ 1336.567992]  [] dump_stack+0x19/0x1b
[ 1336.567992]  [] panic+0xd8/0x20f
[ 1336.567992]  [] ? clear_zonelist_oom+0xa0/0xa0
[ 1336.567992]  [] dev_wr_actions+0x6d9/0xf60 [test_module]
[ 1336.567992]  [] dev_wr_handler+0xa6/0x120 [test_module]
[ 1336.567992]  [] vfs_write+0xbd/0x1e0
[ 1336.567992]  [] ? trace_do_page_fault+0x43/0x110
[ 1336.567992]  [] SyS_write+0x7f/0xe0
[ 1336.567992]  [] system_call_fastpath+0x16/0x1b
[ 1336.567992] drm_kms_helper: panic occurred, switching back to text console
[ 1336.567992] [ cut here ]
[ 1336.567992] kernel BUG at arch/x86/mm/pageattr.c:216!
[ 1336.567992] invalid opcode:  [#1] SMP
[ 1336.567992] Modules linked in: test_module(O) ip6t_rpfilter ip6t_REJECT 
ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter 
ip_tables signo_catch(OV) cirrus ppdev parport_pc parport syscopyarea 
sysfillrect sysimgblt ttm drm_kms_helper drm serio_raw virtio_balloon i2c_piix4 
i2c_core pcspkr xfs libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_common 
ata_generic pata_acpi virtio_console virtio_scsi ata_piix virtio_pci e1000 
libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
[ 1336.567992] CPU: 0 PID: 9566 Comm: bash Tainted: G   O   
  3.10.0-327.el7.x86_64
[ 1336.567992] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 1336.567992] task: 88007afef300 ti: 88007adc4000 task.ti: 
88007adc4000
[ 1336.567992] RIP: 0010:[]  [] 
change_page_attr_set_clr+0x4c8/0x4d0
[ 1336.567992] RSP: 0018:88007adc7538  EFLAGS: 00010046
[ 1336.567992] RAX: 0046 RBX:  RCX: 0004
[ 1336.567992] RDX: 2200 RSI:  RDI: 8000
[ 1336.567992] RBP: 88007adc75d0 R08: 0010 R09: 8800
[ 1336.567992] R10: 3688 R11: 811a738f R12: 0010
[ 1336.567992] R13:  R14: 0200 R15: 0005
[ 1336.567992] FS:  7fee378b1740() GS:88007ec0() 
knlGS:
[ 1336.567992] CS:  0010 DS:  ES:  CR0: 80050033
[ 1336.567992] CR2: 

Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps

2016-07-17 Thread Joonsoo Kim
On Thu, Jul 14, 2016 at 10:05:00AM +0100, Mel Gorman wrote:
> On Thu, Jul 14, 2016 at 02:23:32PM +0900, Joonsoo Kim wrote:
> > > 
> > > > > > And, I'd like to know why max() is used for classzone_idx rather 
> > > > > > than
> > > > > > min()? I think that kswapd should balance the lowest zone requested.
> > > > > > 
> > > > > 
> > > > > If there are two allocation requests -- one zone-constraned and the 
> > > > > other
> > > > > zone-unconstrained, it does not make sense to have kswapd skip the 
> > > > > pages
> > > > > usable for the zone-unconstrained and waste a load of CPU. You could
> > > > 
> > > > I agree that, in this case, it's not good to skip the pages usable
> > > > for the zone-unconstrained request. But, what I am concerned is that
> > > > kswapd stop reclaim prematurely in the view of zone-constrained
> > > > requestor.
> > > 
> > > It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU
> > > for the whole node that may or may not have lower zone pages at the end
> > > of the LRU. If it does, then the allocation request will be satisfied.
> > > If it does not, then kswapd will think the node is balanced and get
> > > rewoken to do a zone-constrained reclaim pass.
> > 
> > If zone-constrained request could go direct reclaim pass, there would
> > be no problem. But, please assume that request is zone-constrained
> > without __GFP_DIRECT_RECLAIM which is common for some device driver
> > implementation.
> 
> Then it's likely GFP_ATOMIC and it'll wake kswapd on each failure. If
> kswapd is containtly awake for highmem requests then we're reclaiming
> everything anyway.  Remember that if kswapd is reclaiming for higher zones,
> it'll still cover the lower zones eventually. There is no guarantee that
> skipping the highmem pages will satisfy the atomic allocations any faster
> but consuming the CPU to skip the pages is a definite cost.

Okay.

> 
> Even worse, skipping highmem pages when a highmem pages are required may
> ake lowmem pressure worse because those pages are freed faster and can
> be consumed by zone-unconstrained requests.

Okay.

> 
> If this really is a problem in practice then we can consider having
> allocation requests that are zone-constrained and !__GFP_DIRECT_RECLAIM
> set a flag and use the min classzone for the wakeup. That flag remains
> set until kswapd takes at least one pass using the lower classzone and
> clears it. The classzone will not be adjusted higher until that flag is

It would work.

> cleared. I don't think we should do it without evidence that it's a real
> problem because kswapd potentially uses useless CPU and the potential for
> higher lowmem pressure.

Hmmm... I think differently. Your patch changes current behaviour
without any evidence. Code simplification cannot compensate
potential stability issue. Before your patch, kswapd try to
balance for minimum classzone so until dis-advantage of this approach
is proved, it's better to keep original logic.

Thanks.


Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps

2016-07-17 Thread Joonsoo Kim
On Thu, Jul 14, 2016 at 10:05:00AM +0100, Mel Gorman wrote:
> On Thu, Jul 14, 2016 at 02:23:32PM +0900, Joonsoo Kim wrote:
> > > 
> > > > > > And, I'd like to know why max() is used for classzone_idx rather 
> > > > > > than
> > > > > > min()? I think that kswapd should balance the lowest zone requested.
> > > > > > 
> > > > > 
> > > > > If there are two allocation requests -- one zone-constraned and the 
> > > > > other
> > > > > zone-unconstrained, it does not make sense to have kswapd skip the 
> > > > > pages
> > > > > usable for the zone-unconstrained and waste a load of CPU. You could
> > > > 
> > > > I agree that, in this case, it's not good to skip the pages usable
> > > > for the zone-unconstrained request. But, what I am concerned is that
> > > > kswapd stop reclaim prematurely in the view of zone-constrained
> > > > requestor.
> > > 
> > > It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU
> > > for the whole node that may or may not have lower zone pages at the end
> > > of the LRU. If it does, then the allocation request will be satisfied.
> > > If it does not, then kswapd will think the node is balanced and get
> > > rewoken to do a zone-constrained reclaim pass.
> > 
> > If zone-constrained request could go direct reclaim pass, there would
> > be no problem. But, please assume that request is zone-constrained
> > without __GFP_DIRECT_RECLAIM which is common for some device driver
> > implementation.
> 
> Then it's likely GFP_ATOMIC and it'll wake kswapd on each failure. If
> kswapd is containtly awake for highmem requests then we're reclaiming
> everything anyway.  Remember that if kswapd is reclaiming for higher zones,
> it'll still cover the lower zones eventually. There is no guarantee that
> skipping the highmem pages will satisfy the atomic allocations any faster
> but consuming the CPU to skip the pages is a definite cost.

Okay.

> 
> Even worse, skipping highmem pages when a highmem pages are required may
> ake lowmem pressure worse because those pages are freed faster and can
> be consumed by zone-unconstrained requests.

Okay.

> 
> If this really is a problem in practice then we can consider having
> allocation requests that are zone-constrained and !__GFP_DIRECT_RECLAIM
> set a flag and use the min classzone for the wakeup. That flag remains
> set until kswapd takes at least one pass using the lower classzone and
> clears it. The classzone will not be adjusted higher until that flag is

It would work.

> cleared. I don't think we should do it without evidence that it's a real
> problem because kswapd potentially uses useless CPU and the potential for
> higher lowmem pressure.

Hmmm... I think differently. Your patch changes current behaviour
without any evidence. Code simplification cannot compensate
potential stability issue. Before your patch, kswapd try to
balance for minimum classzone so until dis-advantage of this approach
is proved, it's better to keep original logic.

Thanks.


[BUG] kernel BUG at arch/x86/mm/pageattr.c:216!

2016-07-17 Thread Xie XiuQi
Hi all,

I'm getting BUG_ON occurred in a panic at arch/x86/mm/pageattr.c:216! on 
3.10.0-327.el7 (RHEL 7.2)
I want to do a test, to expect system will reboot immediately after panic.
But, in drm_fb_helper_panic, may trigger a BUG_ON at arch/x86/mm/pageattr.c:216!

Does anyone has good idea to fix it?

The code is like bellow:
 210 static void cpa_flush_array(unsigned long *start, int numpages, int cache,
 211 int in_flags, struct page **pages)
 212 {
 213 unsigned int i, level;
 214 unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M 
threshold */
 215
 216 BUG_ON(irqs_disabled());
 217
 218 on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
 219
 220 if (!cache || do_wbinvd)
 221 return;
 222
 223 /*
 224  * We only need to flush on one CPU,
 225  * clflush is a MESI-coherent instruction that
 226  * will cause all other CPUs to flush the same
 227  * cachelines:
 228  */
 229 for (i = 0; i < numpages; i++) {
 230 unsigned long addr;
 231 pte_t *pte;
 232
 233 if (in_flags & CPA_PAGES_ARRAY)
 234 addr = (unsigned long)page_address(pages[i]);
 235 else
 236 addr = start[i];
 237
 238 pte = lookup_address(addr, );
 239
 240 /*
 241  * Only flush present addresses:
 242  */
 243 if (pte && (pte_val(*pte) & _PAGE_PRESENT))
 244 clflush_cache_range((void *)addr, PAGE_SIZE);
 245 }
 246 }


--- crash messages ---
[ 1336.567485] test_module: call panic() function in process context 3 times.
[ 1336.567542] Kernel panic - not syncing: call panic() function in process 
context.

[ 1336.567607] CPU: 0 PID: 9566 Comm: bash Tainted: G   OE  
V---   3.10.0-327.el7.x86_64
[ 1336.567699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 1336.567789]  8116f900 035a0a10 88007adc7e00 
81638844
[ 1336.567848]  88007adc7e80 81632097 0008 
88007adc7e90
[ 1336.567943]  88007adc7e30 035a0a10 8008 
88007ec0d6c8
[ 1336.567992] Call Trace:
[ 1336.567992]  [] ? clear_zonelist_oom+0xa0/0xa0
[ 1336.567992]  [] dump_stack+0x19/0x1b
[ 1336.567992]  [] panic+0xd8/0x20f
[ 1336.567992]  [] ? clear_zonelist_oom+0xa0/0xa0
[ 1336.567992]  [] dev_wr_actions+0x6d9/0xf60 [test_module]
[ 1336.567992]  [] dev_wr_handler+0xa6/0x120 [test_module]
[ 1336.567992]  [] vfs_write+0xbd/0x1e0
[ 1336.567992]  [] ? trace_do_page_fault+0x43/0x110
[ 1336.567992]  [] SyS_write+0x7f/0xe0
[ 1336.567992]  [] system_call_fastpath+0x16/0x1b
[ 1336.567992] drm_kms_helper: panic occurred, switching back to text console
[ 1336.567992] [ cut here ]
[ 1336.567992] kernel BUG at arch/x86/mm/pageattr.c:216!
[ 1336.567992] invalid opcode:  [#1] SMP
[ 1336.567992] Modules linked in: test_module(O) ip6t_rpfilter ip6t_REJECT 
ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter 
ip_tables signo_catch(OV) cirrus ppdev parport_pc parport syscopyarea 
sysfillrect sysimgblt ttm drm_kms_helper drm serio_raw virtio_balloon i2c_piix4 
i2c_core pcspkr xfs libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_common 
ata_generic pata_acpi virtio_console virtio_scsi ata_piix virtio_pci e1000 
libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
[ 1336.567992] CPU: 0 PID: 9566 Comm: bash Tainted: G   O   
  3.10.0-327.el7.x86_64
[ 1336.567992] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 1336.567992] task: 88007afef300 ti: 88007adc4000 task.ti: 
88007adc4000
[ 1336.567992] RIP: 0010:[]  [] 
change_page_attr_set_clr+0x4c8/0x4d0
[ 1336.567992] RSP: 0018:88007adc7538  EFLAGS: 00010046
[ 1336.567992] RAX: 0046 RBX:  RCX: 0004
[ 1336.567992] RDX: 2200 RSI:  RDI: 8000
[ 1336.567992] RBP: 88007adc75d0 R08: 0010 R09: 8800
[ 1336.567992] R10: 3688 R11: 811a738f R12: 0010
[ 1336.567992] R13:  R14: 0200 R15: 0005
[ 1336.567992] FS:  7fee378b1740() GS:88007ec0() 
knlGS:
[ 1336.567992] CS:  0010 DS:  ES:  CR0: 80050033
[ 1336.567992] CR2: 

Re: linux-next: manual merge of the kspp tree with the arm64 tree

2016-07-17 Thread Kees Cook
On Sun, Jul 17, 2016 at 7:59 PM, Stephen Rothwell  wrote:
> Hi Kees,
>
> Today's linux-next merge of the kspp tree got a conflict in:
>
>   arch/arm64/include/asm/uaccess.h
>
> between commit:
>
>   bffe1baff5d5 ("arm64: kasan: instrument user memory access API")
>
> from the arm64 tree and commit:
>
>   b19e7f50f056 ("arm64/uaccess: Enable hardened usercopy")
>
> from the kspp tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc arch/arm64/include/asm/uaccess.h
> index 5e834d10b291,1779cbdb7838..
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@@ -264,14 -276,14 +264,15 @@@ extern unsigned long __must_check __cle
>
>   static inline unsigned long __must_check __copy_from_user(void *to, const 
> void __user *from, unsigned long n)
>   {
>  +  kasan_check_write(to, n);
> -   return  __arch_copy_from_user(to, from, n);
> +   check_object_size(to, n, false);
> +   return __arch_copy_from_user(to, from, n);
>   }
>
>   static inline unsigned long __must_check __copy_to_user(void __user *to, 
> const void *from, unsigned long n)
>   {
>  -  check_object_size(from, n, true);
>  +  kasan_check_read(from, n);
> -   return  __arch_copy_to_user(to, from, n);
> +   return __arch_copy_to_user(to, from, n);

If I'm reading correctly, this second fixup is wrong. It should read;

kasan_check_read(from, n);
check_object_size(from, n, true);
return __arch_copy_to_user(to, from, n);

(i.e. fix double space between "return" and "__arch_copy..." in both
chunks and add check_object_size() calls after the kasan calls in both
chunks.

-Kees

-- 
Kees Cook
Brillo & Chrome OS Security


Re: linux-next: manual merge of the kspp tree with the arm64 tree

2016-07-17 Thread Kees Cook
On Sun, Jul 17, 2016 at 7:59 PM, Stephen Rothwell  wrote:
> Hi Kees,
>
> Today's linux-next merge of the kspp tree got a conflict in:
>
>   arch/arm64/include/asm/uaccess.h
>
> between commit:
>
>   bffe1baff5d5 ("arm64: kasan: instrument user memory access API")
>
> from the arm64 tree and commit:
>
>   b19e7f50f056 ("arm64/uaccess: Enable hardened usercopy")
>
> from the kspp tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc arch/arm64/include/asm/uaccess.h
> index 5e834d10b291,1779cbdb7838..
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@@ -264,14 -276,14 +264,15 @@@ extern unsigned long __must_check __cle
>
>   static inline unsigned long __must_check __copy_from_user(void *to, const 
> void __user *from, unsigned long n)
>   {
>  +  kasan_check_write(to, n);
> -   return  __arch_copy_from_user(to, from, n);
> +   check_object_size(to, n, false);
> +   return __arch_copy_from_user(to, from, n);
>   }
>
>   static inline unsigned long __must_check __copy_to_user(void __user *to, 
> const void *from, unsigned long n)
>   {
>  -  check_object_size(from, n, true);
>  +  kasan_check_read(from, n);
> -   return  __arch_copy_to_user(to, from, n);
> +   return __arch_copy_to_user(to, from, n);

If I'm reading correctly, this second fixup is wrong. It should read;

kasan_check_read(from, n);
check_object_size(from, n, true);
return __arch_copy_to_user(to, from, n);

(i.e. fix double space between "return" and "__arch_copy..." in both
chunks and add check_object_size() calls after the kasan calls in both
chunks.

-Kees

-- 
Kees Cook
Brillo & Chrome OS Security


Re: [PATCH 04/31] mm, vmscan: begin reclaiming pages on a per-node basis

2016-07-17 Thread Joonsoo Kim
On Thu, Jul 14, 2016 at 09:48:41AM +0200, Vlastimil Babka wrote:
> On 07/14/2016 08:28 AM, Joonsoo Kim wrote:
> >On Fri, Jul 08, 2016 at 11:05:32AM +0100, Mel Gorman wrote:
> >>On Fri, Jul 08, 2016 at 11:28:52AM +0900, Joonsoo Kim wrote:
> >>>On Thu, Jul 07, 2016 at 10:48:08AM +0100, Mel Gorman wrote:
> On Thu, Jul 07, 2016 at 10:12:12AM +0900, Joonsoo Kim wrote:
> >>@@ -1402,6 +1406,11 @@ static unsigned long isolate_lru_pages(unsigned 
> >>long nr_to_scan,
> >>
> >>VM_BUG_ON_PAGE(!PageLRU(page), page);
> >>
> >>+   if (page_zonenum(page) > sc->reclaim_idx) {
> >>+   list_move(>lru, _skipped);
> >>+   continue;
> >>+   }
> >>+
> >
> >I think that we don't need to skip LRU pages in active list. What we'd
> >like to do is just skipping actual reclaim since it doesn't make
> >freepage that we need. It's unrelated to skip the page in active list.
> >
> 
> Why?
> 
> The active aging is sometimes about simply aging the LRU list. Aging the
> active list based on the timing of when a zone-constrained allocation 
> arrives
> potentially introduces the same zone-balancing problems we currently have
> and applying them to node-lru.
> >>>
> >>>Could you explain more? I don't understand why aging the active list
> >>>based on the timing of when a zone-constrained allocation arrives
> >>>introduces the zone-balancing problem again.
> >>>
> >>
> >>I mispoke. Avoid rotation of the active list based on the timing of a
> >>zone-constrained allocation is what I think potentially introduces problems.
> >>If there are zone-constrained allocations aging the active list then I worry
> >>that pages would be artificially preserved on the active list.  No matter
> >>what we do, there is distortion of the aging for zone-constrained allocation
> >>because right now, it may deactivate high zone pages sooner than expected.
> >>
> >>>I think that if above logic is applied to both the active/inactive
> >>>list, it could cause zone-balancing problem. LRU pages on lower zone
> >>>can be resident on memory with more chance.
> >>
> >>If anything, with node-based LRU, it's high zone pages that can be resident
> >>on memory for longer but only if there are zone-constrained allocations.
> >>If we always reclaim based on age regardless of allocation requirements
> >>then there is a risk that high zones are reclaimed far earlier than 
> >>expected.
> >>
> >>Basically, whether we skip pages in the active list or not there are
> >>distortions with page aging and the impact is workload dependent. Right now,
> >>I see no clear advantage to special casing active aging.
> >>
> >>If we suspect this is a problem in the future, it would be a simple matter
> >>of adding an additional bool parameter to isolate_lru_pages.
> >
> >Okay. I agree that it would be a simple matter.
> >
> >>
> >And, I have a concern that if inactive LRU is full with higher zone's
> >LRU pages, reclaim with low reclaim_idx could be stuck.
> 
> That is an outside possibility but unlikely given that it would require
> that all outstanding allocation requests are zone-contrained. If it 
> happens
> >>>
> >>>I'm not sure that it is outside possibility. It can also happens if there
> >>>is zone-contrained allocation requestor and parallel memory hogger. In
> >>>this case, memory would be reclaimed by memory hogger but memory hogger 
> >>>would
> >>>consume them again so inactive LRU is continually full with higher
> >>>zone's LRU pages and zone-contrained allocation requestor cannot
> >>>progress.
> >>>
> >>
> >>The same memory hogger will also be reclaiming the highmem pages and
> >>reallocating highmem pages.
> >>
> It would be preferred to have an actual test case for this so the
> altered ratio can be tested instead of introducing code that may be
> useless or dead.
> >>>
> >>>Yes, actual test case would be preferred. I will try to implement
> >>>an artificial test case by myself but I'm not sure when I can do it.
> >>>
> >>
> >>That would be appreciated.
> >
> >I make an artificial test case and test this series by using next tree
> >(next-20160713) and found a regression.
> >
> 
> [...]
> 
> >Mem-Info:
> >active_anon:18779 inactive_anon:18 isolated_anon:0
> > active_file:91577 inactive_file:320615 isolated_file:0
> > unevictable:0 dirty:0 writeback:0 unstable:0
> > slab_reclaimable:6741 slab_unreclaimable:18124
> > mapped:389774 shmem:95 pagetables:18332 bounce:0
> > free:8194 free_pcp:140 free_cma:0
> >Node 0 active_anon:75116kB inactive_anon:72kB active_file:366308kB 
> >inactive_file:1282460kB unevictable:0kB isolated(anon):0kB 
> >isolated(file):0kB mapped:1559096kB dirty:0kB writeback:0kB shmem:0kB 
> >shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 380kB writeback_tmp:0kB 
> >unstable:0kB all_unreclaimable? yes
> >Node 0 DMA free:2172kB min:204kB low:252kB high:300kB 

Re: [PATCH 04/31] mm, vmscan: begin reclaiming pages on a per-node basis

2016-07-17 Thread Joonsoo Kim
On Thu, Jul 14, 2016 at 09:48:41AM +0200, Vlastimil Babka wrote:
> On 07/14/2016 08:28 AM, Joonsoo Kim wrote:
> >On Fri, Jul 08, 2016 at 11:05:32AM +0100, Mel Gorman wrote:
> >>On Fri, Jul 08, 2016 at 11:28:52AM +0900, Joonsoo Kim wrote:
> >>>On Thu, Jul 07, 2016 at 10:48:08AM +0100, Mel Gorman wrote:
> On Thu, Jul 07, 2016 at 10:12:12AM +0900, Joonsoo Kim wrote:
> >>@@ -1402,6 +1406,11 @@ static unsigned long isolate_lru_pages(unsigned 
> >>long nr_to_scan,
> >>
> >>VM_BUG_ON_PAGE(!PageLRU(page), page);
> >>
> >>+   if (page_zonenum(page) > sc->reclaim_idx) {
> >>+   list_move(>lru, _skipped);
> >>+   continue;
> >>+   }
> >>+
> >
> >I think that we don't need to skip LRU pages in active list. What we'd
> >like to do is just skipping actual reclaim since it doesn't make
> >freepage that we need. It's unrelated to skip the page in active list.
> >
> 
> Why?
> 
> The active aging is sometimes about simply aging the LRU list. Aging the
> active list based on the timing of when a zone-constrained allocation 
> arrives
> potentially introduces the same zone-balancing problems we currently have
> and applying them to node-lru.
> >>>
> >>>Could you explain more? I don't understand why aging the active list
> >>>based on the timing of when a zone-constrained allocation arrives
> >>>introduces the zone-balancing problem again.
> >>>
> >>
> >>I mispoke. Avoid rotation of the active list based on the timing of a
> >>zone-constrained allocation is what I think potentially introduces problems.
> >>If there are zone-constrained allocations aging the active list then I worry
> >>that pages would be artificially preserved on the active list.  No matter
> >>what we do, there is distortion of the aging for zone-constrained allocation
> >>because right now, it may deactivate high zone pages sooner than expected.
> >>
> >>>I think that if above logic is applied to both the active/inactive
> >>>list, it could cause zone-balancing problem. LRU pages on lower zone
> >>>can be resident on memory with more chance.
> >>
> >>If anything, with node-based LRU, it's high zone pages that can be resident
> >>on memory for longer but only if there are zone-constrained allocations.
> >>If we always reclaim based on age regardless of allocation requirements
> >>then there is a risk that high zones are reclaimed far earlier than 
> >>expected.
> >>
> >>Basically, whether we skip pages in the active list or not there are
> >>distortions with page aging and the impact is workload dependent. Right now,
> >>I see no clear advantage to special casing active aging.
> >>
> >>If we suspect this is a problem in the future, it would be a simple matter
> >>of adding an additional bool parameter to isolate_lru_pages.
> >
> >Okay. I agree that it would be a simple matter.
> >
> >>
> >And, I have a concern that if inactive LRU is full with higher zone's
> >LRU pages, reclaim with low reclaim_idx could be stuck.
> 
> That is an outside possibility but unlikely given that it would require
> that all outstanding allocation requests are zone-contrained. If it 
> happens
> >>>
> >>>I'm not sure that it is outside possibility. It can also happens if there
> >>>is zone-contrained allocation requestor and parallel memory hogger. In
> >>>this case, memory would be reclaimed by memory hogger but memory hogger 
> >>>would
> >>>consume them again so inactive LRU is continually full with higher
> >>>zone's LRU pages and zone-contrained allocation requestor cannot
> >>>progress.
> >>>
> >>
> >>The same memory hogger will also be reclaiming the highmem pages and
> >>reallocating highmem pages.
> >>
> It would be preferred to have an actual test case for this so the
> altered ratio can be tested instead of introducing code that may be
> useless or dead.
> >>>
> >>>Yes, actual test case would be preferred. I will try to implement
> >>>an artificial test case by myself but I'm not sure when I can do it.
> >>>
> >>
> >>That would be appreciated.
> >
> >I make an artificial test case and test this series by using next tree
> >(next-20160713) and found a regression.
> >
> 
> [...]
> 
> >Mem-Info:
> >active_anon:18779 inactive_anon:18 isolated_anon:0
> > active_file:91577 inactive_file:320615 isolated_file:0
> > unevictable:0 dirty:0 writeback:0 unstable:0
> > slab_reclaimable:6741 slab_unreclaimable:18124
> > mapped:389774 shmem:95 pagetables:18332 bounce:0
> > free:8194 free_pcp:140 free_cma:0
> >Node 0 active_anon:75116kB inactive_anon:72kB active_file:366308kB 
> >inactive_file:1282460kB unevictable:0kB isolated(anon):0kB 
> >isolated(file):0kB mapped:1559096kB dirty:0kB writeback:0kB shmem:0kB 
> >shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 380kB writeback_tmp:0kB 
> >unstable:0kB all_unreclaimable? yes
> >Node 0 DMA free:2172kB min:204kB low:252kB high:300kB 

Re: [PATCH V2] leds: trigger: Introduce an USB port trigger

2016-07-17 Thread Rafał Miłecki
On 18 July 2016 at 04:31, Peter Chen  wrote:
> On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote:
>> +
>> +usbport trigger:
>> +- usb-ports : List of USB ports that usbport should observed for turning on 
>> a
>> +   given LED.
>> +
>
> %s/should/should be

Thanks.


>> diff --git a/drivers/leds/trigger/ledtrig-usbport.c 
>> b/drivers/leds/trigger/ledtrig-usbport.c
>> new file mode 100644
>> index 000..97b064c
>> --- /dev/null
>> +++ b/drivers/leds/trigger/ledtrig-usbport.c
>> @@ -0,0 +1,206 @@
>> +/*
>> + * USB port LED trigger
>> + *
>> + * Copyright (C) 2016 Rafał Miłecki 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or (at
>> + * your option) any later version.
>> + */
>
> GPL v2 only.
>
>> +MODULE_AUTHOR("Rafał Miłecki ");
>> +MODULE_DESCRIPTION("USB port trigger");
>> +MODULE_LICENSE("GPL");
>
> GPL v2

What's the reason for this? I don't have any real preference, but I
never heard heard about kernel/Linux preference neither.

-- 
Rafał


Re: [PATCH V2] leds: trigger: Introduce an USB port trigger

2016-07-17 Thread Rafał Miłecki
On 18 July 2016 at 04:31, Peter Chen  wrote:
> On Fri, Jul 15, 2016 at 11:10:45PM +0200, Rafał Miłecki wrote:
>> +
>> +usbport trigger:
>> +- usb-ports : List of USB ports that usbport should observed for turning on 
>> a
>> +   given LED.
>> +
>
> %s/should/should be

Thanks.


>> diff --git a/drivers/leds/trigger/ledtrig-usbport.c 
>> b/drivers/leds/trigger/ledtrig-usbport.c
>> new file mode 100644
>> index 000..97b064c
>> --- /dev/null
>> +++ b/drivers/leds/trigger/ledtrig-usbport.c
>> @@ -0,0 +1,206 @@
>> +/*
>> + * USB port LED trigger
>> + *
>> + * Copyright (C) 2016 Rafał Miłecki 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or (at
>> + * your option) any later version.
>> + */
>
> GPL v2 only.
>
>> +MODULE_AUTHOR("Rafał Miłecki ");
>> +MODULE_DESCRIPTION("USB port trigger");
>> +MODULE_LICENSE("GPL");
>
> GPL v2

What's the reason for this? I don't have any real preference, but I
never heard heard about kernel/Linux preference neither.

-- 
Rafał


Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg

2016-07-17 Thread Borislav Petkov
On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote:
> I would say avoiding ratelimit during boot make no much sense. Userspace can 
> not
> write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
> has not run yet.

You're right - kernel_init() sets SYSTEM_RUNNING before running the init
process. I probably should kill all that logic in the second patch.

> I means to set printk.devkmsg=off by default, userspace can set it to
> on by sysctl.

That can't happen: DEVKMSG_LOG_MASK_LOCK.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--


Re: [PATCH -v4 2/2] printk: Add kernel parameter to control writes to /dev/kmsg

2016-07-17 Thread Borislav Petkov
On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote:
> I would say avoiding ratelimit during boot make no much sense. Userspace can 
> not
> write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
> has not run yet.

You're right - kernel_init() sets SYSTEM_RUNNING before running the init
process. I probably should kill all that logic in the second patch.

> I means to set printk.devkmsg=off by default, userspace can set it to
> on by sysctl.

That can't happen: DEVKMSG_LOG_MASK_LOCK.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--


Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls

2016-07-17 Thread Andy Lutomirski
On Thu, Jul 14, 2016 at 1:07 AM, Ingo Molnar  wrote:
>
> * Andy Lutomirski  wrote:
>
>> On Wed, Jul 13, 2016 at 12:56 AM, Ingo Molnar  wrote:
>> >
>> > * Andy Lutomirski  wrote:
>> >
>> >> > If we push a PKRU value into a thread between the rdpkru() and 
>> >> > wrpkru(), we'll
>> >> > lose the content of that "push".  I'm not sure there's any way to 
>> >> > guarantee
>> >> > this with a user-controlled register.
>> >>
>> >> We could try to insist that user code uses some vsyscall helper that 
>> >> tracks
>> >> which bits are as-yet-unassigned.  That's quite messy, though.
>> >
>> > Actually, if we turned the vDSO into something more like a minimal 
>> > user-space
>> > library with the ability to run at process startup as well to prepare stuff
>> > then it's painful to get right only *once*, and there will be tons of other
>> > areas where a proper per thread data storage on the user-space side would 
>> > be
>> > immensely useful!
>>
>> Doing this could be tricky: how exactly is the vDSO supposed to find 
>> per-thread
>> data without breaking existing glibc?
>
> So I think the way this could be done is by allocating it itself. The vDSO vma
> itself is 'external' to glibc as well to begin with - this would be a small
> extension to that concept.

But how does the vdso code find it?  FS and GS are both spoken for by
existing userspace.

--Andy


Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls

2016-07-17 Thread Andy Lutomirski
On Thu, Jul 14, 2016 at 1:07 AM, Ingo Molnar  wrote:
>
> * Andy Lutomirski  wrote:
>
>> On Wed, Jul 13, 2016 at 12:56 AM, Ingo Molnar  wrote:
>> >
>> > * Andy Lutomirski  wrote:
>> >
>> >> > If we push a PKRU value into a thread between the rdpkru() and 
>> >> > wrpkru(), we'll
>> >> > lose the content of that "push".  I'm not sure there's any way to 
>> >> > guarantee
>> >> > this with a user-controlled register.
>> >>
>> >> We could try to insist that user code uses some vsyscall helper that 
>> >> tracks
>> >> which bits are as-yet-unassigned.  That's quite messy, though.
>> >
>> > Actually, if we turned the vDSO into something more like a minimal 
>> > user-space
>> > library with the ability to run at process startup as well to prepare stuff
>> > then it's painful to get right only *once*, and there will be tons of other
>> > areas where a proper per thread data storage on the user-space side would 
>> > be
>> > immensely useful!
>>
>> Doing this could be tricky: how exactly is the vDSO supposed to find 
>> per-thread
>> data without breaking existing glibc?
>
> So I think the way this could be done is by allocating it itself. The vDSO vma
> itself is 'external' to glibc as well to begin with - this would be a small
> extension to that concept.

But how does the vdso code find it?  FS and GS are both spoken for by
existing userspace.

--Andy


[PATCH 2/3] qemu: Implement virtio-pstore device

2016-07-17 Thread Namhyung Kim
From: Namhyung Kim 

Add virtio pstore device to allow kernel log files saved on the host.
It will save the log files on the directory given by pstore device
option.

  $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ...

  (guest) # echo c > /proc/sysrq-trigger

  $ ls dir-xx
  dmesg-0.enc.z  dmesg-1.enc.z

The log files are usually compressed using zlib.  Users can see the log
messages directly on the host or on the guest (using pstore filesystem).

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Namhyung Kim 
---
 hw/virtio/Makefile.objs|   2 +-
 hw/virtio/virtio-pci.c |  50 
 hw/virtio/virtio-pci.h |  14 +
 hw/virtio/virtio-pstore.c  | 328 +
 include/hw/pci/pci.h   |   1 +
 include/hw/virtio/virtio-pstore.h  |  30 ++
 include/standard-headers/linux/virtio_ids.h|   1 +
 .../linux/{virtio_ids.h => virtio_pstore.h}|  48 +--
 qdev-monitor.c |   1 +
 9 files changed, 455 insertions(+), 20 deletions(-)
 create mode 100644 hw/virtio/virtio-pstore.c
 create mode 100644 include/hw/virtio/virtio-pstore.h
 copy include/standard-headers/linux/{virtio_ids.h => virtio_pstore.h} (63%)

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 3e2b175..aae7082 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -4,4 +4,4 @@ common-obj-y += virtio-bus.o
 common-obj-y += virtio-mmio.o
 
 obj-y += virtio.o virtio-balloon.o 
-obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
+obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o virtio-pstore.o
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 2b34b43..8281b80 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2416,6 +2416,55 @@ static const TypeInfo virtio_host_pci_info = {
 };
 #endif
 
+/* virtio-pstore-pci */
+
+static void virtio_pstore_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VirtIOPstorePCI *vps = VIRTIO_PSTORE_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(>vdev);
+Error *err = NULL;
+
+qdev_set_parent_bus(vdev, BUS(_dev->bus));
+object_property_set_bool(OBJECT(vdev), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+}
+
+static void virtio_pstore_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+
+k->realize = virtio_pstore_pci_realize;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PSTORE;
+pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+pcidev_k->class_id = PCI_CLASS_OTHERS;
+}
+
+static void virtio_pstore_pci_instance_init(Object *obj)
+{
+VirtIOPstorePCI *dev = VIRTIO_PSTORE_PCI(obj);
+
+virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_PSTORE);
+object_property_add_alias(obj, "directory", OBJECT(>vdev),
+  "directory", _abort);
+}
+
+static const TypeInfo virtio_pstore_pci_info = {
+.name  = TYPE_VIRTIO_PSTORE_PCI,
+.parent= TYPE_VIRTIO_PCI,
+.instance_size = sizeof(VirtIOPstorePCI),
+.instance_init = virtio_pstore_pci_instance_init,
+.class_init= virtio_pstore_pci_class_init,
+};
+
 /* virtio-pci-bus */
 
 static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size,
@@ -2485,6 +2534,7 @@ static void virtio_pci_register_types(void)
 #ifdef CONFIG_VHOST_SCSI
 type_register_static(_scsi_pci_info);
 #endif
+type_register_static(_pstore_pci_info);
 }
 
 type_init(virtio_pci_register_types)
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index e4548c2..b4c039f 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -31,6 +31,7 @@
 #ifdef CONFIG_VHOST_SCSI
 #include "hw/virtio/vhost-scsi.h"
 #endif
+#include "hw/virtio/virtio-pstore.h"
 
 typedef struct VirtIOPCIProxy VirtIOPCIProxy;
 typedef struct VirtIOBlkPCI VirtIOBlkPCI;
@@ -44,6 +45,7 @@ typedef struct VirtIOInputPCI VirtIOInputPCI;
 typedef struct VirtIOInputHIDPCI VirtIOInputHIDPCI;
 typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
 typedef struct 

[PATCH 2/3] qemu: Implement virtio-pstore device

2016-07-17 Thread Namhyung Kim
From: Namhyung Kim 

Add virtio pstore device to allow kernel log files saved on the host.
It will save the log files on the directory given by pstore device
option.

  $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ...

  (guest) # echo c > /proc/sysrq-trigger

  $ ls dir-xx
  dmesg-0.enc.z  dmesg-1.enc.z

The log files are usually compressed using zlib.  Users can see the log
messages directly on the host or on the guest (using pstore filesystem).

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Namhyung Kim 
---
 hw/virtio/Makefile.objs|   2 +-
 hw/virtio/virtio-pci.c |  50 
 hw/virtio/virtio-pci.h |  14 +
 hw/virtio/virtio-pstore.c  | 328 +
 include/hw/pci/pci.h   |   1 +
 include/hw/virtio/virtio-pstore.h  |  30 ++
 include/standard-headers/linux/virtio_ids.h|   1 +
 .../linux/{virtio_ids.h => virtio_pstore.h}|  48 +--
 qdev-monitor.c |   1 +
 9 files changed, 455 insertions(+), 20 deletions(-)
 create mode 100644 hw/virtio/virtio-pstore.c
 create mode 100644 include/hw/virtio/virtio-pstore.h
 copy include/standard-headers/linux/{virtio_ids.h => virtio_pstore.h} (63%)

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 3e2b175..aae7082 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -4,4 +4,4 @@ common-obj-y += virtio-bus.o
 common-obj-y += virtio-mmio.o
 
 obj-y += virtio.o virtio-balloon.o 
-obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
+obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o virtio-pstore.o
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 2b34b43..8281b80 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2416,6 +2416,55 @@ static const TypeInfo virtio_host_pci_info = {
 };
 #endif
 
+/* virtio-pstore-pci */
+
+static void virtio_pstore_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VirtIOPstorePCI *vps = VIRTIO_PSTORE_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(>vdev);
+Error *err = NULL;
+
+qdev_set_parent_bus(vdev, BUS(_dev->bus));
+object_property_set_bool(OBJECT(vdev), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+}
+
+static void virtio_pstore_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+
+k->realize = virtio_pstore_pci_realize;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PSTORE;
+pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+pcidev_k->class_id = PCI_CLASS_OTHERS;
+}
+
+static void virtio_pstore_pci_instance_init(Object *obj)
+{
+VirtIOPstorePCI *dev = VIRTIO_PSTORE_PCI(obj);
+
+virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_PSTORE);
+object_property_add_alias(obj, "directory", OBJECT(>vdev),
+  "directory", _abort);
+}
+
+static const TypeInfo virtio_pstore_pci_info = {
+.name  = TYPE_VIRTIO_PSTORE_PCI,
+.parent= TYPE_VIRTIO_PCI,
+.instance_size = sizeof(VirtIOPstorePCI),
+.instance_init = virtio_pstore_pci_instance_init,
+.class_init= virtio_pstore_pci_class_init,
+};
+
 /* virtio-pci-bus */
 
 static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size,
@@ -2485,6 +2534,7 @@ static void virtio_pci_register_types(void)
 #ifdef CONFIG_VHOST_SCSI
 type_register_static(_scsi_pci_info);
 #endif
+type_register_static(_pstore_pci_info);
 }
 
 type_init(virtio_pci_register_types)
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index e4548c2..b4c039f 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -31,6 +31,7 @@
 #ifdef CONFIG_VHOST_SCSI
 #include "hw/virtio/vhost-scsi.h"
 #endif
+#include "hw/virtio/virtio-pstore.h"
 
 typedef struct VirtIOPCIProxy VirtIOPCIProxy;
 typedef struct VirtIOBlkPCI VirtIOBlkPCI;
@@ -44,6 +45,7 @@ typedef struct VirtIOInputPCI VirtIOInputPCI;
 typedef struct VirtIOInputHIDPCI VirtIOInputHIDPCI;
 typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
 typedef struct VirtIOGPUPCI VirtIOGPUPCI;
+typedef struct VirtIOPstorePCI VirtIOPstorePCI;
 
 /* virtio-pci-bus */
 
@@ -311,6 +313,18 @@ struct VirtIOGPUPCI {
 VirtIOGPU vdev;
 };
 
+/*
+ * virtio-pstore-pci: This extends VirtioPCIProxy.
+ */
+#define 

[RFC/PATCHSET 0/3] virtio-pstore: Implement virtio pstore device

2016-07-17 Thread Namhyung Kim
Hello,

This patchset is a proof of concept of virtio-pstore idea [1].  It has
some rough edges and I'm not familiar with this area, so please give
me feedbacks and advices if I'm going to a wrong direction.

It started from the fact that dumping ftrace buffer at kernel
oops/panic takes too much time.  Although there's a way to reduce the
size of the original data, sometimes I want to have the information as
many as possible.  Maybe kexec/kdump can solve this problem but it
consumes some portion of guest memory so I'd like to avoid it.  And I
know the qemu + crashtool can dump and analyze the whole guest memory
including the ftrace buffer without wasting guest memory, but it adds
one more layer and has some limitation as an out-of-tree tool like not
being in sync with the kernel changes.

So I think it'd be great using the pstore interface to dump guest
kernel data on the host.  One can read the data on the host directly
or on the guest (at the next boot) using pstore filesystem as usual.
While this patchset only implements dumping kernel log buffer, it can
be extended to have ftrace buffer and probably some more..

The patch 0001 implements virtio pstore driver.  It has a single virt
queue, pstore buffer and header structure.  The virtio_pstore_hdr
struct is to give information about the current pstore operation.

The patch 0002 and 0003 implement virtio-pstore legacy PCI device on
qemu-kvm and kvmtool respectively.  I referenced virtio-baloon and
virtio-rng implementations and I don't know whether kvmtool supports
modern virtio 1.0+ spec.

For example, using virtio-pstore on qemu looks like below:

  $ qemu-system-x86_64 -enable-kvm -device virtio-pstore,directory=xxx

When guest kernel gets panic the log messages will be saved under the
xxx directory.

  $ ls xxx
  dmesg-0.enc.z  dmesg-1.enc.z

As you can see the pstore subsystem compresses the log data using
zlib.  The data can be extracted with the following command:

  $ cat xxx/dmesg-0.enc.z | \
  > python -c 'import sys, zlib; print(zlib.decompress(sys.stdin.read()))'
  Oops#1 Part1
  <5>[0.00] Linux version 4.6.0kvm+ (namhyung@danjae) (gcc version 
5.3.0 (GCC) ) #145 SMP Mon Jul 18 10:22:45 KST 2016
  <6>[0.00] Command line: root=/dev/vda console=ttyS0
  <6>[0.00] x86/fpu: Legacy x87 FPU detected.
  <6>[0.00] x86/fpu: Using 'eager' FPU context switches.
  <6>[0.00] e820: BIOS-provided physical RAM map:
  <6>[0.00] BIOS-e820: [mem 0x-0x0009fbff] 
usable
  <6>[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
reserved
  <6>[0.00] BIOS-e820: [mem 0x000f-0x000f] 
reserved
  <6>[0.00] BIOS-e820: [mem 0x0010-0x07fddfff] 
usable
  <6>[0.00] BIOS-e820: [mem 0x07fde000-0x07ff] 
reserved
  <6>[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
reserved
  <6>[0.00] BIOS-e820: [mem 0xfffc-0x] 
reserved
  <6>[0.00] NX (Execute Disable) protection: active
  <6>[0.00] SMBIOS 2.8 present.
  <7>[0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
  ...

Maybe we can add a config option to control the compression later.


Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Cc: virtualizat...@lists.linux-foundation.org

[1] https://lkml.org/lkml/2016/7/1/6


Thanks,
Namhyung


[RFC/PATCHSET 0/3] virtio-pstore: Implement virtio pstore device

2016-07-17 Thread Namhyung Kim
Hello,

This patchset is a proof of concept of virtio-pstore idea [1].  It has
some rough edges and I'm not familiar with this area, so please give
me feedbacks and advices if I'm going to a wrong direction.

It started from the fact that dumping ftrace buffer at kernel
oops/panic takes too much time.  Although there's a way to reduce the
size of the original data, sometimes I want to have the information as
many as possible.  Maybe kexec/kdump can solve this problem but it
consumes some portion of guest memory so I'd like to avoid it.  And I
know the qemu + crashtool can dump and analyze the whole guest memory
including the ftrace buffer without wasting guest memory, but it adds
one more layer and has some limitation as an out-of-tree tool like not
being in sync with the kernel changes.

So I think it'd be great using the pstore interface to dump guest
kernel data on the host.  One can read the data on the host directly
or on the guest (at the next boot) using pstore filesystem as usual.
While this patchset only implements dumping kernel log buffer, it can
be extended to have ftrace buffer and probably some more..

The patch 0001 implements virtio pstore driver.  It has a single virt
queue, pstore buffer and header structure.  The virtio_pstore_hdr
struct is to give information about the current pstore operation.

The patch 0002 and 0003 implement virtio-pstore legacy PCI device on
qemu-kvm and kvmtool respectively.  I referenced virtio-baloon and
virtio-rng implementations and I don't know whether kvmtool supports
modern virtio 1.0+ spec.

For example, using virtio-pstore on qemu looks like below:

  $ qemu-system-x86_64 -enable-kvm -device virtio-pstore,directory=xxx

When guest kernel gets panic the log messages will be saved under the
xxx directory.

  $ ls xxx
  dmesg-0.enc.z  dmesg-1.enc.z

As you can see the pstore subsystem compresses the log data using
zlib.  The data can be extracted with the following command:

  $ cat xxx/dmesg-0.enc.z | \
  > python -c 'import sys, zlib; print(zlib.decompress(sys.stdin.read()))'
  Oops#1 Part1
  <5>[0.00] Linux version 4.6.0kvm+ (namhyung@danjae) (gcc version 
5.3.0 (GCC) ) #145 SMP Mon Jul 18 10:22:45 KST 2016
  <6>[0.00] Command line: root=/dev/vda console=ttyS0
  <6>[0.00] x86/fpu: Legacy x87 FPU detected.
  <6>[0.00] x86/fpu: Using 'eager' FPU context switches.
  <6>[0.00] e820: BIOS-provided physical RAM map:
  <6>[0.00] BIOS-e820: [mem 0x-0x0009fbff] 
usable
  <6>[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
reserved
  <6>[0.00] BIOS-e820: [mem 0x000f-0x000f] 
reserved
  <6>[0.00] BIOS-e820: [mem 0x0010-0x07fddfff] 
usable
  <6>[0.00] BIOS-e820: [mem 0x07fde000-0x07ff] 
reserved
  <6>[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
reserved
  <6>[0.00] BIOS-e820: [mem 0xfffc-0x] 
reserved
  <6>[0.00] NX (Execute Disable) protection: active
  <6>[0.00] SMBIOS 2.8 present.
  <7>[0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
  ...

Maybe we can add a config option to control the compression later.


Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Cc: virtualizat...@lists.linux-foundation.org

[1] https://lkml.org/lkml/2016/7/1/6


Thanks,
Namhyung


[PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-07-17 Thread Namhyung Kim
The virtio pstore driver provides interface to the pstore subsystem so
that the guest kernel's log/dump message can be saved on the host
machine.  Users can access the log file directly on the host, or on the
guest at the next boot using pstore filesystem.  It currently deals with
kernel log (printk) buffer only, but we can extend it to have other
information (like ftrace dump) later.

It supports legacy PCI device using single order-2 page buffer.  As all
operation of pstore is synchronous, it would be fine IMHO.  However I
don't know how to make write operation synchronous since it's called
with a spinlock held (from any context including NMI).

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Namhyung Kim 
---
 drivers/virtio/Kconfig |  10 ++
 drivers/virtio/Makefile|   1 +
 drivers/virtio/virtio_pstore.c | 317 +
 include/uapi/linux/Kbuild  |   1 +
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_pstore.h |  53 +++
 6 files changed, 383 insertions(+)
 create mode 100644 drivers/virtio/virtio_pstore.c
 create mode 100644 include/uapi/linux/virtio_pstore.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 77590320d44c..8f0e6c796c12 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -58,6 +58,16 @@ config VIRTIO_INPUT
 
 If unsure, say M.
 
+config VIRTIO_PSTORE
+   tristate "Virtio pstore driver"
+   depends on VIRTIO
+   depends on PSTORE
+   ---help---
+This driver supports virtio pstore devices to save/restore
+panic and oops messages on the host.
+
+If unsure, say M.
+
  config VIRTIO_MMIO
tristate "Platform bus driver for memory mapped virtio devices"
depends on HAS_IOMEM && HAS_DMA
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 41e30e3dc842..bee68cb26d48 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
new file mode 100644
index ..6fe62c0f1508
--- /dev/null
+++ b/drivers/virtio/virtio_pstore.c
@@ -0,0 +1,317 @@
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define VIRT_PSTORE_ORDER2
+#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
+
+struct virtio_pstore {
+   struct virtio_device*vdev;
+   struct virtqueue*vq;
+   struct pstore_info   pstore;
+   struct virtio_pstore_hdr hdr;
+   size_t   buflen;
+   u64  id;
+
+   /* Waiting for host to ack */
+   wait_queue_head_t   acked;
+};
+
+static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type)
+{
+   u16 ret;
+
+   switch (type) {
+   case PSTORE_TYPE_DMESG:
+   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
+   break;
+   default:
+   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
+   break;
+   }
+
+   return ret;
+}
+
+static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 
type)
+{
+   enum pstore_type_id ret;
+
+   switch (virtio16_to_cpu(vps->vdev, type)) {
+   case VIRTIO_PSTORE_TYPE_DMESG:
+   ret = PSTORE_TYPE_DMESG;
+   break;
+   default:
+   ret = PSTORE_TYPE_UNKNOWN;
+   break;
+   }
+
+   return ret;
+}
+
+static void virtpstore_ack(struct virtqueue *vq)
+{
+   struct virtio_pstore *vps = vq->vdev->priv;
+
+   wake_up(>acked);
+}
+
+static int virt_pstore_open(struct pstore_info *psi)
+{
+   struct virtio_pstore *vps = psi->data;
+   struct virtio_pstore_hdr *hdr = >hdr;
+   struct scatterlist sg[1];
+   unsigned int len;
+
+   hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_OPEN);
+
+   sg_init_one(sg, hdr, sizeof(*hdr));
+   virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL);
+   virtqueue_kick(vps->vq);
+
+   wait_event(vps->acked, virtqueue_get_buf(vps->vq, ));
+   return 0;
+}
+
+static int 

[PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-07-17 Thread Namhyung Kim
The virtio pstore driver provides interface to the pstore subsystem so
that the guest kernel's log/dump message can be saved on the host
machine.  Users can access the log file directly on the host, or on the
guest at the next boot using pstore filesystem.  It currently deals with
kernel log (printk) buffer only, but we can extend it to have other
information (like ftrace dump) later.

It supports legacy PCI device using single order-2 page buffer.  As all
operation of pstore is synchronous, it would be fine IMHO.  However I
don't know how to make write operation synchronous since it's called
with a spinlock held (from any context including NMI).

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Namhyung Kim 
---
 drivers/virtio/Kconfig |  10 ++
 drivers/virtio/Makefile|   1 +
 drivers/virtio/virtio_pstore.c | 317 +
 include/uapi/linux/Kbuild  |   1 +
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_pstore.h |  53 +++
 6 files changed, 383 insertions(+)
 create mode 100644 drivers/virtio/virtio_pstore.c
 create mode 100644 include/uapi/linux/virtio_pstore.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 77590320d44c..8f0e6c796c12 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -58,6 +58,16 @@ config VIRTIO_INPUT
 
 If unsure, say M.
 
+config VIRTIO_PSTORE
+   tristate "Virtio pstore driver"
+   depends on VIRTIO
+   depends on PSTORE
+   ---help---
+This driver supports virtio pstore devices to save/restore
+panic and oops messages on the host.
+
+If unsure, say M.
+
  config VIRTIO_MMIO
tristate "Platform bus driver for memory mapped virtio devices"
depends on HAS_IOMEM && HAS_DMA
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 41e30e3dc842..bee68cb26d48 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
new file mode 100644
index ..6fe62c0f1508
--- /dev/null
+++ b/drivers/virtio/virtio_pstore.c
@@ -0,0 +1,317 @@
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define VIRT_PSTORE_ORDER2
+#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
+
+struct virtio_pstore {
+   struct virtio_device*vdev;
+   struct virtqueue*vq;
+   struct pstore_info   pstore;
+   struct virtio_pstore_hdr hdr;
+   size_t   buflen;
+   u64  id;
+
+   /* Waiting for host to ack */
+   wait_queue_head_t   acked;
+};
+
+static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type)
+{
+   u16 ret;
+
+   switch (type) {
+   case PSTORE_TYPE_DMESG:
+   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
+   break;
+   default:
+   ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
+   break;
+   }
+
+   return ret;
+}
+
+static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 
type)
+{
+   enum pstore_type_id ret;
+
+   switch (virtio16_to_cpu(vps->vdev, type)) {
+   case VIRTIO_PSTORE_TYPE_DMESG:
+   ret = PSTORE_TYPE_DMESG;
+   break;
+   default:
+   ret = PSTORE_TYPE_UNKNOWN;
+   break;
+   }
+
+   return ret;
+}
+
+static void virtpstore_ack(struct virtqueue *vq)
+{
+   struct virtio_pstore *vps = vq->vdev->priv;
+
+   wake_up(>acked);
+}
+
+static int virt_pstore_open(struct pstore_info *psi)
+{
+   struct virtio_pstore *vps = psi->data;
+   struct virtio_pstore_hdr *hdr = >hdr;
+   struct scatterlist sg[1];
+   unsigned int len;
+
+   hdr->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_OPEN);
+
+   sg_init_one(sg, hdr, sizeof(*hdr));
+   virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL);
+   virtqueue_kick(vps->vq);
+
+   wait_event(vps->acked, virtqueue_get_buf(vps->vq, ));
+   return 0;
+}
+
+static int virt_pstore_close(struct pstore_info *psi)
+{
+   struct virtio_pstore *vps = psi->data;
+   struct virtio_pstore_hdr *hdr = >hdr;
+   struct scatterlist sg[1];
+   unsigned int len;
+
+   hdr->cmd = 

[PATCH 3/3] kvmtool: Implement virtio-pstore device

2016-07-17 Thread Namhyung Kim
Add virtio pstore device to allow kernel log messages saved on the
host.  With this patch, it will save the log files under directory given
by --pstore option.

  $ lkvm run --pstore=dir-xx

  (guest) # echo c > /proc/sysrq-trigger

  $ ls dir-xx
  dmesg-0.enc.z  dmesg-1.enc.z

The log files are usually compressed using zlib.  User can easily see
the messages on the host or on the guest (using pstore filesystem).

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Namhyung Kim 
---
 Makefile |   1 +
 builtin-run.c|   2 +
 include/kvm/kvm-config.h |   1 +
 include/kvm/virtio-pci-dev.h |   2 +
 include/kvm/virtio-pstore.h  |  31 
 include/linux/virtio_ids.h   |   1 +
 virtio/pstore.c  | 359 +++
 7 files changed, 397 insertions(+)
 create mode 100644 include/kvm/virtio-pstore.h
 create mode 100644 virtio/pstore.c

diff --git a/Makefile b/Makefile
index 1f0196f..d7462b9 100644
--- a/Makefile
+++ b/Makefile
@@ -67,6 +67,7 @@ OBJS  += virtio/net.o
 OBJS   += virtio/rng.o
 OBJS+= virtio/balloon.o
 OBJS   += virtio/pci.o
+OBJS   += virtio/pstore.o
 OBJS   += disk/blk.o
 OBJS   += disk/qcow.o
 OBJS   += disk/raw.o
diff --git a/builtin-run.c b/builtin-run.c
index 72b878d..08c12dd 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -128,6 +128,8 @@ void kvm_run_set_wrapper_sandbox(void)
" rootfs"), \
OPT_STRING('\0', "hugetlbfs", &(cfg)->hugetlbfs_path, "path",   \
"Hugetlbfs path"),  \
+   OPT_STRING('\0', "pstore", &(cfg)->pstore_path, "path", \
+   "pstore data path"),\
\
OPT_GROUP("Kernel options:"),   \
OPT_STRING('k', "kernel", &(cfg)->kernel_filename, "kernel",\
diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
index 386fa8c..42b7651 100644
--- a/include/kvm/kvm-config.h
+++ b/include/kvm/kvm-config.h
@@ -45,6 +45,7 @@ struct kvm_config {
const char *hugetlbfs_path;
const char *custom_rootfs_name;
const char *real_cmdline;
+   const char *pstore_path;
struct virtio_net_params *net_params;
bool single_step;
bool vnc;
diff --git a/include/kvm/virtio-pci-dev.h b/include/kvm/virtio-pci-dev.h
index 48ae018..4339d94 100644
--- a/include/kvm/virtio-pci-dev.h
+++ b/include/kvm/virtio-pci-dev.h
@@ -15,6 +15,7 @@
 #define PCI_DEVICE_ID_VIRTIO_BLN   0x1005
 #define PCI_DEVICE_ID_VIRTIO_SCSI  0x1008
 #define PCI_DEVICE_ID_VIRTIO_9P0x1009
+#define PCI_DEVICE_ID_VIRTIO_PSTORE0x100a
 #define PCI_DEVICE_ID_VESA 0x2000
 #define PCI_DEVICE_ID_PCI_SHMEM0x0001
 
@@ -34,5 +35,6 @@
 #define PCI_CLASS_RNG  0xff
 #define PCI_CLASS_BLN  0xff
 #define PCI_CLASS_9P   0xff
+#define PCI_CLASS_PSTORE   0xff
 
 #endif /* VIRTIO_PCI_DEV_H_ */
diff --git a/include/kvm/virtio-pstore.h b/include/kvm/virtio-pstore.h
new file mode 100644
index 000..293ab57
--- /dev/null
+++ b/include/kvm/virtio-pstore.h
@@ -0,0 +1,31 @@
+#ifndef KVM__PSTORE_VIRTIO_H
+#define KVM__PSTORE_VIRTIO_H
+
+struct kvm;
+
+#define VIRTIO_PSTORE_TYPE_UNKNOWN  0
+#define VIRTIO_PSTORE_TYPE_DMESG1
+
+#define VIRTIO_PSTORE_CMD_NULL   0
+#define VIRTIO_PSTORE_CMD_OPEN   1
+#define VIRTIO_PSTORE_CMD_READ   2
+#define VIRTIO_PSTORE_CMD_WRITE  3
+#define VIRTIO_PSTORE_CMD_ERASE  4
+#define VIRTIO_PSTORE_CMD_CLOSE  5
+
+#define VIRTIO_PSTORE_FL_COMPRESSED  1
+
+struct pstore_hdr {
+   u64 id;
+   u32 flags;
+   u16 cmd;
+   u16 type;
+   u64 time_sec;
+   u32 time_nsec;
+   u32 unused;
+};
+
+int virtio_pstore__init(struct kvm *kvm);
+int virtio_pstore__exit(struct kvm *kvm);
+
+#endif /* KVM__PSTORE_VIRTIO_H */
diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h
index 5f60aa4..f34cabc 100644
--- a/include/linux/virtio_ids.h
+++ b/include/linux/virtio_ids.h
@@ -40,5 +40,6 @@
 #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
 

[PATCH 3/3] kvmtool: Implement virtio-pstore device

2016-07-17 Thread Namhyung Kim
Add virtio pstore device to allow kernel log messages saved on the
host.  With this patch, it will save the log files under directory given
by --pstore option.

  $ lkvm run --pstore=dir-xx

  (guest) # echo c > /proc/sysrq-trigger

  $ ls dir-xx
  dmesg-0.enc.z  dmesg-1.enc.z

The log files are usually compressed using zlib.  User can easily see
the messages on the host or on the guest (using pstore filesystem).

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: "Michael S. Tsirkin" 
Cc: Anthony Liguori 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Minchan Kim 
Cc: k...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Namhyung Kim 
---
 Makefile |   1 +
 builtin-run.c|   2 +
 include/kvm/kvm-config.h |   1 +
 include/kvm/virtio-pci-dev.h |   2 +
 include/kvm/virtio-pstore.h  |  31 
 include/linux/virtio_ids.h   |   1 +
 virtio/pstore.c  | 359 +++
 7 files changed, 397 insertions(+)
 create mode 100644 include/kvm/virtio-pstore.h
 create mode 100644 virtio/pstore.c

diff --git a/Makefile b/Makefile
index 1f0196f..d7462b9 100644
--- a/Makefile
+++ b/Makefile
@@ -67,6 +67,7 @@ OBJS  += virtio/net.o
 OBJS   += virtio/rng.o
 OBJS+= virtio/balloon.o
 OBJS   += virtio/pci.o
+OBJS   += virtio/pstore.o
 OBJS   += disk/blk.o
 OBJS   += disk/qcow.o
 OBJS   += disk/raw.o
diff --git a/builtin-run.c b/builtin-run.c
index 72b878d..08c12dd 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -128,6 +128,8 @@ void kvm_run_set_wrapper_sandbox(void)
" rootfs"), \
OPT_STRING('\0', "hugetlbfs", &(cfg)->hugetlbfs_path, "path",   \
"Hugetlbfs path"),  \
+   OPT_STRING('\0', "pstore", &(cfg)->pstore_path, "path", \
+   "pstore data path"),\
\
OPT_GROUP("Kernel options:"),   \
OPT_STRING('k', "kernel", &(cfg)->kernel_filename, "kernel",\
diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
index 386fa8c..42b7651 100644
--- a/include/kvm/kvm-config.h
+++ b/include/kvm/kvm-config.h
@@ -45,6 +45,7 @@ struct kvm_config {
const char *hugetlbfs_path;
const char *custom_rootfs_name;
const char *real_cmdline;
+   const char *pstore_path;
struct virtio_net_params *net_params;
bool single_step;
bool vnc;
diff --git a/include/kvm/virtio-pci-dev.h b/include/kvm/virtio-pci-dev.h
index 48ae018..4339d94 100644
--- a/include/kvm/virtio-pci-dev.h
+++ b/include/kvm/virtio-pci-dev.h
@@ -15,6 +15,7 @@
 #define PCI_DEVICE_ID_VIRTIO_BLN   0x1005
 #define PCI_DEVICE_ID_VIRTIO_SCSI  0x1008
 #define PCI_DEVICE_ID_VIRTIO_9P0x1009
+#define PCI_DEVICE_ID_VIRTIO_PSTORE0x100a
 #define PCI_DEVICE_ID_VESA 0x2000
 #define PCI_DEVICE_ID_PCI_SHMEM0x0001
 
@@ -34,5 +35,6 @@
 #define PCI_CLASS_RNG  0xff
 #define PCI_CLASS_BLN  0xff
 #define PCI_CLASS_9P   0xff
+#define PCI_CLASS_PSTORE   0xff
 
 #endif /* VIRTIO_PCI_DEV_H_ */
diff --git a/include/kvm/virtio-pstore.h b/include/kvm/virtio-pstore.h
new file mode 100644
index 000..293ab57
--- /dev/null
+++ b/include/kvm/virtio-pstore.h
@@ -0,0 +1,31 @@
+#ifndef KVM__PSTORE_VIRTIO_H
+#define KVM__PSTORE_VIRTIO_H
+
+struct kvm;
+
+#define VIRTIO_PSTORE_TYPE_UNKNOWN  0
+#define VIRTIO_PSTORE_TYPE_DMESG1
+
+#define VIRTIO_PSTORE_CMD_NULL   0
+#define VIRTIO_PSTORE_CMD_OPEN   1
+#define VIRTIO_PSTORE_CMD_READ   2
+#define VIRTIO_PSTORE_CMD_WRITE  3
+#define VIRTIO_PSTORE_CMD_ERASE  4
+#define VIRTIO_PSTORE_CMD_CLOSE  5
+
+#define VIRTIO_PSTORE_FL_COMPRESSED  1
+
+struct pstore_hdr {
+   u64 id;
+   u32 flags;
+   u16 cmd;
+   u16 type;
+   u64 time_sec;
+   u32 time_nsec;
+   u32 unused;
+};
+
+int virtio_pstore__init(struct kvm *kvm);
+int virtio_pstore__exit(struct kvm *kvm);
+
+#endif /* KVM__PSTORE_VIRTIO_H */
diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h
index 5f60aa4..f34cabc 100644
--- a/include/linux/virtio_ids.h
+++ b/include/linux/virtio_ids.h
@@ -40,5 +40,6 @@
 #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
 #define VIRTIO_ID_CAIF12 /* Virtio caif */
 #define VIRTIO_ID_INPUT18 /* virtio input */
+#define VIRTIO_ID_PSTORE   19 /* virtio pstore */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/virtio/pstore.c b/virtio/pstore.c
new 

Re: [PATCH v3 12/17] mm, compaction: more reliably increase direct compaction priority

2016-07-17 Thread Joonsoo Kim
On Fri, Jul 15, 2016 at 03:37:52PM +0200, Vlastimil Babka wrote:
> On 07/06/2016 07:39 AM, Joonsoo Kim wrote:
> > On Fri, Jun 24, 2016 at 11:54:32AM +0200, Vlastimil Babka wrote:
> >> During reclaim/compaction loop, compaction priority can be increased by the
> >> should_compact_retry() function, but the current code is not optimal. 
> >> Priority
> >> is only increased when compaction_failed() is true, which means that 
> >> compaction
> >> has scanned the whole zone. This may not happen even after multiple 
> >> attempts
> >> with the lower priority due to parallel activity, so we might needlessly
> >> struggle on the lower priority and possibly run out of compaction retry
> >> attempts in the process.
> >>
> >> We can remove these corner cases by increasing compaction priority 
> >> regardless
> >> of compaction_failed(). Examining further the compaction result can be
> >> postponed only after reaching the highest priority. This is a simple 
> >> solution
> >> and we don't need to worry about reaching the highest priority "too soon" 
> >> here,
> >> because hen should_compact_retry() is called it means that the system is
> >> already struggling and the allocation is supposed to either try as hard as
> >> possible, or it cannot fail at all. There's not much point staying at lower
> >> priorities with heuristics that may result in only partial compaction.
> >> Also we now count compaction retries only after reaching the highest 
> >> priority.
> > 
> > I'm not sure that this patch is safe. Deferring and skip-bit in
> > compaction is highly related to reclaim/compaction. Just ignoring them and 
> > (almost)
> > unconditionally increasing compaction priority will result in less
> > reclaim and less success rate on compaction.
> 
> I don't see why less reclaim? Reclaim is always attempted before
> compaction and compaction priority doesn't affect it. And as long as
> reclaim wants to retry, should_compact_retry() isn't even called, so the
> priority stays. I wanted to change that in v1, but Michal suggested I
> shouldn't.

I assume the situation that there is no !costly highorder freepage
because of fragmentation. In this case, should_reclaim_retry() would
return false since watermark cannot be met due to absence of high
order freepage. Now, please see should_compact_retry() with assumption
that there are enough order-0 free pages. Reclaim/compaction is only
retried two times (SYNC_LIGHT and SYNC_FULL) with your patchset since
compaction_withdrawn() return false with enough freepages and
!COMPACT_SKIPPED.

But, before your patchset, COMPACT_PARTIAL_SKIPPED and
COMPACT_DEFERRED is considered as withdrawn so will retry
reclaim/compaction more times.

As I said before, more reclaim (more freepage) increase migration
scanner's scan range and then increase compaction success probability.
Therefore, your patchset which makes reclaim/compaction retry less times
deterministically would not be safe.

> 
> > And, as a necessarily, it
> > would trigger OOM more frequently.
> 
> OOM is only allowed for costly orders. If reclaim itself doesn't want to
> retry for non-costly orders anymore, and we finally start calling
> should_compact_retry(), then I guess the system is really struggling
> already and eventual OOM wouldn't be premature?

Premature is really subjective so I don't know. Anyway, I tested
your patchset with simple test case and it causes a regression.

My test setup is:

Mem: 512 MB
vm.compact_unevictable_allowed = 0
Mlocked Mem: 225 MB by using mlock(). With some tricks, mlocked pages are
spread so memory is highly fragmented.

fork 500

This test causes OOM with your patchset but not without your patchset.

Thanks.

> > It would not be your fault. This patch is reasonable in current
> > situation. It just makes current things more deterministic
> > although I dislike that current things and this patch would amplify
> > those problem.
> > 
> > Thanks.
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH v3 12/17] mm, compaction: more reliably increase direct compaction priority

2016-07-17 Thread Joonsoo Kim
On Fri, Jul 15, 2016 at 03:37:52PM +0200, Vlastimil Babka wrote:
> On 07/06/2016 07:39 AM, Joonsoo Kim wrote:
> > On Fri, Jun 24, 2016 at 11:54:32AM +0200, Vlastimil Babka wrote:
> >> During reclaim/compaction loop, compaction priority can be increased by the
> >> should_compact_retry() function, but the current code is not optimal. 
> >> Priority
> >> is only increased when compaction_failed() is true, which means that 
> >> compaction
> >> has scanned the whole zone. This may not happen even after multiple 
> >> attempts
> >> with the lower priority due to parallel activity, so we might needlessly
> >> struggle on the lower priority and possibly run out of compaction retry
> >> attempts in the process.
> >>
> >> We can remove these corner cases by increasing compaction priority 
> >> regardless
> >> of compaction_failed(). Examining further the compaction result can be
> >> postponed only after reaching the highest priority. This is a simple 
> >> solution
> >> and we don't need to worry about reaching the highest priority "too soon" 
> >> here,
> >> because hen should_compact_retry() is called it means that the system is
> >> already struggling and the allocation is supposed to either try as hard as
> >> possible, or it cannot fail at all. There's not much point staying at lower
> >> priorities with heuristics that may result in only partial compaction.
> >> Also we now count compaction retries only after reaching the highest 
> >> priority.
> > 
> > I'm not sure that this patch is safe. Deferring and skip-bit in
> > compaction is highly related to reclaim/compaction. Just ignoring them and 
> > (almost)
> > unconditionally increasing compaction priority will result in less
> > reclaim and less success rate on compaction.
> 
> I don't see why less reclaim? Reclaim is always attempted before
> compaction and compaction priority doesn't affect it. And as long as
> reclaim wants to retry, should_compact_retry() isn't even called, so the
> priority stays. I wanted to change that in v1, but Michal suggested I
> shouldn't.

I assume the situation that there is no !costly highorder freepage
because of fragmentation. In this case, should_reclaim_retry() would
return false since watermark cannot be met due to absence of high
order freepage. Now, please see should_compact_retry() with assumption
that there are enough order-0 free pages. Reclaim/compaction is only
retried two times (SYNC_LIGHT and SYNC_FULL) with your patchset since
compaction_withdrawn() return false with enough freepages and
!COMPACT_SKIPPED.

But, before your patchset, COMPACT_PARTIAL_SKIPPED and
COMPACT_DEFERRED is considered as withdrawn so will retry
reclaim/compaction more times.

As I said before, more reclaim (more freepage) increase migration
scanner's scan range and then increase compaction success probability.
Therefore, your patchset which makes reclaim/compaction retry less times
deterministically would not be safe.

> 
> > And, as a necessarily, it
> > would trigger OOM more frequently.
> 
> OOM is only allowed for costly orders. If reclaim itself doesn't want to
> retry for non-costly orders anymore, and we finally start calling
> should_compact_retry(), then I guess the system is really struggling
> already and eventual OOM wouldn't be premature?

Premature is really subjective so I don't know. Anyway, I tested
your patchset with simple test case and it causes a regression.

My test setup is:

Mem: 512 MB
vm.compact_unevictable_allowed = 0
Mlocked Mem: 225 MB by using mlock(). With some tricks, mlocked pages are
spread so memory is highly fragmented.

fork 500

This test causes OOM with your patchset but not without your patchset.

Thanks.

> > It would not be your fault. This patch is reasonable in current
> > situation. It just makes current things more deterministic
> > although I dislike that current things and this patch would amplify
> > those problem.
> > 
> > Thanks.
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


[PATCH] Staging: ks7010: michael_mic: fixed macros coding style issue

2016-07-17 Thread Sunbing
Fixed coding style issue:
Enclose multiple statements macros definition in a do while loop.
Use one space around binary operators.

Signed-off-by: Sunbing 
---
 drivers/staging/ks7010/michael_mic.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/ks7010/michael_mic.c 
b/drivers/staging/ks7010/michael_mic.c
index e14c109..d332678 100644
--- a/drivers/staging/ks7010/michael_mic.c
+++ b/drivers/staging/ks7010/michael_mic.c
@@ -20,15 +20,21 @@
 #define getUInt32( A, B )  (uint32_t)(A[B+0] << 0) + (A[B+1] << 8) + 
(A[B+2] << 16) + (A[B+3] << 24)
 
 // Convert from UInt32 to Byte[] in a portable way
-#define putUInt32( A, B, C )   A[B+0] = (uint8_t) (C & 0xff);  \
-   A[B+1] = (uint8_t) ((C>>8) & 0xff); \
-   A[B+2] = (uint8_t) ((C>>16) & 0xff);\
-   A[B+3] = (uint8_t) ((C>>24) & 0xff)
+#define putUInt32(A, B, C) \
+do {   \
+   A[B + 0] = (uint8_t)(C & 0xff); \
+   A[B + 1] = (uint8_t)((C >> 8) & 0xff);  \
+   A[B + 2] = (uint8_t)((C >> 16) & 0xff); \
+   A[B + 3] = (uint8_t)((C >> 24) & 0xff); \
+} while (0)
 
 // Reset the state to the empty message.
-#define MichaelClear( A )  A->L = A->K0; \
-   A->R = A->K1; \
-   A->nBytesInM = 0;
+#define MichaelClear(A)\
+do {   \
+   A->L = A->K0;   \
+   A->R = A->K1;   \
+   A->nBytesInM = 0;   \
+} while (0)
 
 static
 void MichaelInitializeFunction(struct michel_mic_t *Mic, uint8_t * key)
-- 
2.1.0



[PATCH] Staging: ks7010: michael_mic: fixed macros coding style issue

2016-07-17 Thread Sunbing
Fixed coding style issue:
Enclose multiple statements macros definition in a do while loop.
Use one space around binary operators.

Signed-off-by: Sunbing 
---
 drivers/staging/ks7010/michael_mic.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/ks7010/michael_mic.c 
b/drivers/staging/ks7010/michael_mic.c
index e14c109..d332678 100644
--- a/drivers/staging/ks7010/michael_mic.c
+++ b/drivers/staging/ks7010/michael_mic.c
@@ -20,15 +20,21 @@
 #define getUInt32( A, B )  (uint32_t)(A[B+0] << 0) + (A[B+1] << 8) + 
(A[B+2] << 16) + (A[B+3] << 24)
 
 // Convert from UInt32 to Byte[] in a portable way
-#define putUInt32( A, B, C )   A[B+0] = (uint8_t) (C & 0xff);  \
-   A[B+1] = (uint8_t) ((C>>8) & 0xff); \
-   A[B+2] = (uint8_t) ((C>>16) & 0xff);\
-   A[B+3] = (uint8_t) ((C>>24) & 0xff)
+#define putUInt32(A, B, C) \
+do {   \
+   A[B + 0] = (uint8_t)(C & 0xff); \
+   A[B + 1] = (uint8_t)((C >> 8) & 0xff);  \
+   A[B + 2] = (uint8_t)((C >> 16) & 0xff); \
+   A[B + 3] = (uint8_t)((C >> 24) & 0xff); \
+} while (0)
 
 // Reset the state to the empty message.
-#define MichaelClear( A )  A->L = A->K0; \
-   A->R = A->K1; \
-   A->nBytesInM = 0;
+#define MichaelClear(A)\
+do {   \
+   A->L = A->K0;   \
+   A->R = A->K1;   \
+   A->nBytesInM = 0;   \
+} while (0)
 
 static
 void MichaelInitializeFunction(struct michel_mic_t *Mic, uint8_t * key)
-- 
2.1.0



Re: [PATCH 1/1] tracing, bpf: Implement function bpf_probe_write

2016-07-17 Thread Alexei Starovoitov
On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote:
> 
> +static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
> +{
> + void *to = (void *) (long) r1;
> + void *from = (void *) (long) r2;
> + int  size = (int) r3;
> +
> + /* check if we're in a user context */
> + if (unlikely(in_interrupt()))
> + return -EINVAL;
> + if (unlikely(!current->pid))
> + return -EINVAL;
> +
> + return copy_to_user(to, from, size);
> +}

thanks for the patch, unfortunately it's not that straightforward.
copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and
you'll see the splat since bpf programs are protected by rcu.
Also 'current' can be null and I'm not sure what current->pid does.
So the writing to user memory either has to be verified to avoid
sleeping and faults or we need to use something like task_work_add
mechanism. Ideas are certainly welcome.



Re: [PATCH 1/1] tracing, bpf: Implement function bpf_probe_write

2016-07-17 Thread Alexei Starovoitov
On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote:
> 
> +static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
> +{
> + void *to = (void *) (long) r1;
> + void *from = (void *) (long) r2;
> + int  size = (int) r3;
> +
> + /* check if we're in a user context */
> + if (unlikely(in_interrupt()))
> + return -EINVAL;
> + if (unlikely(!current->pid))
> + return -EINVAL;
> +
> + return copy_to_user(to, from, size);
> +}

thanks for the patch, unfortunately it's not that straightforward.
copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and
you'll see the splat since bpf programs are protected by rcu.
Also 'current' can be null and I'm not sure what current->pid does.
So the writing to user memory either has to be verified to avoid
sleeping and faults or we need to use something like task_work_add
mechanism. Ideas are certainly welcome.



linux-next: manual merge of the device-mapper tree with the block tree

2016-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the device-mapper tree got a conflict in:

  include/linux/blkdev.h

between commit:

  288dab8a35a0 ("block: add a separate operation type for secure erase")

from the block tree and commit:

  ff6bbdd8ef75 ("block: add QUEUE_FLAG_DAX for devices to advertise their DAX 
support")

from the device-mapper tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/blkdev.h
index 9ae49ccaac95,1493ab3a537f..
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@@ -592,8 -593,9 +593,9 @@@ static inline void queue_flag_clear(uns
  #define blk_queue_stackable(q)\
test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)
  #define blk_queue_discard(q)  test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
 -#define blk_queue_secdiscard(q)   (blk_queue_discard(q) && \
 -  test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags))
 +#define blk_queue_secure_erase(q) \
 +  (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
+ #define blk_queue_dax(q)  test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
  
  #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \


linux-next: manual merge of the device-mapper tree with the block tree

2016-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the device-mapper tree got a conflict in:

  include/linux/blkdev.h

between commit:

  288dab8a35a0 ("block: add a separate operation type for secure erase")

from the block tree and commit:

  ff6bbdd8ef75 ("block: add QUEUE_FLAG_DAX for devices to advertise their DAX 
support")

from the device-mapper tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/blkdev.h
index 9ae49ccaac95,1493ab3a537f..
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@@ -592,8 -593,9 +593,9 @@@ static inline void queue_flag_clear(uns
  #define blk_queue_stackable(q)\
test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)
  #define blk_queue_discard(q)  test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
 -#define blk_queue_secdiscard(q)   (blk_queue_discard(q) && \
 -  test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags))
 +#define blk_queue_secure_erase(q) \
 +  (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
+ #define blk_queue_dax(q)  test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
  
  #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \


[PATCH v2 04/10] binfmt_flat: clean up create_flat_tables() and stack accesses

2016-07-17 Thread Nicolas Pitre
In addition to better code clarity, this brings proper usage of
user memory accessors everywhere the stack is touched. This is essential
for making this work on MMU systems.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 117 ++-
 1 file changed, 63 insertions(+), 54 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 64feb873f0..9538901fe8 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -115,50 +115,58 @@ static int flat_core_dump(struct coredump_params *cprm)
 /*
  * create_flat_tables() parses the env- and arg-strings in new user
  * memory and creates the pointer tables from them, and puts their
- * addresses on the "stack", returning the new stack pointer value.
+ * addresses on the "stack", recording the new stack pointer value.
  */
 
-static unsigned long create_flat_tables(
-   unsigned long pp,
-   struct linux_binprm * bprm)
+static int create_flat_tables(struct linux_binprm * bprm, unsigned long 
arg_start)
 {
-   unsigned long *argv,*envp;
-   unsigned long * sp;
-   char * p = (char*)pp;
-   int argc = bprm->argc;
-   int envc = bprm->envc;
-   char uninitialized_var(dummy);
-
-   sp = (unsigned long *)p;
-   sp -= (envc + argc + 2) + 1 + (flat_argvp_envp_on_stack() ? 2 : 0);
-   sp = (unsigned long *) ((unsigned long)sp & -FLAT_STACK_ALIGN);
-   argv = sp + 1 + (flat_argvp_envp_on_stack() ? 2 : 0);
-   envp = argv + (argc + 1);
+   char __user *p;
+   unsigned long __user *sp;
+   long i, len;
 
+   p = (char __user *)arg_start;
+   sp = (unsigned long __user *)current->mm->start_stack;
+
+   sp -= bprm->envc + 1;
+   sp -= bprm->argc + 1;
+   sp -= flat_argvp_envp_on_stack() ? 2 : 0;
+   sp -= 1;  /*  */
+
+   current->mm->start_stack = (unsigned long)sp & -FLAT_STACK_ALIGN;
+   sp = (unsigned long __user *)current->mm->start_stack;
+
+   __put_user(bprm->argc, sp++);
if (flat_argvp_envp_on_stack()) {
-   put_user((unsigned long) envp, sp + 2);
-   put_user((unsigned long) argv, sp + 1);
-   }
-
-   put_user(argc, sp);
-   current->mm->arg_start = (unsigned long) p;
-   while (argc-->0) {
-   put_user((unsigned long) p, argv++);
-   do {
-   get_user(dummy, p); p++;
-   } while (dummy);
-   }
-   put_user((unsigned long) NULL, argv);
-   current->mm->arg_end = current->mm->env_start = (unsigned long) p;
-   while (envc-->0) {
-   put_user((unsigned long)p, envp); envp++;
-   do {
-   get_user(dummy, p); p++;
-   } while (dummy);
-   }
-   put_user((unsigned long) NULL, envp);
-   current->mm->env_end = (unsigned long) p;
-   return (unsigned long)sp;
+   unsigned long argv, envp;
+   argv = (unsigned long)(sp + 2);
+   envp = (unsigned long)(sp + 2 + bprm->argc + 1);
+   __put_user(argv, sp++);
+   __put_user(envp, sp++);
+   }
+
+   current->mm->arg_start = (unsigned long)p;
+   for (i = bprm->argc; i > 0; i--) {
+   __put_user((unsigned long)p, sp++);
+   len = strnlen_user(p, MAX_ARG_STRLEN);
+   if (!len || len > MAX_ARG_STRLEN)
+   return -EINVAL;
+   p += len;
+   }
+   __put_user(0, sp++);
+   current->mm->arg_end = (unsigned long)p;
+
+   current->mm->env_start = (unsigned long) p;
+   for (i = bprm->envc; i > 0; i--) {
+   __put_user((unsigned long)p, sp++);
+   len = strnlen_user(p, MAX_ARG_STRLEN);
+   if (!len || len > MAX_ARG_STRLEN)
+   return -EINVAL;
+   p += len;
+   }
+   __put_user(0, sp++);
+   current->mm->env_end = (unsigned long)p;
+
+   return 0;
 }
 
 //
@@ -854,7 +862,7 @@ static int load_flat_binary(struct linux_binprm * bprm)
 {
struct lib_info libinfo;
struct pt_regs *regs = current_pt_regs();
-   unsigned long sp, stack_len;
+   unsigned long stack_len;
unsigned long start_addr;
int res;
int i, j;
@@ -868,11 +876,10 @@ static int load_flat_binary(struct linux_binprm * bprm)
 * pedantic and include space for the argv/envp array as it may have
 * a lot of entries.
 */
-#define TOP_OF_ARGS (PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *))
-   stack_len = TOP_OF_ARGS - bprm->p; /* the strings */
-   stack_len += (bprm->argc + 1) * sizeof(char *); /* the argv array */
-   stack_len += (bprm->envc + 1) * sizeof(char *); /* the envp array */
-   stack_len += FLAT_STACK_ALIGN - 1;  /* reserve for upcoming alignment */
+   stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p;  

[PATCH v2 04/10] binfmt_flat: clean up create_flat_tables() and stack accesses

2016-07-17 Thread Nicolas Pitre
In addition to better code clarity, this brings proper usage of
user memory accessors everywhere the stack is touched. This is essential
for making this work on MMU systems.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 117 ++-
 1 file changed, 63 insertions(+), 54 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 64feb873f0..9538901fe8 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -115,50 +115,58 @@ static int flat_core_dump(struct coredump_params *cprm)
 /*
  * create_flat_tables() parses the env- and arg-strings in new user
  * memory and creates the pointer tables from them, and puts their
- * addresses on the "stack", returning the new stack pointer value.
+ * addresses on the "stack", recording the new stack pointer value.
  */
 
-static unsigned long create_flat_tables(
-   unsigned long pp,
-   struct linux_binprm * bprm)
+static int create_flat_tables(struct linux_binprm * bprm, unsigned long 
arg_start)
 {
-   unsigned long *argv,*envp;
-   unsigned long * sp;
-   char * p = (char*)pp;
-   int argc = bprm->argc;
-   int envc = bprm->envc;
-   char uninitialized_var(dummy);
-
-   sp = (unsigned long *)p;
-   sp -= (envc + argc + 2) + 1 + (flat_argvp_envp_on_stack() ? 2 : 0);
-   sp = (unsigned long *) ((unsigned long)sp & -FLAT_STACK_ALIGN);
-   argv = sp + 1 + (flat_argvp_envp_on_stack() ? 2 : 0);
-   envp = argv + (argc + 1);
+   char __user *p;
+   unsigned long __user *sp;
+   long i, len;
 
+   p = (char __user *)arg_start;
+   sp = (unsigned long __user *)current->mm->start_stack;
+
+   sp -= bprm->envc + 1;
+   sp -= bprm->argc + 1;
+   sp -= flat_argvp_envp_on_stack() ? 2 : 0;
+   sp -= 1;  /*  */
+
+   current->mm->start_stack = (unsigned long)sp & -FLAT_STACK_ALIGN;
+   sp = (unsigned long __user *)current->mm->start_stack;
+
+   __put_user(bprm->argc, sp++);
if (flat_argvp_envp_on_stack()) {
-   put_user((unsigned long) envp, sp + 2);
-   put_user((unsigned long) argv, sp + 1);
-   }
-
-   put_user(argc, sp);
-   current->mm->arg_start = (unsigned long) p;
-   while (argc-->0) {
-   put_user((unsigned long) p, argv++);
-   do {
-   get_user(dummy, p); p++;
-   } while (dummy);
-   }
-   put_user((unsigned long) NULL, argv);
-   current->mm->arg_end = current->mm->env_start = (unsigned long) p;
-   while (envc-->0) {
-   put_user((unsigned long)p, envp); envp++;
-   do {
-   get_user(dummy, p); p++;
-   } while (dummy);
-   }
-   put_user((unsigned long) NULL, envp);
-   current->mm->env_end = (unsigned long) p;
-   return (unsigned long)sp;
+   unsigned long argv, envp;
+   argv = (unsigned long)(sp + 2);
+   envp = (unsigned long)(sp + 2 + bprm->argc + 1);
+   __put_user(argv, sp++);
+   __put_user(envp, sp++);
+   }
+
+   current->mm->arg_start = (unsigned long)p;
+   for (i = bprm->argc; i > 0; i--) {
+   __put_user((unsigned long)p, sp++);
+   len = strnlen_user(p, MAX_ARG_STRLEN);
+   if (!len || len > MAX_ARG_STRLEN)
+   return -EINVAL;
+   p += len;
+   }
+   __put_user(0, sp++);
+   current->mm->arg_end = (unsigned long)p;
+
+   current->mm->env_start = (unsigned long) p;
+   for (i = bprm->envc; i > 0; i--) {
+   __put_user((unsigned long)p, sp++);
+   len = strnlen_user(p, MAX_ARG_STRLEN);
+   if (!len || len > MAX_ARG_STRLEN)
+   return -EINVAL;
+   p += len;
+   }
+   __put_user(0, sp++);
+   current->mm->env_end = (unsigned long)p;
+
+   return 0;
 }
 
 //
@@ -854,7 +862,7 @@ static int load_flat_binary(struct linux_binprm * bprm)
 {
struct lib_info libinfo;
struct pt_regs *regs = current_pt_regs();
-   unsigned long sp, stack_len;
+   unsigned long stack_len;
unsigned long start_addr;
int res;
int i, j;
@@ -868,11 +876,10 @@ static int load_flat_binary(struct linux_binprm * bprm)
 * pedantic and include space for the argv/envp array as it may have
 * a lot of entries.
 */
-#define TOP_OF_ARGS (PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *))
-   stack_len = TOP_OF_ARGS - bprm->p; /* the strings */
-   stack_len += (bprm->argc + 1) * sizeof(char *); /* the argv array */
-   stack_len += (bprm->envc + 1) * sizeof(char *); /* the envp array */
-   stack_len += FLAT_STACK_ALIGN - 1;  /* reserve for upcoming alignment */
+   stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p;  /* the strings */

[PATCH v2 08/10] binfmt_flat: update libraries' data segment pointer with userspace accessors

2016-07-17 Thread Nicolas Pitre
This is needed on systems with a MMU.  This also gets rid of the
strangest C code I've seen lateli i.e. an integer indexed with a
pointer value within square brackets. That really looked backwards.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index e981e66bb5..3221ed9d7c 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -902,12 +902,19 @@ static int load_flat_binary(struct linux_binprm * bprm)
return res;

/* Update data segment pointers for all libraries */
-   for (i=0; i

[PATCH v2 09/10] binfmt_flat: add MMU-specific support

2016-07-17 Thread Nicolas Pitre
Not much else to do at this point except for the different stack setups.

SuperH and Xtensa could be added to the allowed list if they implement
__put_user_unaligned() and __get_user_unaligned().

Signed-off-by: Nicolas Pitre 
---
 fs/Kconfig.binfmt |  3 ++-
 fs/binfmt_flat.c  | 16 +---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index 72c03354c1..4c09d93d95 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -89,7 +89,8 @@ config BINFMT_SCRIPT
 
 config BINFMT_FLAT
bool "Kernel support for flat binaries"
-   depends on !MMU && (!FRV || BROKEN)
+   depends on !MMU || ARM || M68K
+   depends on !FRV || BROKEN
help
  Support uClinux FLAT format binaries.
 
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 3221ed9d7c..4cb0c4b6ae 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -546,7 +546,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 * case,  and then the fully copied to RAM case which lumps
 * it all together.
 */
-   if ((flags & (FLAT_FLAG_RAM|FLAT_FLAG_GZIP)) == 0) {
+   if (!IS_ENABLED(CONFIG_MMU) && !(flags & 
(FLAT_FLAG_RAM|FLAT_FLAG_GZIP))) {
/*
 * this should give us a ROM ptr,  but if it doesn't we don't
 * really care
@@ -687,7 +687,9 @@ static int load_flat_file(struct linux_binprm * bprm,
 */
current->mm->start_brk = datapos + data_len + bss_len;
current->mm->brk = (current->mm->start_brk + 3) & ~3;
+#ifndef CONFIG_MMU
current->mm->context.end_brk = memp + memp_size - stack_len;
+#endif
}
 
if (flags & FLAT_FLAG_KTRACE) {
@@ -878,7 +880,7 @@ static int load_flat_binary(struct linux_binprm * bprm)
 {
struct lib_info libinfo;
struct pt_regs *regs = current_pt_regs();
-   unsigned long stack_len;
+   unsigned long stack_len = 0;
unsigned long start_addr;
int res;
int i, j;
@@ -892,7 +894,9 @@ static int load_flat_binary(struct linux_binprm * bprm)
 * pedantic and include space for the argv/envp array as it may have
 * a lot of entries.
 */
-   stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p;  /* the strings */
+#ifndef CONFIG_MMU
+   stack_len += PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */
+#endif
stack_len += (bprm->argc + 1) * sizeof(char *);   /* the argv array */
stack_len += (bprm->envc + 1) * sizeof(char *);   /* the envp array */
stack_len = ALIGN(stack_len, FLAT_STACK_ALIGN);
@@ -920,6 +924,11 @@ static int load_flat_binary(struct linux_binprm * bprm)
 
set_binfmt(_format);
 
+#ifdef CONFIG_MMU
+   res = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
+   if (!res)
+   res = create_flat_tables(bprm, bprm->p);
+#else
/* Stash our initial stack pointer into the mm structure */
current->mm->start_stack =
((current->mm->context.end_brk + stack_len + 3) & ~3) - 4;
@@ -929,6 +938,7 @@ static int load_flat_binary(struct linux_binprm * bprm)
res = transfer_args_to_stack(bprm, >mm->start_stack);
if (!res)
res = create_flat_tables(bprm, current->mm->start_stack);
+#endif
if (res)
return res;
 
-- 
2.7.4



[PATCH v2 08/10] binfmt_flat: update libraries' data segment pointer with userspace accessors

2016-07-17 Thread Nicolas Pitre
This is needed on systems with a MMU.  This also gets rid of the
strangest C code I've seen lateli i.e. an integer indexed with a
pointer value within square brackets. That really looked backwards.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index e981e66bb5..3221ed9d7c 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -902,12 +902,19 @@ static int load_flat_binary(struct linux_binprm * bprm)
return res;

/* Update data segment pointers for all libraries */
-   for (i=0; i

[PATCH v2 09/10] binfmt_flat: add MMU-specific support

2016-07-17 Thread Nicolas Pitre
Not much else to do at this point except for the different stack setups.

SuperH and Xtensa could be added to the allowed list if they implement
__put_user_unaligned() and __get_user_unaligned().

Signed-off-by: Nicolas Pitre 
---
 fs/Kconfig.binfmt |  3 ++-
 fs/binfmt_flat.c  | 16 +---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index 72c03354c1..4c09d93d95 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -89,7 +89,8 @@ config BINFMT_SCRIPT
 
 config BINFMT_FLAT
bool "Kernel support for flat binaries"
-   depends on !MMU && (!FRV || BROKEN)
+   depends on !MMU || ARM || M68K
+   depends on !FRV || BROKEN
help
  Support uClinux FLAT format binaries.
 
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 3221ed9d7c..4cb0c4b6ae 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -546,7 +546,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 * case,  and then the fully copied to RAM case which lumps
 * it all together.
 */
-   if ((flags & (FLAT_FLAG_RAM|FLAT_FLAG_GZIP)) == 0) {
+   if (!IS_ENABLED(CONFIG_MMU) && !(flags & 
(FLAT_FLAG_RAM|FLAT_FLAG_GZIP))) {
/*
 * this should give us a ROM ptr,  but if it doesn't we don't
 * really care
@@ -687,7 +687,9 @@ static int load_flat_file(struct linux_binprm * bprm,
 */
current->mm->start_brk = datapos + data_len + bss_len;
current->mm->brk = (current->mm->start_brk + 3) & ~3;
+#ifndef CONFIG_MMU
current->mm->context.end_brk = memp + memp_size - stack_len;
+#endif
}
 
if (flags & FLAT_FLAG_KTRACE) {
@@ -878,7 +880,7 @@ static int load_flat_binary(struct linux_binprm * bprm)
 {
struct lib_info libinfo;
struct pt_regs *regs = current_pt_regs();
-   unsigned long stack_len;
+   unsigned long stack_len = 0;
unsigned long start_addr;
int res;
int i, j;
@@ -892,7 +894,9 @@ static int load_flat_binary(struct linux_binprm * bprm)
 * pedantic and include space for the argv/envp array as it may have
 * a lot of entries.
 */
-   stack_len = PAGE_SIZE * MAX_ARG_PAGES - bprm->p;  /* the strings */
+#ifndef CONFIG_MMU
+   stack_len += PAGE_SIZE * MAX_ARG_PAGES - bprm->p; /* the strings */
+#endif
stack_len += (bprm->argc + 1) * sizeof(char *);   /* the argv array */
stack_len += (bprm->envc + 1) * sizeof(char *);   /* the envp array */
stack_len = ALIGN(stack_len, FLAT_STACK_ALIGN);
@@ -920,6 +924,11 @@ static int load_flat_binary(struct linux_binprm * bprm)
 
set_binfmt(_format);
 
+#ifdef CONFIG_MMU
+   res = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
+   if (!res)
+   res = create_flat_tables(bprm, bprm->p);
+#else
/* Stash our initial stack pointer into the mm structure */
current->mm->start_stack =
((current->mm->context.end_brk + stack_len + 3) & ~3) - 4;
@@ -929,6 +938,7 @@ static int load_flat_binary(struct linux_binprm * bprm)
res = transfer_args_to_stack(bprm, >mm->start_stack);
if (!res)
res = create_flat_tables(bprm, current->mm->start_stack);
+#endif
if (res)
return res;
 
-- 
2.7.4



[PATCH 1/1] netfilter: Add helper array register/unregister functions

2016-07-17 Thread fgao
From: Gao Feng 

Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister
functions to enhance the conntrack helper codes.

Signed-off-by: Gao Feng 
---
 include/net/netfilter/nf_conntrack_helper.h | 16 ++
 net/netfilter/nf_conntrack_ftp.c| 58 +++---
 net/netfilter/nf_conntrack_helper.c | 58 ++
 net/netfilter/nf_conntrack_irc.c| 36 +-
 net/netfilter/nf_conntrack_sane.c   | 55 +++--
 net/netfilter/nf_conntrack_sip.c| 75 +++--
 net/netfilter/nf_conntrack_tftp.c   | 48 ++
 7 files changed, 165 insertions(+), 181 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 6cf614bc..8c0c08f 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -58,10 +58,26 @@ struct nf_conntrack_helper 
*__nf_conntrack_helper_find(const char *name,
 struct nf_conntrack_helper *nf_conntrack_helper_try_module_get(const char 
*name,
   u16 l3num,
   u8 protonum);
+void nf_ct_helper_init(struct nf_conntrack_helper *helper,
+  u16 l3num, u16 protonum, const char *name,
+  u16 default_port, u16 spec_port,
+  const struct nf_conntrack_expect_policy *exp_pol,
+  u32 expect_class_max, u32 data_len,
+  int (*help)(struct sk_buff *skb, unsigned int protoff,
+  struct nf_conn *ct,
+  enum ip_conntrack_info ctinfo),
+  int (*from_nlattr)(struct nlattr *attr,
+ struct nf_conn *ct),
+  struct module *module);
 
 int nf_conntrack_helper_register(struct nf_conntrack_helper *);
 void nf_conntrack_helper_unregister(struct nf_conntrack_helper *);
 
+int nf_conntrack_helpers_register(struct nf_conntrack_helper *,
+   unsigned int);
+void nf_conntrack_helpers_unregister(struct nf_conntrack_helper *,
+   unsigned int);
+
 struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct,
  struct nf_conntrack_helper *helper,
  gfp_t gfp);
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 19efeba..e15640d 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -572,7 +572,7 @@ static int nf_ct_ftp_from_nlattr(struct nlattr *attr, 
struct nf_conn *ct)
return 0;
 }
 
-static struct nf_conntrack_helper ftp[MAX_PORTS][2] __read_mostly;
+static struct nf_conntrack_helper ftp[MAX_PORTS * 2] __read_mostly;
 
 static const struct nf_conntrack_expect_policy ftp_exp_policy = {
.max_expected   = 1,
@@ -582,24 +582,13 @@ static const struct nf_conntrack_expect_policy 
ftp_exp_policy = {
 /* don't make this __exit, since it's called from __init ! */
 static void nf_conntrack_ftp_fini(void)
 {
-   int i, j;
-   for (i = 0; i < ports_c; i++) {
-   for (j = 0; j < 2; j++) {
-   if (ftp[i][j].me == NULL)
-   continue;
-
-   pr_debug("unregistering helper for pf: %d port: %d\n",
-ftp[i][j].tuple.src.l3num, ports[i]);
-   nf_conntrack_helper_unregister([i][j]);
-   }
-   }
-
+   nf_conntrack_helpers_unregister(ftp, ports_c * 2);
kfree(ftp_buffer);
 }
 
 static int __init nf_conntrack_ftp_init(void)
 {
-   int i, j = -1, ret = 0;
+   int i, ret = 0;
 
ftp_buffer = kmalloc(65536, GFP_KERNEL);
if (!ftp_buffer)
@@ -611,32 +600,21 @@ static int __init nf_conntrack_ftp_init(void)
/* FIXME should be configurable whether IPv4 and IPv6 FTP connections
 are tracked or not - YK */
for (i = 0; i < ports_c; i++) {
-   ftp[i][0].tuple.src.l3num = PF_INET;
-   ftp[i][1].tuple.src.l3num = PF_INET6;
-   for (j = 0; j < 2; j++) {
-   ftp[i][j].data_len = sizeof(struct nf_ct_ftp_master);
-   ftp[i][j].tuple.src.u.tcp.port = htons(ports[i]);
-   ftp[i][j].tuple.dst.protonum = IPPROTO_TCP;
-   ftp[i][j].expect_policy = _exp_policy;
-   ftp[i][j].me = THIS_MODULE;
-   ftp[i][j].help = help;
-   ftp[i][j].from_nlattr = nf_ct_ftp_from_nlattr;
-   if (ports[i] == FTP_PORT)
-   sprintf(ftp[i][j].name, "ftp");
-   else
- 

[PATCH 1/1] netfilter: Add helper array register/unregister functions

2016-07-17 Thread fgao
From: Gao Feng 

Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister
functions to enhance the conntrack helper codes.

Signed-off-by: Gao Feng 
---
 include/net/netfilter/nf_conntrack_helper.h | 16 ++
 net/netfilter/nf_conntrack_ftp.c| 58 +++---
 net/netfilter/nf_conntrack_helper.c | 58 ++
 net/netfilter/nf_conntrack_irc.c| 36 +-
 net/netfilter/nf_conntrack_sane.c   | 55 +++--
 net/netfilter/nf_conntrack_sip.c| 75 +++--
 net/netfilter/nf_conntrack_tftp.c   | 48 ++
 7 files changed, 165 insertions(+), 181 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 6cf614bc..8c0c08f 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -58,10 +58,26 @@ struct nf_conntrack_helper 
*__nf_conntrack_helper_find(const char *name,
 struct nf_conntrack_helper *nf_conntrack_helper_try_module_get(const char 
*name,
   u16 l3num,
   u8 protonum);
+void nf_ct_helper_init(struct nf_conntrack_helper *helper,
+  u16 l3num, u16 protonum, const char *name,
+  u16 default_port, u16 spec_port,
+  const struct nf_conntrack_expect_policy *exp_pol,
+  u32 expect_class_max, u32 data_len,
+  int (*help)(struct sk_buff *skb, unsigned int protoff,
+  struct nf_conn *ct,
+  enum ip_conntrack_info ctinfo),
+  int (*from_nlattr)(struct nlattr *attr,
+ struct nf_conn *ct),
+  struct module *module);
 
 int nf_conntrack_helper_register(struct nf_conntrack_helper *);
 void nf_conntrack_helper_unregister(struct nf_conntrack_helper *);
 
+int nf_conntrack_helpers_register(struct nf_conntrack_helper *,
+   unsigned int);
+void nf_conntrack_helpers_unregister(struct nf_conntrack_helper *,
+   unsigned int);
+
 struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct,
  struct nf_conntrack_helper *helper,
  gfp_t gfp);
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 19efeba..e15640d 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -572,7 +572,7 @@ static int nf_ct_ftp_from_nlattr(struct nlattr *attr, 
struct nf_conn *ct)
return 0;
 }
 
-static struct nf_conntrack_helper ftp[MAX_PORTS][2] __read_mostly;
+static struct nf_conntrack_helper ftp[MAX_PORTS * 2] __read_mostly;
 
 static const struct nf_conntrack_expect_policy ftp_exp_policy = {
.max_expected   = 1,
@@ -582,24 +582,13 @@ static const struct nf_conntrack_expect_policy 
ftp_exp_policy = {
 /* don't make this __exit, since it's called from __init ! */
 static void nf_conntrack_ftp_fini(void)
 {
-   int i, j;
-   for (i = 0; i < ports_c; i++) {
-   for (j = 0; j < 2; j++) {
-   if (ftp[i][j].me == NULL)
-   continue;
-
-   pr_debug("unregistering helper for pf: %d port: %d\n",
-ftp[i][j].tuple.src.l3num, ports[i]);
-   nf_conntrack_helper_unregister([i][j]);
-   }
-   }
-
+   nf_conntrack_helpers_unregister(ftp, ports_c * 2);
kfree(ftp_buffer);
 }
 
 static int __init nf_conntrack_ftp_init(void)
 {
-   int i, j = -1, ret = 0;
+   int i, ret = 0;
 
ftp_buffer = kmalloc(65536, GFP_KERNEL);
if (!ftp_buffer)
@@ -611,32 +600,21 @@ static int __init nf_conntrack_ftp_init(void)
/* FIXME should be configurable whether IPv4 and IPv6 FTP connections
 are tracked or not - YK */
for (i = 0; i < ports_c; i++) {
-   ftp[i][0].tuple.src.l3num = PF_INET;
-   ftp[i][1].tuple.src.l3num = PF_INET6;
-   for (j = 0; j < 2; j++) {
-   ftp[i][j].data_len = sizeof(struct nf_ct_ftp_master);
-   ftp[i][j].tuple.src.u.tcp.port = htons(ports[i]);
-   ftp[i][j].tuple.dst.protonum = IPPROTO_TCP;
-   ftp[i][j].expect_policy = _exp_policy;
-   ftp[i][j].me = THIS_MODULE;
-   ftp[i][j].help = help;
-   ftp[i][j].from_nlattr = nf_ct_ftp_from_nlattr;
-   if (ports[i] == FTP_PORT)
-   sprintf(ftp[i][j].name, "ftp");
-   else
-   sprintf(ftp[i][j].name, 

RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's

2016-07-17 Thread Tan, Jui Nee


> -Original Message-
> From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On
> Behalf Of Paul Gortmaker
> Sent: Friday, July 15, 2016 8:01 AM
> To: Tan, Jui Nee 
> Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com;
> andriy.shevche...@linux.intel.com; t...@linutronix.de;
> mi...@redhat.com; H. Peter Anvin ; X86 ML
> ; pty...@xes-inc.com; Lee Jones ;
> Linus Walleij ; linux-g...@vger.kernel.org; LKML
> ; Yong, Jonathan
> ; Yu, Ong Hock ; Voon,
> Weifeng ; Wan Mohamad, Wan Ahmad Zainie
> 
> Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband
> bridge support driver for Intel SOC's
> 
> On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee  wrote:
> > From: Andy Shevchenko 
> >
> > There is already one and at least one more user coming which require
> > an access to Primary to Sideband bridge (P2SB) in order to get IO or
> > MMIO bar hidden by BIOS.
> > Create a driver to access P2SB for x86 devices.
> >
> > Signed-off-by: Yong, Jonathan 
> > Signed-off-by: Andy Shevchenko 
> > ---
> > Changes in V6:
> > - No change
> >
> > Changes in V5:
> > - No change
> >
> > Changes in V4:
> > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from
> >   [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge
> support driver for Intel SOC's
> >   to
> >   [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO
> pinctrl in non-ACPI system
> >   since the config is used in latter patch.
> >
> > Changes in V3:
> > - No change
> >
> > Changes in V2:
> > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select
> PINCTRL"
> >   to fix kbuildbot error
> >
> >  arch/x86/Kconfig |  4 ++
> >  arch/x86/include/asm/p2sb.h  | 27 +++
> >  arch/x86/platform/intel/Makefile |  1 +
> >  arch/x86/platform/intel/p2sb.c   | 99
> 
> >  4 files changed, 131 insertions(+)
> >  create mode 100644 arch/x86/include/asm/p2sb.h  create mode 100644
> > arch/x86/platform/intel/p2sb.c
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > d9a94da..d305d81 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG
> >
> >   If you don't require the option or are in doubt, say N.
> >
> > +config P2SB
> > +   tristate
> 
> OK, this is tristate, but then
> 
P2SB is tristate as currently it is only used by LPC_ICH that is tristate too.
...
config LPC_ICH
tristate "Intel ICH LPC"
depends on X86 && PCI
select MFD_CORE
select P2SB
...
> > +   depends on PCI
> > +
> >  config X86_RDC321X
> > bool "RDC R-321x SoC"
> > depends on X86_32
> > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h
> > new file mode 100644 index 000..686e07b
> > --- /dev/null
> > +++ b/arch/x86/include/asm/p2sb.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * Primary to Sideband bridge (P2SB) access support  */
> > +
> > +#ifndef P2SB_SYMS_H
> > +#define P2SB_SYMS_H
> > +
> > +#include 
> > +#include 
> > +
> > +#if IS_ENABLED(CONFIG_P2SB)
> > +
> > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > +   struct resource *res);
> > +
> > +#else /* CONFIG_P2SB is not set */
> > +
> > +static inline
> > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > +   struct resource *res)
> > +{
> > +   return -ENODEV;
> > +}
> > +
> > +#endif /* CONFIG_P2SB */
> > +
> > +#endif /* P2SB_SYMS_H */
> > diff --git a/arch/x86/platform/intel/Makefile
> > b/arch/x86/platform/intel/Makefile
> > index b878032..dbf9f10 100644
> > --- a/arch/x86/platform/intel/Makefile
> > +++ b/arch/x86/platform/intel/Makefile
> > @@ -1 +1,2 @@
> >  obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
> > +obj-$(CONFIG_P2SB) += p2sb.o
> > diff --git a/arch/x86/platform/intel/p2sb.c
> > b/arch/x86/platform/intel/p2sb.c new file mode 100644 index
> > 000..8be47a4
> > --- /dev/null
> > +++ b/arch/x86/platform/intel/p2sb.c
> > @@ -0,0 +1,99 @@
> > +/*
> > + * Primary to Sideband bridge (P2SB) driver
> > + *
> > + * Copyright (c) 2016, Intel Corporation.
> > + *
> > + * Authors: Andy Shevchenko 
> > + * Jonathan Yong 
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is 

RE: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband bridge support driver for Intel SOC's

2016-07-17 Thread Tan, Jui Nee


> -Original Message-
> From: paul.gortma...@gmail.com [mailto:paul.gortma...@gmail.com] On
> Behalf Of Paul Gortmaker
> Sent: Friday, July 15, 2016 8:01 AM
> To: Tan, Jui Nee 
> Cc: mika.westerb...@linux.intel.com; heikki.kroge...@linux.intel.com;
> andriy.shevche...@linux.intel.com; t...@linutronix.de;
> mi...@redhat.com; H. Peter Anvin ; X86 ML
> ; pty...@xes-inc.com; Lee Jones ;
> Linus Walleij ; linux-g...@vger.kernel.org; LKML
> ; Yong, Jonathan
> ; Yu, Ong Hock ; Voon,
> Weifeng ; Wan Mohamad, Wan Ahmad Zainie
> 
> Subject: Re: [PATCH v6 1/3] x86/platform/p2sb: New Primary to Sideband
> bridge support driver for Intel SOC's
> 
> On Thu, Jul 14, 2016 at 4:11 AM, Tan Jui Nee  wrote:
> > From: Andy Shevchenko 
> >
> > There is already one and at least one more user coming which require
> > an access to Primary to Sideband bridge (P2SB) in order to get IO or
> > MMIO bar hidden by BIOS.
> > Create a driver to access P2SB for x86 devices.
> >
> > Signed-off-by: Yong, Jonathan 
> > Signed-off-by: Andy Shevchenko 
> > ---
> > Changes in V6:
> > - No change
> >
> > Changes in V5:
> > - No change
> >
> > Changes in V4:
> > - Move Kconfig option CONFIG_X86_INTEL_NON_ACPI from
> >   [PATCH 2/3] x86/platform/p2sb: New Primary to Sideband bridge
> support driver for Intel SOC's
> >   to
> >   [PATCH 3/3] mfd: lpc_ich: Add support for Intel Apollo Lake GPIO
> pinctrl in non-ACPI system
> >   since the config is used in latter patch.
> >
> > Changes in V3:
> > - No change
> >
> > Changes in V2:
> > - Add new config option CONFIG_X86_INTEL_NON_ACPI and "select
> PINCTRL"
> >   to fix kbuildbot error
> >
> >  arch/x86/Kconfig |  4 ++
> >  arch/x86/include/asm/p2sb.h  | 27 +++
> >  arch/x86/platform/intel/Makefile |  1 +
> >  arch/x86/platform/intel/p2sb.c   | 99
> 
> >  4 files changed, 131 insertions(+)
> >  create mode 100644 arch/x86/include/asm/p2sb.h  create mode 100644
> > arch/x86/platform/intel/p2sb.c
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > d9a94da..d305d81 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -604,6 +604,10 @@ config IOSF_MBI_DEBUG
> >
> >   If you don't require the option or are in doubt, say N.
> >
> > +config P2SB
> > +   tristate
> 
> OK, this is tristate, but then
> 
P2SB is tristate as currently it is only used by LPC_ICH that is tristate too.
...
config LPC_ICH
tristate "Intel ICH LPC"
depends on X86 && PCI
select MFD_CORE
select P2SB
...
> > +   depends on PCI
> > +
> >  config X86_RDC321X
> > bool "RDC R-321x SoC"
> > depends on X86_32
> > diff --git a/arch/x86/include/asm/p2sb.h b/arch/x86/include/asm/p2sb.h
> > new file mode 100644 index 000..686e07b
> > --- /dev/null
> > +++ b/arch/x86/include/asm/p2sb.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * Primary to Sideband bridge (P2SB) access support  */
> > +
> > +#ifndef P2SB_SYMS_H
> > +#define P2SB_SYMS_H
> > +
> > +#include 
> > +#include 
> > +
> > +#if IS_ENABLED(CONFIG_P2SB)
> > +
> > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > +   struct resource *res);
> > +
> > +#else /* CONFIG_P2SB is not set */
> > +
> > +static inline
> > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > +   struct resource *res)
> > +{
> > +   return -ENODEV;
> > +}
> > +
> > +#endif /* CONFIG_P2SB */
> > +
> > +#endif /* P2SB_SYMS_H */
> > diff --git a/arch/x86/platform/intel/Makefile
> > b/arch/x86/platform/intel/Makefile
> > index b878032..dbf9f10 100644
> > --- a/arch/x86/platform/intel/Makefile
> > +++ b/arch/x86/platform/intel/Makefile
> > @@ -1 +1,2 @@
> >  obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
> > +obj-$(CONFIG_P2SB) += p2sb.o
> > diff --git a/arch/x86/platform/intel/p2sb.c
> > b/arch/x86/platform/intel/p2sb.c new file mode 100644 index
> > 000..8be47a4
> > --- /dev/null
> > +++ b/arch/x86/platform/intel/p2sb.c
> > @@ -0,0 +1,99 @@
> > +/*
> > + * Primary to Sideband bridge (P2SB) driver
> > + *
> > + * Copyright (c) 2016, Intel Corporation.
> > + *
> > + * Authors: Andy Shevchenko 
> > + * Jonathan Yong 
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but
> > +WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> > +or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > +License for
> > + * more details.
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> 
> ...and module.h is included, but yet...
> 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#define SBREG_BAR 

[PATCH v2 07/10] binfmt_flat: use clear_user() rather than memset() to clear .bss

2016-07-17 Thread Nicolas Pitre
This is needed on systems with a MMU.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index c85f8f1239..e981e66bb5 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -803,10 +803,11 @@ static int load_flat_file(struct linux_binprm * bprm,
flush_icache_range(start_code, end_code);
 
/* zero the BSS,  BRK and stack areas */
-   memset((void*)(datapos + data_len), 0, bss_len + 
+   if (clear_user((void __user *)(datapos + data_len), bss_len + 
(memp + memp_size - stack_len - /* end brk */
libinfo->lib_list[id].start_brk) +  /* start brk */
-   stack_len);
+   stack_len))
+   return -EFAULT;
 
return 0;
 err:
-- 
2.7.4



[PATCH v2 07/10] binfmt_flat: use clear_user() rather than memset() to clear .bss

2016-07-17 Thread Nicolas Pitre
This is needed on systems with a MMU.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index c85f8f1239..e981e66bb5 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -803,10 +803,11 @@ static int load_flat_file(struct linux_binprm * bprm,
flush_icache_range(start_code, end_code);
 
/* zero the BSS,  BRK and stack areas */
-   memset((void*)(datapos + data_len), 0, bss_len + 
+   if (clear_user((void __user *)(datapos + data_len), bss_len + 
(memp + memp_size - stack_len - /* end brk */
libinfo->lib_list[id].start_brk) +  /* start brk */
-   stack_len);
+   stack_len))
+   return -EFAULT;
 
return 0;
 err:
-- 
2.7.4



[PATCH v2 06/10] binfmt_flat: use proper user space accessors with old relocs code

2016-07-17 Thread Nicolas Pitre
Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index fc0ee3ed5d..c85f8f1239 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -394,38 +394,41 @@ static void old_reloc(unsigned long rl)
static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" };
 #endif
flat_v2_reloc_t r;
-   unsigned long *ptr;
+   unsigned long __user *ptr;
+   unsigned long val;

r.value = rl;
 #if defined(CONFIG_COLDFIRE)
-   ptr = (unsigned long *) (current->mm->start_code + r.reloc.offset);
+   ptr = (unsigned long __user *)(current->mm->start_code + 
r.reloc.offset);
 #else
-   ptr = (unsigned long *) (current->mm->start_data + r.reloc.offset);
+   ptr = (unsigned long __user *)(current->mm->start_data + 
r.reloc.offset);
 #endif
+   __get_user(val, ptr);
 
 #ifdef DEBUG
printk("Relocation of variable at DATASEG+%x "
"(address %p, currently %lx) into segment %s\n",
-   r.reloc.offset, ptr, *ptr, segment[r.reloc.type]);
+   r.reloc.offset, ptr, val, segment[r.reloc.type]);
 #endif

switch (r.reloc.type) {
case OLD_FLAT_RELOC_TYPE_TEXT:
-   *ptr += current->mm->start_code;
+   val += current->mm->start_code;
break;
case OLD_FLAT_RELOC_TYPE_DATA:
-   *ptr += current->mm->start_data;
+   val += current->mm->start_data;
break;
case OLD_FLAT_RELOC_TYPE_BSS:
-   *ptr += current->mm->end_data;
+   val += current->mm->end_data;
break;
default:
printk("BINFMT_FLAT: Unknown relocation type=%x\n", 
r.reloc.type);
break;
}
+   __put_user(val, ptr);
 
 #ifdef DEBUG
-   printk("Relocation became %lx\n", *ptr);
+   printk("Relocation became %lx\n", val);
 #endif
 }  
 
@@ -788,8 +791,13 @@ static int load_flat_file(struct linux_binprm * bprm,
}
}
} else {
-   for (i=0; i < relocs; i++)
-   old_reloc(ntohl(reloc[i]));
+   for (i=0; i < relocs; i++) {
+   unsigned long relval;
+   if (get_user(relval, reloc + i))
+   return -EFAULT;
+   relval = ntohl(relval);
+   old_reloc(relval);
+   }
}

flush_icache_range(start_code, end_code);
-- 
2.7.4



[PATCH v2 10/10] binfmt_flat: allow compressed flat binary format to work on MMU systems

2016-07-17 Thread Nicolas Pitre
Let's take the simple and obvious approach by decompressing the binary
into a kernel buffer and then copying it to user space.  Those who are
looking for more performance on a MMU system are unlikely to choose this
executable format anyway.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 44 ++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 4cb0c4b6ae..24deae4dcb 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -637,6 +638,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 * load it all in and treat it like a RAM load from now on
 */
if (flags & FLAT_FLAG_GZIP) {
+#ifndef CONFIG_MMU
result = decompress_exec(bprm, sizeof (struct flat_hdr),
 (((char *) textpos) + sizeof (struct 
flat_hdr)),
 (text_len + full_data
@@ -644,14 +646,52 @@ static int load_flat_file(struct linux_binprm * bprm,
 0);
memmove((void *) datapos, (void *) realdatastart,
full_data);
+#else
+   /*
+* This is used on MMU systems mainly for testing.
+* Let's use a kernel buffer to simplify things.
+*/
+   long unz_text_len = text_len - sizeof(struct flat_hdr);
+   long unz_len = unz_text_len + full_data;
+   char *unz_data = vmalloc(unz_len);
+   if (!unz_data) {
+   result = -ENOMEM;
+   } else {
+   result = decompress_exec(bprm, sizeof(struct 
flat_hdr),
+unz_data, unz_len, 0);
+   if (result == 0 &&
+   (copy_to_user((void __user *)textpos + 
sizeof(struct flat_hdr),
+ unz_data, unz_text_len) ||
+copy_to_user((void __user *)datapos,
+ unz_data + unz_text_len, 
full_data)))
+   result = -EFAULT;
+   vfree(unz_data);
+   }
+#endif
} else if (flags & FLAT_FLAG_GZDATA) {
result = read_code(bprm->file, textpos, 0, text_len);
-   if (!IS_ERR_VALUE(result))
+   if (!IS_ERR_VALUE(result)) {
+#ifndef CONFIG_MMU
result = decompress_exec(bprm, text_len, (char 
*) datapos,
 full_data, 0);
+#else
+   char *unz_data = vmalloc(full_data);
+   if (!unz_data) {
+   result = -ENOMEM;
+   } else {
+   result = decompress_exec(bprm, text_len,
+  unz_data, full_data, 0);
+   if (result == 0 &&
+   copy_to_user((void __user *)datapos,
+unz_data, full_data))
+   result = -EFAULT;
+   vfree(unz_data);
+   }
+#endif
+   }
}
else
-#endif
+#endif /* CONFIG_BINFMT_ZFLAT */
{
result = read_code(bprm->file, textpos, 0, text_len);
if (!IS_ERR_VALUE(result))
-- 
2.7.4



[PULL REQUEST] [PATCH v2 00/10] allow BFLT executables on systems with a MMU

2016-07-17 Thread Nicolas Pitre
This series provides the necessary changes to allow "flat" executable
binaries meant for no-MMU systems to actually run on systems with a MMU.

This can also be found in the following git repo:

git://git.linaro.org/people/nicolas.pitre/linux binfmt_flat_with_mmu

*Why?*

Because developing and testing natively on a large system with lots of
RAM makes it so much more convenient to use all the existing profiling
tools and debugging facilities that a kernel with lots of RAM can give.
And incidentally, those systems with lots of RAM all have a MMU.

*Why not use elf_fdpic?*

The flat executable format is simple with very small footprint
overhead, either in the executables themselves or kernel support.
This makes the flat format more suitable than elf_fdpic for very small
single-user-app embedded systems.

And while elf_fdpic binaries can run on MMU systems, flat binaries still
couldn't, which just felt wrong.

So here it is.  The no-MMU support should remain unaffected. Please consider
for pulling.

Tested on ARM only with a busybox build.

Changes since v1:

- Removed SuperH and Xtensa from the Kconfig rule as they fail to build
  due to lack of get/put_unaligned_user().

- Clarified some commit logs a bit.

diffstat:

 arch/arm/include/asm/flat.h  |   5 +-
 arch/m68k/include/asm/flat.h |   5 +-
 fs/Kconfig.binfmt|   3 +-
 fs/binfmt_elf_fdpic.c|  38 +---
 fs/binfmt_flat.c | 372 +--
 fs/exec.c|  33 
 include/linux/binfmts.h  |   2 +
 7 files changed, 268 insertions(+), 190 deletions(-)



[PATCH v2 05/10] binfmt_flat: use proper user space accessors with relocs processing code

2016-07-17 Thread Nicolas Pitre
Relocs are fixed up in place in user space memory.  The appropriate
accessors are required for this code to work with an active MMU.

The architecture specific handlers for ARM and M68K are also
covered. SuperH and Xtensa are left out as they doesn't implement
__get_user_unaligned() and __put_user_unaligned() yet. The other
architectures that use BFLT don't have any MMU.

Signed-off-by: Nicolas Pitre 
---
 arch/arm/include/asm/flat.h  |  5 +++--
 arch/m68k/include/asm/flat.h |  5 +++--
 fs/binfmt_flat.c | 31 +++
 3 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/arm/include/asm/flat.h b/arch/arm/include/asm/flat.h
index e847d23351..acf1d14b89 100644
--- a/arch/arm/include/asm/flat.h
+++ b/arch/arm/include/asm/flat.h
@@ -8,8 +8,9 @@
 #defineflat_argvp_envp_on_stack()  1
 #defineflat_old_ram_flag(flags)(flags)
 #defineflat_reloc_valid(reloc, size)   ((reloc) <= (size))
-#defineflat_get_addr_from_rp(rp, relval, flags, persistent) 
((void)persistent,get_unaligned(rp))
-#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp)
+#defineflat_get_addr_from_rp(rp, relval, flags, persistent) \
+   ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; })
+#defineflat_put_addr_at_rp(rp, val, relval)
__put_user_unaligned(val, rp)
 #defineflat_get_relocate_addr(rel) (rel)
 #defineflat_set_persistent(relval, p)  0
 
diff --git a/arch/m68k/include/asm/flat.h b/arch/m68k/include/asm/flat.h
index f9454b89a5..f3f592d03e 100644
--- a/arch/m68k/include/asm/flat.h
+++ b/arch/m68k/include/asm/flat.h
@@ -8,8 +8,9 @@
 #defineflat_argvp_envp_on_stack()  1
 #defineflat_old_ram_flag(flags)(flags)
 #defineflat_reloc_valid(reloc, size)   ((reloc) <= (size))
-#defineflat_get_addr_from_rp(rp, relval, flags, p) 
get_unaligned(rp)
-#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp)
+#defineflat_get_addr_from_rp(rp, relval, flags, p) \
+   ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; })
+#defineflat_put_addr_at_rp(rp, val, relval)
__put_user_unaligned(val, rp)
 #defineflat_get_relocate_addr(rel) (rel)
 
 static inline int flat_set_persistent(unsigned long relval,
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 9538901fe8..fc0ee3ed5d 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -438,7 +438,7 @@ static int load_flat_file(struct linux_binprm * bprm,
unsigned long textpos, datapos, realdatastart;
unsigned long text_len, data_len, bss_len, stack_len, full_data, flags;
unsigned long len, memp, memp_size, extra, rlim;
-   unsigned long *reloc, *rp;
+   unsigned long __user *reloc, *rp;
struct inode *inode;
int i, rev, relocs;
loff_t fpos;
@@ -600,7 +600,7 @@ static int load_flat_file(struct linux_binprm * bprm,
goto err;
}
 
-   reloc = (unsigned long *)
+   reloc = (unsigned long __user *)
(datapos + (ntohl(hdr->reloc_start) - text_len));
memp = realdatastart;
memp_size = len;
@@ -625,7 +625,7 @@ static int load_flat_file(struct linux_binprm * bprm,
MAX_SHARED_LIBS * sizeof(unsigned long),
FLAT_DATA_ALIGN);
 
-   reloc = (unsigned long *)
+   reloc = (unsigned long __user *)
(datapos + (ntohl(hdr->reloc_start) - text_len));
memp = textpos;
memp_size = len;
@@ -718,15 +718,20 @@ static int load_flat_file(struct linux_binprm * bprm,
 * image.
 */
if (flags & FLAT_FLAG_GOTPIC) {
-   for (rp = (unsigned long *)datapos; *rp != 0x; rp++) {
-   unsigned long addr;
-   if (*rp) {
-   addr = calc_reloc(*rp, libinfo, id, 0);
+   for (rp = (unsigned long __user *)datapos; ; rp++) {
+   unsigned long addr, rp_val;
+   if (get_user(rp_val, rp))
+   return -EFAULT;
+   if (rp_val == 0x)
+   break;
+   if (rp_val) {
+   addr = calc_reloc(rp_val, libinfo, id, 0);
if (addr == RELOC_FAILED) {
ret = -ENOEXEC;
goto err;
}
-   *rp = addr;
+   if (put_user(addr, rp))
+   return -EFAULT;
}
   

[PATCH v2 06/10] binfmt_flat: use proper user space accessors with old relocs code

2016-07-17 Thread Nicolas Pitre
Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index fc0ee3ed5d..c85f8f1239 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -394,38 +394,41 @@ static void old_reloc(unsigned long rl)
static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" };
 #endif
flat_v2_reloc_t r;
-   unsigned long *ptr;
+   unsigned long __user *ptr;
+   unsigned long val;

r.value = rl;
 #if defined(CONFIG_COLDFIRE)
-   ptr = (unsigned long *) (current->mm->start_code + r.reloc.offset);
+   ptr = (unsigned long __user *)(current->mm->start_code + 
r.reloc.offset);
 #else
-   ptr = (unsigned long *) (current->mm->start_data + r.reloc.offset);
+   ptr = (unsigned long __user *)(current->mm->start_data + 
r.reloc.offset);
 #endif
+   __get_user(val, ptr);
 
 #ifdef DEBUG
printk("Relocation of variable at DATASEG+%x "
"(address %p, currently %lx) into segment %s\n",
-   r.reloc.offset, ptr, *ptr, segment[r.reloc.type]);
+   r.reloc.offset, ptr, val, segment[r.reloc.type]);
 #endif

switch (r.reloc.type) {
case OLD_FLAT_RELOC_TYPE_TEXT:
-   *ptr += current->mm->start_code;
+   val += current->mm->start_code;
break;
case OLD_FLAT_RELOC_TYPE_DATA:
-   *ptr += current->mm->start_data;
+   val += current->mm->start_data;
break;
case OLD_FLAT_RELOC_TYPE_BSS:
-   *ptr += current->mm->end_data;
+   val += current->mm->end_data;
break;
default:
printk("BINFMT_FLAT: Unknown relocation type=%x\n", 
r.reloc.type);
break;
}
+   __put_user(val, ptr);
 
 #ifdef DEBUG
-   printk("Relocation became %lx\n", *ptr);
+   printk("Relocation became %lx\n", val);
 #endif
 }  
 
@@ -788,8 +791,13 @@ static int load_flat_file(struct linux_binprm * bprm,
}
}
} else {
-   for (i=0; i < relocs; i++)
-   old_reloc(ntohl(reloc[i]));
+   for (i=0; i < relocs; i++) {
+   unsigned long relval;
+   if (get_user(relval, reloc + i))
+   return -EFAULT;
+   relval = ntohl(relval);
+   old_reloc(relval);
+   }
}

flush_icache_range(start_code, end_code);
-- 
2.7.4



[PATCH v2 10/10] binfmt_flat: allow compressed flat binary format to work on MMU systems

2016-07-17 Thread Nicolas Pitre
Let's take the simple and obvious approach by decompressing the binary
into a kernel buffer and then copying it to user space.  Those who are
looking for more performance on a MMU system are unlikely to choose this
executable format anyway.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 44 ++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 4cb0c4b6ae..24deae4dcb 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -637,6 +638,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 * load it all in and treat it like a RAM load from now on
 */
if (flags & FLAT_FLAG_GZIP) {
+#ifndef CONFIG_MMU
result = decompress_exec(bprm, sizeof (struct flat_hdr),
 (((char *) textpos) + sizeof (struct 
flat_hdr)),
 (text_len + full_data
@@ -644,14 +646,52 @@ static int load_flat_file(struct linux_binprm * bprm,
 0);
memmove((void *) datapos, (void *) realdatastart,
full_data);
+#else
+   /*
+* This is used on MMU systems mainly for testing.
+* Let's use a kernel buffer to simplify things.
+*/
+   long unz_text_len = text_len - sizeof(struct flat_hdr);
+   long unz_len = unz_text_len + full_data;
+   char *unz_data = vmalloc(unz_len);
+   if (!unz_data) {
+   result = -ENOMEM;
+   } else {
+   result = decompress_exec(bprm, sizeof(struct 
flat_hdr),
+unz_data, unz_len, 0);
+   if (result == 0 &&
+   (copy_to_user((void __user *)textpos + 
sizeof(struct flat_hdr),
+ unz_data, unz_text_len) ||
+copy_to_user((void __user *)datapos,
+ unz_data + unz_text_len, 
full_data)))
+   result = -EFAULT;
+   vfree(unz_data);
+   }
+#endif
} else if (flags & FLAT_FLAG_GZDATA) {
result = read_code(bprm->file, textpos, 0, text_len);
-   if (!IS_ERR_VALUE(result))
+   if (!IS_ERR_VALUE(result)) {
+#ifndef CONFIG_MMU
result = decompress_exec(bprm, text_len, (char 
*) datapos,
 full_data, 0);
+#else
+   char *unz_data = vmalloc(full_data);
+   if (!unz_data) {
+   result = -ENOMEM;
+   } else {
+   result = decompress_exec(bprm, text_len,
+  unz_data, full_data, 0);
+   if (result == 0 &&
+   copy_to_user((void __user *)datapos,
+unz_data, full_data))
+   result = -EFAULT;
+   vfree(unz_data);
+   }
+#endif
+   }
}
else
-#endif
+#endif /* CONFIG_BINFMT_ZFLAT */
{
result = read_code(bprm->file, textpos, 0, text_len);
if (!IS_ERR_VALUE(result))
-- 
2.7.4



[PULL REQUEST] [PATCH v2 00/10] allow BFLT executables on systems with a MMU

2016-07-17 Thread Nicolas Pitre
This series provides the necessary changes to allow "flat" executable
binaries meant for no-MMU systems to actually run on systems with a MMU.

This can also be found in the following git repo:

git://git.linaro.org/people/nicolas.pitre/linux binfmt_flat_with_mmu

*Why?*

Because developing and testing natively on a large system with lots of
RAM makes it so much more convenient to use all the existing profiling
tools and debugging facilities that a kernel with lots of RAM can give.
And incidentally, those systems with lots of RAM all have a MMU.

*Why not use elf_fdpic?*

The flat executable format is simple with very small footprint
overhead, either in the executables themselves or kernel support.
This makes the flat format more suitable than elf_fdpic for very small
single-user-app embedded systems.

And while elf_fdpic binaries can run on MMU systems, flat binaries still
couldn't, which just felt wrong.

So here it is.  The no-MMU support should remain unaffected. Please consider
for pulling.

Tested on ARM only with a busybox build.

Changes since v1:

- Removed SuperH and Xtensa from the Kconfig rule as they fail to build
  due to lack of get/put_unaligned_user().

- Clarified some commit logs a bit.

diffstat:

 arch/arm/include/asm/flat.h  |   5 +-
 arch/m68k/include/asm/flat.h |   5 +-
 fs/Kconfig.binfmt|   3 +-
 fs/binfmt_elf_fdpic.c|  38 +---
 fs/binfmt_flat.c | 372 +--
 fs/exec.c|  33 
 include/linux/binfmts.h  |   2 +
 7 files changed, 268 insertions(+), 190 deletions(-)



[PATCH v2 05/10] binfmt_flat: use proper user space accessors with relocs processing code

2016-07-17 Thread Nicolas Pitre
Relocs are fixed up in place in user space memory.  The appropriate
accessors are required for this code to work with an active MMU.

The architecture specific handlers for ARM and M68K are also
covered. SuperH and Xtensa are left out as they doesn't implement
__get_user_unaligned() and __put_user_unaligned() yet. The other
architectures that use BFLT don't have any MMU.

Signed-off-by: Nicolas Pitre 
---
 arch/arm/include/asm/flat.h  |  5 +++--
 arch/m68k/include/asm/flat.h |  5 +++--
 fs/binfmt_flat.c | 31 +++
 3 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/arm/include/asm/flat.h b/arch/arm/include/asm/flat.h
index e847d23351..acf1d14b89 100644
--- a/arch/arm/include/asm/flat.h
+++ b/arch/arm/include/asm/flat.h
@@ -8,8 +8,9 @@
 #defineflat_argvp_envp_on_stack()  1
 #defineflat_old_ram_flag(flags)(flags)
 #defineflat_reloc_valid(reloc, size)   ((reloc) <= (size))
-#defineflat_get_addr_from_rp(rp, relval, flags, persistent) 
((void)persistent,get_unaligned(rp))
-#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp)
+#defineflat_get_addr_from_rp(rp, relval, flags, persistent) \
+   ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; })
+#defineflat_put_addr_at_rp(rp, val, relval)
__put_user_unaligned(val, rp)
 #defineflat_get_relocate_addr(rel) (rel)
 #defineflat_set_persistent(relval, p)  0
 
diff --git a/arch/m68k/include/asm/flat.h b/arch/m68k/include/asm/flat.h
index f9454b89a5..f3f592d03e 100644
--- a/arch/m68k/include/asm/flat.h
+++ b/arch/m68k/include/asm/flat.h
@@ -8,8 +8,9 @@
 #defineflat_argvp_envp_on_stack()  1
 #defineflat_old_ram_flag(flags)(flags)
 #defineflat_reloc_valid(reloc, size)   ((reloc) <= (size))
-#defineflat_get_addr_from_rp(rp, relval, flags, p) 
get_unaligned(rp)
-#defineflat_put_addr_at_rp(rp, val, relval)put_unaligned(val,rp)
+#defineflat_get_addr_from_rp(rp, relval, flags, p) \
+   ({ unsigned long __val; __get_user_unaligned(__val, rp); __val; })
+#defineflat_put_addr_at_rp(rp, val, relval)
__put_user_unaligned(val, rp)
 #defineflat_get_relocate_addr(rel) (rel)
 
 static inline int flat_set_persistent(unsigned long relval,
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 9538901fe8..fc0ee3ed5d 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -438,7 +438,7 @@ static int load_flat_file(struct linux_binprm * bprm,
unsigned long textpos, datapos, realdatastart;
unsigned long text_len, data_len, bss_len, stack_len, full_data, flags;
unsigned long len, memp, memp_size, extra, rlim;
-   unsigned long *reloc, *rp;
+   unsigned long __user *reloc, *rp;
struct inode *inode;
int i, rev, relocs;
loff_t fpos;
@@ -600,7 +600,7 @@ static int load_flat_file(struct linux_binprm * bprm,
goto err;
}
 
-   reloc = (unsigned long *)
+   reloc = (unsigned long __user *)
(datapos + (ntohl(hdr->reloc_start) - text_len));
memp = realdatastart;
memp_size = len;
@@ -625,7 +625,7 @@ static int load_flat_file(struct linux_binprm * bprm,
MAX_SHARED_LIBS * sizeof(unsigned long),
FLAT_DATA_ALIGN);
 
-   reloc = (unsigned long *)
+   reloc = (unsigned long __user *)
(datapos + (ntohl(hdr->reloc_start) - text_len));
memp = textpos;
memp_size = len;
@@ -718,15 +718,20 @@ static int load_flat_file(struct linux_binprm * bprm,
 * image.
 */
if (flags & FLAT_FLAG_GOTPIC) {
-   for (rp = (unsigned long *)datapos; *rp != 0x; rp++) {
-   unsigned long addr;
-   if (*rp) {
-   addr = calc_reloc(*rp, libinfo, id, 0);
+   for (rp = (unsigned long __user *)datapos; ; rp++) {
+   unsigned long addr, rp_val;
+   if (get_user(rp_val, rp))
+   return -EFAULT;
+   if (rp_val == 0x)
+   break;
+   if (rp_val) {
+   addr = calc_reloc(rp_val, libinfo, id, 0);
if (addr == RELOC_FAILED) {
ret = -ENOEXEC;
goto err;
}
-   *rp = addr;
+   if (put_user(addr, rp))
+   return -EFAULT;
}
}
}
@@ 

[PATCH v2 03/10] binfmt_flat: use generic transfer_args_to_stack()

2016-07-17 Thread Nicolas Pitre
This gets rid of the rather ugly, open coded and suboptimal copy code.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 085059d879..64feb873f0 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -854,10 +854,8 @@ static int load_flat_binary(struct linux_binprm * bprm)
 {
struct lib_info libinfo;
struct pt_regs *regs = current_pt_regs();
-   unsigned long p = bprm->p;
-   unsigned long stack_len;
+   unsigned long sp, stack_len;
unsigned long start_addr;
-   unsigned long *sp;
int res;
int i, j;
 
@@ -892,15 +890,15 @@ static int load_flat_binary(struct linux_binprm * bprm)
 
set_binfmt(_format);
 
-   p = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4;
-   DBG_FLT("p=%lx\n", p);
+   sp = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4;
+   DBG_FLT("sp=%lx\n", sp);
 
-   /* copy the arg pages onto the stack, this could be more efficient :-) 
*/
-   for (i = TOP_OF_ARGS - 1; i >= bprm->p; i--)
-   * (char *) --p =
-   ((char *) page_address(bprm->page[i/PAGE_SIZE]))[i % 
PAGE_SIZE];
+   /* copy the arg pages onto the stack */
+   res = transfer_args_to_stack(bprm, );
+   if (res)
+   return res;
 
-   sp = (unsigned long *) create_flat_tables(p, bprm);
+   sp = create_flat_tables(sp, bprm);

/* Fake some return addresses to ensure the call chain will
 * initialise library in order for us.  We are required to call
@@ -912,14 +910,14 @@ static int load_flat_binary(struct linux_binprm * bprm)
for (i = MAX_SHARED_LIBS-1; i>0; i--) {
if (libinfo.lib_list[i].loaded) {
/* Push previos first to call address */
-   --sp;   put_user(start_addr, sp);
+   --sp;   put_user(start_addr, (unsigned long *)sp);
start_addr = libinfo.lib_list[i].entry;
}
}
 #endif

/* Stash our initial stack pointer into the mm structure */
-   current->mm->start_stack = (unsigned long )sp;
+   current->mm->start_stack = sp;
 
 #ifdef FLAT_PLAT_INIT
FLAT_PLAT_INIT(regs);
-- 
2.7.4



[PATCH v2 01/10] binfmt_flat: assorted cleanups

2016-07-17 Thread Nicolas Pitre
Remove excessive casts, do some code grouping, etc.
No functional changes.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 118 ++-
 1 file changed, 56 insertions(+), 62 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index caf9e39bb8..085059d879 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -80,7 +80,7 @@ struct lib_info {
unsigned long text_len; /* Length of text 
segment */
unsigned long entry;/* Start address for 
this module */
unsigned long build_date;   /* When this one was 
compiled */
-   short loaded;   /* Has this library 
been loaded? */
+   bool loaded;/* Has this library 
been loaded? */
} lib_list[MAX_SHARED_LIBS];
 };
 
@@ -107,7 +107,7 @@ static struct linux_binfmt flat_format = {
 static int flat_core_dump(struct coredump_params *cprm)
 {
printk("Process %s:%d received signr %d and should have core dumped\n",
-   current->comm, current->pid, (int) 
cprm->siginfo->si_signo);
+   current->comm, current->pid, cprm->siginfo->si_signo);
return(1);
 }
 
@@ -190,7 +190,7 @@ static int decompress_exec(
loff_t fpos;
int ret, retval;
 
-   DBG_FLT("decompress_exec(offset=%x,buf=%x,len=%x)\n",(int)offset, 
(int)dst, (int)len);
+   DBG_FLT("decompress_exec(offset=%lx,buf=%p,len=%lx)\n",offset, dst, 
len);
 
memset(, 0, sizeof(strm));
strm.workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
@@ -358,8 +358,8 @@ calc_reloc(unsigned long r, struct lib_info *p, int curid, 
int internalp)
text_len = p->lib_list[id].text_len;
 
if (!flat_reloc_valid(r, start_brk - start_data + text_len)) {
-   printk("BINFMT_FLAT: reloc outside program 0x%x (0 - 
0x%x/0x%x)",
-  (int) 
r,(int)(start_brk-start_data+text_len),(int)text_len);
+   printk("BINFMT_FLAT: reloc outside program 0x%lx (0 - 
0x%lx/0x%lx)",
+  r, start_brk-start_data+text_len, text_len);
goto failed;
}
 
@@ -383,7 +383,7 @@ failed:
 static void old_reloc(unsigned long rl)
 {
 #ifdef DEBUG
-   char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" };
+   static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" };
 #endif
flat_v2_reloc_t r;
unsigned long *ptr;
@@ -397,8 +397,8 @@ static void old_reloc(unsigned long rl)
 
 #ifdef DEBUG
printk("Relocation of variable at DATASEG+%x "
-   "(address %p, currently %x) into segment %s\n",
-   r.reloc.offset, ptr, (int)*ptr, segment[r.reloc.type]);
+   "(address %p, currently %lx) into segment %s\n",
+   r.reloc.offset, ptr, *ptr, segment[r.reloc.type]);
 #endif

switch (r.reloc.type) {
@@ -417,7 +417,7 @@ static void old_reloc(unsigned long rl)
}
 
 #ifdef DEBUG
-   printk("Relocation became %x\n", (int)*ptr);
+   printk("Relocation became %lx\n", *ptr);
 #endif
 }  
 
@@ -427,17 +427,15 @@ static int load_flat_file(struct linux_binprm * bprm,
struct lib_info *libinfo, int id, unsigned long *extra_stack)
 {
struct flat_hdr * hdr;
-   unsigned long textpos = 0, datapos = 0, result;
-   unsigned long realdatastart = 0;
-   unsigned long text_len, data_len, bss_len, stack_len, flags;
-   unsigned long full_data;
-   unsigned long len, memp = 0;
-   unsigned long memp_size, extra, rlim;
-   unsigned long *reloc = 0, *rp;
+   unsigned long textpos, datapos, realdatastart;
+   unsigned long text_len, data_len, bss_len, stack_len, full_data, flags;
+   unsigned long len, memp, memp_size, extra, rlim;
+   unsigned long *reloc, *rp;
struct inode *inode;
-   int i, rev, relocs = 0;
+   int i, rev, relocs;
loff_t fpos;
unsigned long start_code, end_code;
+   ssize_t result;
int ret;
 
hdr = ((struct flat_hdr *) bprm->buf);  /* exec-header */
@@ -481,8 +479,8 @@ static int load_flat_file(struct linux_binprm * bprm,

/* Don't allow old format executables to use shared libraries */
if (rev == OLD_FLAT_VERSION && id != 0) {
-   printk("BINFMT_FLAT: shared libraries are not available before 
rev 0x%x\n",
-   (int) FLAT_VERSION);
+   printk("BINFMT_FLAT: shared libraries are not available before 
rev 0x%lx\n",
+   FLAT_VERSION);
ret = -ENOEXEC;
goto err;
}
@@ -517,11 +515,9 @@ static int load_flat_file(struct linux_binprm * bprm,
 
/* Flush all traces of the currently running executable */
if (id == 0) {
-   result = 

[PATCH v2 02/10] elf_fdpic_transfer_args_to_stack(): make it generic

2016-07-17 Thread Nicolas Pitre
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_elf_fdpic.c   | 38 ++
 fs/exec.c   | 33 +
 include/linux/binfmts.h |  2 ++
 3 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 203589311b..464a972e88 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -67,8 +67,6 @@ static int create_elf_fdpic_tables(struct linux_binprm *, 
struct mm_struct *,
   struct elf_fdpic_params *);
 
 #ifndef CONFIG_MMU
-static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *,
-   unsigned long *);
 static int elf_fdpic_map_file_constdisp_on_uclinux(struct elf_fdpic_params *,
   struct file *,
   struct mm_struct *);
@@ -515,8 +513,9 @@ static int create_elf_fdpic_tables(struct linux_binprm 
*bprm,
sp = mm->start_stack;
 
/* stack the program arguments and environment */
-   if (elf_fdpic_transfer_args_to_stack(bprm, ) < 0)
+   if (transfer_args_to_stack(bprm, ) < 0)
return -EFAULT;
+   sp &= ~15;
 #endif
 
/*
@@ -711,39 +710,6 @@ static int create_elf_fdpic_tables(struct linux_binprm 
*bprm,
 
 /*/
 /*
- * transfer the program arguments and environment from the holding pages onto
- * the stack
- */
-#ifndef CONFIG_MMU
-static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *bprm,
-   unsigned long *_sp)
-{
-   unsigned long index, stop, sp;
-   char *src;
-   int ret = 0;
-
-   stop = bprm->p >> PAGE_SHIFT;
-   sp = *_sp;
-
-   for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
-   src = kmap(bprm->page[index]);
-   sp -= PAGE_SIZE;
-   if (copy_to_user((void *) sp, src, PAGE_SIZE) != 0)
-   ret = -EFAULT;
-   kunmap(bprm->page[index]);
-   if (ret < 0)
-   goto out;
-   }
-
-   *_sp = (*_sp - (MAX_ARG_PAGES * PAGE_SIZE - bprm->p)) & ~15;
-
-out:
-   return ret;
-}
-#endif
-
-/*/
-/*
  * load the appropriate binary image (executable or interpreter) into memory
  * - we assume no MMU is available
  * - if no other PIC bits are set in params->hdr->e_flags
diff --git a/fs/exec.c b/fs/exec.c
index 887c1c955d..ef0df2f092 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -762,6 +762,39 @@ out_unlock:
 }
 EXPORT_SYMBOL(setup_arg_pages);
 
+#else
+
+/*
+ * Transfer the program arguments and environment from the holding pages
+ * onto the stack. The provided stack pointer is adjusted accordingly.
+ */
+int transfer_args_to_stack(struct linux_binprm *bprm,
+  unsigned long *sp_location)
+{
+   unsigned long index, stop, sp;
+   int ret = 0;
+
+   stop = bprm->p >> PAGE_SHIFT;
+   sp = *sp_location;
+
+   for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
+   unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0;
+   char *src = kmap(bprm->page[index]) + offset;
+   sp -= PAGE_SIZE - offset;
+   if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0)
+   ret = -EFAULT;
+   kunmap(bprm->page[index]);
+   if (ret)
+   goto out;
+   }
+
+   *sp_location = sp;
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL(transfer_args_to_stack);
+
 #endif /* CONFIG_MMU */
 
 static struct file *do_open_execat(int fd, struct filename *name, int flags)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 314b3caa70..1303b570b1 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -113,6 +113,8 @@ extern int suid_dumpable;
 extern int setup_arg_pages(struct linux_binprm * bprm,
   unsigned long stack_top,
   int executable_stack);
+extern int transfer_args_to_stack(struct linux_binprm *bprm,
+ unsigned long *sp_location);
 extern int bprm_change_interp(char *interp, struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc, const char *const *argv,
   struct linux_binprm *bprm);
-- 
2.7.4



[PATCH v2 03/10] binfmt_flat: use generic transfer_args_to_stack()

2016-07-17 Thread Nicolas Pitre
This gets rid of the rather ugly, open coded and suboptimal copy code.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 085059d879..64feb873f0 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -854,10 +854,8 @@ static int load_flat_binary(struct linux_binprm * bprm)
 {
struct lib_info libinfo;
struct pt_regs *regs = current_pt_regs();
-   unsigned long p = bprm->p;
-   unsigned long stack_len;
+   unsigned long sp, stack_len;
unsigned long start_addr;
-   unsigned long *sp;
int res;
int i, j;
 
@@ -892,15 +890,15 @@ static int load_flat_binary(struct linux_binprm * bprm)
 
set_binfmt(_format);
 
-   p = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4;
-   DBG_FLT("p=%lx\n", p);
+   sp = ((current->mm->context.end_brk + stack_len + 3) & ~3) - 4;
+   DBG_FLT("sp=%lx\n", sp);
 
-   /* copy the arg pages onto the stack, this could be more efficient :-) 
*/
-   for (i = TOP_OF_ARGS - 1; i >= bprm->p; i--)
-   * (char *) --p =
-   ((char *) page_address(bprm->page[i/PAGE_SIZE]))[i % 
PAGE_SIZE];
+   /* copy the arg pages onto the stack */
+   res = transfer_args_to_stack(bprm, );
+   if (res)
+   return res;
 
-   sp = (unsigned long *) create_flat_tables(p, bprm);
+   sp = create_flat_tables(sp, bprm);

/* Fake some return addresses to ensure the call chain will
 * initialise library in order for us.  We are required to call
@@ -912,14 +910,14 @@ static int load_flat_binary(struct linux_binprm * bprm)
for (i = MAX_SHARED_LIBS-1; i>0; i--) {
if (libinfo.lib_list[i].loaded) {
/* Push previos first to call address */
-   --sp;   put_user(start_addr, sp);
+   --sp;   put_user(start_addr, (unsigned long *)sp);
start_addr = libinfo.lib_list[i].entry;
}
}
 #endif

/* Stash our initial stack pointer into the mm structure */
-   current->mm->start_stack = (unsigned long )sp;
+   current->mm->start_stack = sp;
 
 #ifdef FLAT_PLAT_INIT
FLAT_PLAT_INIT(regs);
-- 
2.7.4



[PATCH v2 01/10] binfmt_flat: assorted cleanups

2016-07-17 Thread Nicolas Pitre
Remove excessive casts, do some code grouping, etc.
No functional changes.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_flat.c | 118 ++-
 1 file changed, 56 insertions(+), 62 deletions(-)

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index caf9e39bb8..085059d879 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -80,7 +80,7 @@ struct lib_info {
unsigned long text_len; /* Length of text 
segment */
unsigned long entry;/* Start address for 
this module */
unsigned long build_date;   /* When this one was 
compiled */
-   short loaded;   /* Has this library 
been loaded? */
+   bool loaded;/* Has this library 
been loaded? */
} lib_list[MAX_SHARED_LIBS];
 };
 
@@ -107,7 +107,7 @@ static struct linux_binfmt flat_format = {
 static int flat_core_dump(struct coredump_params *cprm)
 {
printk("Process %s:%d received signr %d and should have core dumped\n",
-   current->comm, current->pid, (int) 
cprm->siginfo->si_signo);
+   current->comm, current->pid, cprm->siginfo->si_signo);
return(1);
 }
 
@@ -190,7 +190,7 @@ static int decompress_exec(
loff_t fpos;
int ret, retval;
 
-   DBG_FLT("decompress_exec(offset=%x,buf=%x,len=%x)\n",(int)offset, 
(int)dst, (int)len);
+   DBG_FLT("decompress_exec(offset=%lx,buf=%p,len=%lx)\n",offset, dst, 
len);
 
memset(, 0, sizeof(strm));
strm.workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
@@ -358,8 +358,8 @@ calc_reloc(unsigned long r, struct lib_info *p, int curid, 
int internalp)
text_len = p->lib_list[id].text_len;
 
if (!flat_reloc_valid(r, start_brk - start_data + text_len)) {
-   printk("BINFMT_FLAT: reloc outside program 0x%x (0 - 
0x%x/0x%x)",
-  (int) 
r,(int)(start_brk-start_data+text_len),(int)text_len);
+   printk("BINFMT_FLAT: reloc outside program 0x%lx (0 - 
0x%lx/0x%lx)",
+  r, start_brk-start_data+text_len, text_len);
goto failed;
}
 
@@ -383,7 +383,7 @@ failed:
 static void old_reloc(unsigned long rl)
 {
 #ifdef DEBUG
-   char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" };
+   static const char *segment[] = { "TEXT", "DATA", "BSS", "*UNKNOWN*" };
 #endif
flat_v2_reloc_t r;
unsigned long *ptr;
@@ -397,8 +397,8 @@ static void old_reloc(unsigned long rl)
 
 #ifdef DEBUG
printk("Relocation of variable at DATASEG+%x "
-   "(address %p, currently %x) into segment %s\n",
-   r.reloc.offset, ptr, (int)*ptr, segment[r.reloc.type]);
+   "(address %p, currently %lx) into segment %s\n",
+   r.reloc.offset, ptr, *ptr, segment[r.reloc.type]);
 #endif

switch (r.reloc.type) {
@@ -417,7 +417,7 @@ static void old_reloc(unsigned long rl)
}
 
 #ifdef DEBUG
-   printk("Relocation became %x\n", (int)*ptr);
+   printk("Relocation became %lx\n", *ptr);
 #endif
 }  
 
@@ -427,17 +427,15 @@ static int load_flat_file(struct linux_binprm * bprm,
struct lib_info *libinfo, int id, unsigned long *extra_stack)
 {
struct flat_hdr * hdr;
-   unsigned long textpos = 0, datapos = 0, result;
-   unsigned long realdatastart = 0;
-   unsigned long text_len, data_len, bss_len, stack_len, flags;
-   unsigned long full_data;
-   unsigned long len, memp = 0;
-   unsigned long memp_size, extra, rlim;
-   unsigned long *reloc = 0, *rp;
+   unsigned long textpos, datapos, realdatastart;
+   unsigned long text_len, data_len, bss_len, stack_len, full_data, flags;
+   unsigned long len, memp, memp_size, extra, rlim;
+   unsigned long *reloc, *rp;
struct inode *inode;
-   int i, rev, relocs = 0;
+   int i, rev, relocs;
loff_t fpos;
unsigned long start_code, end_code;
+   ssize_t result;
int ret;
 
hdr = ((struct flat_hdr *) bprm->buf);  /* exec-header */
@@ -481,8 +479,8 @@ static int load_flat_file(struct linux_binprm * bprm,

/* Don't allow old format executables to use shared libraries */
if (rev == OLD_FLAT_VERSION && id != 0) {
-   printk("BINFMT_FLAT: shared libraries are not available before 
rev 0x%x\n",
-   (int) FLAT_VERSION);
+   printk("BINFMT_FLAT: shared libraries are not available before 
rev 0x%lx\n",
+   FLAT_VERSION);
ret = -ENOEXEC;
goto err;
}
@@ -517,11 +515,9 @@ static int load_flat_file(struct linux_binprm * bprm,
 
/* Flush all traces of the currently running executable */
if (id == 0) {
-   result = 

[PATCH v2 02/10] elf_fdpic_transfer_args_to_stack(): make it generic

2016-07-17 Thread Nicolas Pitre
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre 
---
 fs/binfmt_elf_fdpic.c   | 38 ++
 fs/exec.c   | 33 +
 include/linux/binfmts.h |  2 ++
 3 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 203589311b..464a972e88 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -67,8 +67,6 @@ static int create_elf_fdpic_tables(struct linux_binprm *, 
struct mm_struct *,
   struct elf_fdpic_params *);
 
 #ifndef CONFIG_MMU
-static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *,
-   unsigned long *);
 static int elf_fdpic_map_file_constdisp_on_uclinux(struct elf_fdpic_params *,
   struct file *,
   struct mm_struct *);
@@ -515,8 +513,9 @@ static int create_elf_fdpic_tables(struct linux_binprm 
*bprm,
sp = mm->start_stack;
 
/* stack the program arguments and environment */
-   if (elf_fdpic_transfer_args_to_stack(bprm, ) < 0)
+   if (transfer_args_to_stack(bprm, ) < 0)
return -EFAULT;
+   sp &= ~15;
 #endif
 
/*
@@ -711,39 +710,6 @@ static int create_elf_fdpic_tables(struct linux_binprm 
*bprm,
 
 /*/
 /*
- * transfer the program arguments and environment from the holding pages onto
- * the stack
- */
-#ifndef CONFIG_MMU
-static int elf_fdpic_transfer_args_to_stack(struct linux_binprm *bprm,
-   unsigned long *_sp)
-{
-   unsigned long index, stop, sp;
-   char *src;
-   int ret = 0;
-
-   stop = bprm->p >> PAGE_SHIFT;
-   sp = *_sp;
-
-   for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
-   src = kmap(bprm->page[index]);
-   sp -= PAGE_SIZE;
-   if (copy_to_user((void *) sp, src, PAGE_SIZE) != 0)
-   ret = -EFAULT;
-   kunmap(bprm->page[index]);
-   if (ret < 0)
-   goto out;
-   }
-
-   *_sp = (*_sp - (MAX_ARG_PAGES * PAGE_SIZE - bprm->p)) & ~15;
-
-out:
-   return ret;
-}
-#endif
-
-/*/
-/*
  * load the appropriate binary image (executable or interpreter) into memory
  * - we assume no MMU is available
  * - if no other PIC bits are set in params->hdr->e_flags
diff --git a/fs/exec.c b/fs/exec.c
index 887c1c955d..ef0df2f092 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -762,6 +762,39 @@ out_unlock:
 }
 EXPORT_SYMBOL(setup_arg_pages);
 
+#else
+
+/*
+ * Transfer the program arguments and environment from the holding pages
+ * onto the stack. The provided stack pointer is adjusted accordingly.
+ */
+int transfer_args_to_stack(struct linux_binprm *bprm,
+  unsigned long *sp_location)
+{
+   unsigned long index, stop, sp;
+   int ret = 0;
+
+   stop = bprm->p >> PAGE_SHIFT;
+   sp = *sp_location;
+
+   for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
+   unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0;
+   char *src = kmap(bprm->page[index]) + offset;
+   sp -= PAGE_SIZE - offset;
+   if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0)
+   ret = -EFAULT;
+   kunmap(bprm->page[index]);
+   if (ret)
+   goto out;
+   }
+
+   *sp_location = sp;
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL(transfer_args_to_stack);
+
 #endif /* CONFIG_MMU */
 
 static struct file *do_open_execat(int fd, struct filename *name, int flags)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 314b3caa70..1303b570b1 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -113,6 +113,8 @@ extern int suid_dumpable;
 extern int setup_arg_pages(struct linux_binprm * bprm,
   unsigned long stack_top,
   int executable_stack);
+extern int transfer_args_to_stack(struct linux_binprm *bprm,
+ unsigned long *sp_location);
 extern int bprm_change_interp(char *interp, struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc, const char *const *argv,
   struct linux_binprm *bprm);
-- 
2.7.4



  1   2   3   4   5   >