date:20180424

This looks interesting.  I suspect it is going to blow up in
quite a few places, so maybe at least for now it might make sense
to have a separate config option?

On Tue, Apr 24, 2018 at 05:12:19PM +0100, Robin Murphy wrote:
> Drivers/subsystems creating scatterlists for DMA should be taking care
> to respect the scatter-gather limitations of the appropriate device, as
> described by dma_parms. A DMA API implementation cannot feasibly split
> a scatterlist into *more* entries than originally passed, so it is not
> well defined what they should do when given a segment larger than the
> limit they are also required to respect.
> 
> Conversely, devices which are less limited than the rather conservative
> defaults, or indeed have no limitations at all (e.g. GPUs with their own
> internal MMU), should be encouraged to set appropriate dma_parms, as
> they may get more efficient DMA mapping performance out of it.
> 
> Signed-off-by: Robin Murphy 
> ---
>  lib/dma-debug.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 7f5cdc1e6b29..9f158941004d 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -1293,6 +1293,30 @@ static void check_sync(struct device *dev,
>   put_hash_bucket(bucket, );
>  }
>  
> +static void check_sg_segment(struct device *dev, struct scatterlist *sg)
> +{
> + unsigned int max_seg = dma_get_max_seg_size(dev);
> + dma_addr_t start, end, boundary = dma_get_seg_boundary(dev);
> +
> + /*
> +  * Either the driver forgot to set dma_parms appropriately, or
> +  * whoever generated the list forgot to check them.
> +  */
> + if (sg->length > max_seg)
> + err_printk(dev, NULL, "DMA-API: mapping sg segment longer than 
> device claims to support [len=%u] [max=%u]\n",
> +sg->length, max_seg);
> + /*
> +  * In some cases this could potentially be the DMA API
> +  * implementation's fault, but it would usually imply that
> +  * the scatterlist was built inappropriately to begin with.
> +  */
> + start = sg_dma_address(sg);
> + end = start + sg_dma_len(sg) - 1;
> + if ((start ^ end) & ~boundary)
> + err_printk(dev, NULL, "DMA-API: mapping sg segment across 
> boundary [start=0x%016llx] [end=0x%016llx] [boundary=0x%016llx]\n",
> +start, end, boundary);
> +}
> +
>  void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>   size_t size, int direction, dma_addr_t dma_addr,
>   bool map_single)
> @@ -1423,6 +1447,8 @@ void debug_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
>   }
>  
> + check_sg_segment(dev, s);
> +
>   add_dma_entry(entry);
>   }
>  }
> -- 
> 2.17.0.dirty
---end quoted text---

Re: [PATCH] dma-debug: Check scatterlist segments

This looks interesting.  I suspect it is going to blow up in
quite a few places, so maybe at least for now it might make sense
to have a separate config option?

On Tue, Apr 24, 2018 at 05:12:19PM +0100, Robin Murphy wrote:
> Drivers/subsystems creating scatterlists for DMA should be taking care
> to respect the scatter-gather limitations of the appropriate device, as
> described by dma_parms. A DMA API implementation cannot feasibly split
> a scatterlist into *more* entries than originally passed, so it is not
> well defined what they should do when given a segment larger than the
> limit they are also required to respect.
> 
> Conversely, devices which are less limited than the rather conservative
> defaults, or indeed have no limitations at all (e.g. GPUs with their own
> internal MMU), should be encouraged to set appropriate dma_parms, as
> they may get more efficient DMA mapping performance out of it.
> 
> Signed-off-by: Robin Murphy 
> ---
>  lib/dma-debug.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 7f5cdc1e6b29..9f158941004d 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -1293,6 +1293,30 @@ static void check_sync(struct device *dev,
>   put_hash_bucket(bucket, );
>  }
>  
> +static void check_sg_segment(struct device *dev, struct scatterlist *sg)
> +{
> + unsigned int max_seg = dma_get_max_seg_size(dev);
> + dma_addr_t start, end, boundary = dma_get_seg_boundary(dev);
> +
> + /*
> +  * Either the driver forgot to set dma_parms appropriately, or
> +  * whoever generated the list forgot to check them.
> +  */
> + if (sg->length > max_seg)
> + err_printk(dev, NULL, "DMA-API: mapping sg segment longer than 
> device claims to support [len=%u] [max=%u]\n",
> +sg->length, max_seg);
> + /*
> +  * In some cases this could potentially be the DMA API
> +  * implementation's fault, but it would usually imply that
> +  * the scatterlist was built inappropriately to begin with.
> +  */
> + start = sg_dma_address(sg);
> + end = start + sg_dma_len(sg) - 1;
> + if ((start ^ end) & ~boundary)
> + err_printk(dev, NULL, "DMA-API: mapping sg segment across 
> boundary [start=0x%016llx] [end=0x%016llx] [boundary=0x%016llx]\n",
> +start, end, boundary);
> +}
> +
>  void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>   size_t size, int direction, dma_addr_t dma_addr,
>   bool map_single)
> @@ -1423,6 +1447,8 @@ void debug_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
>   }
>  
> + check_sg_segment(dev, s);
> +
>   add_dma_entry(entry);
>   }
>  }
> -- 
> 2.17.0.dirty
---end quoted text---

Re: [RFT][PATCH] arm64: dts: exynos: Remove unneeded address space mapping for soc node

2018-04-24 Thread Marek Szyprowski

Hi Krzysztof,

On 2018-04-24 19:36, Krzysztof Kozlowski wrote:
> Remove the address space mapping between root and soc nodes to fix
> DTC warnings in Exynos5433 and Exynos7 like:
>
>  arch/arm64/boot/dts/exynos/exynos5433-tm2.dtb:
>  Warning (unit_address_vs_reg): /soc: node has a reg or ranges 
> property, but no unit name
>
> Signed-off-by: Krzysztof Kozlowski 

Works fine on Samsung Exynos5433-based TM2e board.

Tested-by: Marek Szyprowski 

> ---
>
> Not tested.
> ---
>   arch/arm64/boot/dts/exynos/exynos5433.dtsi | 6 +++---
>   arch/arm64/boot/dts/exynos/exynos7.dtsi| 6 +++---
>   2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi 
> b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
> index 828996a06610..ba8157ceaa56 100644
> --- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi
> +++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
> @@ -18,8 +18,8 @@
>   
>   / {
>   compatible = "samsung,exynos5433";
> - #address-cells = <2>;
> - #size-cells = <2>;
> + #address-cells = <1>;
> + #size-cells = <1>;
>   
>   interrupt-parent = <>;
>   
> @@ -235,7 +235,7 @@
>   compatible = "simple-bus";
>   #address-cells = <1>;
>   #size-cells = <1>;
> - ranges = <0x0 0x0 0x0 0x1800>;
> + ranges;
>   
>   arm_a53_pmu {
>   compatible = "arm,cortex-a53-pmu", "arm,armv8-pmuv3";
> diff --git a/arch/arm64/boot/dts/exynos/exynos7.dtsi 
> b/arch/arm64/boot/dts/exynos/exynos7.dtsi
> index 0b98d2334cad..93a84338938a 100644
> --- a/arch/arm64/boot/dts/exynos/exynos7.dtsi
> +++ b/arch/arm64/boot/dts/exynos/exynos7.dtsi
> @@ -12,8 +12,8 @@
>   / {
>   compatible = "samsung,exynos7";
>   interrupt-parent = <>;
> - #address-cells = <2>;
> - #size-cells = <2>;
> + #address-cells = <1>;
> + #size-cells = <1>;
>   
>   aliases {
>   pinctrl0 = _alive;
> @@ -70,7 +70,7 @@
>   compatible = "simple-bus";
>   #address-cells = <1>;
>   #size-cells = <1>;
> - ranges = <0 0 0 0x1800>;
> + ranges;
>   
>   chipid@1000 {
>   compatible = "samsung,exynos4210-chipid";

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

Re: [RFT][PATCH] arm64: dts: exynos: Remove unneeded address space mapping for soc node

2018-04-24 Thread Marek Szyprowski

Hi Krzysztof,

On 2018-04-24 19:36, Krzysztof Kozlowski wrote:
> Remove the address space mapping between root and soc nodes to fix
> DTC warnings in Exynos5433 and Exynos7 like:
>
>  arch/arm64/boot/dts/exynos/exynos5433-tm2.dtb:
>  Warning (unit_address_vs_reg): /soc: node has a reg or ranges 
> property, but no unit name
>
> Signed-off-by: Krzysztof Kozlowski 

Works fine on Samsung Exynos5433-based TM2e board.

Tested-by: Marek Szyprowski 

> ---
>
> Not tested.
> ---
>   arch/arm64/boot/dts/exynos/exynos5433.dtsi | 6 +++---
>   arch/arm64/boot/dts/exynos/exynos7.dtsi| 6 +++---
>   2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi 
> b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
> index 828996a06610..ba8157ceaa56 100644
> --- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi
> +++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
> @@ -18,8 +18,8 @@
>   
>   / {
>   compatible = "samsung,exynos5433";
> - #address-cells = <2>;
> - #size-cells = <2>;
> + #address-cells = <1>;
> + #size-cells = <1>;
>   
>   interrupt-parent = <>;
>   
> @@ -235,7 +235,7 @@
>   compatible = "simple-bus";
>   #address-cells = <1>;
>   #size-cells = <1>;
> - ranges = <0x0 0x0 0x0 0x1800>;
> + ranges;
>   
>   arm_a53_pmu {
>   compatible = "arm,cortex-a53-pmu", "arm,armv8-pmuv3";
> diff --git a/arch/arm64/boot/dts/exynos/exynos7.dtsi 
> b/arch/arm64/boot/dts/exynos/exynos7.dtsi
> index 0b98d2334cad..93a84338938a 100644
> --- a/arch/arm64/boot/dts/exynos/exynos7.dtsi
> +++ b/arch/arm64/boot/dts/exynos/exynos7.dtsi
> @@ -12,8 +12,8 @@
>   / {
>   compatible = "samsung,exynos7";
>   interrupt-parent = <>;
> - #address-cells = <2>;
> - #size-cells = <2>;
> + #address-cells = <1>;
> + #size-cells = <1>;
>   
>   aliases {
>   pinctrl0 = _alive;
> @@ -70,7 +70,7 @@
>   compatible = "simple-bus";
>   #address-cells = <1>;
>   #size-cells = <1>;
> - ranges = <0 0 0 0x1800>;
> + ranges;
>   
>   chipid@1000 {
>   compatible = "samsung,exynos4210-chipid";

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

[PATCH] sched: fix typo in error message

2018-04-24 Thread Li Bin


Signed-off-by: Li Bin 
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 64cc564..cf15c1c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1618,7 +1618,7 @@ static struct sched_domain *build_sched_domain(struct 
sched_domain_topology_leve
 
if (!cpumask_subset(sched_domain_span(child),
sched_domain_span(sd))) {
-   pr_err("BUG: arch topology borken\n");
+   pr_err("BUG: arch topology broken\n");
 #ifdef CONFIG_SCHED_DEBUG
pr_err(" the %s domain not a subset of the %s 
domain\n",
child->name, sd->name);
-- 
1.7.12.4

[PATCH] sched: fix typo in error message

2018-04-24 Thread Li Bin


Signed-off-by: Li Bin 
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 64cc564..cf15c1c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1618,7 +1618,7 @@ static struct sched_domain *build_sched_domain(struct 
sched_domain_topology_leve
 
if (!cpumask_subset(sched_domain_span(child),
sched_domain_span(sd))) {
-   pr_err("BUG: arch topology borken\n");
+   pr_err("BUG: arch topology broken\n");
 #ifdef CONFIG_SCHED_DEBUG
pr_err(" the %s domain not a subset of the %s 
domain\n",
child->name, sd->name);
-- 
1.7.12.4

Re: [PATCH 1/6] virtio_console: don't tie bufs to a vq

On Tue, Apr 24, 2018 at 09:56:33PM +0300, Michael S. Tsirkin wrote:
> On Sat, Apr 21, 2018 at 09:30:05AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Apr 20, 2018 at 09:18:01PM +0300, Michael S. Tsirkin wrote:
> > > an allocated buffer doesn't need to be tied to a vq -
> > > only vq->vdev is ever used. Pass the function the
> > > just what it needs - the vdev.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/char/virtio_console.c | 14 +++---
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index 468f061..3e56f32 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -422,7 +422,7 @@ static void reclaim_dma_bufs(void)
> > >   }
> > >  }
> > >  
> > > -static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t 
> > > buf_size,
> > > +static struct port_buffer *alloc_buf(struct virtio_device *vdev, size_t 
> > > buf_size,
> > >int pages)
> > >  {
> > >   struct port_buffer *buf;
> > > @@ -445,16 +445,16 @@ static struct port_buffer *alloc_buf(struct 
> > > virtqueue *vq, size_t buf_size,
> > >   return buf;
> > >   }
> > >  
> > > - if (is_rproc_serial(vq->vdev)) {
> > > + if (is_rproc_serial(vdev)) {
> > >   /*
> > >* Allocate DMA memory from ancestor. When a virtio
> > >* device is created by remoteproc, the DMA memory is
> > >* associated with the grandparent device:
> > >* vdev => rproc => platform-dev.
> > >*/
> > > - if (!vq->vdev->dev.parent || !vq->vdev->dev.parent->parent)
> > > + if (!vdev->dev.parent || !vdev->dev.parent->parent)
> > >   goto free_buf;
> > > - buf->dev = vq->vdev->dev.parent->parent;
> > > + buf->dev = vdev->dev.parent->parent;
> > >  
> > >   /* Increase device refcnt to avoid freeing it */
> > >   get_device(buf->dev);
> > > @@ -838,7 +838,7 @@ static ssize_t port_fops_write(struct file *filp, 
> > > const char __user *ubuf,
> > >  
> > >   count = min((size_t)(32 * 1024), count);
> > >  
> > > - buf = alloc_buf(port->out_vq, count, 0);
> > > + buf = alloc_buf(port->portdev->vdev, count, 0);
> > >   if (!buf)
> > >   return -ENOMEM;
> > >  
> > > @@ -957,7 +957,7 @@ static ssize_t port_fops_splice_write(struct 
> > > pipe_inode_info *pipe,
> > >   if (ret < 0)
> > >   goto error_out;
> > >  
> > > - buf = alloc_buf(port->out_vq, 0, pipe->nrbufs);
> > > + buf = alloc_buf(port->portdev->vdev, 0, pipe->nrbufs);
> > >   if (!buf) {
> > >   ret = -ENOMEM;
> > >   goto error_out;
> > > @@ -1374,7 +1374,7 @@ static unsigned int fill_queue(struct virtqueue 
> > > *vq, spinlock_t *lock)
> > >  
> > >   nr_added_bufs = 0;
> > >   do {
> > > - buf = alloc_buf(vq, PAGE_SIZE, 0);
> > > + buf = alloc_buf(vq->vdev, PAGE_SIZE, 0);
> > >   if (!buf)
> > >   break;
> > >  
> > > -- 
> > > MST
> > 
> > 
> > 
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> > 
> > 
> 
> 
> Thanks!
> I have some questions about this one:
> 
> Cc:  # 3.3.x: a1f84a3: sched: Check for idle
> Cc:  # 3.3.x: 1b9508f: sched: Rate-limit newidle
> Cc:  # 3.3.x: fd21073: sched: Fix affinity logic
> Cc:  # 3.3.x
> Signed-off-by: Ingo Molnar 
> 
> 1. what does the kernel version mean? can I omit it?

Did you read the document?, it explains that the version can be used to
say "this kernel version and newer"

> 2. so when I rebase to add the tag, this changes commit IDs for
>following tags in the same tree, breaking their tags
>in the process. Pretty annoying. Any idea how to do it better?

You only put tags there if you want me to pick up pre-requisite patches
that are already in Linus's tree.  If you have a patch series that all
needs to go into stable, just add the "cc: stable@" to the tags on all
of them and I'll pick them up in the correct order then.

hope this helps,

greg k-h

Re: [PATCH 1/6] virtio_console: don't tie bufs to a vq

On Tue, Apr 24, 2018 at 09:56:33PM +0300, Michael S. Tsirkin wrote:
> On Sat, Apr 21, 2018 at 09:30:05AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Apr 20, 2018 at 09:18:01PM +0300, Michael S. Tsirkin wrote:
> > > an allocated buffer doesn't need to be tied to a vq -
> > > only vq->vdev is ever used. Pass the function the
> > > just what it needs - the vdev.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/char/virtio_console.c | 14 +++---
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index 468f061..3e56f32 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -422,7 +422,7 @@ static void reclaim_dma_bufs(void)
> > >   }
> > >  }
> > >  
> > > -static struct port_buffer *alloc_buf(struct virtqueue *vq, size_t 
> > > buf_size,
> > > +static struct port_buffer *alloc_buf(struct virtio_device *vdev, size_t 
> > > buf_size,
> > >int pages)
> > >  {
> > >   struct port_buffer *buf;
> > > @@ -445,16 +445,16 @@ static struct port_buffer *alloc_buf(struct 
> > > virtqueue *vq, size_t buf_size,
> > >   return buf;
> > >   }
> > >  
> > > - if (is_rproc_serial(vq->vdev)) {
> > > + if (is_rproc_serial(vdev)) {
> > >   /*
> > >* Allocate DMA memory from ancestor. When a virtio
> > >* device is created by remoteproc, the DMA memory is
> > >* associated with the grandparent device:
> > >* vdev => rproc => platform-dev.
> > >*/
> > > - if (!vq->vdev->dev.parent || !vq->vdev->dev.parent->parent)
> > > + if (!vdev->dev.parent || !vdev->dev.parent->parent)
> > >   goto free_buf;
> > > - buf->dev = vq->vdev->dev.parent->parent;
> > > + buf->dev = vdev->dev.parent->parent;
> > >  
> > >   /* Increase device refcnt to avoid freeing it */
> > >   get_device(buf->dev);
> > > @@ -838,7 +838,7 @@ static ssize_t port_fops_write(struct file *filp, 
> > > const char __user *ubuf,
> > >  
> > >   count = min((size_t)(32 * 1024), count);
> > >  
> > > - buf = alloc_buf(port->out_vq, count, 0);
> > > + buf = alloc_buf(port->portdev->vdev, count, 0);
> > >   if (!buf)
> > >   return -ENOMEM;
> > >  
> > > @@ -957,7 +957,7 @@ static ssize_t port_fops_splice_write(struct 
> > > pipe_inode_info *pipe,
> > >   if (ret < 0)
> > >   goto error_out;
> > >  
> > > - buf = alloc_buf(port->out_vq, 0, pipe->nrbufs);
> > > + buf = alloc_buf(port->portdev->vdev, 0, pipe->nrbufs);
> > >   if (!buf) {
> > >   ret = -ENOMEM;
> > >   goto error_out;
> > > @@ -1374,7 +1374,7 @@ static unsigned int fill_queue(struct virtqueue 
> > > *vq, spinlock_t *lock)
> > >  
> > >   nr_added_bufs = 0;
> > >   do {
> > > - buf = alloc_buf(vq, PAGE_SIZE, 0);
> > > + buf = alloc_buf(vq->vdev, PAGE_SIZE, 0);
> > >   if (!buf)
> > >   break;
> > >  
> > > -- 
> > > MST
> > 
> > 
> > 
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> > 
> > 
> 
> 
> Thanks!
> I have some questions about this one:
> 
> Cc:  # 3.3.x: a1f84a3: sched: Check for idle
> Cc:  # 3.3.x: 1b9508f: sched: Rate-limit newidle
> Cc:  # 3.3.x: fd21073: sched: Fix affinity logic
> Cc:  # 3.3.x
> Signed-off-by: Ingo Molnar 
> 
> 1. what does the kernel version mean? can I omit it?

Did you read the document?, it explains that the version can be used to
say "this kernel version and newer"

> 2. so when I rebase to add the tag, this changes commit IDs for
>following tags in the same tree, breaking their tags
>in the process. Pretty annoying. Any idea how to do it better?

You only put tags there if you want me to pick up pre-requisite patches
that are already in Linus's tree.  If you have a patch series that all
needs to go into stable, just add the "cc: stable@" to the tags on all
of them and I'll pick them up in the correct order then.

hope this helps,

greg k-h

[PATCH v4 2/2] tty/nozomi: fix inconsistent indentation

Correct misaligned indentation and remove extraneous spaces.

Signed-off-by: Joey Pabalinas 
---
 drivers/tty/nozomi.c | 74 ++--
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index f26bf1d1e9ee0e74eb..0fcb4db721d2a42f08 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -100,45 +100,45 @@ do {  
\
 /* Size of tmp send buffer to card */
 #define SEND_BUF_MAX   1024
 #define RECEIVE_BUF_MAX4
 
 
-#define R_IIR  0x  /* Interrupt Identity Register */
-#define R_FCR  0x  /* Flow Control Register */
-#define R_IER  0x0004  /* Interrupt Enable Register */
+#define R_IIR  0x  /* Interrupt Identity Register */
+#define R_FCR  0x  /* Flow Control Register */
+#define R_IER  0x0004  /* Interrupt Enable Register */
 
 #define NOZOMI_CONFIG_MAGIC0xEFEFFEFE
 #define TOGGLE_VALID   0x
 
 /* Definition of interrupt tokens */
-#define MDM_DL10x0001
-#define MDM_UL10x0002
-#define MDM_DL20x0004
-#define MDM_UL20x0008
-#define DIAG_DL1   0x0010
-#define DIAG_DL2   0x0020
-#define DIAG_UL0x0040
-#define APP1_DL0x0080
-#define APP1_UL0x0100
-#define APP2_DL0x0200
-#define APP2_UL0x0400
-#define CTRL_DL0x0800
-#define CTRL_UL0x1000
-#define RESET  0x8000
+#define MDM_DL10x0001
+#define MDM_UL10x0002
+#define MDM_DL20x0004
+#define MDM_UL20x0008
+#define DIAG_DL1   0x0010
+#define DIAG_DL2   0x0020
+#define DIAG_UL0x0040
+#define APP1_DL0x0080
+#define APP1_UL0x0100
+#define APP2_DL0x0200
+#define APP2_UL0x0400
+#define CTRL_DL0x0800
+#define CTRL_UL0x1000
+#define RESET  0x8000
 
-#define MDM_DL (MDM_DL1  | MDM_DL2)
-#define MDM_UL (MDM_UL1  | MDM_UL2)
-#define DIAG_DL(DIAG_DL1 | DIAG_DL2)
+#define MDM_DL (MDM_DL1  | MDM_DL2)
+#define MDM_UL (MDM_UL1  | MDM_UL2)
+#define DIAG_DL(DIAG_DL1 | DIAG_DL2)
 
 /* modem signal definition */
-#define CTRL_DSR   0x0001
-#define CTRL_DCD   0x0002
-#define CTRL_RI0x0004
-#define CTRL_CTS   0x0008
+#define CTRL_DSR   0x0001
+#define CTRL_DCD   0x0002
+#define CTRL_RI0x0004
+#define CTRL_CTS   0x0008
 
-#define CTRL_DTR   0x0001
-#define CTRL_RTS   0x0002
+#define CTRL_DTR   0x0001
+#define CTRL_RTS   0x0002
 
 #define MAX_PORT   4
 #define NOZOMI_MAX_PORTS   5
 #define NOZOMI_MAX_CARDS   (NTTY_TTY_MAXMINORS / MAX_PORT)
 
@@ -363,11 +363,11 @@ struct nozomi {
 struct buffer {
u32 size;   /* size is the length of the data buffer */
u8 *data;
 } __attribute__ ((packed));
 
-/*Global variables */
+/* Global variables */
 static const struct pci_device_id nozomi_pci_tbl[] = {
{PCI_DEVICE(0x1931, 0x000c)},   /* Nozomi HSDPA */
{},
 };
 
@@ -1684,16 +1684,16 @@ static int ntty_tiocmget(struct tty_struct *tty)
const struct ctrl_dl *ctrl_dl = >ctrl_dl;
const struct ctrl_ul *ctrl_ul = >ctrl_ul;
 
/* Note: these could change under us but it is not clear this
   matters if so */
-   return  (ctrl_ul->RTS ? TIOCM_RTS : 0) |
-   (ctrl_ul->DTR ? TIOCM_DTR : 0) |
-   (ctrl_dl->DCD ? TIOCM_CAR : 0) |
-   (ctrl_dl->RI  ? TIOCM_RNG : 0) |
-   (ctrl_dl->DSR ? TIOCM_DSR : 0) |
-   (ctrl_dl->CTS ? TIOCM_CTS : 0);
+   return (ctrl_ul->RTS ? TIOCM_RTS : 0)
+   | (ctrl_ul->DTR ? TIOCM_DTR : 0)
+   | (ctrl_dl->DCD ? TIOCM_CAR : 0)
+   | (ctrl_dl->RI  ? TIOCM_RNG : 0)
+   | (ctrl_dl->DSR ? TIOCM_DSR : 0)
+   | (ctrl_dl->CTS ? TIOCM_CTS : 0);
 }
 
 /* Sets io controls parameters */
 static int ntty_tiocmset(struct tty_struct *tty,
unsigned int set, unsigned int clear)
@@ -1720,14 +1720,14 @@ static int ntty_cflags_changed(struct port *port, 
unsigned long flags,
struct async_icount *cprev)
 {
const struct async_icount cnow = port->tty_icount;
int ret;
 
-   ret =   ((flags & TIOCM_RNG) && (cnow.rng != cprev->rng)) ||
-   ((flags & TIOCM_DSR) && (cnow.dsr != cprev->dsr)) ||
-   ((flags &

[PATCH v4 2/2] tty/nozomi: fix inconsistent indentation

Correct misaligned indentation and remove extraneous spaces.

Signed-off-by: Joey Pabalinas 
---
 drivers/tty/nozomi.c | 74 ++--
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index f26bf1d1e9ee0e74eb..0fcb4db721d2a42f08 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -100,45 +100,45 @@ do {  
\
 /* Size of tmp send buffer to card */
 #define SEND_BUF_MAX   1024
 #define RECEIVE_BUF_MAX4
 
 
-#define R_IIR  0x  /* Interrupt Identity Register */
-#define R_FCR  0x  /* Flow Control Register */
-#define R_IER  0x0004  /* Interrupt Enable Register */
+#define R_IIR  0x  /* Interrupt Identity Register */
+#define R_FCR  0x  /* Flow Control Register */
+#define R_IER  0x0004  /* Interrupt Enable Register */
 
 #define NOZOMI_CONFIG_MAGIC0xEFEFFEFE
 #define TOGGLE_VALID   0x
 
 /* Definition of interrupt tokens */
-#define MDM_DL10x0001
-#define MDM_UL10x0002
-#define MDM_DL20x0004
-#define MDM_UL20x0008
-#define DIAG_DL1   0x0010
-#define DIAG_DL2   0x0020
-#define DIAG_UL0x0040
-#define APP1_DL0x0080
-#define APP1_UL0x0100
-#define APP2_DL0x0200
-#define APP2_UL0x0400
-#define CTRL_DL0x0800
-#define CTRL_UL0x1000
-#define RESET  0x8000
+#define MDM_DL10x0001
+#define MDM_UL10x0002
+#define MDM_DL20x0004
+#define MDM_UL20x0008
+#define DIAG_DL1   0x0010
+#define DIAG_DL2   0x0020
+#define DIAG_UL0x0040
+#define APP1_DL0x0080
+#define APP1_UL0x0100
+#define APP2_DL0x0200
+#define APP2_UL0x0400
+#define CTRL_DL0x0800
+#define CTRL_UL0x1000
+#define RESET  0x8000
 
-#define MDM_DL (MDM_DL1  | MDM_DL2)
-#define MDM_UL (MDM_UL1  | MDM_UL2)
-#define DIAG_DL(DIAG_DL1 | DIAG_DL2)
+#define MDM_DL (MDM_DL1  | MDM_DL2)
+#define MDM_UL (MDM_UL1  | MDM_UL2)
+#define DIAG_DL(DIAG_DL1 | DIAG_DL2)
 
 /* modem signal definition */
-#define CTRL_DSR   0x0001
-#define CTRL_DCD   0x0002
-#define CTRL_RI0x0004
-#define CTRL_CTS   0x0008
+#define CTRL_DSR   0x0001
+#define CTRL_DCD   0x0002
+#define CTRL_RI0x0004
+#define CTRL_CTS   0x0008
 
-#define CTRL_DTR   0x0001
-#define CTRL_RTS   0x0002
+#define CTRL_DTR   0x0001
+#define CTRL_RTS   0x0002
 
 #define MAX_PORT   4
 #define NOZOMI_MAX_PORTS   5
 #define NOZOMI_MAX_CARDS   (NTTY_TTY_MAXMINORS / MAX_PORT)
 
@@ -363,11 +363,11 @@ struct nozomi {
 struct buffer {
u32 size;   /* size is the length of the data buffer */
u8 *data;
 } __attribute__ ((packed));
 
-/*Global variables */
+/* Global variables */
 static const struct pci_device_id nozomi_pci_tbl[] = {
{PCI_DEVICE(0x1931, 0x000c)},   /* Nozomi HSDPA */
{},
 };
 
@@ -1684,16 +1684,16 @@ static int ntty_tiocmget(struct tty_struct *tty)
const struct ctrl_dl *ctrl_dl = >ctrl_dl;
const struct ctrl_ul *ctrl_ul = >ctrl_ul;
 
/* Note: these could change under us but it is not clear this
   matters if so */
-   return  (ctrl_ul->RTS ? TIOCM_RTS : 0) |
-   (ctrl_ul->DTR ? TIOCM_DTR : 0) |
-   (ctrl_dl->DCD ? TIOCM_CAR : 0) |
-   (ctrl_dl->RI  ? TIOCM_RNG : 0) |
-   (ctrl_dl->DSR ? TIOCM_DSR : 0) |
-   (ctrl_dl->CTS ? TIOCM_CTS : 0);
+   return (ctrl_ul->RTS ? TIOCM_RTS : 0)
+   | (ctrl_ul->DTR ? TIOCM_DTR : 0)
+   | (ctrl_dl->DCD ? TIOCM_CAR : 0)
+   | (ctrl_dl->RI  ? TIOCM_RNG : 0)
+   | (ctrl_dl->DSR ? TIOCM_DSR : 0)
+   | (ctrl_dl->CTS ? TIOCM_CTS : 0);
 }
 
 /* Sets io controls parameters */
 static int ntty_tiocmset(struct tty_struct *tty,
unsigned int set, unsigned int clear)
@@ -1720,14 +1720,14 @@ static int ntty_cflags_changed(struct port *port, 
unsigned long flags,
struct async_icount *cprev)
 {
const struct async_icount cnow = port->tty_icount;
int ret;
 
-   ret =   ((flags & TIOCM_RNG) && (cnow.rng != cprev->rng)) ||
-   ((flags & TIOCM_DSR) && (cnow.dsr != cprev->dsr)) ||
-   ((flags & TIOCM_CD)  && (cnow.dcd !=

Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag

On Tue, Apr 24, 2018 at 09:32:20PM +0200, Daniel Vetter wrote:
> Out of curiosity, how much virtual flushing stuff is there still out
> there? At least in drm we've pretty much ignore this, and seem to be
> getting away without a huge uproar (at least from driver developers
> and users, core folks are less amused about that).

As I've just been wading through the code, the following architectures
have non-coherent dma that flushes by virtual address for at least some
platforms:

 - arm [1], arm64, hexagon, nds32, nios2, parisc, sh, xtensa, mips,
   powerpc

These have non-coherent dma ops that flush by physical address:

 - arc, arm [1], c6x, m68k, microblaze, openrisc, sparc

And these do not have non-coherent dma ops at all:

 - alpha, h8300, riscv, unicore32, x86

[1] arm ѕeems to do both virtually and physically based ops, further
audit needed.

Note that using virtual addresses in the cache flushing interface
doesn't mean that the cache actually is virtually indexed, but it at
least allows for the possibility.

> > I think the most important thing about such a buffer object is that
> > it can distinguish the underlying mapping types.  While
> > dma_alloc_coherent, dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT,
> > dma_map_page/dma_map_single/dma_map_sg and dma_map_resource all give
> > back a dma_addr_t they are in now way interchangable.  And trying to
> > stuff them all into a structure like struct scatterlist that has
> > no indication what kind of mapping you are dealing with is just
> > asking for trouble.
> 
> Well the idea was to have 1 interface to allow all drivers to share
> buffers with anything else, no matter how exactly they're allocated.

Isn't that interface supposed to be dmabuf?  Currently dma_map leaks
a scatterlist through the sg_table in dma_buf_map_attachment /
->map_dma_buf, but looking at a few of the callers it seems like they
really do not even want a scatterlist to start with, but check that
is contains a physically contiguous range first.  So kicking the
scatterlist our there will probably improve the interface in general.

> dma-buf has all the functions for flushing, so you can have coherent
> mappings, non-coherent mappings and pretty much anything else. Or well
> could, because in practice people hack up layering violations until it
> works for the 2-3 drivers they care about. On top of that there's the
> small issue that x86 insists that dma is coherent (and that's true for
> most devices, including v4l drivers you might want to share stuff
> with), and gpus really, really really do want to make almost
> everything incoherent.

How do discrete GPUs manage to be incoherent when attached over PCIe?

Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag

On Tue, Apr 24, 2018 at 09:32:20PM +0200, Daniel Vetter wrote:
> Out of curiosity, how much virtual flushing stuff is there still out
> there? At least in drm we've pretty much ignore this, and seem to be
> getting away without a huge uproar (at least from driver developers
> and users, core folks are less amused about that).

As I've just been wading through the code, the following architectures
have non-coherent dma that flushes by virtual address for at least some
platforms:

 - arm [1], arm64, hexagon, nds32, nios2, parisc, sh, xtensa, mips,
   powerpc

These have non-coherent dma ops that flush by physical address:

 - arc, arm [1], c6x, m68k, microblaze, openrisc, sparc

And these do not have non-coherent dma ops at all:

 - alpha, h8300, riscv, unicore32, x86

[1] arm ѕeems to do both virtually and physically based ops, further
audit needed.

Note that using virtual addresses in the cache flushing interface
doesn't mean that the cache actually is virtually indexed, but it at
least allows for the possibility.

> > I think the most important thing about such a buffer object is that
> > it can distinguish the underlying mapping types.  While
> > dma_alloc_coherent, dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT,
> > dma_map_page/dma_map_single/dma_map_sg and dma_map_resource all give
> > back a dma_addr_t they are in now way interchangable.  And trying to
> > stuff them all into a structure like struct scatterlist that has
> > no indication what kind of mapping you are dealing with is just
> > asking for trouble.
> 
> Well the idea was to have 1 interface to allow all drivers to share
> buffers with anything else, no matter how exactly they're allocated.

Isn't that interface supposed to be dmabuf?  Currently dma_map leaks
a scatterlist through the sg_table in dma_buf_map_attachment /
->map_dma_buf, but looking at a few of the callers it seems like they
really do not even want a scatterlist to start with, but check that
is contains a physically contiguous range first.  So kicking the
scatterlist our there will probably improve the interface in general.

> dma-buf has all the functions for flushing, so you can have coherent
> mappings, non-coherent mappings and pretty much anything else. Or well
> could, because in practice people hack up layering violations until it
> works for the 2-3 drivers they care about. On top of that there's the
> small issue that x86 insists that dma is coherent (and that's true for
> most devices, including v4l drivers you might want to share stuff
> with), and gpus really, really really do want to make almost
> everything incoherent.

How do discrete GPUs manage to be incoherent when attached over PCIe?

Re: [PATCH] drm/mediatek: Use ERR_CAST instead of ERR_PTR(PTR_ERR())

2018-04-24 Thread CK Hu

Hi, Vasyl:

Sorry for the late reply.
I've applied this to my branch mediatek-drm-next-4.18

Regards,
CK

On Thu, 2017-11-23 at 17:31 +0800, Philipp Zabel wrote:
> On Tue, 2017-11-21 at 23:31 +0100, Vasyl Gomonovych wrote:
> > Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)).
> > 
> > drivers/gpu/drm/mediatek/mtk_drm_gem.c:223:9-16: WARNING: ERR_CAST can be 
> > used with mtk_gem
> > Generated by: scripts/coccinelle/api/err_cast.cocci
> > 
> > Signed-off-by: Vasyl Gomonovych 
> > ---
> >  drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c 
> > b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> > index f595ac816b55..5766b42fc174 100644
> > --- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> > +++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> > @@ -220,7 +220,7 @@ struct drm_gem_object 
> > *mtk_gem_prime_import_sg_table(struct drm_device *dev,
> > mtk_gem = mtk_drm_gem_init(dev, attach->dmabuf->size);
> >  
> > if (IS_ERR(mtk_gem))
> > -   return ERR_PTR(PTR_ERR(mtk_gem));
> > +   return ERR_CAST(mtk_gem));
> >  
> > expected = sg_dma_address(sg->sgl);
> > for_each_sg(sg->sgl, s, sg->nents, i) {
> 
> Acked-by: Philipp Zabel 
> 
> regards
> Philipp
>

Re: [PATCH] drm/mediatek: Use ERR_CAST instead of ERR_PTR(PTR_ERR())

2018-04-24 Thread CK Hu

Hi, Vasyl:

Sorry for the late reply.
I've applied this to my branch mediatek-drm-next-4.18

Regards,
CK

On Thu, 2017-11-23 at 17:31 +0800, Philipp Zabel wrote:
> On Tue, 2017-11-21 at 23:31 +0100, Vasyl Gomonovych wrote:
> > Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)).
> > 
> > drivers/gpu/drm/mediatek/mtk_drm_gem.c:223:9-16: WARNING: ERR_CAST can be 
> > used with mtk_gem
> > Generated by: scripts/coccinelle/api/err_cast.cocci
> > 
> > Signed-off-by: Vasyl Gomonovych 
> > ---
> >  drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c 
> > b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> > index f595ac816b55..5766b42fc174 100644
> > --- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> > +++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> > @@ -220,7 +220,7 @@ struct drm_gem_object 
> > *mtk_gem_prime_import_sg_table(struct drm_device *dev,
> > mtk_gem = mtk_drm_gem_init(dev, attach->dmabuf->size);
> >  
> > if (IS_ERR(mtk_gem))
> > -   return ERR_PTR(PTR_ERR(mtk_gem));
> > +   return ERR_CAST(mtk_gem));
> >  
> > expected = sg_dma_address(sg->sgl);
> > for_each_sg(sg->sgl, s, sg->nents, i) {
> 
> Acked-by: Philipp Zabel 
> 
> regards
> Philipp
>

[PATCH v4 1/2] tty/nozomi: cleanup DUMP() macro

Replace snprint() with strscpy() and use min_t() instead of
the conditional operator to clamp buffer length.

Signed-off-by: Joey Pabalinas 
---
 drivers/tty/nozomi.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index b57b35066ebea94639..f26bf1d1e9ee0e74eb 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -70,23 +70,23 @@ do {
\
 
 /* TODO: rewrite to optimize macros... */
 
 #define TMP_BUF_MAX 256
 
-#define DUMP(buf__,len__) \
-  do {  \
-char tbuf[TMP_BUF_MAX] = {0};\
-if (len__ > 1) {\
-   snprintf(tbuf, len__ > TMP_BUF_MAX ? TMP_BUF_MAX : len__, "%s", buf__);\
-   if (tbuf[len__-2] == '\r') {\
-   tbuf[len__-2] = 'r';\
-   } \
-   DBG1("SENDING: '%s' (%d+n)", tbuf, len__);\
-} else {\
-   DBG1("SENDING: '%s' (%d)", tbuf, len__);\
-} \
-} while (0)
+#define DUMP(buf__, len__) \
+   do {\
+   char tbuf[TMP_BUF_MAX] = {0};   \
+   if (len__ > 1) {\
+   u32 data_len = min_t(u32, len__, TMP_BUF_MAX);  \
+   strscpy(tbuf, buf__, data_len); \
+   if (tbuf[data_len - 2] == '\r') \
+   tbuf[data_len - 2] = 'r';   \
+   DBG1("SENDING: '%s' (%d+n)", tbuf, len__);  \
+   } else {\
+   DBG1("SENDING: '%s' (%d)", tbuf, len__);\
+   }   \
+   } while (0)
 
 /*Defines */
 #define NOZOMI_NAME"nozomi"
 #define NOZOMI_NAME_TTY"nozomi_tty"
 #define DRIVER_DESC"Nozomi driver"
-- 
2.17.0.rc1.35.g90bbd502d54fe92035.dirty

[PATCH v4 1/2] tty/nozomi: cleanup DUMP() macro

Replace snprint() with strscpy() and use min_t() instead of
the conditional operator to clamp buffer length.

Signed-off-by: Joey Pabalinas 
---
 drivers/tty/nozomi.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index b57b35066ebea94639..f26bf1d1e9ee0e74eb 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -70,23 +70,23 @@ do {
\
 
 /* TODO: rewrite to optimize macros... */
 
 #define TMP_BUF_MAX 256
 
-#define DUMP(buf__,len__) \
-  do {  \
-char tbuf[TMP_BUF_MAX] = {0};\
-if (len__ > 1) {\
-   snprintf(tbuf, len__ > TMP_BUF_MAX ? TMP_BUF_MAX : len__, "%s", buf__);\
-   if (tbuf[len__-2] == '\r') {\
-   tbuf[len__-2] = 'r';\
-   } \
-   DBG1("SENDING: '%s' (%d+n)", tbuf, len__);\
-} else {\
-   DBG1("SENDING: '%s' (%d)", tbuf, len__);\
-} \
-} while (0)
+#define DUMP(buf__, len__) \
+   do {\
+   char tbuf[TMP_BUF_MAX] = {0};   \
+   if (len__ > 1) {\
+   u32 data_len = min_t(u32, len__, TMP_BUF_MAX);  \
+   strscpy(tbuf, buf__, data_len); \
+   if (tbuf[data_len - 2] == '\r') \
+   tbuf[data_len - 2] = 'r';   \
+   DBG1("SENDING: '%s' (%d+n)", tbuf, len__);  \
+   } else {\
+   DBG1("SENDING: '%s' (%d)", tbuf, len__);\
+   }   \
+   } while (0)
 
 /*Defines */
 #define NOZOMI_NAME"nozomi"
 #define NOZOMI_NAME_TTY"nozomi_tty"
 #define DRIVER_DESC"Nozomi driver"
-- 
2.17.0.rc1.35.g90bbd502d54fe92035.dirty

[PATCH v4 0/2] tty/nozomi: general module cleanup

The nozomi module has a few sections which could use a bit of cleanup;
both style and clarity could be improved while maintaining equivalent
semantics.

Cleanup messy portions of the module code while preserving existing
behavior by:

 - Replacing constructs like `len__ > TMP_BUF_MAX ? TMP_BUF_MAX : len__`
   with `min_t(u32, len__, TMP_BUF_MAX)` and function calls like
   snprintf(tbuf, ..., "%s", ...). with strscpy(tbuf, ..., ...).
 - Correct inconsistently indented lines and extraneous whitespace.

CC: Greg Kroah-Hartman 
CC: Arnd Bergmann 
CC: Jiri Slaby 

Joey Pabalinas (2):
  tty/nozomi: cleanup DUMP() macro
  tty/nozomi: fix inconsistent indentation

 drivers/tty/nozomi.c | 100 +--
 1 file changed, 50 insertions(+), 50 deletions(-)

-- 
2.17.0.rc1.35.g90bbd502d54fe92035.dirty

[PATCH v4 0/2] tty/nozomi: general module cleanup

The nozomi module has a few sections which could use a bit of cleanup;
both style and clarity could be improved while maintaining equivalent
semantics.

Cleanup messy portions of the module code while preserving existing
behavior by:

 - Replacing constructs like `len__ > TMP_BUF_MAX ? TMP_BUF_MAX : len__`
   with `min_t(u32, len__, TMP_BUF_MAX)` and function calls like
   snprintf(tbuf, ..., "%s", ...). with strscpy(tbuf, ..., ...).
 - Correct inconsistently indented lines and extraneous whitespace.

CC: Greg Kroah-Hartman 
CC: Arnd Bergmann 
CC: Jiri Slaby 

Joey Pabalinas (2):
  tty/nozomi: cleanup DUMP() macro
  tty/nozomi: fix inconsistent indentation

 drivers/tty/nozomi.c | 100 +--
 1 file changed, 50 insertions(+), 50 deletions(-)

-- 
2.17.0.rc1.35.g90bbd502d54fe92035.dirty

[PATCH 2/5] f2fs: avoid bug_on on corrupted inode

syzbot has tested the proposed patch but the reproducer still triggered crash:
kernel BUG at fs/f2fs/inode.c:LINE!

F2FS-fs (loop1): invalid crc value
F2FS-fs (loop5): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop5): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop5): invalid crc value
[ cut here ]
kernel BUG at fs/f2fs/inode.c:238!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4886 Comm: syz-executor1 Not tainted 4.17.0-rc1+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:do_read_inode fs/f2fs/inode.c:238 [inline]
RIP: 0010:f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313
RSP: 0018:8801c44a70e8 EFLAGS: 00010293
RAX: 8801ce208040 RBX: 8801b3621080 RCX: 82eace18
F2FS-fs (loop2): Magic Mismatch, valid(0xf2f52010) - read(0x0)
RDX:  RSI: 82eaf047 RDI: 0007
RBP: 8801c44a7410 R08: 8801ce208040 R09: ed0039ee4176
R10: ed0039ee4176 R11: 8801cf720bb7 R12: 8801c0efa000
R13: 0003 R14:  R15: 
FS:  7f753aa9d700() GS:8801daf0() knlGS:
[ cut here ]
CS:  0010 DS:  ES:  CR0: 80050033
kernel BUG at fs/f2fs/inode.c:238!
CR2: 01b03018 CR3: 0001c8b74000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 f2fs_fill_super+0x4377/0x7bf0 fs/f2fs/super.c:2842
 mount_bdev+0x30c/0x3e0 fs/super.c:1165
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457daa
RSP: 002b:7f753aa9cba8 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 2000 RCX: 00457daa
RDX: 2000 RSI: 2100 RDI: 7f753aa9cbf0
RBP: 0064 R08: 20016a00 R09: 2000
R10:  R11: 0246 R12: 0003
R13: 0064 R14: 006fcb80 R15: 
RIP: do_read_inode fs/f2fs/inode.c:238 [inline] RSP: 8801c44a70e8
RIP: f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313 RSP: 8801c44a70e8
invalid opcode:  [#2] SMP KASAN
---[ end trace 1cbcbec2156680bc ]---

Reported-and-tested-by: syzbot+41a1b341571f0952b...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/inode.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 414b1ede642b..7f2fe4574c48 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -185,6 +185,21 @@ void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, 
struct page *page)
ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page));
 }
 
+static bool sanity_check_inode(struct inode *inode)
+{
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+   if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)
+   && !f2fs_has_extra_attr(inode)) {
+   set_sbi_flag(sbi, SBI_NEED_FSCK);
+   f2fs_msg(sbi->sb, KERN_WARNING,
+   "%s: corrupted inode ino=%lx, run fsck to fix.",
+   __func__, inode->i_ino);
+   return false;
+   }
+   return true;
+}
+
 static int do_read_inode(struct inode *inode)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -235,7 +250,6 @@ static int do_read_inode(struct inode *inode)
le16_to_cpu(ri->i_extra_isize) : 0;
 
if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
-   f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
} else if (f2fs_has_inline_xattr(inode) ||
f2fs_has_inline_dentry(inode)) {
@@ -313,6 +327,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned 
long ino)
ret = do_read_inode(inode);
if (ret)
goto bad_inode;
+   if (!sanity_check_inode(inode)) {
+   ret = -EINVAL;
+   goto bad_inode;
+   }
 make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = _node_aops;
-- 
2.17.0.484.g0c8726318c-goog

[PATCH 2/5] f2fs: avoid bug_on on corrupted inode

syzbot has tested the proposed patch but the reproducer still triggered crash:
kernel BUG at fs/f2fs/inode.c:LINE!

F2FS-fs (loop1): invalid crc value
F2FS-fs (loop5): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop5): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop5): invalid crc value
[ cut here ]
kernel BUG at fs/f2fs/inode.c:238!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4886 Comm: syz-executor1 Not tainted 4.17.0-rc1+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:do_read_inode fs/f2fs/inode.c:238 [inline]
RIP: 0010:f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313
RSP: 0018:8801c44a70e8 EFLAGS: 00010293
RAX: 8801ce208040 RBX: 8801b3621080 RCX: 82eace18
F2FS-fs (loop2): Magic Mismatch, valid(0xf2f52010) - read(0x0)
RDX:  RSI: 82eaf047 RDI: 0007
RBP: 8801c44a7410 R08: 8801ce208040 R09: ed0039ee4176
R10: ed0039ee4176 R11: 8801cf720bb7 R12: 8801c0efa000
R13: 0003 R14:  R15: 
FS:  7f753aa9d700() GS:8801daf0() knlGS:
[ cut here ]
CS:  0010 DS:  ES:  CR0: 80050033
kernel BUG at fs/f2fs/inode.c:238!
CR2: 01b03018 CR3: 0001c8b74000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 f2fs_fill_super+0x4377/0x7bf0 fs/f2fs/super.c:2842
 mount_bdev+0x30c/0x3e0 fs/super.c:1165
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457daa
RSP: 002b:7f753aa9cba8 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 2000 RCX: 00457daa
RDX: 2000 RSI: 2100 RDI: 7f753aa9cbf0
RBP: 0064 R08: 20016a00 R09: 2000
R10:  R11: 0246 R12: 0003
R13: 0064 R14: 006fcb80 R15: 
RIP: do_read_inode fs/f2fs/inode.c:238 [inline] RSP: 8801c44a70e8
RIP: f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313 RSP: 8801c44a70e8
invalid opcode:  [#2] SMP KASAN
---[ end trace 1cbcbec2156680bc ]---

Reported-and-tested-by: syzbot+41a1b341571f0952b...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/inode.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 414b1ede642b..7f2fe4574c48 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -185,6 +185,21 @@ void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, 
struct page *page)
ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page));
 }
 
+static bool sanity_check_inode(struct inode *inode)
+{
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+   if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)
+   && !f2fs_has_extra_attr(inode)) {
+   set_sbi_flag(sbi, SBI_NEED_FSCK);
+   f2fs_msg(sbi->sb, KERN_WARNING,
+   "%s: corrupted inode ino=%lx, run fsck to fix.",
+   __func__, inode->i_ino);
+   return false;
+   }
+   return true;
+}
+
 static int do_read_inode(struct inode *inode)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -235,7 +250,6 @@ static int do_read_inode(struct inode *inode)
le16_to_cpu(ri->i_extra_isize) : 0;
 
if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
-   f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
} else if (f2fs_has_inline_xattr(inode) ||
f2fs_has_inline_dentry(inode)) {
@@ -313,6 +327,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned 
long ino)
ret = do_read_inode(inode);
if (ret)
goto bad_inode;
+   if (!sanity_check_inode(inode)) {
+   ret = -EINVAL;
+   goto bad_inode;
+   }
 make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = _node_aops;
-- 
2.17.0.484.g0c8726318c-goog

[PATCH 4/5] f2fs: sanity check for total valid blocks

This patch enhances sanity check for SIT entries.

syzbot hit the following crash on upstream commit
83beed7b2b26f232d782127792dd0cd4362fdc41 (Fri Apr 20 17:56:32 2018 +)
Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad

syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+bf9253040425feb15...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Try to recover 1th superblock, ret: 0
F2FS-fs (loop0): Mounted with checkpoint version = d
F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740
[ cut here ]
kernel BUG at fs/f2fs/segment.c:1884!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882
RSP: 0018:8801af526708 EFLAGS: 00010282
RAX: ed0035ea4cc0 RBX: 8801ad454f90 RCX: 
RDX:  RSI: 82eeb87e RDI: ed0035ea4cb6
RBP: 8801af526760 R08: 8801ad4a2480 R09: ed003b5e4f90
R10: ed003b5e4f90 R11: 8801daf27c87 R12: 8801adb8d380
R13: 0001 R14: 0008 R15: 
FS:  014af940() GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f06bc223000 CR3: 0001adb02000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663
 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727
 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770
 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398
 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652
 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088
 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405
 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077
 __sync_filesystem fs/sync.c:39 [inline]
 sync_filesystem+0x265/0x310 fs/sync.c:67
 generic_shutdown_super+0xd7/0x520 fs/super.c:429
 kill_block_super+0xa4/0x100 fs/super.c:1191
 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030
 deactivate_locked_super+0x97/0x100 fs/super.c:316
 deactivate_super+0x188/0x1b0 fs/super.c:347
 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:191 [inline]
 exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457d97
RSP: 002b:7ffd46f9c8e8 EFLAGS: 0246 ORIG_RAX: 00a6
RAX:  RBX:  RCX: 00457d97
RDX: 014b09a3 RSI: 0002 RDI: 7ffd46f9da50
RBP: 7ffd46f9da50 R08:  R09: 0009
R10: 0005 R11: 0246 R12: 014b0940
R13:  R14: 0002 R15: 658e
RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: 8801af526708
---[ end trace f498328bb02610a2 ]---

Reported-and-tested-by: syzbot+bf9253040425feb15...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+7d6d31d3bc702f566...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+0a725420475916460...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 20250b88bf51..a55647f61232 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3612,6 +3612,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
unsigned int i, start, end;
unsigned int readed, start_blk = 0;
int err = 0;
+   block_t total_valid_blocks = 0;
 
do {
readed = ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES,
@@ -3634,6 +3635,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (err)
return err;

[PATCH 4/5] f2fs: sanity check for total valid blocks

This patch enhances sanity check for SIT entries.

syzbot hit the following crash on upstream commit
83beed7b2b26f232d782127792dd0cd4362fdc41 (Fri Apr 20 17:56:32 2018 +)
Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad

syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+bf9253040425feb15...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Try to recover 1th superblock, ret: 0
F2FS-fs (loop0): Mounted with checkpoint version = d
F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740
[ cut here ]
kernel BUG at fs/f2fs/segment.c:1884!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882
RSP: 0018:8801af526708 EFLAGS: 00010282
RAX: ed0035ea4cc0 RBX: 8801ad454f90 RCX: 
RDX:  RSI: 82eeb87e RDI: ed0035ea4cb6
RBP: 8801af526760 R08: 8801ad4a2480 R09: ed003b5e4f90
R10: ed003b5e4f90 R11: 8801daf27c87 R12: 8801adb8d380
R13: 0001 R14: 0008 R15: 
FS:  014af940() GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f06bc223000 CR3: 0001adb02000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663
 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727
 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770
 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398
 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652
 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088
 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405
 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077
 __sync_filesystem fs/sync.c:39 [inline]
 sync_filesystem+0x265/0x310 fs/sync.c:67
 generic_shutdown_super+0xd7/0x520 fs/super.c:429
 kill_block_super+0xa4/0x100 fs/super.c:1191
 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030
 deactivate_locked_super+0x97/0x100 fs/super.c:316
 deactivate_super+0x188/0x1b0 fs/super.c:347
 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:191 [inline]
 exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457d97
RSP: 002b:7ffd46f9c8e8 EFLAGS: 0246 ORIG_RAX: 00a6
RAX:  RBX:  RCX: 00457d97
RDX: 014b09a3 RSI: 0002 RDI: 7ffd46f9da50
RBP: 7ffd46f9da50 R08:  R09: 0009
R10: 0005 R11: 0246 R12: 014b0940
R13:  R14: 0002 R15: 658e
RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: 8801af526708
---[ end trace f498328bb02610a2 ]---

Reported-and-tested-by: syzbot+bf9253040425feb15...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+7d6d31d3bc702f566...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+0a725420475916460...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 20250b88bf51..a55647f61232 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3612,6 +3612,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
unsigned int i, start, end;
unsigned int readed, start_blk = 0;
int err = 0;
+   block_t total_valid_blocks = 0;
 
do {
readed = ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES,
@@ -3634,6 +3635,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (err)
return err;

[PATCH 5/5] f2fs: enforce fsync_mode=strict for renamed directory

This is to give a option for user to be able to recover B/foo in the below
case.

mkdir A
sync()
rename(A, B)
creat (B/foo)
fsync (B/foo)
---crash---

Sugessted-by: Velayudhan Pillai 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/namei.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b5f404674cad..fef6e3ab2135 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -973,8 +973,11 @@ static int f2fs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
f2fs_put_page(old_dir_page, 0);
f2fs_i_links_write(old_dir, false);
}
-   if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT)
+   if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) {
add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO);
+   if (S_ISDIR(old_inode->i_mode))
+   add_ino_entry(sbi, old_inode->i_ino, TRANS_DIR_INO);
+   }
 
f2fs_unlock_op(sbi);
 
-- 
2.17.0.484.g0c8726318c-goog

[PATCH 3/5] f2fs: sanity check on sit entry

syzbot hit the following crash on upstream commit
87ef12027b9b1dd0e0b12cf311fbcb19f9d92539 (Wed Apr 18 19:48:17 2018 +)
Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744
syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+83699adeb2d13579c...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
BUG: unable to handle kernel paging request at ed006b2a50c0
PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0
Oops:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline]
RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852
RSP: 0018:8801b102e5b0 EFLAGS: 00010a06
RAX: 11006b2a50c0 RBX: 0004 RCX: 0001
RDX:  RSI: 0001 RDI: 8801ac74243e
RBP: 8801b102f410 R08: 8801acbd46c0 R09: fbfff14d9af8
R10: fbfff14d9af8 R11: 8801acbd46c0 R12: 8801ac742a80
R13: 8801d9519100 R14: dc00 R15: 880359528600
FS:  01e04880() GS:8801dae0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed006b2a50c0 CR3: 0001ac6ac000 CR4: 001406f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803
 mount_bdev+0x30c/0x3e0 fs/super.c:1165
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443d6a
RSP: 002b:7ffd312813c8 EFLAGS: 0297 ORIG_RAX: 00a5
RAX: ffda RBX: 2c00 RCX: 00443d6a
RDX: 2000 RSI: 2100 RDI: 7ffd312813d0
RBP: 0003 R08: 20016a00 R09: 000a
R10:  R11: 0297 R12: 0004
R13: 00402c60 R14:  R15: 
RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: 8801b102e5b0
RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: 
8801b102e5b0
CR2: ed006b2a50c0
---[ end trace a2034989e196ff17 ]---

Reported-and-tested-by: syzbot+83699adeb2d13579c...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index e4e8bdd645ee..20250b88bf51 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3662,6 +3662,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
unsigned int old_valid_blocks;
 
start = le32_to_cpu(segno_in_journal(journal, i));
+   if (start >= MAIN_SEGS(sbi)) {
+   f2fs_msg(sbi->sb, KERN_ERR,
+   "Wrong journal entry on segno %u",
+   start);
+   set_sbi_flag(sbi, SBI_NEED_FSCK);
+   err = -EINVAL;
+   break;
+   }
+
se = _i->sentries[start];
sit = sit_in_journal(journal, i);
 
-- 
2.17.0.484.g0c8726318c-goog

[PATCH 5/5] f2fs: enforce fsync_mode=strict for renamed directory

This is to give a option for user to be able to recover B/foo in the below
case.

mkdir A
sync()
rename(A, B)
creat (B/foo)
fsync (B/foo)
---crash---

Sugessted-by: Velayudhan Pillai 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/namei.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b5f404674cad..fef6e3ab2135 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -973,8 +973,11 @@ static int f2fs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
f2fs_put_page(old_dir_page, 0);
f2fs_i_links_write(old_dir, false);
}
-   if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT)
+   if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) {
add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO);
+   if (S_ISDIR(old_inode->i_mode))
+   add_ino_entry(sbi, old_inode->i_ino, TRANS_DIR_INO);
+   }
 
f2fs_unlock_op(sbi);
 
-- 
2.17.0.484.g0c8726318c-goog

[PATCH 3/5] f2fs: sanity check on sit entry

syzbot hit the following crash on upstream commit
87ef12027b9b1dd0e0b12cf311fbcb19f9d92539 (Wed Apr 18 19:48:17 2018 +)
Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744
syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+83699adeb2d13579c...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
BUG: unable to handle kernel paging request at ed006b2a50c0
PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0
Oops:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline]
RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852
RSP: 0018:8801b102e5b0 EFLAGS: 00010a06
RAX: 11006b2a50c0 RBX: 0004 RCX: 0001
RDX:  RSI: 0001 RDI: 8801ac74243e
RBP: 8801b102f410 R08: 8801acbd46c0 R09: fbfff14d9af8
R10: fbfff14d9af8 R11: 8801acbd46c0 R12: 8801ac742a80
R13: 8801d9519100 R14: dc00 R15: 880359528600
FS:  01e04880() GS:8801dae0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed006b2a50c0 CR3: 0001ac6ac000 CR4: 001406f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803
 mount_bdev+0x30c/0x3e0 fs/super.c:1165
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443d6a
RSP: 002b:7ffd312813c8 EFLAGS: 0297 ORIG_RAX: 00a5
RAX: ffda RBX: 2c00 RCX: 00443d6a
RDX: 2000 RSI: 2100 RDI: 7ffd312813d0
RBP: 0003 R08: 20016a00 R09: 000a
R10:  R11: 0297 R12: 0004
R13: 00402c60 R14:  R15: 
RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: 8801b102e5b0
RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: 
8801b102e5b0
CR2: ed006b2a50c0
---[ end trace a2034989e196ff17 ]---

Reported-and-tested-by: syzbot+83699adeb2d13579c...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index e4e8bdd645ee..20250b88bf51 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3662,6 +3662,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
unsigned int old_valid_blocks;
 
start = le32_to_cpu(segno_in_journal(journal, i));
+   if (start >= MAIN_SEGS(sbi)) {
+   f2fs_msg(sbi->sb, KERN_ERR,
+   "Wrong journal entry on segno %u",
+   start);
+   set_sbi_flag(sbi, SBI_NEED_FSCK);
+   err = -EINVAL;
+   break;
+   }
+
se = _i->sentries[start];
sit = sit_in_journal(journal, i);
 
-- 
2.17.0.484.g0c8726318c-goog

[PATCH 1/5] f2fs: give message and set need_fsck given broken node id

syzbot hit the following crash on upstream commit
83beed7b2b26f232d782127792dd0cd4362fdc41 (Fri Apr 20 17:56:32 2018 +)
Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264
syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d154ec99402c6f628...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
[ cut here ]
kernel BUG at fs/f2fs/node.c:1185!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185
RSP: 0018:8801d960e820 EFLAGS: 00010293
RAX: 8801d88205c0 RBX: 0003 RCX: 82f6cc06
RDX:  RSI: 82f6d5e8 RDI: 0004
RBP: 8801d960ec30 R08: 8801d88205c0 R09: ed003b5e46c2
R10: 0003 R11: 0003 R12: 8801a86e00c0
R13: 0001 R14: 8801a86e0530 R15: 8801d9745240
FS:  0072c880() GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f3d403209b8 CR3: 0001d8f3f000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 get_node_page fs/f2fs/node.c:1237 [inline]
 truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014
 remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039
 f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547
 evict+0x4a6/0x960 fs/inode.c:557
 iput_final fs/inode.c:1519 [inline]
 iput+0x62d/0xa80 fs/inode.c:1545
 f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849
 mount_bdev+0x30c/0x3e0 fs/super.c:1164
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1267
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2518 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2848
 ksys_mount+0x12d/0x140 fs/namespace.c:3064
 __do_sys_mount fs/namespace.c:3078 [inline]
 __se_sys_mount fs/namespace.c:3075 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443dea
RSP: 002b:7ffcc7882368 EFLAGS: 0297 ORIG_RAX: 00a5
RAX: ffda RBX: 2c00 RCX: 00443dea
RDX: 2000 RSI: 2100 RDI: 7ffcc7882370
RBP: 0003 R08: 20016a00 R09: 000a
R10:  R11: 0297 R12: 0004
R13: 00402ce0 R14:  R15: 
RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: 8801d960e820
---[ end trace 4edbeb71f002bb76 ]---

Reported-and-tested-by: syzbot+d154ec99402c6f628...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h  | 13 +
 fs/f2fs/inode.c | 13 ++---
 fs/f2fs/node.c  | 23 +--
 3 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8f3ad9662d13..d26aae5bf00d 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1583,18 +1583,6 @@ static inline bool __exist_node_summaries(struct 
f2fs_sb_info *sbi)
is_set_ckpt_flags(sbi, CP_FASTBOOT_FLAG));
 }
 
-/*
- * Check whether the given nid is within node id range.
- */
-static inline int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
-{
-   if (unlikely(nid < F2FS_ROOT_INO(sbi)))
-   return -EINVAL;
-   if (unlikely(nid >= NM_I(sbi)->max_nid))
-   return -EINVAL;
-   return 0;
-}
-
 /*
  * Check whether the inode has blocks or not
  */
@@ -2768,6 +2756,7 @@ f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info,
 struct dnode_of_data;
 struct node_info;
 
+int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid);
 bool available_free_memory(struct f2fs_sb_info *sbi, int type);
 int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid);
 bool

[PATCH 1/5] f2fs: give message and set need_fsck given broken node id

syzbot hit the following crash on upstream commit
83beed7b2b26f232d782127792dd0cd4362fdc41 (Fri Apr 20 17:56:32 2018 +)
Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264
syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d154ec99402c6f628...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
[ cut here ]
kernel BUG at fs/f2fs/node.c:1185!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185
RSP: 0018:8801d960e820 EFLAGS: 00010293
RAX: 8801d88205c0 RBX: 0003 RCX: 82f6cc06
RDX:  RSI: 82f6d5e8 RDI: 0004
RBP: 8801d960ec30 R08: 8801d88205c0 R09: ed003b5e46c2
R10: 0003 R11: 0003 R12: 8801a86e00c0
R13: 0001 R14: 8801a86e0530 R15: 8801d9745240
FS:  0072c880() GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f3d403209b8 CR3: 0001d8f3f000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 get_node_page fs/f2fs/node.c:1237 [inline]
 truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014
 remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039
 f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547
 evict+0x4a6/0x960 fs/inode.c:557
 iput_final fs/inode.c:1519 [inline]
 iput+0x62d/0xa80 fs/inode.c:1545
 f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849
 mount_bdev+0x30c/0x3e0 fs/super.c:1164
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1267
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2518 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2848
 ksys_mount+0x12d/0x140 fs/namespace.c:3064
 __do_sys_mount fs/namespace.c:3078 [inline]
 __se_sys_mount fs/namespace.c:3075 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443dea
RSP: 002b:7ffcc7882368 EFLAGS: 0297 ORIG_RAX: 00a5
RAX: ffda RBX: 2c00 RCX: 00443dea
RDX: 2000 RSI: 2100 RDI: 7ffcc7882370
RBP: 0003 R08: 20016a00 R09: 000a
R10:  R11: 0297 R12: 0004
R13: 00402ce0 R14:  R15: 
RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: 8801d960e820
---[ end trace 4edbeb71f002bb76 ]---

Reported-and-tested-by: syzbot+d154ec99402c6f628...@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h  | 13 +
 fs/f2fs/inode.c | 13 ++---
 fs/f2fs/node.c  | 23 +--
 3 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8f3ad9662d13..d26aae5bf00d 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1583,18 +1583,6 @@ static inline bool __exist_node_summaries(struct 
f2fs_sb_info *sbi)
is_set_ckpt_flags(sbi, CP_FASTBOOT_FLAG));
 }
 
-/*
- * Check whether the given nid is within node id range.
- */
-static inline int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
-{
-   if (unlikely(nid < F2FS_ROOT_INO(sbi)))
-   return -EINVAL;
-   if (unlikely(nid >= NM_I(sbi)->max_nid))
-   return -EINVAL;
-   return 0;
-}
-
 /*
  * Check whether the inode has blocks or not
  */
@@ -2768,6 +2756,7 @@ f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info,
 struct dnode_of_data;
 struct node_info;
 
+int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid);
 bool available_free_memory(struct f2fs_sb_info *sbi, int type);
 int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid);
 bool

Re: [PATCH v3 1/2] tty/nozomi: cleanup DUMP() macro

On Wed, Apr 25, 2018 at 07:38:48AM +0200, Greg Kroah-Hartman wrote:
> How are you sending these patches?  How are you creating them?  What is
> taking part of the diffstat off and just leaving that line?

Hm, my `git format-patch` alias included --shortstat for some odd reason,
very sorry about that.

Going to resend.

-- 
Cheers,
Joey Pabalinas


signature.asc
Description: PGP signature

Re: [PATCH v3 1/2] tty/nozomi: cleanup DUMP() macro

On Wed, Apr 25, 2018 at 07:38:48AM +0200, Greg Kroah-Hartman wrote:
> How are you sending these patches?  How are you creating them?  What is
> taking part of the diffstat off and just leaving that line?

Hm, my `git format-patch` alias included --shortstat for some odd reason,
very sorry about that.

Going to resend.

-- 
Cheers,
Joey Pabalinas


signature.asc
Description: PGP signature

[PATCH v2 1/3] sample: vfio mdev display - host device

Simple framebuffer display, demo-ing the vfio region display interface
(VFIO_GFX_PLANE_TYPE_REGION).

Signed-off-by: Gerd Hoffmann 
---
 samples/vfio-mdev/mdpy-defs.h |  22 ++
 samples/vfio-mdev/mdpy.c  | 807 ++
 samples/Kconfig   |   8 +
 samples/vfio-mdev/Makefile|   1 +
 4 files changed, 838 insertions(+)
 create mode 100644 samples/vfio-mdev/mdpy-defs.h
 create mode 100644 samples/vfio-mdev/mdpy.c

diff --git a/samples/vfio-mdev/mdpy-defs.h b/samples/vfio-mdev/mdpy-defs.h
new file mode 100644
index 00..96b3b1b49d
--- /dev/null
+++ b/samples/vfio-mdev/mdpy-defs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Simple pci display device.
+ *
+ * Framebuffer memory is pci bar 0.
+ * Configuration (read-only) is in pci config space.
+ * Format field uses drm fourcc codes.
+ * ATM only DRM_FORMAT_XRGB is supported.
+ */
+
+/* pci ids */
+#define MDPY_PCI_VENDOR_ID 0x1b36 /* redhat */
+#define MDPY_PCI_DEVICE_ID 0x000f
+#define MDPY_PCI_SUBVENDOR_ID  PCI_SUBVENDOR_ID_REDHAT_QUMRANET
+#define MDPY_PCI_SUBDEVICE_ID  PCI_SUBDEVICE_ID_QEMU
+
+/* pci cfg space offsets for fb config (dword) */
+#define MDPY_VENDORCAP_OFFSET   0x40
+#define MDPY_VENDORCAP_SIZE 0x10
+#define MDPY_FORMAT_OFFSET (MDPY_VENDORCAP_OFFSET + 0x04)
+#define MDPY_WIDTH_OFFSET  (MDPY_VENDORCAP_OFFSET + 0x08)
+#define MDPY_HEIGHT_OFFSET (MDPY_VENDORCAP_OFFSET + 0x0c)
diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
new file mode 100644
index 00..96e7969c47
--- /dev/null
+++ b/samples/vfio-mdev/mdpy.c
@@ -0,0 +1,807 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Mediated virtual PCI display host device driver
+ *
+ * See mdpy-defs.h for device specs
+ *
+ *   (c) Gerd Hoffmann 
+ *
+ * based on mtty driver which is:
+ *   Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *  Author: Neo Jia 
+ *  Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "mdpy-defs.h"
+
+#define MDPY_NAME  "mdpy"
+#define MDPY_CLASS_NAME"mdpy"
+
+#define MDPY_CONFIG_SPACE_SIZE 0xff
+#define MDPY_MEMORY_BAR_OFFSET PAGE_SIZE
+#define MDPY_DISPLAY_REGION16
+
+#define STORE_LE16(addr, val)  (*(u16 *)addr = val)
+#define STORE_LE32(addr, val)  (*(u32 *)addr = val)
+
+
+MODULE_LICENSE("GPL v2");
+
+static int max_devices = 4;
+module_param_named(count, max_devices, int, 0444);
+MODULE_PARM_DESC(count, "number of " MDPY_NAME " devices");
+
+
+#define MDPY_TYPE_1 "vga"
+#define MDPY_TYPE_2 "xga"
+#define MDPY_TYPE_3 "hd"
+
+static const struct mdpy_type {
+   const char *name;
+   u32 format;
+   u32 bytepp;
+   u32 width;
+   u32 height;
+} mdpy_types[] = {
+   {
+   .name   = MDPY_CLASS_NAME "-" MDPY_TYPE_1,
+   .format = DRM_FORMAT_XRGB,
+   .bytepp = 4,
+   .width  = 640,
+   .height = 480,
+   }, {
+   .name   = MDPY_CLASS_NAME "-" MDPY_TYPE_2,
+   .format = DRM_FORMAT_XRGB,
+   .bytepp = 4,
+   .width  = 1024,
+   .height = 768,
+   }, {
+   .name   = MDPY_CLASS_NAME "-" MDPY_TYPE_3,
+   .format = DRM_FORMAT_XRGB,
+   .bytepp = 4,
+   .width  = 1920,
+   .height = 1080,
+   },
+};
+
+static dev_t   mdpy_devt;
+static struct class*mdpy_class;
+static struct cdev mdpy_cdev;
+static struct device   mdpy_dev;
+static u32 mdpy_count;
+
+/* State of each mdev device */
+struct mdev_state {
+   u8 *vconfig;
+   u32 bar_mask;
+   struct mutex ops_lock;
+   struct mdev_device *mdev;
+   struct vfio_device_info dev_info;
+
+   const struct mdpy_type *type;
+   u32 memsize;
+   void *memblk;
+};
+
+static const struct mdpy_type *mdpy_find_type(struct kobject *kobj)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(mdpy_types); i++)
+   if (strcmp(mdpy_types[i].name, kobj->name) == 0)
+   return mdpy_types + i;
+   return NULL;
+}
+
+static void mdpy_create_config_space(struct mdev_state *mdev_state)
+{
+   STORE_LE16((u16 *) _state->vconfig[PCI_VENDOR_ID],
+  MDPY_PCI_VENDOR_ID);
+   STORE_LE16((u16 *) _state->vconfig[PCI_DEVICE_ID],
+  MDPY_PCI_DEVICE_ID);
+   STORE_LE16((u16 *) _state->vconfig[PCI_SUBSYSTEM_VENDOR_ID],
+  MDPY_PCI_SUBVENDOR_ID);
+   STORE_LE16((u16 *)

[PATCH v2 2/3] sample: vfio mdev display - guest driver

Guest fbdev driver for CONFIG_SAMPLE_VFIO_MDEV_MDPY.

Signed-off-by: Gerd Hoffmann 
---
 samples/vfio-mdev/mdpy-fb.c | 232 
 samples/Kconfig |   9 ++
 samples/vfio-mdev/Makefile  |   1 +
 3 files changed, 242 insertions(+)
 create mode 100644 samples/vfio-mdev/mdpy-fb.c

diff --git a/samples/vfio-mdev/mdpy-fb.c b/samples/vfio-mdev/mdpy-fb.c
new file mode 100644
index 00..2719bb2596
--- /dev/null
+++ b/samples/vfio-mdev/mdpy-fb.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Framebuffer driver for mdpy (mediated virtual pci display device).
+ *
+ * See mdpy-defs.h for device specs
+ *
+ *   (c) Gerd Hoffmann 
+ *
+ * Using some code snippets from simplefb and cirrusfb.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "mdpy-defs.h"
+
+static const struct fb_fix_screeninfo mdpy_fb_fix = {
+   .id = "mdpy-fb",
+   .type   = FB_TYPE_PACKED_PIXELS,
+   .visual = FB_VISUAL_TRUECOLOR,
+   .accel  = FB_ACCEL_NONE,
+};
+
+static const struct fb_var_screeninfo mdpy_fb_var = {
+   .height = -1,
+   .width  = -1,
+   .activate   = FB_ACTIVATE_NOW,
+   .vmode  = FB_VMODE_NONINTERLACED,
+
+   .bits_per_pixel = 32,
+   .transp.offset  = 24,
+   .red.offset = 16,
+   .green.offset   = 8,
+   .blue.offset= 0,
+   .transp.length  = 8,
+   .red.length = 8,
+   .green.length   = 8,
+   .blue.length= 8,
+};
+
+#define PSEUDO_PALETTE_SIZE 16
+
+struct mdpy_fb_par {
+   u32 palette[PSEUDO_PALETTE_SIZE];
+};
+
+static int mdpy_fb_setcolreg(u_int regno, u_int red, u_int green, u_int blue,
+ u_int transp, struct fb_info *info)
+{
+   u32 *pal = info->pseudo_palette;
+   u32 cr = red >> (16 - info->var.red.length);
+   u32 cg = green >> (16 - info->var.green.length);
+   u32 cb = blue >> (16 - info->var.blue.length);
+   u32 value, mask;
+
+   if (regno >= PSEUDO_PALETTE_SIZE)
+   return -EINVAL;
+
+   value = (cr << info->var.red.offset) |
+   (cg << info->var.green.offset) |
+   (cb << info->var.blue.offset);
+   if (info->var.transp.length > 0) {
+   mask = (1 << info->var.transp.length) - 1;
+   mask <<= info->var.transp.offset;
+   value |= mask;
+   }
+   pal[regno] = value;
+
+   return 0;
+}
+
+static void mdpy_fb_destroy(struct fb_info *info)
+{
+   if (info->screen_base)
+   iounmap(info->screen_base);
+}
+
+static struct fb_ops mdpy_fb_ops = {
+   .owner  = THIS_MODULE,
+   .fb_destroy = mdpy_fb_destroy,
+   .fb_setcolreg   = mdpy_fb_setcolreg,
+   .fb_fillrect= cfb_fillrect,
+   .fb_copyarea= cfb_copyarea,
+   .fb_imageblit   = cfb_imageblit,
+};
+
+static int mdpy_fb_probe(struct pci_dev *pdev,
+const struct pci_device_id *ent)
+{
+   struct fb_info *info;
+   struct mdpy_fb_par *par;
+   u32 format, width, height;
+   int ret;
+
+   ret = pci_enable_device(pdev);
+   if (ret < 0)
+   return ret;
+
+   ret = pci_request_regions(pdev, "mdpy-fb");
+   if (ret < 0)
+   return ret;
+
+   pci_read_config_dword(pdev, MDPY_FORMAT_OFFSET, );
+   pci_read_config_dword(pdev, MDPY_WIDTH_OFFSET,  );
+   pci_read_config_dword(pdev, MDPY_HEIGHT_OFFSET, );
+   if (format != DRM_FORMAT_XRGB) {
+   pci_err(pdev, "format mismatch (0x%x != 0x%x)\n",
+   format, DRM_FORMAT_XRGB);
+   return -EINVAL;
+   }
+   if (width < 100  || width > 1) {
+   pci_err(pdev, "width (%d) out of range\n", width);
+   return -EINVAL;
+   }
+   if (height < 100 || height > 1) {
+   pci_err(pdev, "height (%d) out of range\n", height);
+   return -EINVAL;
+   }
+   pci_info(pdev, "mdpy found: %dx%d framebuffer\n",
+width, height);
+
+   info = framebuffer_alloc(sizeof(struct mdpy_fb_par), >dev);
+   if (!info)
+   goto err_release_regions;
+   pci_set_drvdata(pdev, info);
+   par = info->par;
+
+   info->fix = mdpy_fb_fix;
+   info->fix.smem_start = pci_resource_start(pdev, 0);
+   info->fix.smem_len = pci_resource_len(pdev, 0);
+

[PATCH v2 2/3] sample: vfio mdev display - guest driver

Guest fbdev driver for CONFIG_SAMPLE_VFIO_MDEV_MDPY.

Signed-off-by: Gerd Hoffmann 
---
 samples/vfio-mdev/mdpy-fb.c | 232 
 samples/Kconfig |   9 ++
 samples/vfio-mdev/Makefile  |   1 +
 3 files changed, 242 insertions(+)
 create mode 100644 samples/vfio-mdev/mdpy-fb.c

diff --git a/samples/vfio-mdev/mdpy-fb.c b/samples/vfio-mdev/mdpy-fb.c
new file mode 100644
index 00..2719bb2596
--- /dev/null
+++ b/samples/vfio-mdev/mdpy-fb.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Framebuffer driver for mdpy (mediated virtual pci display device).
+ *
+ * See mdpy-defs.h for device specs
+ *
+ *   (c) Gerd Hoffmann 
+ *
+ * Using some code snippets from simplefb and cirrusfb.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "mdpy-defs.h"
+
+static const struct fb_fix_screeninfo mdpy_fb_fix = {
+   .id = "mdpy-fb",
+   .type   = FB_TYPE_PACKED_PIXELS,
+   .visual = FB_VISUAL_TRUECOLOR,
+   .accel  = FB_ACCEL_NONE,
+};
+
+static const struct fb_var_screeninfo mdpy_fb_var = {
+   .height = -1,
+   .width  = -1,
+   .activate   = FB_ACTIVATE_NOW,
+   .vmode  = FB_VMODE_NONINTERLACED,
+
+   .bits_per_pixel = 32,
+   .transp.offset  = 24,
+   .red.offset = 16,
+   .green.offset   = 8,
+   .blue.offset= 0,
+   .transp.length  = 8,
+   .red.length = 8,
+   .green.length   = 8,
+   .blue.length= 8,
+};
+
+#define PSEUDO_PALETTE_SIZE 16
+
+struct mdpy_fb_par {
+   u32 palette[PSEUDO_PALETTE_SIZE];
+};
+
+static int mdpy_fb_setcolreg(u_int regno, u_int red, u_int green, u_int blue,
+ u_int transp, struct fb_info *info)
+{
+   u32 *pal = info->pseudo_palette;
+   u32 cr = red >> (16 - info->var.red.length);
+   u32 cg = green >> (16 - info->var.green.length);
+   u32 cb = blue >> (16 - info->var.blue.length);
+   u32 value, mask;
+
+   if (regno >= PSEUDO_PALETTE_SIZE)
+   return -EINVAL;
+
+   value = (cr << info->var.red.offset) |
+   (cg << info->var.green.offset) |
+   (cb << info->var.blue.offset);
+   if (info->var.transp.length > 0) {
+   mask = (1 << info->var.transp.length) - 1;
+   mask <<= info->var.transp.offset;
+   value |= mask;
+   }
+   pal[regno] = value;
+
+   return 0;
+}
+
+static void mdpy_fb_destroy(struct fb_info *info)
+{
+   if (info->screen_base)
+   iounmap(info->screen_base);
+}
+
+static struct fb_ops mdpy_fb_ops = {
+   .owner  = THIS_MODULE,
+   .fb_destroy = mdpy_fb_destroy,
+   .fb_setcolreg   = mdpy_fb_setcolreg,
+   .fb_fillrect= cfb_fillrect,
+   .fb_copyarea= cfb_copyarea,
+   .fb_imageblit   = cfb_imageblit,
+};
+
+static int mdpy_fb_probe(struct pci_dev *pdev,
+const struct pci_device_id *ent)
+{
+   struct fb_info *info;
+   struct mdpy_fb_par *par;
+   u32 format, width, height;
+   int ret;
+
+   ret = pci_enable_device(pdev);
+   if (ret < 0)
+   return ret;
+
+   ret = pci_request_regions(pdev, "mdpy-fb");
+   if (ret < 0)
+   return ret;
+
+   pci_read_config_dword(pdev, MDPY_FORMAT_OFFSET, );
+   pci_read_config_dword(pdev, MDPY_WIDTH_OFFSET,  );
+   pci_read_config_dword(pdev, MDPY_HEIGHT_OFFSET, );
+   if (format != DRM_FORMAT_XRGB) {
+   pci_err(pdev, "format mismatch (0x%x != 0x%x)\n",
+   format, DRM_FORMAT_XRGB);
+   return -EINVAL;
+   }
+   if (width < 100  || width > 1) {
+   pci_err(pdev, "width (%d) out of range\n", width);
+   return -EINVAL;
+   }
+   if (height < 100 || height > 1) {
+   pci_err(pdev, "height (%d) out of range\n", height);
+   return -EINVAL;
+   }
+   pci_info(pdev, "mdpy found: %dx%d framebuffer\n",
+width, height);
+
+   info = framebuffer_alloc(sizeof(struct mdpy_fb_par), >dev);
+   if (!info)
+   goto err_release_regions;
+   pci_set_drvdata(pdev, info);
+   par = info->par;
+
+   info->fix = mdpy_fb_fix;
+   info->fix.smem_start = pci_resource_start(pdev, 0);
+   info->fix.smem_len = pci_resource_len(pdev, 0);
+   info->fix.line_length = width * 4;

[PATCH v2 1/3] sample: vfio mdev display - host device

Simple framebuffer display, demo-ing the vfio region display interface
(VFIO_GFX_PLANE_TYPE_REGION).

Signed-off-by: Gerd Hoffmann 
---
 samples/vfio-mdev/mdpy-defs.h |  22 ++
 samples/vfio-mdev/mdpy.c  | 807 ++
 samples/Kconfig   |   8 +
 samples/vfio-mdev/Makefile|   1 +
 4 files changed, 838 insertions(+)
 create mode 100644 samples/vfio-mdev/mdpy-defs.h
 create mode 100644 samples/vfio-mdev/mdpy.c

diff --git a/samples/vfio-mdev/mdpy-defs.h b/samples/vfio-mdev/mdpy-defs.h
new file mode 100644
index 00..96b3b1b49d
--- /dev/null
+++ b/samples/vfio-mdev/mdpy-defs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Simple pci display device.
+ *
+ * Framebuffer memory is pci bar 0.
+ * Configuration (read-only) is in pci config space.
+ * Format field uses drm fourcc codes.
+ * ATM only DRM_FORMAT_XRGB is supported.
+ */
+
+/* pci ids */
+#define MDPY_PCI_VENDOR_ID 0x1b36 /* redhat */
+#define MDPY_PCI_DEVICE_ID 0x000f
+#define MDPY_PCI_SUBVENDOR_ID  PCI_SUBVENDOR_ID_REDHAT_QUMRANET
+#define MDPY_PCI_SUBDEVICE_ID  PCI_SUBDEVICE_ID_QEMU
+
+/* pci cfg space offsets for fb config (dword) */
+#define MDPY_VENDORCAP_OFFSET   0x40
+#define MDPY_VENDORCAP_SIZE 0x10
+#define MDPY_FORMAT_OFFSET (MDPY_VENDORCAP_OFFSET + 0x04)
+#define MDPY_WIDTH_OFFSET  (MDPY_VENDORCAP_OFFSET + 0x08)
+#define MDPY_HEIGHT_OFFSET (MDPY_VENDORCAP_OFFSET + 0x0c)
diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
new file mode 100644
index 00..96e7969c47
--- /dev/null
+++ b/samples/vfio-mdev/mdpy.c
@@ -0,0 +1,807 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Mediated virtual PCI display host device driver
+ *
+ * See mdpy-defs.h for device specs
+ *
+ *   (c) Gerd Hoffmann 
+ *
+ * based on mtty driver which is:
+ *   Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *  Author: Neo Jia 
+ *  Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "mdpy-defs.h"
+
+#define MDPY_NAME  "mdpy"
+#define MDPY_CLASS_NAME"mdpy"
+
+#define MDPY_CONFIG_SPACE_SIZE 0xff
+#define MDPY_MEMORY_BAR_OFFSET PAGE_SIZE
+#define MDPY_DISPLAY_REGION16
+
+#define STORE_LE16(addr, val)  (*(u16 *)addr = val)
+#define STORE_LE32(addr, val)  (*(u32 *)addr = val)
+
+
+MODULE_LICENSE("GPL v2");
+
+static int max_devices = 4;
+module_param_named(count, max_devices, int, 0444);
+MODULE_PARM_DESC(count, "number of " MDPY_NAME " devices");
+
+
+#define MDPY_TYPE_1 "vga"
+#define MDPY_TYPE_2 "xga"
+#define MDPY_TYPE_3 "hd"
+
+static const struct mdpy_type {
+   const char *name;
+   u32 format;
+   u32 bytepp;
+   u32 width;
+   u32 height;
+} mdpy_types[] = {
+   {
+   .name   = MDPY_CLASS_NAME "-" MDPY_TYPE_1,
+   .format = DRM_FORMAT_XRGB,
+   .bytepp = 4,
+   .width  = 640,
+   .height = 480,
+   }, {
+   .name   = MDPY_CLASS_NAME "-" MDPY_TYPE_2,
+   .format = DRM_FORMAT_XRGB,
+   .bytepp = 4,
+   .width  = 1024,
+   .height = 768,
+   }, {
+   .name   = MDPY_CLASS_NAME "-" MDPY_TYPE_3,
+   .format = DRM_FORMAT_XRGB,
+   .bytepp = 4,
+   .width  = 1920,
+   .height = 1080,
+   },
+};
+
+static dev_t   mdpy_devt;
+static struct class*mdpy_class;
+static struct cdev mdpy_cdev;
+static struct device   mdpy_dev;
+static u32 mdpy_count;
+
+/* State of each mdev device */
+struct mdev_state {
+   u8 *vconfig;
+   u32 bar_mask;
+   struct mutex ops_lock;
+   struct mdev_device *mdev;
+   struct vfio_device_info dev_info;
+
+   const struct mdpy_type *type;
+   u32 memsize;
+   void *memblk;
+};
+
+static const struct mdpy_type *mdpy_find_type(struct kobject *kobj)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(mdpy_types); i++)
+   if (strcmp(mdpy_types[i].name, kobj->name) == 0)
+   return mdpy_types + i;
+   return NULL;
+}
+
+static void mdpy_create_config_space(struct mdev_state *mdev_state)
+{
+   STORE_LE16((u16 *) _state->vconfig[PCI_VENDOR_ID],
+  MDPY_PCI_VENDOR_ID);
+   STORE_LE16((u16 *) _state->vconfig[PCI_DEVICE_ID],
+  MDPY_PCI_DEVICE_ID);
+   STORE_LE16((u16 *) _state->vconfig[PCI_SUBSYSTEM_VENDOR_ID],
+  MDPY_PCI_SUBVENDOR_ID);
+   STORE_LE16((u16 *) _state->vconfig[PCI_SUBSYSTEM_ID],
+  MDPY_PCI_SUBDEVICE_ID);
+
+

[PATCH V4 6/6] ARM: imx_v6_v7_defconfig: Select CONFIG_GPIO_MAX732X by default

Enable max7320 IO expander for i.MX platforms.

Signed-off-by: Anson Huang 
---
 arch/arm/configs/imx_v6_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 3a30843..8455d39 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -199,6 +199,7 @@ CONFIG_SPI_GPIO=y
 CONFIG_SPI_IMX=y
 CONFIG_SPI_FSL_DSPI=y
 CONFIG_GPIO_SYSFS=y
+CONFIG_GPIO_MAX732X=y
 CONFIG_GPIO_MC9S08DZ60=y
 CONFIG_GPIO_PCA953X=y
 CONFIG_GPIO_STMPE=y
-- 
2.7.4

[PATCH V4 6/6] ARM: imx_v6_v7_defconfig: Select CONFIG_GPIO_MAX732X by default

Enable max7320 IO expander for i.MX platforms.

Signed-off-by: Anson Huang 
---
 arch/arm/configs/imx_v6_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 3a30843..8455d39 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -199,6 +199,7 @@ CONFIG_SPI_GPIO=y
 CONFIG_SPI_IMX=y
 CONFIG_SPI_FSL_DSPI=y
 CONFIG_GPIO_SYSFS=y
+CONFIG_GPIO_MAX732X=y
 CONFIG_GPIO_MC9S08DZ60=y
 CONFIG_GPIO_PCA953X=y
 CONFIG_GPIO_STMPE=y
-- 
2.7.4

[PATCH V4 5/6] ARM: dts: imx6sx-sabreauto: add wdog external reset support

i.MX6SX Sabre Auto board has GPIO1_IO13 pin can be
MUXed as WDOG output to reset PMIC, add this function
support.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index fdc642f..ccf1cbb 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -230,6 +230,12 @@
MX6SX_PAD_KEY_COL1__GPIO2_IO_11 0x17059
>;
};
+
+   pinctrl_wdog: wdoggrp {
+   fsl,pins = <
+   MX6SX_PAD_GPIO1_IO13__WDOG1_WDOG_ANY 0x30b0
+   >;
+   };
};
 };
 
@@ -369,3 +375,9 @@
#gpio-cells = <2>;
};
 };
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_wdog>;
+   fsl,ext-reset-output;
+};
-- 
2.7.4

[PATCH V4 2/6] ARM: dts: imx6sx-sabreauto: add max7322 IO expander support

Add MAX7322 IO expander support.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index 2caca934..d59084f 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -163,6 +163,13 @@
pinctrl-0 = <_i2c2_1>;
status = "okay";
 
+   max7322: gpio@68 {
+   compatible = "maxim,max7322";
+   reg = <0x68>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+
pmic: pfuze100@08 {
compatible = "fsl,pfuze100";
reg = <0x08>;
-- 
2.7.4

[PATCH V4 5/6] ARM: dts: imx6sx-sabreauto: add wdog external reset support

i.MX6SX Sabre Auto board has GPIO1_IO13 pin can be
MUXed as WDOG output to reset PMIC, add this function
support.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index fdc642f..ccf1cbb 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -230,6 +230,12 @@
MX6SX_PAD_KEY_COL1__GPIO2_IO_11 0x17059
>;
};
+
+   pinctrl_wdog: wdoggrp {
+   fsl,pins = <
+   MX6SX_PAD_GPIO1_IO13__WDOG1_WDOG_ANY 0x30b0
+   >;
+   };
};
 };
 
@@ -369,3 +375,9 @@
#gpio-cells = <2>;
};
 };
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_wdog>;
+   fsl,ext-reset-output;
+};
-- 
2.7.4

[PATCH V4 2/6] ARM: dts: imx6sx-sabreauto: add max7322 IO expander support

Add MAX7322 IO expander support.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index 2caca934..d59084f 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -163,6 +163,13 @@
pinctrl-0 = <_i2c2_1>;
status = "okay";
 
+   max7322: gpio@68 {
+   compatible = "maxim,max7322";
+   reg = <0x68>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+
pmic: pfuze100@08 {
compatible = "fsl,pfuze100";
reg = <0x08>;
-- 
2.7.4

[PATCH V4 4/6] ARM: dts: imx6sx-sabreauto: add fec support

Add FEC support on i.MX6SX Sabre Auto board.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index 812f40b..fdc642f 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -41,6 +41,39 @@
clock-frequency = <24576000>;
 };
 
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_enet1_1>;
+   phy-mode = "rgmii";
+   phy-handle = <>;
+   fsl,magic-packet;
+   status = "okay";
+
+   mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethphy0: ethernet-phy@0 {
+   compatible = "ethernet-phy-ieee802.3-c22";
+   reg = <0>;
+   };
+
+   ethphy1: ethernet-phy@1 {
+   compatible = "ethernet-phy-ieee802.3-c22";
+   reg = <1>;
+   };
+   };
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_enet2_1>;
+   phy-mode = "rgmii";
+   phy-handle = <>;
+   fsl,magic-packet;
+   status = "okay";
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_uart1>;
@@ -75,6 +108,42 @@
  {
imx6x-sabreauto {
 
+   pinctrl_enet1_1: enet1grp-1 {
+   fsl,pins = <
+   MX6SX_PAD_ENET1_MDIO__ENET1_MDIO0xa0b1
+   MX6SX_PAD_ENET1_MDC__ENET1_MDC  0xa0b1
+   MX6SX_PAD_RGMII1_TXC__ENET1_RGMII_TXC   0xa0b9
+   MX6SX_PAD_RGMII1_TD0__ENET1_TX_DATA_0   0xa0b1
+   MX6SX_PAD_RGMII1_TD1__ENET1_TX_DATA_1   0xa0b1
+   MX6SX_PAD_RGMII1_TD2__ENET1_TX_DATA_2   0xa0b1
+   MX6SX_PAD_RGMII1_TD3__ENET1_TX_DATA_3   0xa0b1
+   MX6SX_PAD_RGMII1_TX_CTL__ENET1_TX_EN0xa0b1
+   MX6SX_PAD_RGMII1_RXC__ENET1_RX_CLK  0x3081
+   MX6SX_PAD_RGMII1_RD0__ENET1_RX_DATA_0   0x3081
+   MX6SX_PAD_RGMII1_RD1__ENET1_RX_DATA_1   0x3081
+   MX6SX_PAD_RGMII1_RD2__ENET1_RX_DATA_2   0x3081
+   MX6SX_PAD_RGMII1_RD3__ENET1_RX_DATA_3   0x3081
+   MX6SX_PAD_RGMII1_RX_CTL__ENET1_RX_EN0x3081
+   >;
+   };
+
+   pinctrl_enet2_1: enet2grp-1 {
+   fsl,pins = <
+   MX6SX_PAD_RGMII2_TXC__ENET2_RGMII_TXC   0xa0b9
+   MX6SX_PAD_RGMII2_TD0__ENET2_TX_DATA_0   0xa0b1
+   MX6SX_PAD_RGMII2_TD1__ENET2_TX_DATA_1   0xa0b1
+   MX6SX_PAD_RGMII2_TD2__ENET2_TX_DATA_2   0xa0b1
+   MX6SX_PAD_RGMII2_TD3__ENET2_TX_DATA_3   0xa0b1
+   MX6SX_PAD_RGMII2_TX_CTL__ENET2_TX_EN0xa0b1
+   MX6SX_PAD_RGMII2_RXC__ENET2_RX_CLK  0x3081
+   MX6SX_PAD_RGMII2_RD0__ENET2_RX_DATA_0   0x3081
+   MX6SX_PAD_RGMII2_RD1__ENET2_RX_DATA_1   0x3081
+   MX6SX_PAD_RGMII2_RD2__ENET2_RX_DATA_2   0x3081
+   MX6SX_PAD_RGMII2_RD3__ENET2_RX_DATA_3   0x3081
+   MX6SX_PAD_RGMII2_RX_CTL__ENET2_RX_EN0x3081
+   >;
+   };
+
pinctrl_i2c2_1: i2c2grp-1 {
fsl,pins = <
MX6SX_PAD_GPIO1_IO03__I2C2_SDA  
0x4001b8b1
-- 
2.7.4

[PATCH V4 4/6] ARM: dts: imx6sx-sabreauto: add fec support

Add FEC support on i.MX6SX Sabre Auto board.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index 812f40b..fdc642f 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -41,6 +41,39 @@
clock-frequency = <24576000>;
 };
 
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_enet1_1>;
+   phy-mode = "rgmii";
+   phy-handle = <>;
+   fsl,magic-packet;
+   status = "okay";
+
+   mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethphy0: ethernet-phy@0 {
+   compatible = "ethernet-phy-ieee802.3-c22";
+   reg = <0>;
+   };
+
+   ethphy1: ethernet-phy@1 {
+   compatible = "ethernet-phy-ieee802.3-c22";
+   reg = <1>;
+   };
+   };
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_enet2_1>;
+   phy-mode = "rgmii";
+   phy-handle = <>;
+   fsl,magic-packet;
+   status = "okay";
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_uart1>;
@@ -75,6 +108,42 @@
  {
imx6x-sabreauto {
 
+   pinctrl_enet1_1: enet1grp-1 {
+   fsl,pins = <
+   MX6SX_PAD_ENET1_MDIO__ENET1_MDIO0xa0b1
+   MX6SX_PAD_ENET1_MDC__ENET1_MDC  0xa0b1
+   MX6SX_PAD_RGMII1_TXC__ENET1_RGMII_TXC   0xa0b9
+   MX6SX_PAD_RGMII1_TD0__ENET1_TX_DATA_0   0xa0b1
+   MX6SX_PAD_RGMII1_TD1__ENET1_TX_DATA_1   0xa0b1
+   MX6SX_PAD_RGMII1_TD2__ENET1_TX_DATA_2   0xa0b1
+   MX6SX_PAD_RGMII1_TD3__ENET1_TX_DATA_3   0xa0b1
+   MX6SX_PAD_RGMII1_TX_CTL__ENET1_TX_EN0xa0b1
+   MX6SX_PAD_RGMII1_RXC__ENET1_RX_CLK  0x3081
+   MX6SX_PAD_RGMII1_RD0__ENET1_RX_DATA_0   0x3081
+   MX6SX_PAD_RGMII1_RD1__ENET1_RX_DATA_1   0x3081
+   MX6SX_PAD_RGMII1_RD2__ENET1_RX_DATA_2   0x3081
+   MX6SX_PAD_RGMII1_RD3__ENET1_RX_DATA_3   0x3081
+   MX6SX_PAD_RGMII1_RX_CTL__ENET1_RX_EN0x3081
+   >;
+   };
+
+   pinctrl_enet2_1: enet2grp-1 {
+   fsl,pins = <
+   MX6SX_PAD_RGMII2_TXC__ENET2_RGMII_TXC   0xa0b9
+   MX6SX_PAD_RGMII2_TD0__ENET2_TX_DATA_0   0xa0b1
+   MX6SX_PAD_RGMII2_TD1__ENET2_TX_DATA_1   0xa0b1
+   MX6SX_PAD_RGMII2_TD2__ENET2_TX_DATA_2   0xa0b1
+   MX6SX_PAD_RGMII2_TD3__ENET2_TX_DATA_3   0xa0b1
+   MX6SX_PAD_RGMII2_TX_CTL__ENET2_TX_EN0xa0b1
+   MX6SX_PAD_RGMII2_RXC__ENET2_RX_CLK  0x3081
+   MX6SX_PAD_RGMII2_RD0__ENET2_RX_DATA_0   0x3081
+   MX6SX_PAD_RGMII2_RD1__ENET2_RX_DATA_1   0x3081
+   MX6SX_PAD_RGMII2_RD2__ENET2_RX_DATA_2   0x3081
+   MX6SX_PAD_RGMII2_RD3__ENET2_RX_DATA_3   0x3081
+   MX6SX_PAD_RGMII2_RX_CTL__ENET2_RX_EN0x3081
+   >;
+   };
+
pinctrl_i2c2_1: i2c2grp-1 {
fsl,pins = <
MX6SX_PAD_GPIO1_IO03__I2C2_SDA  
0x4001b8b1
-- 
2.7.4

[PATCH V4 3/6] ARM: dts: imx6sx-sabreauto: add IO expander max7310 support

i.MX6SX Sabre Auto board has two max7310 IO expander
on I2C3 bus, add support for them.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index d59084f..812f40b 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -82,6 +82,13 @@
>;
};
 
+   pinctrl_i2c3_2: i2c3grp-2 {
+   fsl,pins = <
+   MX6SX_PAD_KEY_ROW4__I2C3_SDA
0x4001b8b1
+   MX6SX_PAD_KEY_COL4__I2C3_SCL
0x4001b8b1
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX6SX_PAD_GPIO1_IO04__UART1_TX  0x1b0b1
@@ -272,3 +279,24 @@
};
};
 };
+
+ {
+   clock-frequency = <10>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c3_2>;
+   status = "okay";
+
+   max7310_a: gpio@30 {
+   compatible = "maxim,max7310";
+   reg = <0x30>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+
+   max7310_b: gpio@32 {
+   compatible = "maxim,max7310";
+   reg = <0x32>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+};
-- 
2.7.4

[PATCH V4 3/6] ARM: dts: imx6sx-sabreauto: add IO expander max7310 support

i.MX6SX Sabre Auto board has two max7310 IO expander
on I2C3 bus, add support for them.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index d59084f..812f40b 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -82,6 +82,13 @@
>;
};
 
+   pinctrl_i2c3_2: i2c3grp-2 {
+   fsl,pins = <
+   MX6SX_PAD_KEY_ROW4__I2C3_SDA
0x4001b8b1
+   MX6SX_PAD_KEY_COL4__I2C3_SCL
0x4001b8b1
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX6SX_PAD_GPIO1_IO04__UART1_TX  0x1b0b1
@@ -272,3 +279,24 @@
};
};
 };
+
+ {
+   clock-frequency = <10>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c3_2>;
+   status = "okay";
+
+   max7310_a: gpio@30 {
+   compatible = "maxim,max7310";
+   reg = <0x30>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+
+   max7310_b: gpio@32 {
+   compatible = "maxim,max7310";
+   reg = <0x32>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+};
-- 
2.7.4

[PATCH V4 1/6] ARM: dts: imx6sx-sabreauto: add PMIC support

Add pfuze100 support on i.MX6SX Sabre Auto board.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 117 +
 1 file changed, 117 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index 57d1ea0..2caca934 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -74,6 +74,14 @@
 
  {
imx6x-sabreauto {
+
+   pinctrl_i2c2_1: i2c2grp-1 {
+   fsl,pins = <
+   MX6SX_PAD_GPIO1_IO03__I2C2_SDA  
0x4001b8b1
+   MX6SX_PAD_GPIO1_IO02__I2C2_SCL  
0x4001b8b1
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX6SX_PAD_GPIO1_IO04__UART1_TX  0x1b0b1
@@ -148,3 +156,112 @@
};
};
 };
+
+ {
+   clock-frequency = <10>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c2_1>;
+   status = "okay";
+
+   pmic: pfuze100@08 {
+   compatible = "fsl,pfuze100";
+   reg = <0x08>;
+
+   regulators {
+   sw1a_reg: sw1ab {
+   regulator-min-microvolt = <30>;
+   regulator-max-microvolt = <1875000>;
+   regulator-boot-on;
+   regulator-always-on;
+   regulator-ramp-delay = <6250>;
+   };
+
+   sw1c_reg: sw1c {
+   regulator-min-microvolt = <30>;
+   regulator-max-microvolt = <1875000>;
+   regulator-boot-on;
+   regulator-always-on;
+   regulator-ramp-delay = <6250>;
+   };
+
+   sw2_reg: sw2 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <330>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw3a_reg: sw3a {
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1975000>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw3b_reg: sw3b {
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1975000>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw4_reg: sw4 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   swbst_reg: swbst {
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <515>;
+   };
+
+   snvs_reg: vsnvs {
+   regulator-min-microvolt = <100>;
+   regulator-max-microvolt = <300>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   vref_reg: vrefddr {
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   vgen1_reg: vgen1 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <155>;
+   regulator-always-on;
+   };
+
+   vgen2_reg: vgen2 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <155>;
+   };
+
+   vgen3_reg: vgen3 {
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vgen4_reg: vgen4 {
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vgen5_reg: vgen5 {
+

[PATCH v2 3/3] sample: vfio bochs vbe display (host device for bochs-drm)

Display device, demo-ing the vfio dmabuf display interface
(VFIO_GFX_PLANE_TYPE_DMABUF).  Compatible enough to qemu stdvga
that bochs-drm.ko can be used as guest driver.

Signed-off-by: Gerd Hoffmann 
---
 samples/vfio-mdev/mbochs.c | 1406 
 samples/Kconfig|   13 +
 samples/vfio-mdev/Makefile |1 +
 3 files changed, 1420 insertions(+)
 create mode 100644 samples/vfio-mdev/mbochs.c

diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
new file mode 100644
index 00..f8d66def5b
--- /dev/null
+++ b/samples/vfio-mdev/mbochs.c
@@ -0,0 +1,1406 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Mediated virtual PCI display host device driver
+ *
+ * Emulate enough of qemu stdvga to make bochs-drm.ko happy.  That is
+ * basically the vram memory bar and the bochs dispi interface vbe
+ * registers in the mmio register bar. Specifically it does *not*
+ * include any legacy vga stuff.  Device looks alot like "qemu -device
+ * secondary-vga".
+ *
+ *   (c) Gerd Hoffmann 
+ *
+ * based on mtty driver which is:
+ *   Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *  Author: Neo Jia 
+ *  Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define VBE_DISPI_INDEX_ID 0x0
+#define VBE_DISPI_INDEX_XRES   0x1
+#define VBE_DISPI_INDEX_YRES   0x2
+#define VBE_DISPI_INDEX_BPP0x3
+#define VBE_DISPI_INDEX_ENABLE 0x4
+#define VBE_DISPI_INDEX_BANK   0x5
+#define VBE_DISPI_INDEX_VIRT_WIDTH 0x6
+#define VBE_DISPI_INDEX_VIRT_HEIGHT0x7
+#define VBE_DISPI_INDEX_X_OFFSET   0x8
+#define VBE_DISPI_INDEX_Y_OFFSET   0x9
+#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa
+#define VBE_DISPI_INDEX_COUNT  0xb
+
+#define VBE_DISPI_ID0  0xB0C0
+#define VBE_DISPI_ID1  0xB0C1
+#define VBE_DISPI_ID2  0xB0C2
+#define VBE_DISPI_ID3  0xB0C3
+#define VBE_DISPI_ID4  0xB0C4
+#define VBE_DISPI_ID5  0xB0C5
+
+#define VBE_DISPI_DISABLED 0x00
+#define VBE_DISPI_ENABLED  0x01
+#define VBE_DISPI_GETCAPS  0x02
+#define VBE_DISPI_8BIT_DAC 0x20
+#define VBE_DISPI_LFB_ENABLED  0x40
+#define VBE_DISPI_NOCLEARMEM   0x80
+
+
+#define MBOCHS_NAME  "mbochs"
+#define MBOCHS_CLASS_NAME"mbochs"
+
+#define MBOCHS_CONFIG_SPACE_SIZE  0xff
+#define MBOCHS_MMIO_BAR_OFFSET   PAGE_SIZE
+#define MBOCHS_MMIO_BAR_SIZE PAGE_SIZE
+#define MBOCHS_MEMORY_BAR_OFFSET  (MBOCHS_MMIO_BAR_OFFSET + \
+  MBOCHS_MMIO_BAR_SIZE)
+
+#define STORE_LE16(addr, val)  (*(u16 *)addr = val)
+#define STORE_LE32(addr, val)  (*(u32 *)addr = val)
+
+
+MODULE_LICENSE("GPL v2");
+
+static int max_mbytes = 256;
+module_param_named(count, max_mbytes, int, 0444);
+MODULE_PARM_DESC(mem, "megabytes available to " MBOCHS_NAME " devices");
+
+
+#define MBOCHS_TYPE_1 "small"
+#define MBOCHS_TYPE_2 "medium"
+#define MBOCHS_TYPE_3 "large"
+
+static const struct mbochs_type {
+   const char *name;
+   u32 mbytes;
+} mbochs_types[] = {
+   {
+   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_1,
+   .mbytes = 4,
+   }, {
+   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_2,
+   .mbytes = 16,
+   }, {
+   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_3,
+   .mbytes = 64,
+   },
+};
+
+
+static dev_t   mbochs_devt;
+static struct class*mbochs_class;
+static struct cdev mbochs_cdev;
+static struct device   mbochs_dev;
+static int mbochs_used_mbytes;
+
+struct mbochs_mode {
+   u32 drm_format;
+   u32 bytepp;
+   u32 width;
+   u32 height;
+   u32 stride;
+   u32 __pad;
+   u64 offset;
+   u64 size;
+};
+
+struct mbochs_dmabuf {
+   struct mbochs_mode mode;
+   u32 id;
+   struct page **pages;
+   pgoff_t pagecount;
+   struct dma_buf *buf;
+   struct mdev_state *mdev_state;
+   struct list_head next;
+   bool unlinked;
+};
+
+/* State of each mdev device */
+struct mdev_state {
+   u8 *vconfig;
+   u64 bar_mask[3];
+   u32 memory_bar_mask;
+   struct mutex ops_lock;
+   struct mdev_device *mdev;
+   struct vfio_device_info dev_info;
+
+   const struct mbochs_type *type;
+   u16 vbe[VBE_DISPI_INDEX_COUNT];
+   u64 memsize;
+   struct page **pages;
+   pgoff_t

[PATCH V4 1/6] ARM: dts: imx6sx-sabreauto: add PMIC support

Add pfuze100 support on i.MX6SX Sabre Auto board.

Signed-off-by: Anson Huang 
---
no changes.
 arch/arm/boot/dts/imx6sx-sabreauto.dts | 117 +
 1 file changed, 117 insertions(+)

diff --git a/arch/arm/boot/dts/imx6sx-sabreauto.dts 
b/arch/arm/boot/dts/imx6sx-sabreauto.dts
index 57d1ea0..2caca934 100644
--- a/arch/arm/boot/dts/imx6sx-sabreauto.dts
+++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts
@@ -74,6 +74,14 @@
 
  {
imx6x-sabreauto {
+
+   pinctrl_i2c2_1: i2c2grp-1 {
+   fsl,pins = <
+   MX6SX_PAD_GPIO1_IO03__I2C2_SDA  
0x4001b8b1
+   MX6SX_PAD_GPIO1_IO02__I2C2_SCL  
0x4001b8b1
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX6SX_PAD_GPIO1_IO04__UART1_TX  0x1b0b1
@@ -148,3 +156,112 @@
};
};
 };
+
+ {
+   clock-frequency = <10>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c2_1>;
+   status = "okay";
+
+   pmic: pfuze100@08 {
+   compatible = "fsl,pfuze100";
+   reg = <0x08>;
+
+   regulators {
+   sw1a_reg: sw1ab {
+   regulator-min-microvolt = <30>;
+   regulator-max-microvolt = <1875000>;
+   regulator-boot-on;
+   regulator-always-on;
+   regulator-ramp-delay = <6250>;
+   };
+
+   sw1c_reg: sw1c {
+   regulator-min-microvolt = <30>;
+   regulator-max-microvolt = <1875000>;
+   regulator-boot-on;
+   regulator-always-on;
+   regulator-ramp-delay = <6250>;
+   };
+
+   sw2_reg: sw2 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <330>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw3a_reg: sw3a {
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1975000>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw3b_reg: sw3b {
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1975000>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw4_reg: sw4 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   swbst_reg: swbst {
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <515>;
+   };
+
+   snvs_reg: vsnvs {
+   regulator-min-microvolt = <100>;
+   regulator-max-microvolt = <300>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   vref_reg: vrefddr {
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   vgen1_reg: vgen1 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <155>;
+   regulator-always-on;
+   };
+
+   vgen2_reg: vgen2 {
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <155>;
+   };
+
+   vgen3_reg: vgen3 {
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vgen4_reg: vgen4 {
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vgen5_reg: vgen5 {
+   regulator-min-microvolt =

[PATCH v2 3/3] sample: vfio bochs vbe display (host device for bochs-drm)

Display device, demo-ing the vfio dmabuf display interface
(VFIO_GFX_PLANE_TYPE_DMABUF).  Compatible enough to qemu stdvga
that bochs-drm.ko can be used as guest driver.

Signed-off-by: Gerd Hoffmann 
---
 samples/vfio-mdev/mbochs.c | 1406 
 samples/Kconfig|   13 +
 samples/vfio-mdev/Makefile |1 +
 3 files changed, 1420 insertions(+)
 create mode 100644 samples/vfio-mdev/mbochs.c

diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
new file mode 100644
index 00..f8d66def5b
--- /dev/null
+++ b/samples/vfio-mdev/mbochs.c
@@ -0,0 +1,1406 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Mediated virtual PCI display host device driver
+ *
+ * Emulate enough of qemu stdvga to make bochs-drm.ko happy.  That is
+ * basically the vram memory bar and the bochs dispi interface vbe
+ * registers in the mmio register bar. Specifically it does *not*
+ * include any legacy vga stuff.  Device looks alot like "qemu -device
+ * secondary-vga".
+ *
+ *   (c) Gerd Hoffmann 
+ *
+ * based on mtty driver which is:
+ *   Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *  Author: Neo Jia 
+ *  Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define VBE_DISPI_INDEX_ID 0x0
+#define VBE_DISPI_INDEX_XRES   0x1
+#define VBE_DISPI_INDEX_YRES   0x2
+#define VBE_DISPI_INDEX_BPP0x3
+#define VBE_DISPI_INDEX_ENABLE 0x4
+#define VBE_DISPI_INDEX_BANK   0x5
+#define VBE_DISPI_INDEX_VIRT_WIDTH 0x6
+#define VBE_DISPI_INDEX_VIRT_HEIGHT0x7
+#define VBE_DISPI_INDEX_X_OFFSET   0x8
+#define VBE_DISPI_INDEX_Y_OFFSET   0x9
+#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa
+#define VBE_DISPI_INDEX_COUNT  0xb
+
+#define VBE_DISPI_ID0  0xB0C0
+#define VBE_DISPI_ID1  0xB0C1
+#define VBE_DISPI_ID2  0xB0C2
+#define VBE_DISPI_ID3  0xB0C3
+#define VBE_DISPI_ID4  0xB0C4
+#define VBE_DISPI_ID5  0xB0C5
+
+#define VBE_DISPI_DISABLED 0x00
+#define VBE_DISPI_ENABLED  0x01
+#define VBE_DISPI_GETCAPS  0x02
+#define VBE_DISPI_8BIT_DAC 0x20
+#define VBE_DISPI_LFB_ENABLED  0x40
+#define VBE_DISPI_NOCLEARMEM   0x80
+
+
+#define MBOCHS_NAME  "mbochs"
+#define MBOCHS_CLASS_NAME"mbochs"
+
+#define MBOCHS_CONFIG_SPACE_SIZE  0xff
+#define MBOCHS_MMIO_BAR_OFFSET   PAGE_SIZE
+#define MBOCHS_MMIO_BAR_SIZE PAGE_SIZE
+#define MBOCHS_MEMORY_BAR_OFFSET  (MBOCHS_MMIO_BAR_OFFSET + \
+  MBOCHS_MMIO_BAR_SIZE)
+
+#define STORE_LE16(addr, val)  (*(u16 *)addr = val)
+#define STORE_LE32(addr, val)  (*(u32 *)addr = val)
+
+
+MODULE_LICENSE("GPL v2");
+
+static int max_mbytes = 256;
+module_param_named(count, max_mbytes, int, 0444);
+MODULE_PARM_DESC(mem, "megabytes available to " MBOCHS_NAME " devices");
+
+
+#define MBOCHS_TYPE_1 "small"
+#define MBOCHS_TYPE_2 "medium"
+#define MBOCHS_TYPE_3 "large"
+
+static const struct mbochs_type {
+   const char *name;
+   u32 mbytes;
+} mbochs_types[] = {
+   {
+   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_1,
+   .mbytes = 4,
+   }, {
+   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_2,
+   .mbytes = 16,
+   }, {
+   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_3,
+   .mbytes = 64,
+   },
+};
+
+
+static dev_t   mbochs_devt;
+static struct class*mbochs_class;
+static struct cdev mbochs_cdev;
+static struct device   mbochs_dev;
+static int mbochs_used_mbytes;
+
+struct mbochs_mode {
+   u32 drm_format;
+   u32 bytepp;
+   u32 width;
+   u32 height;
+   u32 stride;
+   u32 __pad;
+   u64 offset;
+   u64 size;
+};
+
+struct mbochs_dmabuf {
+   struct mbochs_mode mode;
+   u32 id;
+   struct page **pages;
+   pgoff_t pagecount;
+   struct dma_buf *buf;
+   struct mdev_state *mdev_state;
+   struct list_head next;
+   bool unlinked;
+};
+
+/* State of each mdev device */
+struct mdev_state {
+   u8 *vconfig;
+   u64 bar_mask[3];
+   u32 memory_bar_mask;
+   struct mutex ops_lock;
+   struct mdev_device *mdev;
+   struct vfio_device_info dev_info;
+
+   const struct mbochs_type *type;
+   u16 vbe[VBE_DISPI_INDEX_COUNT];
+   u64 memsize;
+   struct page **pages;
+   pgoff_t pagecount;
+
+   struct list_head dmabufs;
+   u32 active_id;
+   u32

Re: [PATCH v3 1/2] tty/nozomi: cleanup DUMP() macro

On Tue, Apr 24, 2018 at 12:39:14PM -1000, Joey Pabalinas wrote:
> Replace snprint() with strscpy() and use min_t() instead of
> the conditional operator to clamp buffer length.
> 
> Signed-off-by: Joey Pabalinas 
> 
>  1 file changed, 13 insertions(+), 13 deletions(-)

Again, this is odd...

How are you sending these patches?  How are you creating them?  What is
taking part of the diffstat off and just leaving that line?

thanks,

greg k-h

Re: [PATCH v3 1/2] tty/nozomi: cleanup DUMP() macro

On Tue, Apr 24, 2018 at 12:39:14PM -1000, Joey Pabalinas wrote:
> Replace snprint() with strscpy() and use min_t() instead of
> the conditional operator to clamp buffer length.
> 
> Signed-off-by: Joey Pabalinas 
> 
>  1 file changed, 13 insertions(+), 13 deletions(-)

Again, this is odd...

How are you sending these patches?  How are you creating them?  What is
taking part of the diffstat off and just leaving that line?

thanks,

greg k-h

Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)

2018-04-24 Thread Paul Menzel


Dear Bart,


Am 24.04.2018 um 23:17 schrieb Bart Van Assche:

On Tue, 2018-04-24 at 23:04 +0200, Paul Menzel wrote:

I applied your change, and rebuilt the Linux kernel. Unfortunately, it
looks like, it didn’t make a difference.


In that case I don't know what is causing the failure. Can you run a bisect
to determine which commit introduced this regression?


With `scsi_mod.use_blk_mq=n` the system resumes fine, so for to me 
unknown reasons, that Kconfig option get selected in my Linux kernel 
configuration. I remember having similar issues when this was enabled by 
default in Linux 4.13-rc?, so it was just a configuration problem and 
not a regression. Unfortunately, the Linux configuration files are not 
under version control, so I cannot check, but it is probably my fault.


Sorry for the noise, and please tell me, what I can do to get the option 
working on this old device.



Kind regards,

Paul

Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)

2018-04-24 Thread Paul Menzel


Dear Bart,


Am 24.04.2018 um 23:17 schrieb Bart Van Assche:

On Tue, 2018-04-24 at 23:04 +0200, Paul Menzel wrote:

I applied your change, and rebuilt the Linux kernel. Unfortunately, it
looks like, it didn’t make a difference.


In that case I don't know what is causing the failure. Can you run a bisect
to determine which commit introduced this regression?


With `scsi_mod.use_blk_mq=n` the system resumes fine, so for to me 
unknown reasons, that Kconfig option get selected in my Linux kernel 
configuration. I remember having similar issues when this was enabled by 
default in Linux 4.13-rc?, so it was just a configuration problem and 
not a regression. Unfortunately, the Linux configuration files are not 
under version control, so I cannot check, but it is probably my fault.


Sorry for the noise, and please tell me, what I can do to get the option 
working on this old device.



Kind regards,

Paul

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

On 2018-04-24 20:26, Alan Tull wrote:
> On Tue, Apr 24, 2018 at 11:08 AM, Alan Tull  wrote:
>> On Tue, Apr 24, 2018 at 12:29 AM, Jan Kiszka  wrote:
>>>
>>> We have drivers/fpga/of-fpga-region.c in-tree, and that does not seem to
>>> store any pointers to objects, rather consumes them in-place. And I
>>> would consider it fair to impose such a limitation on the notifier
>>> interface.
>>
>> The FPGA code was written assuming that overlays could be removed.
> 
> To be more specific, drivers/fpga/of-fpga-region.c currently saves a
> pointer to the overlay and uses it only during the pre-apply
> notification.

Even more exactly: You are saving a reference during pre-apply and
freeing that one again on the corresponding post-remove. That is what I
would have expected as normal usage of the notification API as well.

Jan

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

On 2018-04-24 20:26, Alan Tull wrote:
> On Tue, Apr 24, 2018 at 11:08 AM, Alan Tull  wrote:
>> On Tue, Apr 24, 2018 at 12:29 AM, Jan Kiszka  wrote:
>>>
>>> We have drivers/fpga/of-fpga-region.c in-tree, and that does not seem to
>>> store any pointers to objects, rather consumes them in-place. And I
>>> would consider it fair to impose such a limitation on the notifier
>>> interface.
>>
>> The FPGA code was written assuming that overlays could be removed.
> 
> To be more specific, drivers/fpga/of-fpga-region.c currently saves a
> pointer to the overlay and uses it only during the pre-apply
> notification.

Even more exactly: You are saving a reference during pre-apply and
freeing that one again on the corresponding post-remove. That is what I
would have expected as normal usage of the notification API as well.

Jan

Re: [PATCH RFC v5] pidns: introduce syscall translate_pid

2018-04-24 Thread Konstantin Khlebnikov

On 23.04.2018 20:37, Nagarathnam Muthusamy wrote:

On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote:

On 05.04.2018 01:29, Eric W. Biederman wrote:

Nagarathnam Muthusamy writes:

On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote:

Each process have different pids, one for each pid namespace it belongs.
When interaction happens within single pid-ns translation isn't required.
More complicated scenarios needs special handling.

For example:
- reading pid-files or logs written inside container with pid namespace
- attaching with ptrace to tasks from different pid namespace
- passing pids across pid namespaces in any kind of API

Currently there are several interfaces that could be used here:

Pid namespaces are identified by inode number of /proc/[pid]/ns/pid.

Using the inode number in interfaces is not an option. Especially not
withou referencing the device number for the filesystem as well.

This is supposed to be single-instance fs,
not part of proc but referenced but its magic "symlinks".

Device numbers are not mentioned in "man namespaces".

Pids for nested Pid namespaces are shown in file /proc/[pid]/status.
In some cases conversion pid -> vpid could be easily done using this
information, but backward translation requires scanning all tasks.

Unix socket automatically translates pid attached to SCM_CREDENTIALS.
This requires CAP_SYS_ADMIN for sending arbitrary pids and entering
into pid namespace, this expose process and could be insecure.

This patch adds new syscall for converting pids between pid namespaces:

pid_t translate_pid(pid_t pid, int source_type, int source,
int target_type, int target);

@source_type and @target_type defines type of following arguments:

TRANSLATE_PID_CURRENT_PIDNS - current pid namespace, argument is unused
TRANSLATE_PID_TASK_PIDNS - task pid-ns, argument is task pid

I believe using pid to represent the namespace has been already
discussed in V1 of this patch in https://lkml.org/lkml/2015/9/22/1087
after which we moved on to fd based version of this interface.

Or in short why is the case of pids important?

You Konstantin you almost said why they were important in your message
saying you were going to send this one. However you don't explain in
your description why you want to identify pid namespaces by pid.

Open of /proc/[pid]/ns/pid requires same permissions as ptrace,
pid based variant doesn't have such restrictions.

Can you provide more information on usecase requiring PID translation but not
used for tracing related purposes?

Any introspection for [nested] containers. It's easier to work when you have
all information when you don't have any.
For example our CMS https://github.com/yandex/porto allows to start nested sub-container (or even deeper) by request from any container and
have to tell back which pid task is have. And it could translate any pid inside into accessible by client and vice versa.

On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS and TRANSLATE_PID_FD_PIDNS integrated first and then possibly extend the
interface to include TRANSLATE_PID_TASK_PIDNS in future?

I don't see reason for this separation.
Pids and pid namespaces are part of the API for a long time.

Thanks,
Nagarathnam.

Most pid-based syscalls are racy in some cases but they are
here for decades and everybody knowns how to deal with it.
So, I've decided to merge both worlds in one interface which clearly tells what
to expect.

Re: [PATCH RFC v5] pidns: introduce syscall translate_pid

2018-04-24 Thread Konstantin Khlebnikov

On 23.04.2018 20:37, Nagarathnam Muthusamy wrote:

On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote:

On 05.04.2018 01:29, Eric W. Biederman wrote:

Nagarathnam Muthusamy writes:

On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote:

Each process have different pids, one for each pid namespace it belongs.
When interaction happens within single pid-ns translation isn't required.
More complicated scenarios needs special handling.

Currently there are several interfaces that could be used here:

Pid namespaces are identified by inode number of /proc/[pid]/ns/pid.

Using the inode number in interfaces is not an option. Especially not
withou referencing the device number for the filesystem as well.

This is supposed to be single-instance fs,
not part of proc but referenced but its magic "symlinks".

Device numbers are not mentioned in "man namespaces".

This patch adds new syscall for converting pids between pid namespaces:

pid_t translate_pid(pid_t pid, int source_type, int source,
int target_type, int target);

@source_type and @target_type defines type of following arguments:

TRANSLATE_PID_CURRENT_PIDNS - current pid namespace, argument is unused
TRANSLATE_PID_TASK_PIDNS - task pid-ns, argument is task pid

I believe using pid to represent the namespace has been already
discussed in V1 of this patch in https://lkml.org/lkml/2015/9/22/1087
after which we moved on to fd based version of this interface.

Or in short why is the case of pids important?

Open of /proc/[pid]/ns/pid requires same permissions as ptrace,
pid based variant doesn't have such restrictions.

Can you provide more information on usecase requiring PID translation but not
used for tracing related purposes?

On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS and TRANSLATE_PID_FD_PIDNS integrated first and then possibly extend the
interface to include TRANSLATE_PID_TASK_PIDNS in future?

I don't see reason for this separation.
Pids and pid namespaces are part of the API for a long time.

Thanks,
Nagarathnam.

RE: [PATCH 4/5] ARM: dts: imx6sx-sabreauto: add fec support

Hi, Fabio

Anson Huang
Best Regards!


> -Original Message-
> From: Anson Huang
> Sent: Wednesday, April 25, 2018 11:30 AM
> To: 'Fabio Estevam' 
> Cc: Shawn Guo ; Sascha Hauer
> ; Fabio Estevam ; Rob
> Herring ; Mark Rutland ;
> dl-linux-imx ; moderated list:ARM/FREESCALE IMX / MXC
> ARM ARCHITECTURE ; open list:OPEN
> FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> ; linux-kernel 
> Subject: RE: [PATCH 4/5] ARM: dts: imx6sx-sabreauto: add fec support
> 
> Hi, Fabio
> 
> Anson Huang
> Best Regards!
> 
> 
> > -Original Message-
> > From: Fabio Estevam [mailto:feste...@gmail.com]
> > Sent: Tuesday, April 24, 2018 8:23 PM
> > To: Anson Huang 
> > Cc: Shawn Guo ; Sascha Hauer
> > ; Fabio Estevam ; Rob
> > Herring ; Mark Rutland ;
> > dl-linux-imx ; moderated list:ARM/FREESCALE IMX /
> > MXC ARM ARCHITECTURE ; open
> > list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> > ; linux-kernel
> > 
> > Subject: Re: [PATCH 4/5] ARM: dts: imx6sx-sabreauto: add fec support
> >
> > Hi Anson,
> >
> > On Mon, Apr 23, 2018 at 11:09 PM, Anson Huang 
> > wrote:
> >
> > > Ah, yes, thanks for pointing out this issue, I just removed it and
> > > the function is still working, already sent out V2 patch set, thanks.
> >
> > So now maybe it is working only because the bootloader activated this
> > GPIO, which is not good.
> >
> > I don't have access to the mx6sx sabreauto schematics to verify  where
> >  0 connects to, but it would be better to make sure that you
> > activate this pin in dts if it is needed for Ethernet, without relying on 
> > the
> bootloader.
> 
> It is working by default hardware settings, but I agree we should do it in 
> dts, I
> found it has a lot of dependency if we want to enable the gpio reset for FEC1,
> many gpio reset patch missed in upstream kernel, need patch work for
> MAX7322 first, so I plan to remove FEC support in this patch set, and will
> upstream the MAX7322 reset patch first, then will add FEC support after
> MAX7322 patch done. Will send out V3 patch to drop fec support for now,
> thanks.
> 
> Anson.

Sorry, I made a mistake here, the MAX7320 IO0 is for adjusting FEC1's voltage,
NOT reset, and when I tested MAX7320 driver, I did NOT notice that the
CONFIG_GPIO_MAX732X is NOT enabled in imx_v6_v7_defconfig, so I thought
it is NOT working, after enabling MAX7320 driver, max7320 is working as 
expected,
will send out another patch set including fec driver and MAX7320 config, thanks 
for your patience.

Anson.

> 
> 
> 
>

RE: [PATCH 4/5] ARM: dts: imx6sx-sabreauto: add fec support

Hi, Fabio

Anson Huang
Best Regards!


> -Original Message-
> From: Anson Huang
> Sent: Wednesday, April 25, 2018 11:30 AM
> To: 'Fabio Estevam' 
> Cc: Shawn Guo ; Sascha Hauer
> ; Fabio Estevam ; Rob
> Herring ; Mark Rutland ;
> dl-linux-imx ; moderated list:ARM/FREESCALE IMX / MXC
> ARM ARCHITECTURE ; open list:OPEN
> FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> ; linux-kernel 
> Subject: RE: [PATCH 4/5] ARM: dts: imx6sx-sabreauto: add fec support
> 
> Hi, Fabio
> 
> Anson Huang
> Best Regards!
> 
> 
> > -Original Message-
> > From: Fabio Estevam [mailto:feste...@gmail.com]
> > Sent: Tuesday, April 24, 2018 8:23 PM
> > To: Anson Huang 
> > Cc: Shawn Guo ; Sascha Hauer
> > ; Fabio Estevam ; Rob
> > Herring ; Mark Rutland ;
> > dl-linux-imx ; moderated list:ARM/FREESCALE IMX /
> > MXC ARM ARCHITECTURE ; open
> > list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> > ; linux-kernel
> > 
> > Subject: Re: [PATCH 4/5] ARM: dts: imx6sx-sabreauto: add fec support
> >
> > Hi Anson,
> >
> > On Mon, Apr 23, 2018 at 11:09 PM, Anson Huang 
> > wrote:
> >
> > > Ah, yes, thanks for pointing out this issue, I just removed it and
> > > the function is still working, already sent out V2 patch set, thanks.
> >
> > So now maybe it is working only because the bootloader activated this
> > GPIO, which is not good.
> >
> > I don't have access to the mx6sx sabreauto schematics to verify  where
> >  0 connects to, but it would be better to make sure that you
> > activate this pin in dts if it is needed for Ethernet, without relying on 
> > the
> bootloader.
> 
> It is working by default hardware settings, but I agree we should do it in 
> dts, I
> found it has a lot of dependency if we want to enable the gpio reset for FEC1,
> many gpio reset patch missed in upstream kernel, need patch work for
> MAX7322 first, so I plan to remove FEC support in this patch set, and will
> upstream the MAX7322 reset patch first, then will add FEC support after
> MAX7322 patch done. Will send out V3 patch to drop fec support for now,
> thanks.
> 
> Anson.

Sorry, I made a mistake here, the MAX7320 IO0 is for adjusting FEC1's voltage,
NOT reset, and when I tested MAX7320 driver, I did NOT notice that the
CONFIG_GPIO_MAX732X is NOT enabled in imx_v6_v7_defconfig, so I thought
it is NOT working, after enabling MAX7320 driver, max7320 is working as 
expected,
will send out another patch set including fec driver and MAX7320 config, thanks 
for your patience.

Anson.

> 
> 
> 
>

Re: [PATCH] printk: Ratelimit messages printed by console drivers

2018-04-24 Thread Sergey Senozhatsky

On (04/23/18 14:45), Petr Mladek wrote:
[..]
> I am not sure how slow are the slowest consoles. If I take that
> everything should be faster than 1200 bauds. Then 10 minutes
> should be enough for 1000 lines and 80 characters per-line:

Well, the problem with the numbers is that they are too... simple...
let me put it this way.

What if I don't have a slow serial console? Or what if I have NMI
watchdog set to 40 seconds? Or what if I don't have NMIs at all?
Why am I all of a sudden limited by "1200 bauds"?

Another problem is that we limit the *wrong* thing.

Not only because we can [and probably need to] rate-limit the misbehaving
code that calls printk()-s, instead of printk(). But because we claim
that we limit the "number of lines" added recursively. This is wrong.
We limit the number of times vprintk_func() was called, which is != the
number of added lines. Because vprintk_func() is also called for pr_cont()
or printk(KERN_CONT) or printk("missing new line"). Backtraces contain
tons and tons of pr_cont()-s - registers print out, list of modules
print out, stack print out, code print out. Even this thing at the
bottom of a trace:

Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 
e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 
e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d

is nothing but a bunch of pr_cont()-s, each of which will individually
end up in vprintk_func(). Error reports in general can contain even more
pr_cont() calls. E.g. core kernel code can hex dump slab memory, while
being called from one of console drivers.

Another problem is that nothing tells us that we *actually* have an
infinite loop. Nothing tells us that every call_console_drivers()
adds more messages to the logbuf. We see just one thing - the current
call_console_drivers() is about to add some lines to the logbuf later
on. OK, why is this a problem? This can be a one time thing. Or
console_unlock() may be in a schedulable context, getting rescheduled
after every line it prints [either implicitly after
printk_safe_exit_irqrestore(), or explicitly by calling into the
scheduler - cond_resched()].

Most likely, we don't even realize how many things we are about to
break.

> Alternatively, it seems that we are going to call console drivers
> outside printk_safe context => the messages will appear in the main
> log buffer immediately => only small risk of a ping-pong with printk
> safe buffers. We might reset the counter when all messages are handled
> in console_unlock(). It will be more complex patch than when using
> ratelimiting but it still should be sane.

We may have some sort of vprintk_func()-based solution, may be.
But we first need a real reason. Right now it looks to me like
we have "a solution" to a problem which we have never witnessed.

That vprintk_func()-based solution, if there will be no other
options on the table, must be much smarter than anything that
we have seen so far.

Sorry.

-ss

Re: [PATCH] printk: Ratelimit messages printed by console drivers

2018-04-24 Thread Sergey Senozhatsky

On (04/23/18 14:45), Petr Mladek wrote:
[..]
> I am not sure how slow are the slowest consoles. If I take that
> everything should be faster than 1200 bauds. Then 10 minutes
> should be enough for 1000 lines and 80 characters per-line:

Well, the problem with the numbers is that they are too... simple...
let me put it this way.

What if I don't have a slow serial console? Or what if I have NMI
watchdog set to 40 seconds? Or what if I don't have NMIs at all?
Why am I all of a sudden limited by "1200 bauds"?

Another problem is that we limit the *wrong* thing.

Not only because we can [and probably need to] rate-limit the misbehaving
code that calls printk()-s, instead of printk(). But because we claim
that we limit the "number of lines" added recursively. This is wrong.
We limit the number of times vprintk_func() was called, which is != the
number of added lines. Because vprintk_func() is also called for pr_cont()
or printk(KERN_CONT) or printk("missing new line"). Backtraces contain
tons and tons of pr_cont()-s - registers print out, list of modules
print out, stack print out, code print out. Even this thing at the
bottom of a trace:

Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 
e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 
e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d

is nothing but a bunch of pr_cont()-s, each of which will individually
end up in vprintk_func(). Error reports in general can contain even more
pr_cont() calls. E.g. core kernel code can hex dump slab memory, while
being called from one of console drivers.

Another problem is that nothing tells us that we *actually* have an
infinite loop. Nothing tells us that every call_console_drivers()
adds more messages to the logbuf. We see just one thing - the current
call_console_drivers() is about to add some lines to the logbuf later
on. OK, why is this a problem? This can be a one time thing. Or
console_unlock() may be in a schedulable context, getting rescheduled
after every line it prints [either implicitly after
printk_safe_exit_irqrestore(), or explicitly by calling into the
scheduler - cond_resched()].

Most likely, we don't even realize how many things we are about to
break.

> Alternatively, it seems that we are going to call console drivers
> outside printk_safe context => the messages will appear in the main
> log buffer immediately => only small risk of a ping-pong with printk
> safe buffers. We might reset the counter when all messages are handled
> in console_unlock(). It will be more complex patch than when using
> ratelimiting but it still should be sane.

We may have some sort of vprintk_func()-based solution, may be.
But we first need a real reason. Right now it looks to me like
we have "a solution" to a problem which we have never witnessed.

That vprintk_func()-based solution, if there will be no other
options on the table, must be much smarter than anything that
we have seen so far.

Sorry.

-ss

Re: linux-next: Signed-off-by missing for commit in the wireless-drivers-next tree

2018-04-24 Thread Luciano Coelho

Hi,

On Wed, 2018-04-25 at 10:56 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   84226ca1c5d3 ("iwlwifi: mvm: support offload of AMSDU rate
> control")
> 
> is missing a Signed-off-by from its author.

I checked this and it should be fine.  The author is actually wrong in
the commit.  Sara is the actual author and we have her S-o-B in the
commit.

I have no idea how Gregory became the author in that commit, though.  I
checked exactly how I committed it and checked the author in our
internal tree.  It's all fine, but for some really weird reason that I
can't figure out, Gregory became the author of the commit when I
applied that patch.  The AuthorDate is also wrong.  I checked my bash
history to see how I applied the patch and everything looks normal...
*confused*

--
Cheers,
Luca.

Re: linux-next: Signed-off-by missing for commit in the wireless-drivers-next tree

2018-04-24 Thread Luciano Coelho

Hi,

On Wed, 2018-04-25 at 10:56 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   84226ca1c5d3 ("iwlwifi: mvm: support offload of AMSDU rate
> control")
> 
> is missing a Signed-off-by from its author.

I checked this and it should be fine.  The author is actually wrong in
the commit.  Sara is the actual author and we have her S-o-B in the
commit.

I have no idea how Gregory became the author in that commit, though.  I
checked exactly how I committed it and checked the author in our
internal tree.  It's all fine, but for some really weird reason that I
can't figure out, Gregory became the author of the commit when I
applied that patch.  The AuthorDate is also wrong.  I checked my bash
history to see how I applied the patch and everything looks normal...
*confused*

--
Cheers,
Luca.

[PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(>mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Suggested-by: Andy Lutomirski 
Cc: linux...@kvack.org
Cc: Soheil Hassas Yeganeh 
---
 include/uapi/linux/tcp.h |   8 ++
 net/ipv4/tcp.c   | 186 ---
 2 files changed, 103 insertions(+), 91 deletions(-)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 
379b08700a542d49bbce9b4b49b17879d00b69bb..e9e8373b34b9ddc735329341b91f455bf5c0b17c
 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -122,6 +122,7 @@ enum {
 #define TCP_MD5SIG_EXT 32  /* TCP MD5 Signature with extensions */
 #define TCP_FASTOPEN_KEY   33  /* Set the key for Fast Open (cookie) */
 #define TCP_FASTOPEN_NO_COOKIE 34  /* Enable TFO without a TFO cookie */
+#define TCP_ZEROCOPY_RECEIVE   35
 
 struct tcp_repair_opt {
__u32   opt_code;
@@ -276,4 +277,11 @@ struct tcp_diag_md5sig {
__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN];
 };
 
+/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
+
+struct tcp_zerocopy_receive {
+   __u64 address;  /* in: address of mapping */
+   __u32 length;   /* in/out: number of bytes to map/mapped */
+   __u32 recv_skip_hint;   /* out: amount of bytes to skip */
+};
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
dfd090ea54ad47112fc23c61180b5bf8edd2c736..a28eca97df9465a0aa522a1833a171b53c237b1f
 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1726,118 +1726,108 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
 }
 EXPORT_SYMBOL(tcp_set_rcvlowat);
 
-/* When user wants to mmap X pages, we first need to perform the mapping
- * before freeing any skbs in receive queue, otherwise user would be unable
- * to fallback to standard recvmsg(). This happens if some data in the
- * requested block is not exactly fitting in a page.
- *
- * We only support order-0 pages for the moment.
- * mmap() on TCP is very strict, there is no point
- * trying to accommodate with pathological layouts.
- */
+static const struct vm_operations_struct tcp_vm_ops = {
+};
+
 int tcp_mmap(struct file *file, struct socket *sock,
 struct vm_area_struct *vma)
 {
-   unsigned long size = vma->vm_end - vma->vm_start;
-   unsigned int nr_pages = size >> PAGE_SHIFT;
-   struct page **pages_array = NULL;
-   u32 seq, len, offset, nr = 0;
-   struct sock *sk = sock->sk;
-   const skb_frag_t *frags;
+   if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+   return -EPERM;
+   vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+   /* Instruct vm_insert_page() to not down_read(mmap_sem) */
+   vma->vm_flags |= VM_MIXEDMAP;
+
+   vma->vm_ops = _vm_ops;
+   return 0;
+}
+EXPORT_SYMBOL(tcp_mmap);
+
+static int tcp_zerocopy_receive(struct sock *sk,
+   struct tcp_zerocopy_receive *zc)
+{
+   unsigned long address = (unsigned long)zc->address;
+   const skb_frag_t *frags = NULL;
+   u32 length = 0, seq, offset;
+   struct vm_area_struct *vma;
+   struct sk_buff *skb = NULL;
struct tcp_sock *tp;
-   struct sk_buff *skb;
int ret;
 
-   if (vma->vm_pgoff || !nr_pages)
+   if (address &

[PATCH net-next 0/2] tcp: mmap: rework zerocopy receive

syzbot reported a lockdep issue caused by tcp mmap() support.

I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.

First patch is adding a new setsockopt() operation and changes mmap()
behavior.

Second patch changes tcp_mmap reference program.

Eric Dumazet (2):
  tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
  selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

 include/uapi/linux/tcp.h   |   8 ++
 net/ipv4/tcp.c | 186 +
 tools/testing/selftests/net/tcp_mmap.c |  63 +
 3 files changed, 139 insertions(+), 118 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog

[PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(>mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Suggested-by: Andy Lutomirski 
Cc: linux...@kvack.org
Cc: Soheil Hassas Yeganeh 
---
 include/uapi/linux/tcp.h |   8 ++
 net/ipv4/tcp.c   | 186 ---
 2 files changed, 103 insertions(+), 91 deletions(-)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 
379b08700a542d49bbce9b4b49b17879d00b69bb..e9e8373b34b9ddc735329341b91f455bf5c0b17c
 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -122,6 +122,7 @@ enum {
 #define TCP_MD5SIG_EXT 32  /* TCP MD5 Signature with extensions */
 #define TCP_FASTOPEN_KEY   33  /* Set the key for Fast Open (cookie) */
 #define TCP_FASTOPEN_NO_COOKIE 34  /* Enable TFO without a TFO cookie */
+#define TCP_ZEROCOPY_RECEIVE   35
 
 struct tcp_repair_opt {
__u32   opt_code;
@@ -276,4 +277,11 @@ struct tcp_diag_md5sig {
__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN];
 };
 
+/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
+
+struct tcp_zerocopy_receive {
+   __u64 address;  /* in: address of mapping */
+   __u32 length;   /* in/out: number of bytes to map/mapped */
+   __u32 recv_skip_hint;   /* out: amount of bytes to skip */
+};
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
dfd090ea54ad47112fc23c61180b5bf8edd2c736..a28eca97df9465a0aa522a1833a171b53c237b1f
 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1726,118 +1726,108 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
 }
 EXPORT_SYMBOL(tcp_set_rcvlowat);
 
-/* When user wants to mmap X pages, we first need to perform the mapping
- * before freeing any skbs in receive queue, otherwise user would be unable
- * to fallback to standard recvmsg(). This happens if some data in the
- * requested block is not exactly fitting in a page.
- *
- * We only support order-0 pages for the moment.
- * mmap() on TCP is very strict, there is no point
- * trying to accommodate with pathological layouts.
- */
+static const struct vm_operations_struct tcp_vm_ops = {
+};
+
 int tcp_mmap(struct file *file, struct socket *sock,
 struct vm_area_struct *vma)
 {
-   unsigned long size = vma->vm_end - vma->vm_start;
-   unsigned int nr_pages = size >> PAGE_SHIFT;
-   struct page **pages_array = NULL;
-   u32 seq, len, offset, nr = 0;
-   struct sock *sk = sock->sk;
-   const skb_frag_t *frags;
+   if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+   return -EPERM;
+   vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+   /* Instruct vm_insert_page() to not down_read(mmap_sem) */
+   vma->vm_flags |= VM_MIXEDMAP;
+
+   vma->vm_ops = _vm_ops;
+   return 0;
+}
+EXPORT_SYMBOL(tcp_mmap);
+
+static int tcp_zerocopy_receive(struct sock *sk,
+   struct tcp_zerocopy_receive *zc)
+{
+   unsigned long address = (unsigned long)zc->address;
+   const skb_frag_t *frags = NULL;
+   u32 length = 0, seq, offset;
+   struct vm_area_struct *vma;
+   struct sk_buff *skb = NULL;
struct tcp_sock *tp;
-   struct sk_buff *skb;
int ret;
 
-   if (vma->vm_pgoff || !nr_pages)
+   if (address & (PAGE_SIZE - 1) || address != zc->address)
return -EINVAL;
 
-

[PATCH net-next 0/2] tcp: mmap: rework zerocopy receive

syzbot reported a lockdep issue caused by tcp mmap() support.

I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.

First patch is adding a new setsockopt() operation and changes mmap()
behavior.

Second patch changes tcp_mmap reference program.

Eric Dumazet (2):
  tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
  selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

 include/uapi/linux/tcp.h   |   8 ++
 net/ipv4/tcp.c | 186 +
 tools/testing/selftests/net/tcp_mmap.c |  63 +
 3 files changed, 139 insertions(+), 118 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog

[PATCH net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

After prior kernel change, mmap() on TCP socket only reserves VMA.

We have to use setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.

struct tcp_zerocopy_receive {
__u64 address;  /* in: address of mapping */
__u32 length;   /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint;   /* out: amount of bytes to skip */
};

After a successful setsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.

Signed-off-by: Eric Dumazet 
Cc: Andy Lutomirski 
Cc: Soheil Hassas Yeganeh 
---
 tools/testing/selftests/net/tcp_mmap.c | 63 +++---
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/net/tcp_mmap.c 
b/tools/testing/selftests/net/tcp_mmap.c
index 
dea342fe6f4e88b5709d2ac37b2fc9a2a320bf44..5b381cdbdd6319556ba4e3dad530fae8f13f5a9b
 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -76,9 +76,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
+#include 
 
 #ifndef MSG_ZEROCOPY
 #define MSG_ZEROCOPY0x400
@@ -134,11 +135,12 @@ void hash_zone(void *zone, unsigned int length)
 void *child_thread(void *arg)
 {
unsigned long total_mmap = 0, total = 0;
+   struct tcp_zerocopy_receive zc;
unsigned long delta_usec;
int flags = MAP_SHARED;
struct timeval t0, t1;
char *buffer = NULL;
-   void *oaddr = NULL;
+   void *addr = NULL;
double throughput;
struct rusage ru;
int lu, fd;
@@ -153,41 +155,45 @@ void *child_thread(void *arg)
perror("malloc");
goto error;
}
+   if (zflg) {
+   addr = mmap(NULL, chunk_size, PROT_READ, flags, fd, 0);
+   if (addr == (void *)-1)
+   zflg = 0;
+   }
while (1) {
struct pollfd pfd = { .fd = fd, .events = POLLIN, };
int sub;
 
poll(, 1, 1);
if (zflg) {
-   void *naddr;
+   int res;
 
-   naddr = mmap(oaddr, chunk_size, PROT_READ, flags, fd, 
0);
-   if (naddr == (void *)-1) {
-   if (errno == EAGAIN) {
-   /* That is if SO_RCVLOWAT is buggy */
-   usleep(1000);
-   continue;
-   }
-   if (errno == EINVAL) {
-   flags = MAP_SHARED;
-   oaddr = NULL;
-   goto fallback;
-   }
-   if (errno != EIO)
-   perror("mmap()");
+   zc.address = (__u64)addr;
+   zc.length = chunk_size;
+   zc.recv_skip_hint = 0;
+   res = setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE,
+, sizeof(zc));
+   if (res == -1)
break;
+
+   if (zc.length) {
+   assert(zc.length <= chunk_size);
+   total_mmap += zc.length;
+   if (xflg)
+   hash_zone(addr, zc.length);
+   total += zc.length;
}
-   total_mmap += chunk_size;
-   if (xflg)
-   hash_zone(naddr, chunk_size);
-   total += chunk_size;
-   if (!keepflag) {
-   flags |= MAP_FIXED;
-   oaddr = naddr;
+   if (zc.recv_skip_hint) {
+   assert(zc.recv_skip_hint <= chunk_size);
+   lu = read(fd, buffer, zc.recv_skip_hint);
+   if (lu > 0) {
+   if (xflg)
+   hash_zone(buffer, lu);
+   total += lu;
+   }
}
continue;
}
-fallback:
sub = 0;
while (sub < chunk_size) {
lu = read(fd, buffer + sub, chunk_size - sub);
@@ -228,6 +234,8 @@

[PATCH net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

After prior kernel change, mmap() on TCP socket only reserves VMA.

We have to use setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.

struct tcp_zerocopy_receive {
__u64 address;  /* in: address of mapping */
__u32 length;   /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint;   /* out: amount of bytes to skip */
};

After a successful setsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.

Signed-off-by: Eric Dumazet 
Cc: Andy Lutomirski 
Cc: Soheil Hassas Yeganeh 
---
 tools/testing/selftests/net/tcp_mmap.c | 63 +++---
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/net/tcp_mmap.c 
b/tools/testing/selftests/net/tcp_mmap.c
index 
dea342fe6f4e88b5709d2ac37b2fc9a2a320bf44..5b381cdbdd6319556ba4e3dad530fae8f13f5a9b
 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -76,9 +76,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
+#include 
 
 #ifndef MSG_ZEROCOPY
 #define MSG_ZEROCOPY0x400
@@ -134,11 +135,12 @@ void hash_zone(void *zone, unsigned int length)
 void *child_thread(void *arg)
 {
unsigned long total_mmap = 0, total = 0;
+   struct tcp_zerocopy_receive zc;
unsigned long delta_usec;
int flags = MAP_SHARED;
struct timeval t0, t1;
char *buffer = NULL;
-   void *oaddr = NULL;
+   void *addr = NULL;
double throughput;
struct rusage ru;
int lu, fd;
@@ -153,41 +155,45 @@ void *child_thread(void *arg)
perror("malloc");
goto error;
}
+   if (zflg) {
+   addr = mmap(NULL, chunk_size, PROT_READ, flags, fd, 0);
+   if (addr == (void *)-1)
+   zflg = 0;
+   }
while (1) {
struct pollfd pfd = { .fd = fd, .events = POLLIN, };
int sub;
 
poll(, 1, 1);
if (zflg) {
-   void *naddr;
+   int res;
 
-   naddr = mmap(oaddr, chunk_size, PROT_READ, flags, fd, 
0);
-   if (naddr == (void *)-1) {
-   if (errno == EAGAIN) {
-   /* That is if SO_RCVLOWAT is buggy */
-   usleep(1000);
-   continue;
-   }
-   if (errno == EINVAL) {
-   flags = MAP_SHARED;
-   oaddr = NULL;
-   goto fallback;
-   }
-   if (errno != EIO)
-   perror("mmap()");
+   zc.address = (__u64)addr;
+   zc.length = chunk_size;
+   zc.recv_skip_hint = 0;
+   res = setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE,
+, sizeof(zc));
+   if (res == -1)
break;
+
+   if (zc.length) {
+   assert(zc.length <= chunk_size);
+   total_mmap += zc.length;
+   if (xflg)
+   hash_zone(addr, zc.length);
+   total += zc.length;
}
-   total_mmap += chunk_size;
-   if (xflg)
-   hash_zone(naddr, chunk_size);
-   total += chunk_size;
-   if (!keepflag) {
-   flags |= MAP_FIXED;
-   oaddr = naddr;
+   if (zc.recv_skip_hint) {
+   assert(zc.recv_skip_hint <= chunk_size);
+   lu = read(fd, buffer, zc.recv_skip_hint);
+   if (lu > 0) {
+   if (xflg)
+   hash_zone(buffer, lu);
+   total += lu;
+   }
}
continue;
}
-fallback:
sub = 0;
while (sub < chunk_size) {
lu = read(fd, buffer + sub, chunk_size - sub);
@@ -228,6 +234,8 @@ void *child_thread(void *arg)
 error:
free(buffer);

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

On 2018-04-24 22:56, Frank Rowand wrote:
> Hi Alan,
> 
> On 04/23/18 15:38, Frank Rowand wrote:
>> Hi Jan,
>>
>> + Alan Tull for fpga perspective
>>
>> On 04/22/18 03:30, Jan Kiszka wrote:
>>> On 2018-04-11 07:42, Jan Kiszka wrote:
 On 2018-04-05 23:12, Rob Herring wrote:
> On Thu, Apr 5, 2018 at 2:28 PM, Frank Rowand  
> wrote:
>> On 04/05/18 12:13, Jan Kiszka wrote:
>>> On 2018-04-05 20:59, Frank Rowand wrote:
 Hi Jan,

 On 04/04/18 15:35, Jan Kiszka wrote:
> Hi Frank,
>
> On 2018-03-04 01:17, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> Move duplicating and unflattening of an overlay flattened devicetree
>> (FDT) into the overlay application code.  To accomplish this,
>> of_overlay_apply() is replaced by of_overlay_fdt_apply().
>>
>> The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
>> code, which is thus responsible for freeing the duplicate FDT.  The
>> caller of of_overlay_fdt_apply() remains responsible for freeing the
>> original FDT.
>>
>> The unflattened devicetree now belongs to devicetree code, which is
>> thus responsible for freeing the unflattened devicetree.
>>
>> These ownership changes prevent early freeing of the duplicated FDT
>> or the unflattened devicetree, which could result in use after free
>> errors.
>>
>> of_overlay_fdt_apply() is a private function for the anticipated
>> overlay loader.
>
> We are using of_fdt_unflatten_tree + of_overlay_apply in the
> (out-of-tree) Jailhouse loader driver in order to register a virtual
> device during hypervisor activation with Linux. The DT overlay is
> created from a a template but modified prior to application to account
> for runtime-specific parameters. See [1] for the current 
> implementation.
>
> I'm now wondering how to model that scenario best with the new API.
> Given that the loader lost ownership of the unflattened tree but the
> modification API exist only for the that DT state, I'm not yet seeing 
> a
> clear solution. Should we apply the template in disabled form (status 
> =
> "disabled"), modify it, and then activate it while it is already 
> applied?

 Thank you for the pointer to the driver - that makes it much easier to
 understand the use case and consider solutions.

 If you can make the changes directly on the FDT instead of on the
 expanded devicetree, then you could move to the new API.
>>>
>>> Are there some examples/references on how to edit FDTs in-place in the
>>> kernel? I'd like to avoid writing the n-th FDT parser/generator.
>>
>> I don't know of any existing in-kernel edits of the FDT (but they might
>> exist).  The functions to access an FDT are in libfdt, which is in
>> scripts/dtc/libfdt/.
>
> Let's please not go down that route of doing FDT modifications. There
> is little reason to other than for early boot changes. And it is much
> easier to work on unflattened trees.

 I just briefly looked into libfdt, and it would have meant building it
 into the module as there are no library functions exported by the kernel
 either. Another reason to drop that.

 What's apparently working now is the pattern I initially suggested:
 Register template with status = "disabled" as overlay, then prepare and
 apply changeset that contains all needed modifications and sets the
 status to "ok". I might be leaking additional resources, but to find
 that out, I will now finally have to resolve clean unbinding of the
 generic PCI host controller [1] first.
>>>
>>> static void free_overlay_changeset(struct overlay_changeset *ovcs)
>>> {
>>> [...]
>>> /*
>>>  * TODO
>>>  *
>>>  * would like to: kfree(ovcs->overlay_tree);
>>>  * but can not since drivers may have pointers into this data
>>>  *
>>>  * would like to: kfree(ovcs->fdt);
>>>  * but can not since drivers may have pointers into this data
>>>  */
>>>
>>> kfree(ovcs);
>>> }
>>>
>>> What's this? I have kmemleak now jumping at me over this. Who is suppose
>>> to plug these leaks? The caller of of_overlay_fdt_apply has no pointers
>>> to those objects. I would say that's a regression of the new API.
>>
>> The problem already existed but it was hidden.  We have never been able to
>> kfree() these object because we do not know if there are any pointers into
>> these objects.  The new API makes the problem visible to kmemleak.
>>
>> The reason that we do not know if there are any pointers into these objects
>> is that devicetree access APIs return pointers into the

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

On 2018-04-24 22:56, Frank Rowand wrote:
> Hi Alan,
> 
> On 04/23/18 15:38, Frank Rowand wrote:
>> Hi Jan,
>>
>> + Alan Tull for fpga perspective
>>
>> On 04/22/18 03:30, Jan Kiszka wrote:
>>> On 2018-04-11 07:42, Jan Kiszka wrote:
 On 2018-04-05 23:12, Rob Herring wrote:
> On Thu, Apr 5, 2018 at 2:28 PM, Frank Rowand  
> wrote:
>> On 04/05/18 12:13, Jan Kiszka wrote:
>>> On 2018-04-05 20:59, Frank Rowand wrote:
 Hi Jan,

 On 04/04/18 15:35, Jan Kiszka wrote:
> Hi Frank,
>
> On 2018-03-04 01:17, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> Move duplicating and unflattening of an overlay flattened devicetree
>> (FDT) into the overlay application code.  To accomplish this,
>> of_overlay_apply() is replaced by of_overlay_fdt_apply().
>>
>> The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
>> code, which is thus responsible for freeing the duplicate FDT.  The
>> caller of of_overlay_fdt_apply() remains responsible for freeing the
>> original FDT.
>>
>> The unflattened devicetree now belongs to devicetree code, which is
>> thus responsible for freeing the unflattened devicetree.
>>
>> These ownership changes prevent early freeing of the duplicated FDT
>> or the unflattened devicetree, which could result in use after free
>> errors.
>>
>> of_overlay_fdt_apply() is a private function for the anticipated
>> overlay loader.
>
> We are using of_fdt_unflatten_tree + of_overlay_apply in the
> (out-of-tree) Jailhouse loader driver in order to register a virtual
> device during hypervisor activation with Linux. The DT overlay is
> created from a a template but modified prior to application to account
> for runtime-specific parameters. See [1] for the current 
> implementation.
>
> I'm now wondering how to model that scenario best with the new API.
> Given that the loader lost ownership of the unflattened tree but the
> modification API exist only for the that DT state, I'm not yet seeing 
> a
> clear solution. Should we apply the template in disabled form (status 
> =
> "disabled"), modify it, and then activate it while it is already 
> applied?

 Thank you for the pointer to the driver - that makes it much easier to
 understand the use case and consider solutions.

 If you can make the changes directly on the FDT instead of on the
 expanded devicetree, then you could move to the new API.
>>>
>>> Are there some examples/references on how to edit FDTs in-place in the
>>> kernel? I'd like to avoid writing the n-th FDT parser/generator.
>>
>> I don't know of any existing in-kernel edits of the FDT (but they might
>> exist).  The functions to access an FDT are in libfdt, which is in
>> scripts/dtc/libfdt/.
>
> Let's please not go down that route of doing FDT modifications. There
> is little reason to other than for early boot changes. And it is much
> easier to work on unflattened trees.

 I just briefly looked into libfdt, and it would have meant building it
 into the module as there are no library functions exported by the kernel
 either. Another reason to drop that.

 What's apparently working now is the pattern I initially suggested:
 Register template with status = "disabled" as overlay, then prepare and
 apply changeset that contains all needed modifications and sets the
 status to "ok". I might be leaking additional resources, but to find
 that out, I will now finally have to resolve clean unbinding of the
 generic PCI host controller [1] first.
>>>
>>> static void free_overlay_changeset(struct overlay_changeset *ovcs)
>>> {
>>> [...]
>>> /*
>>>  * TODO
>>>  *
>>>  * would like to: kfree(ovcs->overlay_tree);
>>>  * but can not since drivers may have pointers into this data
>>>  *
>>>  * would like to: kfree(ovcs->fdt);
>>>  * but can not since drivers may have pointers into this data
>>>  */
>>>
>>> kfree(ovcs);
>>> }
>>>
>>> What's this? I have kmemleak now jumping at me over this. Who is suppose
>>> to plug these leaks? The caller of of_overlay_fdt_apply has no pointers
>>> to those objects. I would say that's a regression of the new API.
>>
>> The problem already existed but it was hidden.  We have never been able to
>> kfree() these object because we do not know if there are any pointers into
>> these objects.  The new API makes the problem visible to kmemleak.
>>
>> The reason that we do not know if there are any pointers into these objects
>> is that devicetree access APIs return pointers into the devicetree internal
>> data structures (that is, into

[GIT PULL] dma mapping fixes for 4.17-rc3

The following changes since commit 6d08b06e67cd117f6992c46611dfb4ce267cd71e:

  Linux 4.17-rc2 (2018-04-22 19:20:09 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.17-3

for you to fetch changes up to 60695be2bb6b0623f8e53bd9949d582a83c6d44a:

  dma-mapping: postpone cpu addr translation on mmap (2018-04-23 14:44:24 +0200)


A few small dma-mapping fixes for Linux 4.17-rc3:

 - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually
   enabled (regression in 4.16)
 - don't try to do virt_to_page before we know we actuall have a
   valid page in dma_common_mmap
 - a comment fixup related to the above fix


Jacopo Mondi (1):
  dma-mapping: postpone cpu addr translation on mmap

Robin Murphy (1):
  dma-coherent: clarify dma_mmap_from_dev_coherent documentation

Takashi Iwai (1):
  dma-direct: don't retry allocation for no-op GFP_DMA

 drivers/base/dma-coherent.c | 5 +++--
 drivers/base/dma-mapping.c  | 6 ++
 lib/dma-direct.c| 3 ++-
 3 files changed, 7 insertions(+), 7 deletions(-)

[GIT PULL] dma mapping fixes for 4.17-rc3

The following changes since commit 6d08b06e67cd117f6992c46611dfb4ce267cd71e:

  Linux 4.17-rc2 (2018-04-22 19:20:09 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.17-3

for you to fetch changes up to 60695be2bb6b0623f8e53bd9949d582a83c6d44a:

  dma-mapping: postpone cpu addr translation on mmap (2018-04-23 14:44:24 +0200)


A few small dma-mapping fixes for Linux 4.17-rc3:

 - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually
   enabled (regression in 4.16)
 - don't try to do virt_to_page before we know we actuall have a
   valid page in dma_common_mmap
 - a comment fixup related to the above fix


Jacopo Mondi (1):
  dma-mapping: postpone cpu addr translation on mmap

Robin Murphy (1):
  dma-coherent: clarify dma_mmap_from_dev_coherent documentation

Takashi Iwai (1):
  dma-direct: don't retry allocation for no-op GFP_DMA

 drivers/base/dma-coherent.c | 5 +++--
 drivers/base/dma-mapping.c  | 6 ++
 lib/dma-direct.c| 3 ++-
 3 files changed, 7 insertions(+), 7 deletions(-)

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

On 2018-04-24 23:15, Frank Rowand wrote:
> On 04/23/18 22:29, Jan Kiszka wrote:
>> On 2018-04-24 00:38, Frank Rowand wrote:
>>> Hi Jan,
>>>
>>> + Alan Tull for fpga perspective
>>>
>>> On 04/22/18 03:30, Jan Kiszka wrote:
 On 2018-04-11 07:42, Jan Kiszka wrote:
> On 2018-04-05 23:12, Rob Herring wrote:
>> On Thu, Apr 5, 2018 at 2:28 PM, Frank Rowand  
>> wrote:
>>> On 04/05/18 12:13, Jan Kiszka wrote:
 On 2018-04-05 20:59, Frank Rowand wrote:
> Hi Jan,
>
> On 04/04/18 15:35, Jan Kiszka wrote:
>> Hi Frank,
>>
>> On 2018-03-04 01:17, frowand.l...@gmail.com wrote:
>>> From: Frank Rowand 
>>>
>>> Move duplicating and unflattening of an overlay flattened devicetree
>>> (FDT) into the overlay application code.  To accomplish this,
>>> of_overlay_apply() is replaced by of_overlay_fdt_apply().
>>>
>>> The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
>>> code, which is thus responsible for freeing the duplicate FDT.  The
>>> caller of of_overlay_fdt_apply() remains responsible for freeing the
>>> original FDT.
>>>
>>> The unflattened devicetree now belongs to devicetree code, which is
>>> thus responsible for freeing the unflattened devicetree.
>>>
>>> These ownership changes prevent early freeing of the duplicated FDT
>>> or the unflattened devicetree, which could result in use after free
>>> errors.
>>>
>>> of_overlay_fdt_apply() is a private function for the anticipated
>>> overlay loader.
>>
>> We are using of_fdt_unflatten_tree + of_overlay_apply in the
>> (out-of-tree) Jailhouse loader driver in order to register a virtual
>> device during hypervisor activation with Linux. The DT overlay is
>> created from a a template but modified prior to application to 
>> account
>> for runtime-specific parameters. See [1] for the current 
>> implementation.
>>
>> I'm now wondering how to model that scenario best with the new API.
>> Given that the loader lost ownership of the unflattened tree but the
>> modification API exist only for the that DT state, I'm not yet 
>> seeing a
>> clear solution. Should we apply the template in disabled form 
>> (status =
>> "disabled"), modify it, and then activate it while it is already 
>> applied?
>
> Thank you for the pointer to the driver - that makes it much easier to
> understand the use case and consider solutions.
>
> If you can make the changes directly on the FDT instead of on the
> expanded devicetree, then you could move to the new API.

 Are there some examples/references on how to edit FDTs in-place in the
 kernel? I'd like to avoid writing the n-th FDT parser/generator.
>>>
>>> I don't know of any existing in-kernel edits of the FDT (but they might
>>> exist).  The functions to access an FDT are in libfdt, which is in
>>> scripts/dtc/libfdt/.
>>
>> Let's please not go down that route of doing FDT modifications. There
>> is little reason to other than for early boot changes. And it is much
>> easier to work on unflattened trees.
>
> I just briefly looked into libfdt, and it would have meant building it
> into the module as there are no library functions exported by the kernel
> either. Another reason to drop that.
>
> What's apparently working now is the pattern I initially suggested:
> Register template with status = "disabled" as overlay, then prepare and
> apply changeset that contains all needed modifications and sets the
> status to "ok". I might be leaking additional resources, but to find
> that out, I will now finally have to resolve clean unbinding of the
> generic PCI host controller [1] first.

 static void free_overlay_changeset(struct overlay_changeset *ovcs)
 {
[...]
/*
 * TODO
 *
 * would like to: kfree(ovcs->overlay_tree);
 * but can not since drivers may have pointers into this data
 *
 * would like to: kfree(ovcs->fdt);
 * but can not since drivers may have pointers into this data
 */

kfree(ovcs);
 }

 What's this? I have kmemleak now jumping at me over this. Who is suppose
 to plug these leaks? The caller of of_overlay_fdt_apply has no pointers
 to those objects. I would say that's a regression of the new API.
>>>
>>> The problem already existed but it was hidden.  We have never been able to
>>> kfree() these object because we do not know if there are any pointers into
>>> these objects.  The new API makes the problem visible to kmemleak.
>>
>> My old

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

On 2018-04-24 23:15, Frank Rowand wrote:
> On 04/23/18 22:29, Jan Kiszka wrote:
>> On 2018-04-24 00:38, Frank Rowand wrote:
>>> Hi Jan,
>>>
>>> + Alan Tull for fpga perspective
>>>
>>> On 04/22/18 03:30, Jan Kiszka wrote:
 On 2018-04-11 07:42, Jan Kiszka wrote:
> On 2018-04-05 23:12, Rob Herring wrote:
>> On Thu, Apr 5, 2018 at 2:28 PM, Frank Rowand  
>> wrote:
>>> On 04/05/18 12:13, Jan Kiszka wrote:
 On 2018-04-05 20:59, Frank Rowand wrote:
> Hi Jan,
>
> On 04/04/18 15:35, Jan Kiszka wrote:
>> Hi Frank,
>>
>> On 2018-03-04 01:17, frowand.l...@gmail.com wrote:
>>> From: Frank Rowand 
>>>
>>> Move duplicating and unflattening of an overlay flattened devicetree
>>> (FDT) into the overlay application code.  To accomplish this,
>>> of_overlay_apply() is replaced by of_overlay_fdt_apply().
>>>
>>> The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
>>> code, which is thus responsible for freeing the duplicate FDT.  The
>>> caller of of_overlay_fdt_apply() remains responsible for freeing the
>>> original FDT.
>>>
>>> The unflattened devicetree now belongs to devicetree code, which is
>>> thus responsible for freeing the unflattened devicetree.
>>>
>>> These ownership changes prevent early freeing of the duplicated FDT
>>> or the unflattened devicetree, which could result in use after free
>>> errors.
>>>
>>> of_overlay_fdt_apply() is a private function for the anticipated
>>> overlay loader.
>>
>> We are using of_fdt_unflatten_tree + of_overlay_apply in the
>> (out-of-tree) Jailhouse loader driver in order to register a virtual
>> device during hypervisor activation with Linux. The DT overlay is
>> created from a a template but modified prior to application to 
>> account
>> for runtime-specific parameters. See [1] for the current 
>> implementation.
>>
>> I'm now wondering how to model that scenario best with the new API.
>> Given that the loader lost ownership of the unflattened tree but the
>> modification API exist only for the that DT state, I'm not yet 
>> seeing a
>> clear solution. Should we apply the template in disabled form 
>> (status =
>> "disabled"), modify it, and then activate it while it is already 
>> applied?
>
> Thank you for the pointer to the driver - that makes it much easier to
> understand the use case and consider solutions.
>
> If you can make the changes directly on the FDT instead of on the
> expanded devicetree, then you could move to the new API.

 Are there some examples/references on how to edit FDTs in-place in the
 kernel? I'd like to avoid writing the n-th FDT parser/generator.
>>>
>>> I don't know of any existing in-kernel edits of the FDT (but they might
>>> exist).  The functions to access an FDT are in libfdt, which is in
>>> scripts/dtc/libfdt/.
>>
>> Let's please not go down that route of doing FDT modifications. There
>> is little reason to other than for early boot changes. And it is much
>> easier to work on unflattened trees.
>
> I just briefly looked into libfdt, and it would have meant building it
> into the module as there are no library functions exported by the kernel
> either. Another reason to drop that.
>
> What's apparently working now is the pattern I initially suggested:
> Register template with status = "disabled" as overlay, then prepare and
> apply changeset that contains all needed modifications and sets the
> status to "ok". I might be leaking additional resources, but to find
> that out, I will now finally have to resolve clean unbinding of the
> generic PCI host controller [1] first.

 static void free_overlay_changeset(struct overlay_changeset *ovcs)
 {
[...]
/*
 * TODO
 *
 * would like to: kfree(ovcs->overlay_tree);
 * but can not since drivers may have pointers into this data
 *
 * would like to: kfree(ovcs->fdt);
 * but can not since drivers may have pointers into this data
 */

kfree(ovcs);
 }

 What's this? I have kmemleak now jumping at me over this. Who is suppose
 to plug these leaks? The caller of of_overlay_fdt_apply has no pointers
 to those objects. I would say that's a regression of the new API.
>>>
>>> The problem already existed but it was hidden.  We have never been able to
>>> kfree() these object because we do not know if there are any pointers into
>>> these objects.  The new API makes the problem visible to kmemleak.
>>
>> My old code didn't have the problem because there was no

KASAN: stack-out-of-bounds Write in compat_copy_entries

2018-04-24 Thread syzbot


Hello,

syzbot hit the following crash on upstream commit
24cac7009cb1b211f1c793ecb6a462c03dc35818 (Tue Apr 24 21:16:40 2018 +)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=4e42a04e0bc33cb6c087


So far this crash happened 3 times on upstream.
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4827027970457600
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6212733133389824
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=7043958930931867332

compiler: gcc (GCC) 8.0.1 20180413 (experimental)
user-space arch: i386

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4e42a04e0bc33cb6c...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
IPVS: ftp: loaded support on port[0] = 21
==
BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300  
[inline]
BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user  
net/bridge/netfilter/ebtables.c:1957 [inline]
BUG: KASAN: stack-out-of-bounds in ebt_size_mwt  
net/bridge/netfilter/ebtables.c:2059 [inline]
BUG: KASAN: stack-out-of-bounds in size_entry_mwt  
net/bridge/netfilter/ebtables.c:2155 [inline]
BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0  
net/bridge/netfilter/ebtables.c:2194

Write of size 33 at addr 8801b0abf888 by task syz-executor0/4504

CPU: 0 PID: 4504 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline]
 ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline]
 size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline]
 compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194
 compat_do_replace+0x483/0x900 net/bridge/netfilter/ebtables.c:2285
 compat_do_ebt_set_ctl+0x2ac/0x324 net/bridge/netfilter/ebtables.c:2367
 compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline]
 compat_nf_setsockopt+0x9b/0x140 net/netfilter/nf_sockopt.c:156
 compat_ip_setsockopt+0xff/0x140 net/ipv4/ip_sockglue.c:1279
 inet_csk_compat_setsockopt+0x97/0x120 net/ipv4/inet_connection_sock.c:1041
 compat_tcp_setsockopt+0x49/0x80 net/ipv4/tcp.c:2901
 compat_sock_common_setsockopt+0xb4/0x150 net/core/sock.c:3050
 __compat_sys_setsockopt+0x1ab/0x7c0 net/compat.c:403
 __do_compat_sys_setsockopt net/compat.c:416 [inline]
 __se_compat_sys_setsockopt net/compat.c:413 [inline]
 __ia32_compat_sys_setsockopt+0xbd/0x150 net/compat.c:413
 do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
 do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fb3cb9
RSP: 002b:fff0c26c EFLAGS: 0282 ORIG_RAX: 016e
RAX: ffda RBX: 0003 RCX: 
RDX: 0080 RSI: 2300 RDI: 05f4
RBP:  R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 

The buggy address belongs to the page:
page:ea0006c2afc0 count:0 mapcount:0 mapping: index:0x0
flags: 0x2fffc00()
raw: 02fffc00   
raw:  ea0006c20101  
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8801b0abf780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 8801b0abf800: 00 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2 f2 f2 f2

8801b0abf880: f2 00 00 00 07 f3 f3 f3 f3 00 00 00 00 00 00 00

   ^
 8801b0abf900: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
 8801b0abf980: 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
==


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep

KASAN: stack-out-of-bounds Write in compat_copy_entries

2018-04-24 Thread syzbot


Hello,

syzbot hit the following crash on upstream commit
24cac7009cb1b211f1c793ecb6a462c03dc35818 (Tue Apr 24 21:16:40 2018 +)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=4e42a04e0bc33cb6c087


So far this crash happened 3 times on upstream.
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4827027970457600
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6212733133389824
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=7043958930931867332

compiler: gcc (GCC) 8.0.1 20180413 (experimental)
user-space arch: i386

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4e42a04e0bc33cb6c...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
IPVS: ftp: loaded support on port[0] = 21
==
BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300  
[inline]
BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user  
net/bridge/netfilter/ebtables.c:1957 [inline]
BUG: KASAN: stack-out-of-bounds in ebt_size_mwt  
net/bridge/netfilter/ebtables.c:2059 [inline]
BUG: KASAN: stack-out-of-bounds in size_entry_mwt  
net/bridge/netfilter/ebtables.c:2155 [inline]
BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0  
net/bridge/netfilter/ebtables.c:2194

Write of size 33 at addr 8801b0abf888 by task syz-executor0/4504

CPU: 0 PID: 4504 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline]
 ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline]
 size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline]
 compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194
 compat_do_replace+0x483/0x900 net/bridge/netfilter/ebtables.c:2285
 compat_do_ebt_set_ctl+0x2ac/0x324 net/bridge/netfilter/ebtables.c:2367
 compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline]
 compat_nf_setsockopt+0x9b/0x140 net/netfilter/nf_sockopt.c:156
 compat_ip_setsockopt+0xff/0x140 net/ipv4/ip_sockglue.c:1279
 inet_csk_compat_setsockopt+0x97/0x120 net/ipv4/inet_connection_sock.c:1041
 compat_tcp_setsockopt+0x49/0x80 net/ipv4/tcp.c:2901
 compat_sock_common_setsockopt+0xb4/0x150 net/core/sock.c:3050
 __compat_sys_setsockopt+0x1ab/0x7c0 net/compat.c:403
 __do_compat_sys_setsockopt net/compat.c:416 [inline]
 __se_compat_sys_setsockopt net/compat.c:413 [inline]
 __ia32_compat_sys_setsockopt+0xbd/0x150 net/compat.c:413
 do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
 do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fb3cb9
RSP: 002b:fff0c26c EFLAGS: 0282 ORIG_RAX: 016e
RAX: ffda RBX: 0003 RCX: 
RDX: 0080 RSI: 2300 RDI: 05f4
RBP:  R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 

The buggy address belongs to the page:
page:ea0006c2afc0 count:0 mapcount:0 mapping: index:0x0
flags: 0x2fffc00()
raw: 02fffc00   
raw:  ea0006c20101  
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8801b0abf780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 8801b0abf800: 00 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2 f2 f2 f2

8801b0abf880: f2 00 00 00 07 f3 f3 f3 f3 00 00 00 00 00 00 00

   ^
 8801b0abf900: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
 8801b0abf980: 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
==


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep

[RFC v3 1/5] virtio: add packed ring definitions

Signed-off-by: Tiwei Bie 
---
 include/uapi/linux/virtio_config.h | 12 +++-
 include/uapi/linux/virtio_ring.h   | 36 
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 308e2096291f..a6e392325e3a 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START   28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 36
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,14 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED   34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..3932cb80c347 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
 
+#define VRING_DESC_F_AVAIL(b)  ((b) << 7)
+#define VRING_DESC_F_USED(b)   ((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -53,6 +56,10 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT 1
 
+#define VRING_EVENT_F_ENABLE   0x0
+#define VRING_EVENT_F_DISABLE  0x1
+#define VRING_EVENT_F_DESC 0x2
+
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC28
 
@@ -171,4 +178,33 @@ static inline int vring_need_event(__u16 event_idx, __u16 
new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+struct vring_packed_desc_event {
+   /* __virtio16 off  : 15; // Descriptor Event Offset
+* __virtio16 wrap : 1;  // Descriptor Event Wrap Counter */
+   __virtio16 off_wrap;
+   /* __virtio16 flags : 2; // Descriptor Event Flags */
+   __virtio16 flags;
+};
+
+struct vring_packed_desc {
+   /* Buffer Address. */
+   __virtio64 addr;
+   /* Buffer Length. */
+   __virtio32 len;
+   /* Buffer ID. */
+   __virtio16 id;
+   /* The flags depending on descriptor type. */
+   __virtio16 flags;
+};
+
+struct vring_packed {
+   unsigned int num;
+
+   struct vring_packed_desc *desc;
+
+   struct vring_packed_desc_event *driver;
+
+   struct vring_packed_desc_event *device;
+};
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.11.0

[RFC v3 1/5] virtio: add packed ring definitions

Signed-off-by: Tiwei Bie 
---
 include/uapi/linux/virtio_config.h | 12 +++-
 include/uapi/linux/virtio_ring.h   | 36 
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 308e2096291f..a6e392325e3a 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START   28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 36
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,14 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED   34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..3932cb80c347 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
 
+#define VRING_DESC_F_AVAIL(b)  ((b) << 7)
+#define VRING_DESC_F_USED(b)   ((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -53,6 +56,10 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT 1
 
+#define VRING_EVENT_F_ENABLE   0x0
+#define VRING_EVENT_F_DISABLE  0x1
+#define VRING_EVENT_F_DESC 0x2
+
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC28
 
@@ -171,4 +178,33 @@ static inline int vring_need_event(__u16 event_idx, __u16 
new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+struct vring_packed_desc_event {
+   /* __virtio16 off  : 15; // Descriptor Event Offset
+* __virtio16 wrap : 1;  // Descriptor Event Wrap Counter */
+   __virtio16 off_wrap;
+   /* __virtio16 flags : 2; // Descriptor Event Flags */
+   __virtio16 flags;
+};
+
+struct vring_packed_desc {
+   /* Buffer Address. */
+   __virtio64 addr;
+   /* Buffer Length. */
+   __virtio32 len;
+   /* Buffer ID. */
+   __virtio16 id;
+   /* The flags depending on descriptor type. */
+   __virtio16 flags;
+};
+
+struct vring_packed {
+   unsigned int num;
+
+   struct vring_packed_desc *desc;
+
+   struct vring_packed_desc_event *driver;
+
+   struct vring_packed_desc_event *device;
+};
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.11.0

[RFC v3 3/5] virtio_ring: add packed ring support

This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 444 ++-
 1 file changed, 434 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e164822ca66e..0181e93897be 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
 
 struct vring_desc_state {
void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
@@ -142,6 +143,16 @@ struct vring_virtqueue {
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +338,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +750,49 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
 }
 
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
 static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +802,419 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
 {
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_packed_desc *desc;
+   struct scatterlist *sg;
+   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+   __virtio16 uninitialized_var(head_flags), flags;
+   int head, wrap_counter;
+   bool indirect;
+
+   START_USE(vq);
+
+   BUG_ON(data == NULL);
+   BUG_ON(ctx && vq->indirect);
+
+   if (unlikely(vq->broken)) {
+   END_USE(vq);
+   return -EIO;
+   }
+
+#ifdef DEBUG
+   {
+   ktime_t now = ktime_get();
+
+   /* No kick or get, with .1 second between?  Warn. */
+   if (vq->last_add_time_valid)
+   WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+   > 100);
+   vq->last_add_time =

[RFC v3 5/5] virtio_ring: enable packed ring

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b1039c2985b9..9a3d13e1e2ba 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1873,6 +1873,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+   case VIRTIO_F_RING_PACKED:
+   break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
-- 
2.11.0

[RFC v3 5/5] virtio_ring: enable packed ring

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b1039c2985b9..9a3d13e1e2ba 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1873,6 +1873,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+   case VIRTIO_F_RING_PACKED:
+   break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
-- 
2.11.0

[RFC v3 3/5] virtio_ring: add packed ring support

This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 444 ++-
 1 file changed, 434 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e164822ca66e..0181e93897be 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
 
 struct vring_desc_state {
void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
@@ -142,6 +143,16 @@ struct vring_virtqueue {
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +338,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +750,49 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
 }
 
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
 static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +802,419 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
 {
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_packed_desc *desc;
+   struct scatterlist *sg;
+   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+   __virtio16 uninitialized_var(head_flags), flags;
+   int head, wrap_counter;
+   bool indirect;
+
+   START_USE(vq);
+
+   BUG_ON(data == NULL);
+   BUG_ON(ctx && vq->indirect);
+
+   if (unlikely(vq->broken)) {
+   END_USE(vq);
+   return -EIO;
+   }
+
+#ifdef DEBUG
+   {
+   ktime_t now = ktime_get();
+
+   /* No kick or get, with .1 second between?  Warn. */
+   if (vq->last_add_time_valid)
+   WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+   > 100);
+   vq->last_add_time = now;
+

[RFC v3 4/5] virtio_ring: add event idx support in packed ring

This commit introduces the event idx support in packed
ring. This feature is temporarily disabled, because the
implementation in this patch may not work as expected,
and some further discussions on the implementation are
needed, e.g. do we have to check the wrap counter when
checking whether a kick is needed?

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 53 
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0181e93897be..b1039c2985b9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 flags;
+   u16 new, old, off_wrap, flags;
bool needs_kick;
u32 snapshot;
 
@@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue 
*_vq)
 * suppressions. */
virtio_mb(vq->weak_barriers);
 
+   old = vq->next_avail_idx - vq->num_added;
+   new = vq->next_avail_idx;
+   vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+   off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0x);
flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
 
 #ifdef DEBUG
@@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
vq->last_add_time_valid = false;
 #endif
 
-   needs_kick = (flags != VRING_EVENT_F_DISABLE);
+   if (flags == VRING_EVENT_F_DESC)
+   needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
+   else
+   needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
 }
@@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
if (vq->last_used_idx >= vq->vring_packed.num)
vq->last_used_idx -= vq->vring_packed.num;
 
+   /* If we expect an interrupt for the next entry, tell host
+* by writing event index and flush out the write before
+* the read in the next get_buf call. */
+   if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+   virtio_store_mb(vq->weak_barriers,
+   >vring_packed.driver->off_wrap,
+   cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+   (vq->wrap_counter << 15)));
+
 #ifdef DEBUG
vq->last_add_time_valid = false;
 #endif
@@ -1143,10 +1160,17 @@ static unsigned 
virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   vq->last_used_idx | (vq->wrap_counter << 15));
 
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue 
*_vq, unsigned last_used_idx)
 static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 bufs, used_idx, wrap_counter;
 
START_USE(vq);
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   /* TODO: tune this threshold */
+   bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
+
+   used_idx = vq->last_used_idx + bufs;
+   wrap_counter = vq->wrap_counter;
+
+   if (used_idx >= vq->vring_packed.num) {
+   used_idx -= vq->vring_packed.num;
+   wrap_counter ^= 1;
+   }
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   used_idx | (wrap_counter << 15));
 
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow =

[RFC v3 4/5] virtio_ring: add event idx support in packed ring

This commit introduces the event idx support in packed
ring. This feature is temporarily disabled, because the
implementation in this patch may not work as expected,
and some further discussions on the implementation are
needed, e.g. do we have to check the wrap counter when
checking whether a kick is needed?

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 53 
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0181e93897be..b1039c2985b9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 flags;
+   u16 new, old, off_wrap, flags;
bool needs_kick;
u32 snapshot;
 
@@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue 
*_vq)
 * suppressions. */
virtio_mb(vq->weak_barriers);
 
+   old = vq->next_avail_idx - vq->num_added;
+   new = vq->next_avail_idx;
+   vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+   off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0x);
flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
 
 #ifdef DEBUG
@@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
vq->last_add_time_valid = false;
 #endif
 
-   needs_kick = (flags != VRING_EVENT_F_DISABLE);
+   if (flags == VRING_EVENT_F_DESC)
+   needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
+   else
+   needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
 }
@@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
if (vq->last_used_idx >= vq->vring_packed.num)
vq->last_used_idx -= vq->vring_packed.num;
 
+   /* If we expect an interrupt for the next entry, tell host
+* by writing event index and flush out the write before
+* the read in the next get_buf call. */
+   if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+   virtio_store_mb(vq->weak_barriers,
+   >vring_packed.driver->off_wrap,
+   cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+   (vq->wrap_counter << 15)));
+
 #ifdef DEBUG
vq->last_add_time_valid = false;
 #endif
@@ -1143,10 +1160,17 @@ static unsigned 
virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   vq->last_used_idx | (vq->wrap_counter << 15));
 
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue 
*_vq, unsigned last_used_idx)
 static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 bufs, used_idx, wrap_counter;
 
START_USE(vq);
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   /* TODO: tune this threshold */
+   bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
+
+   used_idx = vq->last_used_idx + bufs;
+   wrap_counter = vq->wrap_counter;
+
+   if (used_idx >= vq->vring_packed.num) {
+   used_idx -= vq->vring_packed.num;
+   wrap_counter ^= 1;
+   }
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   used_idx | (wrap_counter << 15));
 
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+

[RFC v3 2/5] virtio_ring: support creating packed ring

This commit introduces the support for creating packed ring.
All split ring specific functions are added _split suffix.
Some necessary stubs for packed ring are also added.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 764 ---
 include/linux/virtio_ring.h  |   8 +-
 2 files changed, 513 insertions(+), 259 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71458f493cf8..e164822ca66e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -64,8 +64,8 @@ struct vring_desc_state {
 struct vring_virtqueue {
struct virtqueue vq;
 
-   /* Actual memory layout for this queue */
-   struct vring vring;
+   /* Is this a packed ring? */
+   bool packed;
 
/* Can we use weak barriers? */
bool weak_barriers;
@@ -79,19 +79,45 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
 
-   /* Head of free buffer list. */
-   unsigned int free_head;
/* Number we've added since last sync. */
unsigned int num_added;
 
/* Last used index we've seen. */
u16 last_used_idx;
 
-   /* Last written value to avail->flags */
-   u16 avail_flags_shadow;
+   union {
+   /* Available for split ring */
+   struct {
+   /* Actual memory layout for this queue. */
+   struct vring vring;
 
-   /* Last written value to avail->idx in guest byte order */
-   u16 avail_idx_shadow;
+   /* Head of free buffer list. */
+   unsigned int free_head;
+
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
+
+   /* Last written value to avail->idx in
+* guest byte order. */
+   u16 avail_idx_shadow;
+   };
+
+   /* Available for packed ring */
+   struct {
+   /* Actual memory layout for this queue. */
+   struct vring_packed vring_packed;
+
+   /* Driver ring wrap counter. */
+   u8 wrap_counter;
+
+   /* Index of the next avail descriptor. */
+   unsigned int next_avail_idx;
+
+   /* Last written value to driver->flags in
+* guest byte order. */
+   u16 event_flags_shadow;
+   };
+   };
 
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -201,8 +227,17 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
  cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-   struct vring_desc *desc)
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+  dma_addr_t addr)
+{
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return 0;
+
+   return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+
+static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+ struct vring_desc *desc)
 {
u16 flags;
 
@@ -226,17 +261,9 @@ static void vring_unmap_one(const struct vring_virtqueue 
*vq,
}
 }
 
-static int vring_mapping_error(const struct vring_virtqueue *vq,
-  dma_addr_t addr)
-{
-   if (!vring_use_dma_api(vq->vq.vdev))
-   return 0;
-
-   return dma_mapping_error(vring_dma_dev(vq), addr);
-}
-
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
 {
struct vring_desc *desc;
unsigned int i;
@@ -257,14 +284,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue 
*_vq,
return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-   struct scatterlist *sgs[],
-   unsigned int total_sg,
-   unsigned int out_sgs,
-   unsigned int in_sgs,
-   void *data,
-   void *ctx,
-   gfp_t gfp)
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+

Re: [PATCH] sched/fair: Rearrange select_task_rq_fair() to optimize it

2018-04-24 Thread Viresh Kumar

On 24-04-18, 14:35, Peter Zijlstra wrote:
> In any case, if there not going to be conflicts here, this all looks
> good.

Thanks Peter.

I also had another patch and wasn't sure if that would be the right
thing to do. The main purpose of this is to avoid calling
sync_entity_load_avg() unnecessarily.

+++ b/kernel/sched/fair.c
@@ -6196,9 +6196,6 @@ static inline int find_idlest_cpu(struct sched_domain 
*sd, struct task_struct *p
 {
int new_cpu = cpu;
 
-   if (!cpumask_intersects(sched_domain_span(sd), >cpus_allowed))
-   return prev_cpu;
-
while (sd) {
struct sched_group *group;
struct sched_domain *tmp;
@@ -6652,15 +6649,19 @@ select_task_rq_fair(struct task_struct *p, int 
prev_cpu, int sd_flag, int wake_f
if (unlikely(sd)) {
/* Slow path */
 
-   /*
-* We're going to need the task's util for capacity_spare_wake
-* in find_idlest_group. Sync it up to prev_cpu's
-* last_update_time.
-*/
-   if (!(sd_flag & SD_BALANCE_FORK))
-   sync_entity_load_avg(>se);
+   if (!cpumask_intersects(sched_domain_span(sd), 
>cpus_allowed)) {
+   new_cpu = prev_cpu;
+   } else {
+   /*
+* We're going to need the task's util for
+* capacity_spare_wake in find_idlest_group. Sync it up
+* to prev_cpu's last_update_time.
+*/
+   if (!(sd_flag & SD_BALANCE_FORK))
+   sync_entity_load_avg(>se);
 
-   new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
+   new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, 
sd_flag);
+   }
} else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
/* Fast path */

-- 
viresh

[RFC v3 0/5] virtio: support packed ring

Hello everyone,

This RFC implements packed ring support in virtio driver.

Some simple functional tests have been done with Jason's
packed ring implementation in vhost:

https://lkml.org/lkml/2018/4/23/12

Both of ping and netperf worked as expected (with EVENT_IDX
disabled). But there are below known issues:

1. Reloading the guest driver will break the Tx/Rx;
2. Zeroing the flags when detaching a used desc will
   break the guest -> host path.

Some simple functional tests have also been done with
Wei's packed ring implementation in QEMU:

http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00342.html

Both of ping and netperf worked as expected (with EVENT_IDX
disabled). Reloading the guest driver also worked as expected.

TODO:
- Refinements (for code and commit log) and bug fixes;
- Discuss/fix/test EVENT_IDX support;
- Test devices other than net;

RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;

RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;

Thanks!

Tiwei Bie (5):
  virtio: add packed ring definitions
  virtio_ring: support creating packed ring
  virtio_ring: add packed ring support
  virtio_ring: add event idx support in packed ring
  virtio_ring: enable packed ring

 drivers/virtio/virtio_ring.c   | 1271 
 include/linux/virtio_ring.h|8 +-
 include/uapi/linux/virtio_config.h |   12 +-
 include/uapi/linux/virtio_ring.h   |   36 +
 4 files changed, 1049 insertions(+), 278 deletions(-)

-- 
2.11.0

Re: [PATCH] sched/fair: Rearrange select_task_rq_fair() to optimize it

2018-04-24 Thread Viresh Kumar

On 24-04-18, 14:35, Peter Zijlstra wrote:
> In any case, if there not going to be conflicts here, this all looks
> good.

Thanks Peter.

I also had another patch and wasn't sure if that would be the right
thing to do. The main purpose of this is to avoid calling
sync_entity_load_avg() unnecessarily.

+++ b/kernel/sched/fair.c
@@ -6196,9 +6196,6 @@ static inline int find_idlest_cpu(struct sched_domain 
*sd, struct task_struct *p
 {
int new_cpu = cpu;
 
-   if (!cpumask_intersects(sched_domain_span(sd), >cpus_allowed))
-   return prev_cpu;
-
while (sd) {
struct sched_group *group;
struct sched_domain *tmp;
@@ -6652,15 +6649,19 @@ select_task_rq_fair(struct task_struct *p, int 
prev_cpu, int sd_flag, int wake_f
if (unlikely(sd)) {
/* Slow path */
 
-   /*
-* We're going to need the task's util for capacity_spare_wake
-* in find_idlest_group. Sync it up to prev_cpu's
-* last_update_time.
-*/
-   if (!(sd_flag & SD_BALANCE_FORK))
-   sync_entity_load_avg(>se);
+   if (!cpumask_intersects(sched_domain_span(sd), 
>cpus_allowed)) {
+   new_cpu = prev_cpu;
+   } else {
+   /*
+* We're going to need the task's util for
+* capacity_spare_wake in find_idlest_group. Sync it up
+* to prev_cpu's last_update_time.
+*/
+   if (!(sd_flag & SD_BALANCE_FORK))
+   sync_entity_load_avg(>se);
 
-   new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
+   new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, 
sd_flag);
+   }
} else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
/* Fast path */

-- 
viresh

[RFC v3 0/5] virtio: support packed ring

Hello everyone,

This RFC implements packed ring support in virtio driver.

Some simple functional tests have been done with Jason's
packed ring implementation in vhost:

https://lkml.org/lkml/2018/4/23/12

Both of ping and netperf worked as expected (with EVENT_IDX
disabled). But there are below known issues:

1. Reloading the guest driver will break the Tx/Rx;
2. Zeroing the flags when detaching a used desc will
   break the guest -> host path.

Some simple functional tests have also been done with
Wei's packed ring implementation in QEMU:

http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00342.html

Both of ping and netperf worked as expected (with EVENT_IDX
disabled). Reloading the guest driver also worked as expected.

TODO:
- Refinements (for code and commit log) and bug fixes;
- Discuss/fix/test EVENT_IDX support;
- Test devices other than net;

RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;

RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;

Thanks!

Tiwei Bie (5):
  virtio: add packed ring definitions
  virtio_ring: support creating packed ring
  virtio_ring: add packed ring support
  virtio_ring: add event idx support in packed ring
  virtio_ring: enable packed ring

 drivers/virtio/virtio_ring.c   | 1271 
 include/linux/virtio_ring.h|8 +-
 include/uapi/linux/virtio_config.h |   12 +-
 include/uapi/linux/virtio_ring.h   |   36 +
 4 files changed, 1049 insertions(+), 278 deletions(-)

-- 
2.11.0

[RFC v3 2/5] virtio_ring: support creating packed ring