[PATCH v4 0/2] virtio_console: fix replug of virtio console port

2019-08-13 Thread Pankaj Gupta
This patch series fixes the issue with unplug/replug of a port in virtio
console driver which fails with an error "Error allocating inbufs\n".

Patch 1 updates the next avail index for packed ring code.
Patch 2 makes use of 'virtqueue_detach_unused_buf' function to detach
the unused buffers during port hotunplug time.

Tested the packed ring code with the qemu virtio 1.1 device code posted
here [1]. Also, sent a patch to document the behavior in virtio spec as
suggested by Michael.

Changes from v3
- Swap patch 1 with patch 2  - [Michael]
- Added acked-by tag by Jason in patch 1
- Add reference to spec change
Changes from v2
- Better change log in patch2 - [Greg]
Changes from v1[2]
- Make virtio packed ring code compatible with split ring - [Michael]

[1]  https://marc.info/?l=qemu-devel=156471883703948=2
[2]  https://lkml.org/lkml/2019/3/4/517
[3]  https://lists.oasis-open.org/archives/virtio-dev/201908/msg00055.html

Pankaj Gupta (2):
  virtio: decrement avail idx with buffer detach for packed ring
  virtio_console: free unused buffers with port delete

 char/virtio_console.c |   14 +++---
 virtio/virtio_ring.c  |6 ++
 2 files changed, 17 insertions(+), 3 deletions(-)
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 2/2] virtio_console: free unused buffers with port delete

2019-08-13 Thread Pankaj Gupta
The commit a7a69ec0d8e4 ("virtio_console: free buffers after reset")
deferred detaching of unused buffer to virtio device unplug time.
This causes unplug/replug of single port in virtio device with an
error "Error allocating inbufs\n". As we don't free the unused buffers
attached with the port. Re-plug the same port tries to allocate new
buffers in virtqueue and results in this error if queue is full.
This patch reverts this commit by removing the unused buffers in vq's
when we unplug the port.

Reported-by: Xiaohui Li 
Cc: sta...@vger.kernel.org
Signed-off-by: Pankaj Gupta 
---
 drivers/char/virtio_console.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7270e7b69262..e8be82f1bae9 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1494,15 +1494,25 @@ static void remove_port(struct kref *kref)
kfree(port);
 }
 
+static void remove_unused_bufs(struct virtqueue *vq)
+{
+   struct port_buffer *buf;
+
+   while ((buf = virtqueue_detach_unused_buf(vq)))
+   free_buf(buf, true);
+}
+
 static void remove_port_data(struct port *port)
 {
spin_lock_irq(>inbuf_lock);
/* Remove unused data this port might have received. */
discard_port_data(port);
+   remove_unused_bufs(port->in_vq);
spin_unlock_irq(>inbuf_lock);
 
spin_lock_irq(>outvq_lock);
reclaim_consumed_buffers(port);
+   remove_unused_bufs(port->out_vq);
spin_unlock_irq(>outvq_lock);
 }
 
@@ -1938,11 +1948,9 @@ static void remove_vqs(struct ports_device *portdev)
struct virtqueue *vq;
 
virtio_device_for_each_vq(portdev->vdev, vq) {
-   struct port_buffer *buf;
 
flush_bufs(vq, true);
-   while ((buf = virtqueue_detach_unused_buf(vq)))
-   free_buf(buf, true);
+   remove_unused_bufs(vq);
}
portdev->vdev->config->del_vqs(portdev->vdev);
kfree(portdev->in_vqs);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/2] virtio: decrement avail idx with buffer detach for packed ring

2019-08-13 Thread Pankaj Gupta
This patch decrements 'next_avail_idx' count when detaching a buffer
from vq for packed ring code. Split ring code already does this in
virtqueue_detach_unused_buf_split function. This updates the
'next_avail_idx' to the previous correct index after an unused buffer
is detatched from the vq.

Acked-by: Jason Wang 
Signed-off-by: Pankaj Gupta 
---
 drivers/virtio/virtio_ring.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4f5b55..7c69181113e2 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1537,6 +1537,12 @@ static void *virtqueue_detach_unused_buf_packed(struct 
virtqueue *_vq)
/* detach_buf clears data, so grab it now. */
buf = vq->packed.desc_state[i].data;
detach_buf_packed(vq, i, NULL);
+   vq->packed.next_avail_idx--;
+   if (vq->packed.next_avail_idx < 0) {
+   vq->packed.next_avail_idx = vq->packed.vring.num - 1;
+   vq->packed.avail_wrap_counter ^= 1;
+   }
+
END_USE(vq);
return buf;
}
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 2/2] virtio: decrement avail idx with buffer detach for packed ring

2019-08-11 Thread Pankaj Gupta

> 
> On 2019/8/9 下午2:48, Pankaj Gupta wrote:
> > This patch decrements 'next_avail_idx' count when detaching a buffer
> > from vq for packed ring code. Split ring code already does this in
> > virtqueue_detach_unused_buf_split function. This updates the
> > 'next_avail_idx' to the previous correct index after an unused buffer
> > is detatched from the vq.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >   drivers/virtio/virtio_ring.c | 6 ++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c8be1c4f5b55..7c69181113e2 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1537,6 +1537,12 @@ static void
> > *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> > /* detach_buf clears data, so grab it now. */
> > buf = vq->packed.desc_state[i].data;
> > detach_buf_packed(vq, i, NULL);
> > +   vq->packed.next_avail_idx--;
> > +   if (vq->packed.next_avail_idx < 0) {
> > +   vq->packed.next_avail_idx = vq->packed.vring.num - 1;
> > +   vq->packed.avail_wrap_counter ^= 1;
> > +   }
> > +
> > END_USE(vq);
> > return buf;
> > }
> 
> 
> Acked-by: Jason Wang 

Thank you, Jason.

Best regards,
Pankaj

> 
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/2] virtio_console: free unused buffers with port delete

2019-08-11 Thread Pankaj Gupta


> 
> On Fri, Aug 09, 2019 at 12:18:46PM +0530, Pankaj Gupta wrote:
> > The commit a7a69ec0d8e4 ("virtio_console: free buffers after reset")
> > deferred detaching of unused buffer to virtio device unplug time.
> > This causes unplug/replug of single port in virtio device with an
> > error "Error allocating inbufs\n". As we don't free the unused buffers
> > attached with the port. Re-plug the same port tries to allocate new
> > buffers in virtqueue and results in this error if queue is full.
> 
> So why not reuse the buffers that are already there in this case?
> Seems quite possible.

I took this approach because reusing the buffers will involve tweaking
the existing core functionality like managing the the virt queue indexes.
Compared to that deleting the buffers while hot-unplugging port is simple
and was working fine before. It seems logically correct as well.   

I agree we need a spec change for this.

> 
> > This patch removes the unused buffers in vq's when we unplug the port.
> > This is the best we can do as we cannot call device_reset because virtio
> > device is still active.
> > 
> > Reported-by: Xiaohui Li 
> > Fixes: a7a69ec0d8e4 ("virtio_console: free buffers after reset")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Pankaj Gupta 
> 
> This is really a revert of a7a69ec0d8e4, just tagged confusingly.
> 
> And the original is also supposed to be a bugfix.
> So how will the original bug be fixed?

Yes, Even I was confused while adding this tag.
I will remove remove 'fixes' tag completely for this patch?
because its a revert to original behavior which also is a bugfix.

> 
> "this is the best we can do" is rarely the case.
> 
> I am not necessarily against the revert. But if we go that way then what
> we need to do is specify the behaviour in the spec, since strict spec
> compliance is exactly what the original patch was addressing.

Agree.

> 
> In particular, we'd document that console has a special property that
> when port is detached virtqueue is considered stopped, device must not
> use any buffers, and it is legal to take buffers out of the device.

Yes. This documents the exact scenario. Thanks.
You want me to send a patch for the spec change?

Best regards,
Pankaj
 
> 
> 
> 
> > ---
> >  drivers/char/virtio_console.c | 14 +++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index 7270e7b69262..e8be82f1bae9 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1494,15 +1494,25 @@ static void remove_port(struct kref *kref)
> >  kfree(port);
> >  }
> >  
> > +static void remove_unused_bufs(struct virtqueue *vq)
> > +{
> > +struct port_buffer *buf;
> > +
> > +while ((buf = virtqueue_detach_unused_buf(vq)))
> > +free_buf(buf, true);
> > +}
> > +
> >  static void remove_port_data(struct port *port)
> >  {
> >  spin_lock_irq(>inbuf_lock);
> >  /* Remove unused data this port might have received. */
> >  discard_port_data(port);
> > +remove_unused_bufs(port->in_vq);
> >  spin_unlock_irq(>inbuf_lock);
> >  
> >  spin_lock_irq(>outvq_lock);
> >  reclaim_consumed_buffers(port);
> > +remove_unused_bufs(port->out_vq);
> >  spin_unlock_irq(>outvq_lock);
> >  }
> >  
> > @@ -1938,11 +1948,9 @@ static void remove_vqs(struct ports_device *portdev)
> >  struct virtqueue *vq;
> >  
> >  virtio_device_for_each_vq(portdev->vdev, vq) {
> > -struct port_buffer *buf;
> >  
> >  flush_bufs(vq, true);
> > -while ((buf = virtqueue_detach_unused_buf(vq)))
> > -free_buf(buf, true);
> > +remove_unused_bufs(vq);
> >  }
> >  portdev->vdev->config->del_vqs(portdev->vdev);
> >  kfree(portdev->in_vqs);
> > --
> > 2.21.0
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 2/2] virtio: decrement avail idx with buffer detach for packed ring

2019-08-11 Thread Pankaj Gupta


> On Fri, Aug 09, 2019 at 12:18:47PM +0530, Pankaj Gupta wrote:
> > This patch decrements 'next_avail_idx' count when detaching a buffer
> > from vq for packed ring code. Split ring code already does this in
> > virtqueue_detach_unused_buf_split function. This updates the
> > 'next_avail_idx' to the previous correct index after an unused buffer
> > is detatched from the vq.
> > 
> > Signed-off-by: Pankaj Gupta 
> 
> I would make this patch 1, not patch 2, otherwise
> patch 1 corrupts the ring.

Sure. Will post this patch as patch 1 in next version.

Thanks,
Pankaj 

> 
> 
> > ---
> >  drivers/virtio/virtio_ring.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c8be1c4f5b55..7c69181113e2 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1537,6 +1537,12 @@ static void
> > *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> > /* detach_buf clears data, so grab it now. */
> > buf = vq->packed.desc_state[i].data;
> > detach_buf_packed(vq, i, NULL);
> > +   vq->packed.next_avail_idx--;
> > +   if (vq->packed.next_avail_idx < 0) {
> > +   vq->packed.next_avail_idx = vq->packed.vring.num - 1;
> > +   vq->packed.avail_wrap_counter ^= 1;
> > +   }
> > +
> > END_USE(vq);
> > return buf;
> > }
> > --
> > 2.20.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 1/2] virtio_console: free unused buffers with port delete

2019-08-09 Thread Pankaj Gupta
The commit a7a69ec0d8e4 ("virtio_console: free buffers after reset")
deferred detaching of unused buffer to virtio device unplug time.
This causes unplug/replug of single port in virtio device with an
error "Error allocating inbufs\n". As we don't free the unused buffers
attached with the port. Re-plug the same port tries to allocate new
buffers in virtqueue and results in this error if queue is full.
This patch removes the unused buffers in vq's when we unplug the port.
This is the best we can do as we cannot call device_reset because virtio
device is still active.

Reported-by: Xiaohui Li 
Fixes: a7a69ec0d8e4 ("virtio_console: free buffers after reset")
Cc: sta...@vger.kernel.org
Signed-off-by: Pankaj Gupta 
---
 drivers/char/virtio_console.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7270e7b69262..e8be82f1bae9 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1494,15 +1494,25 @@ static void remove_port(struct kref *kref)
kfree(port);
 }
 
+static void remove_unused_bufs(struct virtqueue *vq)
+{
+   struct port_buffer *buf;
+
+   while ((buf = virtqueue_detach_unused_buf(vq)))
+   free_buf(buf, true);
+}
+
 static void remove_port_data(struct port *port)
 {
spin_lock_irq(>inbuf_lock);
/* Remove unused data this port might have received. */
discard_port_data(port);
+   remove_unused_bufs(port->in_vq);
spin_unlock_irq(>inbuf_lock);
 
spin_lock_irq(>outvq_lock);
reclaim_consumed_buffers(port);
+   remove_unused_bufs(port->out_vq);
spin_unlock_irq(>outvq_lock);
 }
 
@@ -1938,11 +1948,9 @@ static void remove_vqs(struct ports_device *portdev)
struct virtqueue *vq;
 
virtio_device_for_each_vq(portdev->vdev, vq) {
-   struct port_buffer *buf;
 
flush_bufs(vq, true);
-   while ((buf = virtqueue_detach_unused_buf(vq)))
-   free_buf(buf, true);
+   remove_unused_bufs(vq);
}
portdev->vdev->config->del_vqs(portdev->vdev);
kfree(portdev->in_vqs);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/2] virtio: decrement avail idx with buffer detach for packed ring

2019-08-09 Thread Pankaj Gupta
This patch decrements 'next_avail_idx' count when detaching a buffer
from vq for packed ring code. Split ring code already does this in
virtqueue_detach_unused_buf_split function. This updates the
'next_avail_idx' to the previous correct index after an unused buffer
is detatched from the vq.

Signed-off-by: Pankaj Gupta 
---
 drivers/virtio/virtio_ring.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4f5b55..7c69181113e2 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1537,6 +1537,12 @@ static void *virtqueue_detach_unused_buf_packed(struct 
virtqueue *_vq)
/* detach_buf clears data, so grab it now. */
buf = vq->packed.desc_state[i].data;
detach_buf_packed(vq, i, NULL);
+   vq->packed.next_avail_idx--;
+   if (vq->packed.next_avail_idx < 0) {
+   vq->packed.next_avail_idx = vq->packed.vring.num - 1;
+   vq->packed.avail_wrap_counter ^= 1;
+   }
+
END_USE(vq);
return buf;
}
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/2] virtio_console: fix replug of virtio console port

2019-08-09 Thread Pankaj Gupta
This patch series fixes the issue with unplug/replug of a port in virtio
console driver which fails with an error "Error allocating inbufs\n".
Patch 1 makes use of 'virtqueue_detach_unused_buf' function to detach
the unused buffers during port hotunplug time.
Patch 2 updates the next avail index for packed ring code.
Tested the packed ring code with the qemu virtio 1.1 device code posted
here [1].

Changes from v2
 Better change log in patch2 - [Greg]
Changes from v1[2]
 Make virtio packed ring code compatible with split ring - [Michael]

[1]  https://marc.info/?l=qemu-devel=156471883703948=2
[2]  https://lkml.org/lkml/2019/3/4/517

Pankaj Gupta (2):
  virtio_console: free unused buffers with port delete
  virtio: decrement avail idx with buffer detach for packed ring

 char/virtio_console.c |   14 +++---
 virtio/virtio_ring.c  |6 ++
 2 files changed, 17 insertions(+), 3 deletions(-)
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/2] virtio_ring: packed ring: fix virtqueue_detach_unused_buf

2019-08-08 Thread Pankaj Gupta


> On Thu, Aug 08, 2019 at 08:28:46AM -0400, Pankaj Gupta wrote:
> > 
> > 
> > > > This patch makes packed ring code compatible with split ring in
> > > > function
> > > > 'virtqueue_detach_unused_buf_*'.
> > > 
> > > What does that mean?  What does this "fix"?
> > 
> > Patch 1 frees the buffers When a port is unplugged from the virtio
> > console device. It does this with the help of
> > 'virtqueue_detach_unused_buf_split/packed'
> > function. For split ring case, corresponding function decrements avail ring
> > index.
> > For packed ring code, this functionality is not available, so this patch
> > adds the
> > required support and hence help to remove the unused buffer completely.
> 
> Explain all of this in great detail in the changelog comment.  What you
> have in there today does not make any sense.

Sure. Will try to put in more clear words.

Thanks,
Pankaj

> 
> thanks,
> 
> greg k-h
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/2] virtio_ring: packed ring: fix virtqueue_detach_unused_buf

2019-08-08 Thread Pankaj Gupta



> > This patch makes packed ring code compatible with split ring in function
> > 'virtqueue_detach_unused_buf_*'.
> 
> What does that mean?  What does this "fix"?

Patch 1 frees the buffers When a port is unplugged from the virtio
console device. It does this with the help of 
'virtqueue_detach_unused_buf_split/packed'
function. For split ring case, corresponding function decrements avail ring 
index.
For packed ring code, this functionality is not available, so this patch adds 
the
required support and hence help to remove the unused buffer completely.

Thanks,
Pankaj  

> 
> thanks,
> 
> greg k-h
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] virtio_console: free unused buffers with port delete

2019-08-08 Thread Pankaj Gupta


> 
> On Thu, Aug 08, 2019 at 05:06:05PM +0530, Pankaj Gupta wrote:
> >   The commit a7a69ec0d8e4 ("virtio_console: free buffers after reset")
> >   deferred detaching of unused buffer to virtio device unplug time.
> > 
> >   This causes unplug/replug of single port in virtio device with an
> >   error "Error allocating inbufs\n". As we don't free the unused buffers
> >   attached with the port. Re-plug the same port tries to allocate new
> >   buffers in virtqueue and results in this error if queue is full.
> > 
> >   This patch removes the unused buffers in vq's when we unplug the port.
> >   This is the best we can do as we cannot call device_reset because virtio
> >   device is still active.
> 
> Why is this indented?

o.k. will remove the empty lines.

> 
> > 
> > Reported-by: Xiaohui Li 
> > Fixes: b3258ff1d6 ("virtio_console: free buffers after reset")
> 
> Fixes: b3258ff1d608 ("virtio: Decrement avail idx on buffer detach")
> 
> is the correct format to use.

Sorry! for this. Commit it fixes is:
a7a69ec0d8e4 ("virtio_console: free buffers after reset")

> 
> And given that this is from 2.6.39 (and 2.6.38.5), shouldn't it also be
> backported for the stable kernels?

Yes.

Thanks,
Pankaj

> 
> thanks,
> 
> greg k-h
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/2] virtio_console: fix replug of virtio console port

2019-08-08 Thread Pankaj Gupta
This patch series fixes the issue with unplug/replug of a port in virtio
console device, which fails with an error "Error allocating inbufs\n".

Patch 2 makes virtio packed ring code compatible with virtio split ring.
Tested the packed ring code with the qemu virtio 1.1 device code posted
here [1].

Changes from v1[2]
-
Make virtio packed ring code compatible with split ring - [Michael]

[1]  https://marc.info/?l=qemu-devel=156471883703948=2
[2] https://lkml.org/lkml/2019/3/4/517

Pankaj Gupta (2):
  virtio_console: free unused buffers with port delete
  virtio_ring: packed ring: fix virtqueue_detach_unused_buf

 drivers/char/virtio_console.c | 14 +++---
 drivers/virtio/virtio_ring.c  |  5 +
 2 files changed, 16 insertions(+), 3 deletions(-)

-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/2] virtio_console: free unused buffers with port delete

2019-08-08 Thread Pankaj Gupta
  The commit a7a69ec0d8e4 ("virtio_console: free buffers after reset")
  deferred detaching of unused buffer to virtio device unplug time.

  This causes unplug/replug of single port in virtio device with an
  error "Error allocating inbufs\n". As we don't free the unused buffers
  attached with the port. Re-plug the same port tries to allocate new
  buffers in virtqueue and results in this error if queue is full.

  This patch removes the unused buffers in vq's when we unplug the port.
  This is the best we can do as we cannot call device_reset because virtio
  device is still active.

Reported-by: Xiaohui Li 
Fixes: b3258ff1d6 ("virtio_console: free buffers after reset")
Signed-off-by: Pankaj Gupta 
---
 drivers/char/virtio_console.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7270e7b69262..e8be82f1bae9 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1494,15 +1494,25 @@ static void remove_port(struct kref *kref)
kfree(port);
 }
 
+static void remove_unused_bufs(struct virtqueue *vq)
+{
+   struct port_buffer *buf;
+
+   while ((buf = virtqueue_detach_unused_buf(vq)))
+   free_buf(buf, true);
+}
+
 static void remove_port_data(struct port *port)
 {
spin_lock_irq(>inbuf_lock);
/* Remove unused data this port might have received. */
discard_port_data(port);
+   remove_unused_bufs(port->in_vq);
spin_unlock_irq(>inbuf_lock);
 
spin_lock_irq(>outvq_lock);
reclaim_consumed_buffers(port);
+   remove_unused_bufs(port->out_vq);
spin_unlock_irq(>outvq_lock);
 }
 
@@ -1938,11 +1948,9 @@ static void remove_vqs(struct ports_device *portdev)
struct virtqueue *vq;
 
virtio_device_for_each_vq(portdev->vdev, vq) {
-   struct port_buffer *buf;
 
flush_bufs(vq, true);
-   while ((buf = virtqueue_detach_unused_buf(vq)))
-   free_buf(buf, true);
+   remove_unused_bufs(vq);
}
portdev->vdev->config->del_vqs(portdev->vdev);
kfree(portdev->in_vqs);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/2] virtio_ring: packed ring: fix virtqueue_detach_unused_buf

2019-08-08 Thread Pankaj Gupta
This patch makes packed ring code compatible with split ring in function
'virtqueue_detach_unused_buf_*'.

Signed-off-by: Pankaj Gupta 
---
 drivers/virtio/virtio_ring.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4f5b55..1b98a6777b7e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1534,6 +1534,11 @@ static void *virtqueue_detach_unused_buf_packed(struct 
virtqueue *_vq)
for (i = 0; i < vq->packed.vring.num; i++) {
if (!vq->packed.desc_state[i].data)
continue;
+   vq->packed.next_avail_idx--;
+   if (vq->packed.next_avail_idx < 0) {
+   vq->packed.next_avail_idx = vq->packed.vring.num - 1;
+   vq->packed.avail_wrap_counter ^= 1;
+   }
/* detach_buf clears data, so grab it now. */
buf = vq->packed.desc_state[i].data;
detach_buf_packed(vq, i, NULL);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3] virtio_pmem: fix sparse warning

2019-07-11 Thread Pankaj Gupta
This patch fixes below sparse warning related to __virtio
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment
(different base types)
nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2
(different base types)
nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret

Reported-by: kbuild test robot 
Signed-off-by: Pankaj Gupta 
---
This fixes a warning, so submitting it as a separate
patch on top of virtio pmem series.

v2-> v3
 Use __le for req/resp fields - Michael

 drivers/nvdimm/nd_virtio.c   | 4 ++--
 include/uapi/linux/virtio_pmem.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index 8645275c08c2..10351d5b49fa 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -53,7 +53,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
init_waitqueue_head(_data->host_acked);
init_waitqueue_head(_data->wq_buf);
INIT_LIST_HEAD(_data->list);
-   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   req_data->req.type = cpu_to_le32(VIRTIO_PMEM_REQ_TYPE_FLUSH);
sg_init_one(, _data->req, sizeof(req_data->req));
sgs[0] = 
sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
@@ -90,7 +90,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
} else {
/* A host repsonse results in "host_ack" getting called */
wait_event(req_data->host_acked, req_data->done);
-   err = virtio32_to_cpu(vdev, req_data->resp.ret);
+   err = le32_to_cpu(req_data->resp.ret);
}
 
kfree(req_data);
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
index efcd72f2d20d..9a63ed6d062f 100644
--- a/include/uapi/linux/virtio_pmem.h
+++ b/include/uapi/linux/virtio_pmem.h
@@ -23,12 +23,12 @@ struct virtio_pmem_config {
 
 struct virtio_pmem_resp {
/* Host return status corresponding to flush request */
-   __u32 ret;
+   __le32 ret;
 };
 
 struct virtio_pmem_req {
/* command type */
-   __u32 type;
+   __le32 type;
 };
 
 #endif
-- 
2.14.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_pmem: fix sparse warning

2019-07-11 Thread Pankaj Gupta


> 
> On Wed, Jul 10, 2019 at 11:28:32PM +0530, Pankaj Gupta wrote:
> > This patch fixes below sparse warning related to __virtio
> > type in virtio pmem driver. This is reported by Intel test
> > bot on linux-next tree.
> > 
> > nd_virtio.c:56:28: warning: incorrect type in assignment
> > (different base types)
> > nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
> > nd_virtio.c:56:28:got restricted __virtio32
> > nd_virtio.c:93:59: warning: incorrect type in argument 2
> > (different base types)
> > nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
> > nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret
> > 
> > Reported-by: kbuild test robot 
> > Signed-off-by: Pankaj Gupta 
> > ---
> > 
> > This fixes a warning, so submitting it as a separate
> > patch on top of virtio pmem series.
> > 
> >  include/uapi/linux/virtio_pmem.h | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > index efcd72f2d20d..7a7435281362 100644
> > --- a/include/uapi/linux/virtio_pmem.h
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -10,7 +10,7 @@
> >  #ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> >  #define _UAPI_LINUX_VIRTIO_PMEM_H
> >  
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -23,12 +23,12 @@ struct virtio_pmem_config {
> >  
> >  struct virtio_pmem_resp {
> > /* Host return status corresponding to flush request */
> > -   __u32 ret;
> > +   __virtio32 ret;
> >  };
> >  
> >  struct virtio_pmem_req {
> > /* command type */
> > -   __u32 type;
> > +   __virtio32 type;
> >  };
> >  
> >  #endif
> 
> Same comment as previously: pls use __le and fix accessors.

Now, I think I got it. 

__virtio is for legacy devices to solve the endianess. 
By default virtio 1.0 and later are little endian.
Will send updated patch soon.

Thank you,
Pankaj

> 
> > --
> > 2.20.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_pmem: fix sparse warning

2019-07-11 Thread Pankaj Gupta



> 
> On Wed, Jul 10, 2019 at 07:57:00PM +0530, Pankaj Gupta wrote:
> > This patch fixes below sparse warning related to __virtio
> > type in virtio pmem driver. This is reported by Intel test
> > bot on linux-next tree.
> > 
> > nd_virtio.c:56:28: warning: incorrect type in assignment (different base
> > types)
> > nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
> > nd_virtio.c:56:28:got restricted __virtio32
> > nd_virtio.c:93:59: warning: incorrect type in argument 2 (different base
> > types)
> > nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
> > nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret
> > 
> > Signed-off-by: Pankaj Gupta 
> > Reported-by: kbuild test robot 
> > ---
> > 
> > This fixes a warning, so submitting it as a separate
> > patch on top of virtio pmem series.
> >  
> >  include/uapi/linux/virtio_pmem.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > index efcd72f2d20d..f89129bf1f84 100644
> > --- a/include/uapi/linux/virtio_pmem.h
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -23,12 +23,12 @@ struct virtio_pmem_config {
> >  
> >  struct virtio_pmem_resp {
> > /* Host return status corresponding to flush request */
> > -   __u32 ret;
> > +   __virtio32 ret;
> >  };
> >  
> >  struct virtio_pmem_req {
> > /* command type */
> > -   __u32 type;
> > +   __virtio32 type;
> >  };
> >  
> >  #endif
> 
> req/resp are in memory right?
> Then this looks like a wrong fix.
> The accessors should all use cpu_to/from_le
> and they types should be __le32.

o.k

> 
> > --
> > 2.20.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] virtio_pmem: fix sparse warning

2019-07-10 Thread Pankaj Gupta
This patch fixes below sparse warning related to __virtio
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment 
(different base types)
nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2 
(different base types)
nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret

Reported-by: kbuild test robot 
Signed-off-by: Pankaj Gupta 
---

This fixes a warning, so submitting it as a separate
patch on top of virtio pmem series.

 include/uapi/linux/virtio_pmem.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
index efcd72f2d20d..7a7435281362 100644
--- a/include/uapi/linux/virtio_pmem.h
+++ b/include/uapi/linux/virtio_pmem.h
@@ -10,7 +10,7 @@
 #ifndef _UAPI_LINUX_VIRTIO_PMEM_H
 #define _UAPI_LINUX_VIRTIO_PMEM_H
 
-#include 
+#include 
 #include 
 #include 
 
@@ -23,12 +23,12 @@ struct virtio_pmem_config {
 
 struct virtio_pmem_resp {
/* Host return status corresponding to flush request */
-   __u32 ret;
+   __virtio32 ret;
 };
 
 struct virtio_pmem_req {
/* command type */
-   __u32 type;
+   __virtio32 type;
 };
 
 #endif
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio_pmem: fix sparse warning

2019-07-10 Thread Pankaj Gupta
This patch fixes below sparse warning related to __virtio 
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment (different base types)
nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2 (different base types)
nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret

Signed-off-by: Pankaj Gupta 
Reported-by: kbuild test robot 
---

This fixes a warning, so submitting it as a separate
patch on top of virtio pmem series. 
 
 include/uapi/linux/virtio_pmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
index efcd72f2d20d..f89129bf1f84 100644
--- a/include/uapi/linux/virtio_pmem.h
+++ b/include/uapi/linux/virtio_pmem.h
@@ -23,12 +23,12 @@ struct virtio_pmem_config {
 
 struct virtio_pmem_resp {
/* Host return status corresponding to flush request */
-   __u32 ret;
+   __virtio32 ret;
 };
 
 struct virtio_pmem_req {
/* command type */
-   __u32 type;
+   __virtio32 type;
 };
 
 #endif
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v15 7/7] xfs: disable map_sync for async flush

2019-07-05 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v15 6/7] ext4: disable map_sync for async flush

2019-07-05 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v15 5/7] dax: check synchronous mapping is supported

2019-07-05 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 86fc55c99b58..d1bea3979b5a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -53,6 +53,18 @@ static inline void set_dax_synchronous(struct dax_device 
*dax_dev)
 {
__set_dax_synchronous(dax_dev);
 }
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -87,6 +99,11 @@ static inline bool dax_synchronous(struct dax_device 
*dax_dev)
 static inline void set_dax_synchronous(struct dax_device *dax_dev)
 {
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v15 4/7] dm: enable synchronous dax

2019-07-05 Thread Pankaj Gupta
This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.

'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.

Suggested-by: Mike Snitzer 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Mike Snitzer 
---
 drivers/md/dm-table.c | 24 ++--
 drivers/md/dm.c   |  2 +-
 drivers/md/dm.h   |  5 -
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..81c55304c4fa 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -881,7 +881,7 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
+int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
   sector_t start, sector_t len, void *data)
 {
int blocksize = *(int *) data;
@@ -890,7 +890,15 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }
 
-bool dm_table_supports_dax(struct dm_table *t, int blocksize)
+/* Check devices support synchronous DAX */
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
+bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn, int *blocksize)
 {
struct dm_target *ti;
unsigned i;
@@ -903,8 +911,7 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
return false;
 
if (!ti->type->iterate_devices ||
-   !ti->type->iterate_devices(ti, device_supports_dax,
-   ))
+   !ti->type->iterate_devices(ti, iterate_fn, blocksize))
return false;
}
 
@@ -940,6 +947,7 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
+   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -984,7 +992,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, PAGE_SIZE) ||
+   if (dm_table_supports_dax(t, device_supports_dax, _size) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
@@ -1883,6 +1891,7 @@ void dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *q,
   struct queue_limits *limits)
 {
bool wc = false, fua = false;
+   int page_size = PAGE_SIZE;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -1910,8 +1919,11 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);
 
-   if (dm_table_supports_dax(t, PAGE_SIZE))
+   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+   if (dm_table_supports_dax(t, device_synchronous, NULL))
+   set_dax_synchronous(t->md->dax_dev);
+   }
else
blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b1caa7188209..b92c42a72ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1119,7 +1119,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, 
struct block_device *bd
if (!map)
return false;
 
-   ret = dm_table_supports_dax(map, blocksize);
+   ret = dm_table_supports_dax(map, device_supports_dax, );
 
dm_put_live_table(md, srcu_idx);
 
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 17e3db54404c..0475673337f3 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -72,7 +72,10 @@ bool dm_table_bio_based(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-bool dm_table_supports_dax(struct dm_table *t, int blocksize);
+bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
+

[PATCH v15 3/7] libnvdimm: add dax_dev sync flag

2019-07-05 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/s390/block/dcssblk.c |  2 +-
 include/linux/dax.h  | 24 ++--
 include/linux/libnvdimm.h|  1 +
 8 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 4e5ae7e8b557..8ab12068eea3 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -195,6 +195,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -372,6 +374,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool __dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(__dax_synchronous);
+
+void __set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(__set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -526,7 +540,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -549,6 +563,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5475081dcbd6..b1caa7188209 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1991,7 +1991,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   _dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 223da63d1bd7..8be868e2a18b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -376,6 +376,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -474,7 +475,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index eca2e62af134..56f2227f192a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1211,6 +1211,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index d04d4378ca50..635

[PATCH v15 2/7] virtio-pmem: Add virtio pmem driver

2019-07-05 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
Reviewed-by: Cornelia Huck 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 125 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  34 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..8645275c08c2
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void virtio_pmem_host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
+   req_data->done = true;
+   wake_up(_data->host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_ack);
+
+ /* The request submission function */
+static int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(_data->host_acked);
+   init_waitqueue_head(_data->wq_buf);
+   INIT_LIST_HEAD(_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(, _data->req, sizeof(req_data->req));
+   sgs[0] = 
+   sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_info(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(_data->list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req_data->wq_buf, re

[PATCH v15 1/7] libnvdimm: nd_region flush callback support

2019-07-05 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c

[PATCH v15 0/7] virtio pmem driver

2019-07-05 Thread Pankaj Gupta
et device mapper synchronous if all target devices support - Dan
 - Move virtio_pmem.h to nvdimm directory  - Dan
 - Style, indentation & better error messages in patch 2 - DavidH
 - Added MST's ack in patch 2.

Changes from PATCH v7:
 - Corrected pending request queue logic (patch 2) - Jakub Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/6/21/452
[2] https://lkml.org/lkml/2019/6/12/624
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://marc.info/?l=qemu-devel=155860751202202=2
[6] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[7] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   24 +--
 drivers/md/dm.c  |5 -
 drivers/md/dm.h  |5 +
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  125 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/s390/block/dcssblk.c |2 
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   41 
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   34 ++
 22 files changed, 504 insertions(+), 34 deletions(-)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v14 7/7] xfs: disable map_sync for async flush

2019-06-21 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v14 6/7] ext4: disable map_sync for async flush

2019-06-21 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v14 5/7] dax: check synchronous mapping is supported

2019-06-21 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 86fc55c99b58..d1bea3979b5a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -53,6 +53,18 @@ static inline void set_dax_synchronous(struct dax_device 
*dax_dev)
 {
__set_dax_synchronous(dax_dev);
 }
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -87,6 +99,11 @@ static inline bool dax_synchronous(struct dax_device 
*dax_dev)
 static inline void set_dax_synchronous(struct dax_device *dax_dev)
 {
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13 4/7] dm: enable synchronous dax

2019-06-21 Thread Pankaj Gupta
This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.

'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.

Suggested-by: Mike Snitzer 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Mike Snitzer 
---
 drivers/md/dm-table.c | 24 ++--
 drivers/md/dm.c   |  2 +-
 drivers/md/dm.h   |  5 -
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..81c55304c4fa 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -881,7 +881,7 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
+int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
   sector_t start, sector_t len, void *data)
 {
int blocksize = *(int *) data;
@@ -890,7 +890,15 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }
 
-bool dm_table_supports_dax(struct dm_table *t, int blocksize)
+/* Check devices support synchronous DAX */
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
+bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn, int *blocksize)
 {
struct dm_target *ti;
unsigned i;
@@ -903,8 +911,7 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
return false;
 
if (!ti->type->iterate_devices ||
-   !ti->type->iterate_devices(ti, device_supports_dax,
-   ))
+   !ti->type->iterate_devices(ti, iterate_fn, blocksize))
return false;
}
 
@@ -940,6 +947,7 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
+   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -984,7 +992,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, PAGE_SIZE) ||
+   if (dm_table_supports_dax(t, device_supports_dax, _size) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
@@ -1883,6 +1891,7 @@ void dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *q,
   struct queue_limits *limits)
 {
bool wc = false, fua = false;
+   int page_size = PAGE_SIZE;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -1910,8 +1919,11 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);
 
-   if (dm_table_supports_dax(t, PAGE_SIZE))
+   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+   if (dm_table_supports_dax(t, device_synchronous, NULL))
+   set_dax_synchronous(t->md->dax_dev);
+   }
else
blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b1caa7188209..b92c42a72ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1119,7 +1119,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, 
struct block_device *bd
if (!map)
return false;
 
-   ret = dm_table_supports_dax(map, blocksize);
+   ret = dm_table_supports_dax(map, device_supports_dax, );
 
dm_put_live_table(md, srcu_idx);
 
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 17e3db54404c..0475673337f3 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -72,7 +72,10 @@ bool dm_table_bio_based(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-bool dm_table_supports_dax(struct dm_table *t, int blocksize);
+bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
+

[PATCH v14 3/7] libnvdimm: add dax_dev sync flag

2019-06-21 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/s390/block/dcssblk.c |  2 +-
 include/linux/dax.h  | 24 ++--
 include/linux/libnvdimm.h|  1 +
 8 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 4e5ae7e8b557..8ab12068eea3 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -195,6 +195,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -372,6 +374,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool __dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(__dax_synchronous);
+
+void __set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(__set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -526,7 +540,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -549,6 +563,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5475081dcbd6..b1caa7188209 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1991,7 +1991,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   _dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 223da63d1bd7..8be868e2a18b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -376,6 +376,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -474,7 +475,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index eca2e62af134..56f2227f192a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1211,6 +1211,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index d04d4378ca50..635

[PATCH v14 2/7] virtio-pmem: Add virtio pmem driver

2019-06-21 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
Reviewed-by: Cornelia Huck 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 125 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  35 +
 7 files changed, 350 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..8645275c08c2
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void virtio_pmem_host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
+   req_data->done = true;
+   wake_up(_data->host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_ack);
+
+ /* The request submission function */
+static int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(_data->host_acked);
+   init_waitqueue_head(_data->wq_buf);
+   INIT_LIST_HEAD(_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(, _data->req, sizeof(req_data->req));
+   sgs[0] = 
+   sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_info(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(_data->list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req_data->wq_buf, re

[PATCH v14 1/7] libnvdimm: nd_region flush callback support

2019-06-21 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c

[PATCH v14 0/7] virtio pmem driver

2019-06-21 Thread Pankaj Gupta
device mapper synchronous if all target devices support - Dan
 - Move virtio_pmem.h to nvdimm directory  - Dan
 - Style, indentation & better error messages in patch 2 - DavidH
 - Added MST's ack in patch 2.

Changes from PATCH v7:
 - Corrected pending request queue logic (patch 2) - Jakub Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/6/12/624
[2] https://lkml.org/lkml/2019/6/11/831
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://marc.info/?l=qemu-devel=155860751202202=2
[6] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[7] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   24 +--
 drivers/md/dm.c  |5 -
 drivers/md/dm.h  |5 +
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  125 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/s390/block/dcssblk.c |2 
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   41 
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   35 ++
 22 files changed, 505 insertions(+), 34 deletions(-)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v13 2/7] virtio-pmem: Add virtio pmem driver

2019-06-12 Thread Pankaj Gupta


> 
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > Reviewed-by: Yuval Shaia 
> > Acked-by: Michael S. Tsirkin 
> > Acked-by: Jakub Staron 
> > Tested-by: Jakub Staron 
> > ---
> >  drivers/nvdimm/Makefile  |   1 +
> >  drivers/nvdimm/nd_virtio.c   | 125 +++
> >  drivers/nvdimm/virtio_pmem.c | 122 ++
> >  drivers/nvdimm/virtio_pmem.h |  55 ++
> >  drivers/virtio/Kconfig   |  11 +++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  35 +
> >  7 files changed, 350 insertions(+)
> >  create mode 100644 drivers/nvdimm/nd_virtio.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> Reviewed-by: Cornelia Huck 

Thank you Cornelia for the review.

Best regards,
Pankaj
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13 7/7] xfs: disable map_sync for async flush

2019-06-12 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13 6/7] ext4: disable map_sync for async flush

2019-06-12 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13 5/7] dax: check synchronous mapping is supported

2019-06-12 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2b106752b1b8..267251a394fa 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -42,6 +42,18 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -69,6 +81,11 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13 4/7] dm: enable synchronous dax

2019-06-12 Thread Pankaj Gupta
This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.

'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.

Suggested-by: Mike Snitzer 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Mike Snitzer 
---
 drivers/md/dm-table.c | 24 ++--
 drivers/md/dm.c   |  2 +-
 drivers/md/dm.h   |  5 -
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..81c55304c4fa 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -881,7 +881,7 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
+int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
   sector_t start, sector_t len, void *data)
 {
int blocksize = *(int *) data;
@@ -890,7 +890,15 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }
 
-bool dm_table_supports_dax(struct dm_table *t, int blocksize)
+/* Check devices support synchronous DAX */
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
+bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn, int *blocksize)
 {
struct dm_target *ti;
unsigned i;
@@ -903,8 +911,7 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
return false;
 
if (!ti->type->iterate_devices ||
-   !ti->type->iterate_devices(ti, device_supports_dax,
-   ))
+   !ti->type->iterate_devices(ti, iterate_fn, blocksize))
return false;
}
 
@@ -940,6 +947,7 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
+   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -984,7 +992,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, PAGE_SIZE) ||
+   if (dm_table_supports_dax(t, device_supports_dax, _size) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
@@ -1883,6 +1891,7 @@ void dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *q,
   struct queue_limits *limits)
 {
bool wc = false, fua = false;
+   int page_size = PAGE_SIZE;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -1910,8 +1919,11 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);
 
-   if (dm_table_supports_dax(t, PAGE_SIZE))
+   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+   if (dm_table_supports_dax(t, device_synchronous, NULL))
+   set_dax_synchronous(t->md->dax_dev);
+   }
else
blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b1caa7188209..b92c42a72ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1119,7 +1119,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, 
struct block_device *bd
if (!map)
return false;
 
-   ret = dm_table_supports_dax(map, blocksize);
+   ret = dm_table_supports_dax(map, device_supports_dax, );
 
dm_put_live_table(md, srcu_idx);
 
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 17e3db54404c..0475673337f3 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -72,7 +72,10 @@ bool dm_table_bio_based(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-bool dm_table_supports_dax(struct dm_table *t, int blocksize);
+bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
+

[PATCH v13 3/7] libnvdimm: add dax_dev sync flag

2019-06-12 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  9 +++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bbd57ca0634a..93b3718b5cfa 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
+void set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -508,7 +522,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -531,6 +545,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1fb1333fefec..7eee7ddc73a8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1970,7 +1970,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   _dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..bdbd2b414d3d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..f3ea5369d20a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..2b106752b1b8 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@

[PATCH v13 2/7] virtio-pmem: Add virtio pmem driver

2019-06-12 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 125 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  35 +
 7 files changed, 350 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..8645275c08c2
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void virtio_pmem_host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
+   req_data->done = true;
+   wake_up(_data->host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_ack);
+
+ /* The request submission function */
+static int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(_data->host_acked);
+   init_waitqueue_head(_data->wq_buf);
+   INIT_LIST_HEAD(_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(, _data->req, sizeof(req_data->req));
+   sgs[0] = 
+   sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_info(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(_data->list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req_data->wq_buf, re

[PATCH v13 1/7] libnvdimm: nd_region flush callback support

2019-06-12 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c

[PATCH v13 0/7] virtio pmem driver

2019-06-12 Thread Pankaj Gupta
d MST's ack in patch 2.

Changes from PATCH v7:
 - Corrected pending request queue logic (patch 2) - Jakub Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/6/11/831
[2] https://lkml.org/lkml/2019/6/10/209
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://marc.info/?l=qemu-devel=155860751202202=2
[6] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[7] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   24 +--
 drivers/md/dm.c  |5 -
 drivers/md/dm.h  |5 +
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  125 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   26 +++-
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   35 ++
 21 files changed, 489 insertions(+), 33 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v12 2/7] virtio-pmem: Add virtio pmem driver

2019-06-12 Thread Pankaj Gupta


> 
> Hi Pankaj,
> 
> On Tue, 11 Jun 2019 23:34:50 -0400 (EDT)
> Pankaj Gupta  wrote:
> 
> > Hi Cornelia,
> > 
> > > On Tue, 11 Jun 2019 22:07:57 +0530
> > > Pankaj Gupta  wrote:
> 
> 
> > > > +   err1 = virtqueue_kick(vpmem->req_vq);
> > > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > > +   /*
> > > > +* virtqueue_add_sgs failed with error different than -ENOSPC, 
> > > > we
> > > > can't
> > > > +* do anything about that.
> > > > +*/
> > > 
> > > Does it make sense to kick if you couldn't add at all?
> > 
> > When we could not add because of -ENOSPC we are waiting and when buffer is
> > added
> > then only we do a kick. For any other error which might be a rare
> > occurrence, I think
> > kick is harmless here and keeps the code clean?
> 
> Yes, I agree it does not hurt. Let's keep it as-is.

Sure.

> 
> 
> > Sure, Thank you. Attaching below on top changes on current patch2 based on
> > your suggestions. Let me know if these are okay and then will send official
> > v13 to for upstream merging.
> 
> Looks good to me, except for one change.

Sure. Will send v13 shortly.

> 
> [Again sorry for the late review, did not want to get the version
> numbers up :)]

Thank you :)

> 
> > 
> > Thanks,
> > Pankaj
> > 
> > ===
> > 
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index efc535723517..5b8d2367da0b 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -10,7 +10,7 @@
> >  #include "nd.h"
> >  
> >   /* The interrupt handler */
> > -void host_ack(struct virtqueue *vq)
> > +void virtio_pmem_host_ack(struct virtqueue *vq)
> >  {
> > struct virtio_pmem *vpmem = vq->vdev->priv;
> > struct virtio_pmem_request *req_data, *req_buf;
> > @@ -32,10 +32,10 @@ void host_ack(struct virtqueue *vq)
> > }
> > spin_unlock_irqrestore(>pmem_lock, flags);
> >  }
> > -EXPORT_SYMBOL_GPL(host_ack);
> > +EXPORT_SYMBOL_GPL(virtio_pmem_host_ack);
> >  
> >   /* The request submission function */
> > -int virtio_pmem_flush(struct nd_region *nd_region)
> > +static int virtio_pmem_flush(struct nd_region *nd_region)
> >  {
> > struct virtio_device *vdev = nd_region->provider_data;
> > struct virtio_pmem *vpmem  = vdev->priv;
> > @@ -69,7 +69,7 @@ int virtio_pmem_flush(struct nd_region *nd_region)
> > while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
> > GFP_ATOMIC)) == -ENOSPC) {
> >  
> > -   dev_err(>dev, "failed to send command to virtio pmem
> > device, no free slots in the virtqueue\n");
> > +   dev_info(>dev, "failed to send command to virtio pmem
> > device, no free slots in the virtqueue\n");
> > req_data->wq_buf_avail = false;
> > list_add_tail(_data->list, >req_list);
> > spin_unlock_irqrestore(>pmem_lock, flags);
> > @@ -90,7 +90,8 @@ int virtio_pmem_flush(struct nd_region *nd_region)
> > } else {
> > /* A host repsonse results in "host_ack" getting called */
> > wait_event(req_data->host_acked, req_data->done);
> > -   err = virtio32_to_cpu(vdev, req_data->resp.ret);
> > +   if ((err = virtio32_to_cpu(vdev, req_data->resp.ret)))
> > +   err = -EIO;
> 
> Hm, why are you making this change? I think the previous code was fine.

Yes, Something came to my mind while making the change but I agree will keep
this as it was before.

> 
> > }
> >  
> > kfree(req_data);
> > @@ -100,7 +101,8 @@ int virtio_pmem_flush(struct nd_region *nd_region)
> >  /* The asynchronous flush callback function */
> >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> >  {
> > -   /* Create child bio for asynchronous flush and chain with
> > +   /*
> > +* Create child bio for asynchronous flush and chain with
> >  * parent bio. Otherwise directly call nd_region flush.
> >  */
> > if (bio && bio->bi_iter.bi_sector != -1) {
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > index b60ebd8cd2fd..5e3d07b47e0c 100644
> > --- a

Re: [PATCH v12 2/7] virtio-pmem: Add virtio pmem driver

2019-06-11 Thread Pankaj Gupta


Hi Cornelia,

> On Tue, 11 Jun 2019 22:07:57 +0530
> Pankaj Gupta  wrote:
> 
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > Reviewed-by: Yuval Shaia 
> > Acked-by: Michael S. Tsirkin 
> > Acked-by: Jakub Staron 
> > Tested-by: Jakub Staron 
> > ---
> >  drivers/nvdimm/Makefile  |   1 +
> >  drivers/nvdimm/nd_virtio.c   | 124 +++
> >  drivers/nvdimm/virtio_pmem.c | 122 ++
> >  drivers/nvdimm/virtio_pmem.h |  55 ++
> >  drivers/virtio/Kconfig   |  11 +++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  35 +
> >  7 files changed, 349 insertions(+)
> >  create mode 100644 drivers/nvdimm/nd_virtio.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> Sorry about being late to the party; this one has been sitting in my
> 'to review' queue for far too long :(
> 
> (...)
> 
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > new file mode 100644
> > index ..efc535723517
> > --- /dev/null
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -0,0 +1,124 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include "virtio_pmem.h"
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> > +   struct virtio_pmem_request *req_data, *req_buf;
> > +   unsigned long flags;
> > +   unsigned int len;
> > +
> > +   spin_lock_irqsave(>pmem_lock, flags);
> > +   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
> > +   req_data->done = true;
> > +   wake_up(_data->host_acked);
> > +
> > +   if (!list_empty(>req_list)) {
> > +   req_buf = list_first_entry(>req_list,
> > +   struct virtio_pmem_request, list);
> > +   req_buf->wq_buf_avail = true;
> > +   wake_up(_buf->wq_buf);
> > +   list_del(_buf->list);
> > +   }
> > +   }
> > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> 
> Nit: 'host_ack' looks a bit generic for an exported function... would
> 'virtio_pmem_host_ack' maybe be better?

Yes, this looks better. Changed.

> 
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> 
> I don't see an EXPORT_SYMBOL_GPL() for this function... should it get
> one, or should it be made static?

Made static. Leftover from last refactor of 'asyc_pmem_flush'.

> 
> > +{
> > +   struct virtio_device *vdev = nd_region->provider_data;
> > +   struct virtio_pmem *vpmem  = vdev->priv;
> > +   struct virtio_pmem_request *req_data;
> > +   struct scatterlist *sgs[2], sg, ret;
> > +   unsigned long flags;
> > +   int err, err1;
> > +
> > +   might_sleep();
> > +   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> > +   if (!req_data)
> > +   return -ENOMEM;
> > +
> > +   req_data->done = false;
> > +   init_waitqueue_head(_data->host_acked);
> > +   init_waitqueue_head(_data->wq_buf);
> > +   INIT_LIST_HEAD(_data->list);
> > +   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
> > +   sg_init_one(, _data->req, sizeof(req_data->req));
> > +   sgs[0] = 
> >

Re: [Qemu-devel] [PATCH v12 4/7] dm: enable synchronous dax

2019-06-11 Thread Pankaj Gupta


> On Tue, Jun 11 2019 at 12:37pm -0400,
> Pankaj Gupta  wrote:
> 
> > This patch sets dax device 'DAXDEV_SYNC' flag if all the target
> > devices of device mapper support synchrononous DAX. If device
> > mapper consists of both synchronous and asynchronous dax devices,
> > we don't set 'DAXDEV_SYNC' flag.
> > 
> > 'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
> > as argument so that the callers can pass the appropriate functions.
> > 
> > Suggested-by: Mike Snitzer 
> > Signed-off-by: Pankaj Gupta 
> 
> Thanks, and for the benefit of others, passing function pointers like
> this is perfectly fine IMHO because this code is _not_ in the fast
> path.  These methods are only for device creation.
> 
> Reviewed-by: Mike Snitzer 

Thank you, Mike

Best regards,
Pankaj

> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v12 7/7] xfs: disable map_sync for async flush

2019-06-11 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v12 6/7] ext4: disable map_sync for async flush

2019-06-11 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v12 5/7] dax: check synchronous mapping is supported

2019-06-11 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2b106752b1b8..267251a394fa 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -42,6 +42,18 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -69,6 +81,11 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v12 4/7] dm: enable synchronous dax

2019-06-11 Thread Pankaj Gupta
This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.

'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.

Suggested-by: Mike Snitzer 
Signed-off-by: Pankaj Gupta 
---
 drivers/md/dm-table.c | 24 ++--
 drivers/md/dm.c   |  2 +-
 drivers/md/dm.h   |  5 -
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..81c55304c4fa 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -881,7 +881,7 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
+int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
   sector_t start, sector_t len, void *data)
 {
int blocksize = *(int *) data;
@@ -890,7 +890,15 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }
 
-bool dm_table_supports_dax(struct dm_table *t, int blocksize)
+/* Check devices support synchronous DAX */
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
+bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn, int *blocksize)
 {
struct dm_target *ti;
unsigned i;
@@ -903,8 +911,7 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
return false;
 
if (!ti->type->iterate_devices ||
-   !ti->type->iterate_devices(ti, device_supports_dax,
-   ))
+   !ti->type->iterate_devices(ti, iterate_fn, blocksize))
return false;
}
 
@@ -940,6 +947,7 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
+   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -984,7 +992,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, PAGE_SIZE) ||
+   if (dm_table_supports_dax(t, device_supports_dax, _size) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
@@ -1883,6 +1891,7 @@ void dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *q,
   struct queue_limits *limits)
 {
bool wc = false, fua = false;
+   int page_size = PAGE_SIZE;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -1910,8 +1919,11 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);
 
-   if (dm_table_supports_dax(t, PAGE_SIZE))
+   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+   if (dm_table_supports_dax(t, device_synchronous, NULL))
+   set_dax_synchronous(t->md->dax_dev);
+   }
else
blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b1caa7188209..b92c42a72ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1119,7 +1119,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, 
struct block_device *bd
if (!map)
return false;
 
-   ret = dm_table_supports_dax(map, blocksize);
+   ret = dm_table_supports_dax(map, device_supports_dax, );
 
dm_put_live_table(md, srcu_idx);
 
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 17e3db54404c..0475673337f3 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -72,7 +72,10 @@ bool dm_table_bio_based(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-bool dm_table_supports_dax(struct dm_table *t, int blocksize);
+bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
+  i

[PATCH v12 3/7] libnvdimm: add dax_dev sync flag

2019-06-11 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  9 +++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bbd57ca0634a..93b3718b5cfa 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
+void set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -508,7 +522,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -531,6 +545,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1fb1333fefec..7eee7ddc73a8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1970,7 +1970,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   _dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..bdbd2b414d3d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..f3ea5369d20a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..2b106752b1b8 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@

[PATCH v12 1/7] libnvdimm: nd_region flush callback support

2019-06-11 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c

[PATCH v12 2/7] virtio-pmem: Add virtio pmem driver

2019-06-11 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 124 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  35 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..efc535723517
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
+   req_data->done = true;
+   wake_up(_data->host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(_data->host_acked);
+   init_waitqueue_head(_data->wq_buf);
+   INIT_LIST_HEAD(_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(, _data->req, sizeof(req_data->req));
+   sgs[0] = 
+   sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_err(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(_data->list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req_data->wq_buf, req_data->wq_buf_avail);
+   spin_lock

[PATCH v12 0/7] virtio pmem driver

2019-06-11 Thread Pankaj Gupta
TCH v7:
 - Corrected pending request queue logic (patch 2) - Jakub Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/6/10/209
[2] https://lkml.org/lkml/2019/5/21/569
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://marc.info/?l=qemu-devel=155860751202202=2
[6] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[7] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   24 +--
 drivers/md/dm.c  |5 -
 drivers/md/dm.h  |5 +
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  124 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   26 +++-
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   35 +++
 21 files changed, 488 insertions(+), 33 deletions(-)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v11 4/7] dm: enable synchronous dax

2019-06-11 Thread Pankaj Gupta



> 
> > Hi Mike,
> > 
> > Thanks for the review Please find my reply inline.
> > 
> > > 
> > > dm_table_supports_dax() is called multiple times (from
> > > dm_table_set_restrictions and dm_table_determine_type).  It is strange
> > > to have a getter have a side-effect of being a setter too.  Overloading
> > > like this could get you in trouble in the future.
> > > 
> > > Are you certain this is what you want?
> > 
> > I agree with you.
> > 
> > > 
> > > Or would it be better to refactor dm_table_supports_dax() to take an
> > > iterate_devices_fn arg and have callers pass the appropriate function?
> > > Then have dm_table_set_restrictions() caller do:
> > > 
> > >  if (dm_table_supports_dax(t, device_synchronous, NULL))
> > >set_dax_synchronous(t->md->dax_dev);
> > > 
> > > (NULL arg implies dm_table_supports_dax() refactoring would take a int
> > > *data pointer rather than int type).
> > > 
> > > Mike
> > > 
> > 
> > I am sending below patch as per your suggestion. Does it look
> > near to what you have in mind?
> 
> Yes, it does.. just one nit I noticed inlined below.
> 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 350cf0451456..8d89acc8b8c2 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> 
> ...
> 
> > @@ -1910,8 +1919,13 @@ void dm_table_set_restrictions(struct dm_table *t,
> > struct request_queue *q,
> > }
> > blk_queue_write_cache(q, wc, fua);
> > 
> > -   if (dm_table_supports_dax(t, PAGE_SIZE))
> > +   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
> > +
> 
> No need for an empty newline here ^

Sure. Will remove this and send official v12 patchset with the updated patch 4.

Thanks,
Pankaj

> 
> > blk_queue_flag_set(QUEUE_FLAG_DAX, q);
> > +   if (dm_table_supports_dax(t, device_synchronous, NULL))
> > +   set_dax_synchronous(t->md->dax_dev);
> > +   }
> > else
> > blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
> > 
> 
> Thanks,
> Mike
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH v11 4/7] dm: enable synchronous dax

2019-06-11 Thread Pankaj Gupta
Hi Mike,

Thanks for the review Please find my reply inline.

> 
> dm_table_supports_dax() is called multiple times (from
> dm_table_set_restrictions and dm_table_determine_type).  It is strange
> to have a getter have a side-effect of being a setter too.  Overloading
> like this could get you in trouble in the future.
> 
> Are you certain this is what you want?

I agree with you.

> 
> Or would it be better to refactor dm_table_supports_dax() to take an
> iterate_devices_fn arg and have callers pass the appropriate function?
> Then have dm_table_set_restrictions() caller do:
> 
>  if (dm_table_supports_dax(t, device_synchronous, NULL))
>set_dax_synchronous(t->md->dax_dev);
> 
> (NULL arg implies dm_table_supports_dax() refactoring would take a int
> *data pointer rather than int type).
> 
> Mike
> 

I am sending below patch as per your suggestion. Does it look
near to what you have in mind?

Thank you,
Pankaj

===

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..8d89acc8b8c2 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -881,7 +881,7 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);

 /* validate the dax capability of the target device span */
-static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
+int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
   sector_t start, sector_t len, void *data)
 {
int blocksize = *(int *) data;
@@ -890,7 +890,15 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }

-bool dm_table_supports_dax(struct dm_table *t, int blocksize)
+/* Check devices support synchronous DAX */
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
+bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn, int *blocksize)
 {
struct dm_target *ti;
unsigned i;
@@ -903,8 +911,7 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
return false;

if (!ti->type->iterate_devices ||
-   !ti->type->iterate_devices(ti, device_supports_dax,
-   ))
+   !ti->type->iterate_devices(ti, iterate_fn, blocksize))
return false;
}

@@ -940,6 +947,7 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
+   int page_size = PAGE_SIZE;

if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -984,7 +992,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, PAGE_SIZE) ||
+   if (dm_table_supports_dax(t, device_supports_dax, _size) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
@@ -1883,6 +1891,7 @@ void dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *q,
   struct queue_limits *limits)
 {
bool wc = false, fua = false;
+   int page_size = PAGE_SIZE;

/*
 * Copy table's limits to the DM device's request_queue
@@ -1910,8 +1919,13 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);

-   if (dm_table_supports_dax(t, PAGE_SIZE))
+   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
+
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+   if (dm_table_supports_dax(t, device_synchronous, NULL))
+   set_dax_synchronous(t->md->dax_dev);
+   }
else
blk_queue_flag_clear(QUEUE_FLAG_DAX, q);

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b1caa7188209..b92c42a72ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1119,7 +1119,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, 
struct block_device *bd
if (!map)
return false;

-   ret = dm_table_supports_dax(map, blocksize);
+   ret = dm_table_supports_dax(map, device_supports_dax, );

dm_put_live_table(md, srcu_idx);

diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 17e3db54404c..0475673337f3 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -72,7 +72,10 @@ bool dm_table_bio_based(struct dm_table *t);
 bool 

[PATCH v11 7/7] xfs: disable map_sync for async flush

2019-06-10 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 6/7] ext4: disable map_sync for async flush

2019-06-10 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 5/7] dax: check synchronous mapping is supported

2019-06-10 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2b106752b1b8..267251a394fa 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -42,6 +42,18 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -69,6 +81,11 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 4/7] dm: enable synchronous dax

2019-06-10 Thread Pankaj Gupta
 This patch sets dax device 'DAXDEV_SYNC' flag if all the target
 devices of device mapper support synchrononous DAX. If device
 mapper consists of both synchronous and asynchronous dax devices,
 we don't set 'DAXDEV_SYNC' flag.

Signed-off-by: Pankaj Gupta 
---
 drivers/md/dm-table.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..c5160d846fe6 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -890,10 +890,17 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }
 
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
 bool dm_table_supports_dax(struct dm_table *t, int blocksize)
 {
struct dm_target *ti;
unsigned i;
+   bool dax_sync = true;
 
/* Ensure that all targets support DAX. */
for (i = 0; i < dm_table_get_num_targets(t); i++) {
@@ -906,7 +913,14 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
!ti->type->iterate_devices(ti, device_supports_dax,
))
return false;
+
+   /* Check devices support synchronous DAX */
+   if (dax_sync &&
+   !ti->type->iterate_devices(ti, device_synchronous, NULL))
+   dax_sync = false;
}
+   if (dax_sync)
+   set_dax_synchronous(t->md->dax_dev);
 
return true;
 }
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 3/7] libnvdimm: add dax_dev sync flag

2019-06-10 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  9 +++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bbd57ca0634a..93b3718b5cfa 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
+void set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -508,7 +522,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -531,6 +545,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1fb1333fefec..7eee7ddc73a8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1970,7 +1970,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   _dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..bdbd2b414d3d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..f3ea5369d20a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..2b106752b1b8 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@

[PATCH v11 2/7] virtio-pmem: Add virtio pmem driver

2019-06-10 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 124 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  35 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..efc535723517
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
+   req_data->done = true;
+   wake_up(_data->host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(_data->host_acked);
+   init_waitqueue_head(_data->wq_buf);
+   INIT_LIST_HEAD(_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(, _data->req, sizeof(req_data->req));
+   sgs[0] = 
+   sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_err(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(_data->list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req_data->wq_buf, req_data->wq_buf_avail);
+   spin_lock

[PATCH v11 1/7] libnvdimm: nd_region flush callback support

2019-06-10 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c

[PATCH v11 0/7] virtio pmem driver

2019-06-10 Thread Pankaj Gupta
b Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/5/21/569
[2] https://lkml.org/lkml/2019/5/14/465
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://marc.info/?l=qemu-devel=155860751202202=2
[6] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[7] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   14 
 drivers/md/dm.c  |3 
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  124 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   26 +++-
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   35 +++
 20 files changed, 479 insertions(+), 25 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v10 4/7] dm: enable synchronous dax

2019-06-09 Thread Pankaj Gupta


> On Tue, May 21, 2019 at 6:43 AM Pankaj Gupta  wrote:
> >
> >  This patch sets dax device 'DAXDEV_SYNC' flag if all the target
> >  devices of device mapper support synchrononous DAX. If device
> >  mapper consists of both synchronous and asynchronous dax devices,
> >  we don't set 'DAXDEV_SYNC' flag.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/md/dm-table.c | 14 ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index cde3b49b2a91..1cce626ff576 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -886,10 +886,17 @@ static int device_supports_dax(struct dm_target *ti,
> > struct dm_dev *dev,
> > return bdev_dax_supported(dev->bdev, PAGE_SIZE);
> >  }
> >
> > +static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
> > +  sector_t start, sector_t len, void *data)
> > +{
> > +   return dax_synchronous(dev->dax_dev);
> > +}
> > +
> >  static bool dm_table_supports_dax(struct dm_table *t)
> >  {
> > struct dm_target *ti;
> > unsigned i;
> > +   bool dax_sync = true;
> >
> > /* Ensure that all targets support DAX. */
> > for (i = 0; i < dm_table_get_num_targets(t); i++) {
> > @@ -901,7 +908,14 @@ static bool dm_table_supports_dax(struct dm_table *t)
> > if (!ti->type->iterate_devices ||
> > !ti->type->iterate_devices(ti, device_supports_dax,
> > NULL))
> > return false;
> > +
> > +   /* Check devices support synchronous DAX */
> > +   if (dax_sync &&
> > +   !ti->type->iterate_devices(ti, device_synchronous,
> > NULL))
> > +   dax_sync = false;
> 
> Looks like this needs to be rebased on the current state of v5.2-rc,
> and then we can nudge Mike for an ack.

Sorry! for late reply due to vacations. I will rebase the series on v5.2-rc4 and
send a v11.

Thanks,
Pankaj
Yes, 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH v10 2/7] virtio-pmem: Add virtio pmem driver

2019-05-21 Thread Pankaj Gupta


> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index ..7a3e2fe52415
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,35 @@
> > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> > +/*
> > + * Definitions for virtio-pmem devices.
> > + *
> > + * Copyright (C) 2019 Red Hat, Inc.
> > + *
> > + * Author(s): Pankaj Gupta 
> > + */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +struct virtio_pmem_config {
> > +   __le64 start;
> > +   __le64 size;
> > +};
> > +
> 
> config generally should be __u64.
> Are you sure sparse does not complain?

I used this because VIRTIO 1.1 spec says: 
"The device configuration space uses the little-endian format for multi-byte 
fields. "

and __le64 looks ok to me. Also, its used in other driver config as welle.g 
virtio-vsock

> 
> 
> > +#define VIRTIO_PMEM_REQ_TYPE_FLUSH  0
> > +
> > +struct virtio_pmem_resp {
> > +   /* Host return status corresponding to flush request */
> > +   __virtio32 ret;
> > +};
> > +
> > +struct virtio_pmem_req {
> > +   /* command type */
> > +   __virtio32 type;
> > +};
> > +
> > +#endif
> > --
> > 2.20.1
> 
> Sorry why are these __virtio32 not __le32?

I used __virtio32 for data fields for guest and host supporting different 
endianess.
 
Thanks,
Pankaj
> 
> --
> MST
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v10 7/7] xfs: disable map_sync for async flush

2019-05-21 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v10 5/7] dax: check synchronous mapping is supported

2019-05-21 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2b106752b1b8..267251a394fa 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -42,6 +42,18 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -69,6 +81,11 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v10 6/7] ext4: disable map_sync for async flush

2019-05-21 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v10 4/7] dm: enable synchronous dax

2019-05-21 Thread Pankaj Gupta
 This patch sets dax device 'DAXDEV_SYNC' flag if all the target
 devices of device mapper support synchrononous DAX. If device
 mapper consists of both synchronous and asynchronous dax devices,
 we don't set 'DAXDEV_SYNC' flag. 

Signed-off-by: Pankaj Gupta 
---
 drivers/md/dm-table.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index cde3b49b2a91..1cce626ff576 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -886,10 +886,17 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
return bdev_dax_supported(dev->bdev, PAGE_SIZE);
 }
 
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
 static bool dm_table_supports_dax(struct dm_table *t)
 {
struct dm_target *ti;
unsigned i;
+   bool dax_sync = true;
 
/* Ensure that all targets support DAX. */
for (i = 0; i < dm_table_get_num_targets(t); i++) {
@@ -901,7 +908,14 @@ static bool dm_table_supports_dax(struct dm_table *t)
if (!ti->type->iterate_devices ||
!ti->type->iterate_devices(ti, device_supports_dax, NULL))
return false;
+
+   /* Check devices support synchronous DAX */
+   if (dax_sync &&
+   !ti->type->iterate_devices(ti, device_synchronous, NULL))
+   dax_sync = false;
}
+   if (dax_sync)
+   set_dax_synchronous(t->md->dax_dev);
 
return true;
 }
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v10 3/7] libnvdimm: add dax_dev sync flag

2019-05-21 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  9 +++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bbd57ca0634a..93b3718b5cfa 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
+void set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -508,7 +522,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -531,6 +545,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1fb1333fefec..7eee7ddc73a8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1970,7 +1970,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   _dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..bdbd2b414d3d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..f3ea5369d20a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..2b106752b1b8 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@

[PATCH v10 2/7] virtio-pmem: Add virtio pmem driver

2019-05-21 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 124 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  35 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..efc535723517
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, )) != NULL) {
+   req_data->done = true;
+   wake_up(_data->host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(_data->host_acked);
+   init_waitqueue_head(_data->wq_buf);
+   INIT_LIST_HEAD(_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(, _data->req, sizeof(req_data->req));
+   sgs[0] = 
+   sg_init_one(, _data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_err(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(_data->list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req_data->wq_buf, req_data->wq_buf_avail);
+   spin_lock

[PATCH v10 1/7] libnvdimm: nd_region flush callback support

2019-05-21 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c

[PATCH v10 0/7] virtio pmem driver

2019-05-21 Thread Pankaj Gupta
- DavidH
 - Added MST's ack in patch 2.

Changes from PATCH v7:
 - Corrected pending request queue logic (patch 2) - Jakub Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/5/14/465
[2] https://lkml.org/lkml/2019/5/10/447
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel=153572228719237=2 
[7] https://marc.info/?l=qemu-devel=153555721901824=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[9] https://lkml.org/lkml/2019/1/9/1191

 20 files changed, 468 insertions(+), 25 deletions(-)
 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   14 
 drivers/md/dm.c  |3 
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  124 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   26 +++-
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   35 +++
 20 files changed, 479 insertions(+), 25 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-19 Thread Pankaj Gupta


> 
> 
> > On 5/16/19 10:35 PM, Pankaj Gupta wrote:
> > > Can I take it your reviewed/acked-by? or tested-by tag? for the virtio
> > > patch :)I don't feel that I have enough expertise to give the reviewed-by
> > > tag, but you can
> > take my acked-by + tested-by.
> > 
> > Acked-by: Jakub Staron 
> > Tested-by: Jakub Staron 
> > 
> > No kernel panics/stalls encountered during testing this patches (v9) with
> > QEMU + xfstests.
> 
> Thank you for testing and confirming the results. I will add your tested &
> acked-by in v10.
> 
> > Some CPU stalls encountered while testing with crosvm instead of QEMU with
> > xfstests
> > (test generic/464) but no repro for QEMU, so the fault may be on the side
> > of
> > crosvm.
> 
> yes, looks like crosvm related as we did not see any of this in my and your
> testing with Qemu.

Also, they don't seem to be related with virtio-pmem.

Thanks,
Pankaj
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-19 Thread Pankaj Gupta


> On 5/16/19 10:35 PM, Pankaj Gupta wrote:
> > Can I take it your reviewed/acked-by? or tested-by tag? for the virtio
> > patch :)I don't feel that I have enough expertise to give the reviewed-by
> > tag, but you can
> take my acked-by + tested-by.
> 
> Acked-by: Jakub Staron 
> Tested-by: Jakub Staron 
> 
> No kernel panics/stalls encountered during testing this patches (v9) with
> QEMU + xfstests.

Thank you for testing and confirming the results. I will add your tested &
acked-by in v10.

> Some CPU stalls encountered while testing with crosvm instead of QEMU with
> xfstests
> (test generic/464) but no repro for QEMU, so the fault may be on the side of
> crosvm.

yes, looks like crosvm related as we did not see any of this in my and your
testing with Qemu. 

> 
> 
> The dump for the crosvm/xfstests stall:
> [ 2504.175276] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 2504.176681] rcu: 0-...!: (1 GPs behind) idle=9b2/1/0x4000
> softirq=1089198/1089202 fqs=0
> [ 2504.178270] rcu: 2-...!: (1 ticks this GP)
> idle=cfe/1/0x4002 softirq=1055108/1055110 fqs=0
> [ 2504.179802] rcu: 3-...!: (1 GPs behind) idle=1d6/1/0x4002
> softirq=1046798/1046802 fqs=0
> [ 2504.181215] rcu: 4-...!: (2 ticks this GP)
> idle=522/1/0x4002 softirq=1249063/1249064 fqs=0
> [ 2504.182625] rcu: 5-...!: (1 GPs behind) idle=6da/1/0x4000
> softirq=1131036/1131047 fqs=0
> [ 2504.183955]  (detected by 3, t=0 jiffies, g=1232529, q=1370)
> [ 2504.184762] Sending NMI from CPU 3 to CPUs 0:
> [ 2504.186400] NMI backtrace for cpu 0
> [ 2504.186401] CPU: 0 PID: 6670 Comm: 464 Not tainted 5.1.0+ #1
> [ 2504.186401] Hardware name: ChromiumOS crosvm, BIOS 0
> [ 2504.186402] RIP: 0010:queued_spin_lock_slowpath+0x1c/0x1e0
> [ 2504.186402] Code: e7 89 c8 f0 44 0f b1 07 39 c1 75 dc f3 c3 0f 1f 44 00 00
> ba 01 00 00 00 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2 f3 c3 f3 90  ec
> 81 fe 00 01 00 00 0f 84 ab 00 00 00 81 e6 00 ff ff ff 75 44
> [ 2504.186403] RSP: 0018:c9003ee8 EFLAGS: 0002
> [ 2504.186404] RAX: 0001 RBX: 0246 RCX:
> 00404044
> [ 2504.186404] RDX: 0001 RSI: 0001 RDI:
> 8244a280
> [ 2504.186405] RBP: 8244a280 R08: 000f4200 R09:
> 024709ed6c32
> [ 2504.186405] R10:  R11: 0001 R12:
> 8244a280
> [ 2504.186405] R13: 0009 R14: 0009 R15:
> 
> [ 2504.186406] FS:  () GS:8880cc60()
> knlGS:
> [ 2504.186406] CS:  0010 DS:  ES:  CR0: 80050033
> [ 2504.186406] CR2: 7efd6b0f15d8 CR3: 0260a006 CR4:
> 00360ef0
> [ 2504.186407] DR0:  DR1:  DR2:
> 
> [ 2504.186407] DR3:  DR6: fffe0ff0 DR7:
> 0400
> [ 2504.186407] Call Trace:
> [ 2504.186408]  
> [ 2504.186408]  _raw_spin_lock_irqsave+0x1d/0x30
> [ 2504.186408]  rcu_core+0x3b6/0x740
> [ 2504.186408]  ? __hrtimer_run_queues+0x133/0x280
> [ 2504.186409]  ? recalibrate_cpu_khz+0x10/0x10
> [ 2504.186409]  __do_softirq+0xd8/0x2e4
> [ 2504.186409]  irq_exit+0xa3/0xb0
> [ 2504.186410]  smp_apic_timer_interrupt+0x67/0x120
> [ 2504.186410]  apic_timer_interrupt+0xf/0x20
> [ 2504.186410]  
> [ 2504.186410] RIP: 0010:unmap_page_range+0x47a/0x9b0
> [ 2504.186411] Code: 0f 46 46 10 49 39 6e 18 49 89 46 10 48 89 e8 49 0f 43 46
> 18 41 80 4e 20 08 4d 85 c9 49 89 46 18 0f 84 68 ff ff ff 49 8b 51 08 <48> 8d
> 42 ff 83 e2 01 49 0f 44 c1 f6 40 18 01 75 38 48 ba ff 0f 00
> [ 2504.186411] RSP: 0018:c900036cbcc8 EFLAGS: 0282 ORIG_RAX:
> ff13
> [ 2504.186412] RAX:  RBX: 80003751d045 RCX:
> 0001
> [ 2504.186413] RDX: ea0002e09288 RSI: 0269b000 RDI:
> 8880b6525e40
> [ 2504.186413] RBP: 0269c000 R08:  R09:
> eadd4740
> [ 2504.186413] R10: ea0001755700 R11: 8880cc62d120 R12:
> 02794000
> [ 2504.186414] R13: 0269b000 R14: c900036cbdf0 R15:
> 8880572434d8
> [ 2504.186414]  ? unmap_page_range+0x420/0x9b0
> [ 2504.186414]  ? release_pages+0x175/0x390
> [ 2504.186414]  unmap_vmas+0x7c/0xe0
> [ 2504.186415]  exit_mmap+0xa4/0x190
> [ 2504.186415]  mmput+0x3b/0x100
> [ 2504.186415]  do_exit+0x276/0xc10
> [ 2504.186415]  do_group_exit+0x35/0xa0
> [ 2504.186415]  __x64_sys_exit_group+0xf/0x10
> [ 2504.186416]  do_syscall_64+0x43/0x120
> [ 2504.186416]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 2504.186416] RIP: 0033:0x7efd6ae10618
> [ 2504.186416] Code: Bad RIP value.
> [ 2504.18

Re: [Qemu-devel] [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-16 Thread Pankaj Gupta



Hi Jakub,

> 
> On 5/14/19 7:54 AM, Pankaj Gupta wrote:
> > +   if (!list_empty(>req_list)) {
> > +   req_buf = list_first_entry(>req_list,
> > +   struct virtio_pmem_request, list);
> > +   req_buf->wq_buf_avail = true;
> > +   wake_up(_buf->wq_buf);
> > +   list_del(_buf->list);
> Yes, this change is the right one, thank you!

Thank you for the confirmation.

> 
> > +/*
> > + * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
> > + * queue does not have free descriptor. We add the request
> > + * to req_list and wait for host_ack to wake us up when free
> > + * slots are available.
> > + */
> > +   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > +   GFP_ATOMIC)) == -ENOSPC) {
> > +
> > +   dev_err(>dev, "failed to send command to virtio pmem" \
> > +   "device, no free slots in the virtqueue\n");
> > +   req->wq_buf_avail = false;
> > +   list_add_tail(>list, >req_list);
> > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > +
> > +   /* A host response results in "host_ack" getting called */
> > +   wait_event(req->wq_buf, req->wq_buf_avail);
> > +   spin_lock_irqsave(>pmem_lock, flags);
> > +   }
> > +   err1 = virtqueue_kick(vpmem->req_vq);
> > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > +
> > +   /*
> > +* virtqueue_add_sgs failed with error different than -ENOSPC, we can't
> > +* do anything about that.
> > +*/
> > +   if (err || !err1) {
> > +   dev_info(>dev, "failed to send command to virtio pmem 
> > device\n");
> > +   err = -EIO;
> > +   } else {
> > +   /* A host repsonse results in "host_ack" getting called */
> > +   wait_event(req->host_acked, req->done);
> > +   err = req->ret;
> > +I confirm that the failures I was facing with the `-ENOSPC` error path are
> > not present in v9.

Can I take it your reviewed/acked-by? or tested-by tag? for the virtio patch :)

Thank you,
Pankaj

> 
> Best,
> Jakub Staron
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-16 Thread Pankaj Gupta


> 
> On Wed, May 15, 2019 at 10:46:00PM +0200, David Hildenbrand wrote:
> > > + vpmem->vdev = vdev;
> > > + vdev->priv = vpmem;
> > > + err = init_vq(vpmem);
> > > + if (err) {
> > > + dev_err(>dev, "failed to initialize virtio pmem vq's\n");
> > > + goto out_err;
> > > + }
> > > +
> > > + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > + start, >start);
> > > + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > + size, >size);
> > > +
> > > + res.start = vpmem->start;
> > > + res.end   = vpmem->start + vpmem->size-1;
> > 
> > nit: " - 1;"
> > 
> > > + vpmem->nd_desc.provider_name = "virtio-pmem";
> > > + vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > + vpmem->nvdimm_bus = nvdimm_bus_register(>dev,
> > > + >nd_desc);
> > > + if (!vpmem->nvdimm_bus) {
> > > + dev_err(>dev, "failed to register device with 
> > > nvdimm_bus\n");
> > > + err = -ENXIO;
> > > + goto out_vq;
> > > + }
> > > +
> > > + dev_set_drvdata(>dev, vpmem->nvdimm_bus);
> > > +
> > > + ndr_desc.res = 
> > > + ndr_desc.numa_node = nid;
> > > + ndr_desc.flush = async_pmem_flush;
> > > + set_bit(ND_REGION_PAGEMAP, _desc.flags);
> > > + set_bit(ND_REGION_ASYNC, _desc.flags);
> > > + nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, _desc);
> > > + if (!nd_region) {
> > > + dev_err(>dev, "failed to create nvdimm region\n");
> > > + err = -ENXIO;
> > > + goto out_nd;
> > > + }
> > > + nd_region->provider_data =
> > > dev_to_virtio(nd_region->dev.parent->parent);
> > > + return 0;
> > > +out_nd:
> > > + nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > +out_vq:
> > > + vdev->config->del_vqs(vdev);
> > > +out_err:
> > > + return err;
> > > +}
> > > +
> > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > +{
> > > + struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(>dev);
> > > +
> > > + nvdimm_bus_unregister(nvdimm_bus);
> > > + vdev->config->del_vqs(vdev);
> > > + vdev->config->reset(vdev);
> > > +}
> > > +
> > > +static struct virtio_driver virtio_pmem_driver = {
> > > + .driver.name= KBUILD_MODNAME,
> > > + .driver.owner   = THIS_MODULE,
> > > + .id_table   = id_table,
> > > + .probe  = virtio_pmem_probe,
> > > + .remove = virtio_pmem_remove,
> > > +};
> > > +
> > > +module_virtio_driver(virtio_pmem_driver);
> > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
> > > new file mode 100644
> > > index ..ab1da877575d
> > > --- /dev/null
> > > +++ b/drivers/nvdimm/virtio_pmem.h
> > > @@ -0,0 +1,60 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * virtio_pmem.h: virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + **/
> > > +
> > > +#ifndef _LINUX_VIRTIO_PMEM_H
> > > +#define _LINUX_VIRTIO_PMEM_H
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +struct virtio_pmem_request {
> > > + /* Host return status corresponding to flush request */
> > > + int ret;
> > > +
> > > + /* command name*/
> > > + char name[16];
> > 
> > So ... why are we sending string commands and expect native-endianess
> > integers and don't define a proper request/response structure + request
> > types in include/uapi/linux/virtio_pmem.h like
> 
> passing names could be ok.
> I missed the fact we return a native endian int.
> Pls fix that.

Sure. will fix this.

> 
> 
> > 
> > struct virtio_pmem_resp {
> > __virtio32 ret;
> > }
> > 
> > #define VIRTIO_PMEM_REQ_TYPE_FLUSH  1
> > struct virtio_pmem_req {
> > __virtio16 type;
> > }
> > 
> > ... and this way we also define a proper endianess format for exchange
> > and keep it extensible
> > 
> > @MST, what's your take on this?
> 
> Extensions can always use feature bits so I don't think
> it's a problem.

That was exactly my thought when I implemented this. Though I am
fine with separate structures for request/response and I made the
change. 

Thank you for all the comments.

Best regards,
Pankaj 
> > 
> > --
> > 
> > Thanks,
> > 
> > David / dhildenb
> 
> 


Re: [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-16 Thread Pankaj Gupta


> >> +  vpmem->vdev = vdev;
> >> +  vdev->priv = vpmem;
> >> +  err = init_vq(vpmem);
> >> +  if (err) {
> >> +  dev_err(>dev, "failed to initialize virtio pmem vq's\n");
> >> +  goto out_err;
> >> +  }
> >> +
> >> +  virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> >> +  start, >start);
> >> +  virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> >> +  size, >size);
> >> +
> >> +  res.start = vpmem->start;
> >> +  res.end   = vpmem->start + vpmem->size-1;
> > 
> > nit: " - 1;"
> > 
> >> +  vpmem->nd_desc.provider_name = "virtio-pmem";
> >> +  vpmem->nd_desc.module = THIS_MODULE;
> >> +
> >> +  vpmem->nvdimm_bus = nvdimm_bus_register(>dev,
> >> +  >nd_desc);
> >> +  if (!vpmem->nvdimm_bus) {
> >> +  dev_err(>dev, "failed to register device with 
> >> nvdimm_bus\n");
> >> +  err = -ENXIO;
> >> +  goto out_vq;
> >> +  }
> >> +
> >> +  dev_set_drvdata(>dev, vpmem->nvdimm_bus);
> >> +
> >> +  ndr_desc.res = 
> >> +  ndr_desc.numa_node = nid;
> >> +  ndr_desc.flush = async_pmem_flush;
> >> +  set_bit(ND_REGION_PAGEMAP, _desc.flags);
> >> +  set_bit(ND_REGION_ASYNC, _desc.flags);
> >> +  nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, _desc);
> >> +  if (!nd_region) {
> >> +  dev_err(>dev, "failed to create nvdimm region\n");
> >> +  err = -ENXIO;
> >> +  goto out_nd;
> >> +  }
> >> +  nd_region->provider_data = dev_to_virtio(nd_region->dev.parent->parent);
> >> +  return 0;
> >> +out_nd:
> >> +  nvdimm_bus_unregister(vpmem->nvdimm_bus);
> >> +out_vq:
> >> +  vdev->config->del_vqs(vdev);
> >> +out_err:
> >> +  return err;
> >> +}
> >> +
> >> +static void virtio_pmem_remove(struct virtio_device *vdev)
> >> +{
> >> +  struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(>dev);
> >> +
> >> +  nvdimm_bus_unregister(nvdimm_bus);
> >> +  vdev->config->del_vqs(vdev);
> >> +  vdev->config->reset(vdev);
> >> +}
> >> +
> >> +static struct virtio_driver virtio_pmem_driver = {
> >> +  .driver.name= KBUILD_MODNAME,
> >> +  .driver.owner   = THIS_MODULE,
> >> +  .id_table   = id_table,
> >> +  .probe  = virtio_pmem_probe,
> >> +  .remove = virtio_pmem_remove,
> >> +};
> >> +
> >> +module_virtio_driver(virtio_pmem_driver);
> >> +MODULE_DEVICE_TABLE(virtio, id_table);
> >> +MODULE_DESCRIPTION("Virtio pmem driver");
> >> +MODULE_LICENSE("GPL");
> >> diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
> >> new file mode 100644
> >> index ..ab1da877575d
> >> --- /dev/null
> >> +++ b/drivers/nvdimm/virtio_pmem.h
> >> @@ -0,0 +1,60 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +/*
> >> + * virtio_pmem.h: virtio pmem Driver
> >> + *
> >> + * Discovers persistent memory range information
> >> + * from host and provides a virtio based flushing
> >> + * interface.
> >> + **/
> >> +
> >> +#ifndef _LINUX_VIRTIO_PMEM_H
> >> +#define _LINUX_VIRTIO_PMEM_H
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +struct virtio_pmem_request {
> >> +  /* Host return status corresponding to flush request */
> >> +  int ret;
> >> +
> >> +  /* command name*/
> >> +  char name[16];
> > 
> > So ... why are we sending string commands and expect native-endianess
> > integers and don't define a proper request/response structure + request
> > types in include/uapi/linux/virtio_pmem.h like
> > 
> > struct virtio_pmem_resp {
> > __virtio32 ret;
> > }
> 
> FWIW, I wonder if we should even properly translate return values and
> define types like
> 
> VIRTIO_PMEM_RESP_TYPE_OK  0
> VIRTIO_PMEM_RESP_TYPE_EIO 1

Don't think these are required as only failure and success
return types easy to understand.

Thanks,
Pankaj
> 
> ..
> 
> > 
> > #define VIRTIO_PMEM_REQ_TYPE_FLUSH  1
> > struct virtio_pmem_req {
> > __virtio16 type;
> > }
> > 
> > ... and this way we also define a proper endianess format for exchange
> > and keep it extensible
> > 
> > @MST, what's your take on this?
> > 
> > 
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-16 Thread Pankaj Gupta


> 
> > +   vpmem->vdev = vdev;
> > +   vdev->priv = vpmem;
> > +   err = init_vq(vpmem);
> > +   if (err) {
> > +   dev_err(>dev, "failed to initialize virtio pmem vq's\n");
> > +   goto out_err;
> > +   }
> > +
> > +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +   start, >start);
> > +   virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +   size, >size);
> > +
> > +   res.start = vpmem->start;
> > +   res.end   = vpmem->start + vpmem->size-1;
> 
> nit: " - 1;"

Sure.

> 
> > +   vpmem->nd_desc.provider_name = "virtio-pmem";
> > +   vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +   vpmem->nvdimm_bus = nvdimm_bus_register(>dev,
> > +   >nd_desc);
> > +   if (!vpmem->nvdimm_bus) {
> > +   dev_err(>dev, "failed to register device with 
> > nvdimm_bus\n");
> > +   err = -ENXIO;
> > +   goto out_vq;
> > +   }
> > +
> > +   dev_set_drvdata(>dev, vpmem->nvdimm_bus);
> > +
> > +   ndr_desc.res = 
> > +   ndr_desc.numa_node = nid;
> > +   ndr_desc.flush = async_pmem_flush;
> > +   set_bit(ND_REGION_PAGEMAP, _desc.flags);
> > +   set_bit(ND_REGION_ASYNC, _desc.flags);
> > +   nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, _desc);
> > +   if (!nd_region) {
> > +   dev_err(>dev, "failed to create nvdimm region\n");
> > +   err = -ENXIO;
> > +   goto out_nd;
> > +   }
> > +   nd_region->provider_data = dev_to_virtio(nd_region->dev.parent->parent);
> > +   return 0;
> > +out_nd:
> > +   nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +   vdev->config->del_vqs(vdev);
> > +out_err:
> > +   return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +   struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(>dev);
> > +
> > +   nvdimm_bus_unregister(nvdimm_bus);
> > +   vdev->config->del_vqs(vdev);
> > +   vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +   .driver.name= KBUILD_MODNAME,
> > +   .driver.owner   = THIS_MODULE,
> > +   .id_table   = id_table,
> > +   .probe  = virtio_pmem_probe,
> > +   .remove = virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
> > new file mode 100644
> > index ..ab1da877575d
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.h
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +struct virtio_pmem_request {
> > +   /* Host return status corresponding to flush request */
> > +   int ret;
> > +
> > +   /* command name*/
> > +   char name[16];
> 
> So ... why are we sending string commands and expect native-endianess
> integers and don't define a proper request/response structure + request
> types in include/uapi/linux/virtio_pmem.h like
> 
> struct virtio_pmem_resp {
>   __virtio32 ret;
> }
> 
> #define VIRTIO_PMEM_REQ_TYPE_FLUSH1
> struct virtio_pmem_req {
>   __virtio16 type;
> }
> 
> ... and this way we also define a proper endianess format for exchange
> and keep it extensible

Done the suggested change.

Thank you,
Pankaj

> 
> @MST, what's your take on this?
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 


Re: [Qemu-devel] [PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta


> On 5/14/19 7:54 AM, Pankaj Gupta wrote:
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..94bad084ebab 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,17 @@ config VIRTIO_PCI_LEGACY
> >  
> >   If unsure, say Y.
> >  
> > +config VIRTIO_PMEM
> > +   tristate "Support for virtio pmem driver"
> > +   depends on VIRTIO
> > +   depends on LIBNVDIMM
> > +   help
> > +   This driver provides access to virtio-pmem devices, storage devices
> > +   that are mapped into the physical address space - similar to NVDIMMs
> > +- with a virtio-based flushing interface.
> > +
> > +   If unsure, say M.
> 
> 
> from Documentation/process/coding-style.rst:
> "Lines under a ``config`` definition
> are indented with one tab, while help text is indented an additional two
> spaces."

ah... I changed help text and 'checkpatch' did not say anything :( .

Will wait for Dan, If its possible to add two spaces to help text while applying
the series.

Thanks,
Pankaj
  

> 
> > +
> >  config VIRTIO_BALLOON
> > tristate "Virtio balloon driver"
> > depends on VIRTIO
> 
> thanks.
> --
> ~Randy
> 
> 


[PATCH v9 7/7] xfs: disable map_sync for async flush

2019-05-14 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1



[PATCH v9 6/7] ext4: disable map_sync for async flush

2019-05-14 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1



[PATCH v9 5/7] dax: check synchronous mapping is supported

2019-05-14 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2b106752b1b8..267251a394fa 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -42,6 +42,18 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -69,6 +81,11 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1



[PATCH v9 4/7] dm: enable synchronous dax

2019-05-14 Thread Pankaj Gupta
 This patch sets dax device 'DAXDEV_SYNC' flag if all the target
 devices of device mapper support synchrononous DAX. If device
 mapper consists of both synchronous and asynchronous dax devices,
 we don't set 'DAXDEV_SYNC' flag. 

Signed-off-by: Pankaj Gupta 
---
 drivers/md/dm-table.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index cde3b49b2a91..1cce626ff576 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -886,10 +886,17 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
return bdev_dax_supported(dev->bdev, PAGE_SIZE);
 }
 
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
 static bool dm_table_supports_dax(struct dm_table *t)
 {
struct dm_target *ti;
unsigned i;
+   bool dax_sync = true;
 
/* Ensure that all targets support DAX. */
for (i = 0; i < dm_table_get_num_targets(t); i++) {
@@ -901,7 +908,14 @@ static bool dm_table_supports_dax(struct dm_table *t)
if (!ti->type->iterate_devices ||
!ti->type->iterate_devices(ti, device_supports_dax, NULL))
return false;
+
+   /* Check devices support synchronous DAX */
+   if (dax_sync &&
+   !ti->type->iterate_devices(ti, device_synchronous, NULL))
+   dax_sync = false;
}
+   if (dax_sync)
+   set_dax_synchronous(t->md->dax_dev);
 
return true;
 }
-- 
2.20.1



[PATCH v9 3/7] libnvdimm: add dax_dev sync flag

2019-05-14 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  9 +++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bbd57ca0634a..93b3718b5cfa 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
+void set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -508,7 +522,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -531,6 +545,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 043f0761e4a0..ee007b75d9fd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
+DAXDEV_F_SYNC);
if (!dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..bdbd2b414d3d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..f3ea5369d20a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..2b106752b1b8 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7

[PATCH v9 2/7] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 126 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  60 +++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 331 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..dd95cfecfe4c
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
+   req->done = true;
+   wake_up(>host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   list_del(_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_pmem_request *req;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(>host_acked);
+   init_waitqueue_head(>wq_buf);
+   INIT_LIST_HEAD(>list);
+   sg_init_one(, req->name, strlen(req->name));
+   sgs[0] = 
+   sg_init_one(, >ret, sizeof(req->ret));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_err(>dev, "failed to send command to virtio pmem" \
+   "device, no free slots in the virtqueue\n");
+   req->wq_buf_avail = false;
+   list_add_tail(>list, >req_list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* A host response results in "host_ack" getting called */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(>pmem_lock, flags);
+   }
+   err1 = virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+

[PATCH v9 1/7] libnvdimm: nd_region flush callback support

2019-05-14 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  8 +++-
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..08dde76cf459 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..f719245da170 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b

[PATCH v9 0/7] virtio pmem driver

2019-05-14 Thread Pankaj Gupta
amp; patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/5/10/447
[2] https://lkml.org/lkml/2019/4/26/36
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel=153572228719237=2 
[7] https://marc.info/?l=qemu-devel=153555721901824=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[9] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   14 
 drivers/md/dm.c  |3 
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  126 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 +
 drivers/nvdimm/virtio_pmem.h |   60 ++
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   26 +++-
 include/linux/libnvdimm.h|9 ++
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   10 +++
 20 files changed, 460 insertions(+), 25 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta


> >>
> >> I think you should do the same here, vdev->priv is allocated in
> >> virtio_pmem_probe.
> >>
> >> But maybe I am missing something important here :)
> > 
> > Because virtio_balloon use "kzalloc" for allocation and needs to be freed.
> > But virtio pmem uses "devm_kzalloc" which takes care of automatically
> > deleting
> > the device memory when associated device is detached.
> 
> Hehe, thanks, that was the part that I was missing!

Thank you for the review.

Best regards,
Pankaj
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 


Re: [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta


> 
> >>
> >>> + }
> >>> +
> >>> + /* When host has read buffer, this completes via host_ack */
> >>
> >> "A host repsonse results in "host_ack" getting called" ... ?
> >>
> >>> + wait_event(req->host_acked, req->done);
> >>> + err = req->ret;
> >>> +ret:
> >>> + kfree(req);
> >>> + return err;
> >>> +};
> >>> +
> >>> +/* The asynchronous flush callback function */
> >>> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> >>> +{
> >>> + int rc = 0;
> >>> +
> >>> + /* Create child bio for asynchronous flush and chain with
> >>> +  * parent bio. Otherwise directly call nd_region flush.
> >>> +  */
> >>> + if (bio && bio->bi_iter.bi_sector != -1) {
> >>> + struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> >>> +
> >>> + if (!child)
> >>> + return -ENOMEM;
> >>> + bio_copy_dev(child, bio);
> >>> + child->bi_opf = REQ_PREFLUSH;
> >>> + child->bi_iter.bi_sector = -1;
> >>> + bio_chain(child, bio);
> >>> + submit_bio(child);
> >>
> >> return 0;
> >>
> >> Then, drop the "else" case and "int rc" and do directly
> >>
> >> if (virtio_pmem_flush(nd_region))
> >>return -EIO;
> > 
> > and another 'return 0' here :)
> > 
> > I don't like return from multiple places instead I prefer
> > single exit point from function.
> 
> Makes this function more complicated than necessary. I agree when there
> are locks involved.

o.k. I will change as you suggest :)

> 
> >  
> >>
> >>> +
> >>> + return 0;
> >>> +};
> >>> +
> >>> +static int virtio_pmem_probe(struct virtio_device *vdev)
> >>> +{
> >>> + int err = 0;
> >>> + struct resource res;
> >>> + struct virtio_pmem *vpmem;
> >>> + struct nd_region_desc ndr_desc = {};
> >>> + int nid = dev_to_node(>dev);
> >>> + struct nd_region *nd_region;
> >>
> >> Nit: use reverse Christmas tree layout :)
> > 
> > Done.
> > 
> >>
> >>> +
> >>> + if (!vdev->config->get) {
> >>> + dev_err(>dev, "%s failure: config access disabled\n",
> >>> + __func__);
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + vpmem = devm_kzalloc(>dev, sizeof(*vpmem), GFP_KERNEL);
> >>> + if (!vpmem) {
> >>> + err = -ENOMEM;
> >>> + goto out_err;
> >>> + }
> >>> +
> >>> + vpmem->vdev = vdev;
> >>> + vdev->priv = vpmem;
> >>> + err = init_vq(vpmem);
> >>> + if (err)
> >>> + goto out_err;
> >>> +
> >>> + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> >>> + start, >start);
> >>> + virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> >>> + size, >size);
> >>> +
> >>> + res.start = vpmem->start;
> >>> + res.end   = vpmem->start + vpmem->size-1;
> >>> + vpmem->nd_desc.provider_name = "virtio-pmem";
> >>> + vpmem->nd_desc.module = THIS_MODULE;
> >>> +
> >>> + vpmem->nvdimm_bus = nvdimm_bus_register(>dev,
> >>> + >nd_desc);
> >>> + if (!vpmem->nvdimm_bus)
> >>> + goto out_vq;
> >>> +
> >>> + dev_set_drvdata(>dev, vpmem->nvdimm_bus);
> >>> +
> >>> + ndr_desc.res = 
> >>> + ndr_desc.numa_node = nid;
> >>> + ndr_desc.flush = async_pmem_flush;
> >>> + set_bit(ND_REGION_PAGEMAP, _desc.flags);
> >>> + set_bit(ND_REGION_ASYNC, _desc.flags);
> >>> + nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, _desc);
> >>> +
> >>
> >> I'd drop this empty line.
> > 
> > hmm.
> > 
> 
> The common pattern after allocating something, immediately check for it
> in the next line (like you do throughout this patch ;) )

Right. But rare times when I see space will beauty the code I tend to
add it. Maybe I should not :)

> 
> ...
> >> You are not freeing "vdev->priv".
> > 
> > vdev->priv is vpmem which is allocated using devm API.
> 
> I'm confused. Looking at drivers/virtio/virtio_balloon.c:
> 
> static void virtballoon_remove(struct virtio_device *vdev)
> {
>   struct virtio_balloon *vb = vdev->priv;
> 
>   ...
> 
>   kfree(vb);
> }
> 
> I think you should do the same here, vdev->priv is allocated in
> virtio_pmem_probe.
> 
> But maybe I am missing something important here :)

Because virtio_balloon use "kzalloc" for allocation and needs to be freed. 
But virtio pmem uses "devm_kzalloc" which takes care of automatically deleting 
the device memory when associated device is detached.

Thanks,
Pankaj
> 
> >>
> >>> + nvdimm_bus_unregister(nvdimm_bus);
> >>> + vdev->config->del_vqs(vdev);
> >>> + vdev->config->reset(vdev);
> >>> +}
> >>> +
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-14 Thread Pankaj Gupta


Hi David,

Thank you for the review.

> On 10.05.19 17:51, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > Reviewed-by: Yuval Shaia 
> > ---
> >  drivers/nvdimm/Makefile  |   1 +
> >  drivers/nvdimm/nd_virtio.c   | 129 +++
> >  drivers/nvdimm/virtio_pmem.c | 117 
> >  drivers/virtio/Kconfig   |  10 +++
> >  include/linux/virtio_pmem.h  |  60 ++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 328 insertions(+)
> >  create mode 100644 drivers/nvdimm/nd_virtio.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> > index 6f2a088afad6..cefe233e0b52 100644
> > --- a/drivers/nvdimm/Makefile
> > +++ b/drivers/nvdimm/Makefile
> > @@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
> >  obj-$(CONFIG_ND_BLK) += nd_blk.o
> >  obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
> >  obj-$(CONFIG_OF_PMEM) += of_pmem.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
> >  
> >  nd_pmem-y := pmem.o
> >  
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > new file mode 100644
> > index ..ed7ddcc5a62c
> > --- /dev/null
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -0,0 +1,129 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include 
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +   unsigned int len;
> > +   unsigned long flags;
> > +   struct virtio_pmem_request *req, *req_buf;
> > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> 
> Nit: use reverse Christmas tree layout :)

o.k

> 
> > +
> > +   spin_lock_irqsave(>pmem_lock, flags);
> > +   while ((req = virtqueue_get_buf(vq, )) != NULL) {
> > +   req->done = true;
> > +   wake_up(>host_acked);
> > +
> > +   if (!list_empty(>req_list)) {
> > +   req_buf = list_first_entry(>req_list,
> > +   struct virtio_pmem_request, list);
> > +   req_buf->wq_buf_avail = true;
> > +   wake_up(_buf->wq_buf);
> > +   list_del(_buf->list);
> > +   }
> > +   }
> > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +   int err, err1;
> > +   unsigned long flags;
> > +   struct scatterlist *sgs[2], sg, ret;
> > +   struct virtio_device *vdev = nd_region->provider_data;
> > +   struct virtio_pmem *vpmem = vdev->priv;
> > +   struct virtio_pmem_request *req;
> 
> Nit: use reverse Christmas tree layout :)

o.k

> 
> > +
> > +   might_sleep();
> > +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +   if (!req)
> > +   return -ENOMEM;
> > +
> > +   req->done = false;
> > +   strcpy(req->name, "FLUSH");
> > +   init_waitqueue_head(>host_acked);
> > +   init_waitqueue_head(>wq_buf);
> > +   INIT_LIST_HEAD(>list);
> > +   sg_init_one(, req->name, strlen(req->name));
> > +   sgs[0] = 
> > +   sg_init_one(, >ret, sizeof(req->ret));
> > +   sgs[1] = 
> > +
> > +   spin_loc

Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-13 Thread Pankaj Gupta


> >
> >
> > Hi Dan,
> >
> > While testing device mapper with DAX, I faced a bug with the commit:
> >
> > commit ad428cdb525a97d15c0349fdc80f3d58befb50df
> > Author: Dan Williams 
> > Date:   Wed Feb 20 21:12:50 2019 -0800
> >
> > When I reverted the condition to old code[1] it worked for me. I
> > am thinking when we map two different devices (e.g with device mapper),
> > will
> > start & end pfn still point to same pgmap? Or there is something else which
> > I am missing here.
> >
> > Note: I tested only EXT4.
> >
> > [1]
> >
> > -   if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX)
> > +   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
> > +   if (pgmap && pgmap == end_pgmap && pgmap->type ==
> > MEMORY_DEVICE_FS_DAX
> > +   && pfn_t_to_page(pfn)->pgmap == pgmap
> > +   && pfn_t_to_page(end_pfn)->pgmap == pgmap
> > +   && pfn_t_to_pfn(pfn) ==
> > PHYS_PFN(__pa(kaddr))
> > +   && pfn_t_to_pfn(end_pfn) ==
> > PHYS_PFN(__pa(end_kaddr)))
> 
> Ugh, yes, device-mapper continues to be an awkward fit for dax (or
> vice versa). We would either need a way to have a multi-level pfn to
> pagemap lookup for composite devices, or a way to discern that even
> though the pagemap is different that the result is still valid / not
> an indication that we have leaked into an unassociated address range.
> Perhaps a per-daxdev callback for ->dax_supported() so that
> device-mapper internals can be used for this validation.

Yes, Will look at it.

> 
> We need to get that fixed up, but I don't see it as a blocker /
> pre-requisite for virtio-pmem.

Agree. Will send virtio-pmem patch series.

Thank you,
Pankaj
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-13 Thread Pankaj Gupta


Hi Dan,

While testing device mapper with DAX, I faced a bug with the commit:

commit ad428cdb525a97d15c0349fdc80f3d58befb50df
Author: Dan Williams 
Date:   Wed Feb 20 21:12:50 2019 -0800

When I reverted the condition to old code[1] it worked for me. I 
am thinking when we map two different devices (e.g with device mapper), will 
start & end pfn still point to same pgmap? Or there is something else which
I am missing here. 

Note: I tested only EXT4.

[1]

-   if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX)
+   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
+   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
+   && pfn_t_to_page(pfn)->pgmap == pgmap
+   && pfn_t_to_page(end_pfn)->pgmap == pgmap
+   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
+   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
dax_enabled = true;
put_dev_pagemap(pgmap);

Thanks,
Pankaj




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH v8 2/6] virtio-pmem: Add virtio pmem driver

2019-05-12 Thread Pankaj Gupta


> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > Reviewed-by: Yuval Shaia 
> 
> Acked-by: Michael S. Tsirkin 

Thank you, Michael.

Best regards,
Pankaj

> 
> > ---
> >  drivers/nvdimm/Makefile  |   1 +
> >  drivers/nvdimm/nd_virtio.c   | 129 +++
> >  drivers/nvdimm/virtio_pmem.c | 117 
> >  drivers/virtio/Kconfig   |  10 +++
> >  include/linux/virtio_pmem.h  |  60 ++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 328 insertions(+)
> >  create mode 100644 drivers/nvdimm/nd_virtio.c
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> > index 6f2a088afad6..cefe233e0b52 100644
> > --- a/drivers/nvdimm/Makefile
> > +++ b/drivers/nvdimm/Makefile
> > @@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
> >  obj-$(CONFIG_ND_BLK) += nd_blk.o
> >  obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
> >  obj-$(CONFIG_OF_PMEM) += of_pmem.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
> >  
> >  nd_pmem-y := pmem.o
> >  
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > new file mode 100644
> > index ..ed7ddcc5a62c
> > --- /dev/null
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -0,0 +1,129 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include 
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +   unsigned int len;
> > +   unsigned long flags;
> > +   struct virtio_pmem_request *req, *req_buf;
> > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +   spin_lock_irqsave(>pmem_lock, flags);
> > +   while ((req = virtqueue_get_buf(vq, )) != NULL) {
> > +   req->done = true;
> > +   wake_up(>host_acked);
> > +
> > +   if (!list_empty(>req_list)) {
> > +   req_buf = list_first_entry(>req_list,
> > +   struct virtio_pmem_request, list);
> > +   req_buf->wq_buf_avail = true;
> > +   wake_up(_buf->wq_buf);
> > +   list_del(_buf->list);
> > +   }
> > +   }
> > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +   int err, err1;
> > +   unsigned long flags;
> > +   struct scatterlist *sgs[2], sg, ret;
> > +   struct virtio_device *vdev = nd_region->provider_data;
> > +   struct virtio_pmem *vpmem = vdev->priv;
> > +   struct virtio_pmem_request *req;
> > +
> > +   might_sleep();
> > +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +   if (!req)
> > +   return -ENOMEM;
> > +
> > +   req->done = false;
> > +   strcpy(req->name, "FLUSH");
> > +   init_waitqueue_head(>host_acked);
> > +   init_waitqueue_head(>wq_buf);
> > +   INIT_LIST_HEAD(>list);
> > +   sg_init_one(, req->name, strlen(req->name));
> > +   sgs[0] = 
> > +   sg_init_one(, >ret, sizeof(req->ret));
> > +   sgs[1] = 
> > +
> > +   spin_lock_irqsave(>pmem_lock, flags);
> > +/*
> > + * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
> > + * queue does not have free descri

Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver

2019-05-10 Thread Pankaj Gupta


> >
> > Hi Dan,
> >
> > Thank you for the review. Please see my reply inline.
> >
> > >
> > > Hi Pankaj,
> > >
> > > Some minor file placement comments below.
> >
> > Sure.
> >
> > >
> > > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta  wrote:
> > > >
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > >
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > >
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > >
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  drivers/nvdimm/virtio_pmem.c | 114 +
> > > >  drivers/virtio/Kconfig   |  10 +++
> > > >  drivers/virtio/Makefile  |   1 +
> > > >  drivers/virtio/pmem.c| 118 +++
> > > >  include/linux/virtio_pmem.h  |  60 
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > > >  7 files changed, 314 insertions(+)
> > > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > > >  create mode 100644 drivers/virtio/pmem.c
> > > >  create mode 100644 include/linux/virtio_pmem.h
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c
> > > > b/drivers/nvdimm/virtio_pmem.c
> > > > new file mode 100644
> > > > index ..66b582f751a3
> > > > --- /dev/null
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -0,0 +1,114 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include 
> > > > +#include "nd.h"
> > > > +
> > > > + /* The interrupt handler */
> > > > +void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +   unsigned int len;
> > > > +   unsigned long flags;
> > > > +   struct virtio_pmem_request *req, *req_buf;
> > > > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > > +   while ((req = virtqueue_get_buf(vq, )) != NULL) {
> > > > +   req->done = true;
> > > > +   wake_up(>host_acked);
> > > > +
> > > > +   if (!list_empty(>req_list)) {
> > > > +   req_buf = list_first_entry(>req_list,
> > > > +   struct virtio_pmem_request,
> > > > list);
> > > > +   list_del(>req_list);
> > > > +   req_buf->wq_buf_avail = true;
> > > > +   wake_up(_buf->wq_buf);
> > > > +   }
> > > > +   }
> > > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(host_ack);
> > > > +
> > > > + /* The request submission function */
> > > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +   int err;
> > > > +   unsigned long flags;
> > > > +   struct scatterlist *sgs[2], sg, ret;
> > > > +   struct virtio_device *vdev = nd_region->provider_data;
> > > > +   struct virtio_pmem *vpmem = vdev->priv;
> > > > +   struct virtio_pmem_request *req;
> > > > +
> > > > +   might_sleep();
> > >

Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta


> > > >
> > > > This patch adds 'DAXDEV_SYNC' flag which is set
> > > > for nd_region doing synchronous flush. This later
> > > > is used to disable MAP_SYNC functionality for
> > > > ext4 & xfs filesystem for devices don't support
> > > > synchronous flush.
> > > >
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  drivers/dax/bus.c|  2 +-
> > > >  drivers/dax/super.c  | 13 -
> > > >  drivers/md/dm.c  |  3 ++-
> > > >  drivers/nvdimm/pmem.c|  5 -
> > > >  drivers/nvdimm/region_devs.c |  7 +++
> > > >  include/linux/dax.h  |  8 ++--
> > > >  include/linux/libnvdimm.h|  1 +
> > > >  7 files changed, 33 insertions(+), 6 deletions(-)
> > > [..]
> > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > > index 043f0761e4a0..ee007b75d9fd 100644
> > > > --- a/drivers/md/dm.c
> > > > +++ b/drivers/md/dm.c
> > > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > > > sprintf(md->disk->disk_name, "dm-%d", minor);
> > > >
> > > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > > > -   dax_dev = alloc_dax(md, md->disk->disk_name,
> > > > _dax_ops);
> > > > +   dax_dev = alloc_dax(md, md->disk->disk_name,
> > > > _dax_ops,
> > > > +
> > > > DAXDEV_F_SYNC);
> > >
> > > Apologies for not realizing this until now, but this is broken.
> > > Imaging a device-mapper configuration composed of both 'async'
> > > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> > > across all members. I would change this argument to '0' and then
> > > arrange for it to be set at dm_table_supports_dax() time after
> > > validating that all components support synchronous dax.
> >
> > o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
> > components support synchronous DAX.
> >
> > Just a question, If device mapper configuration have composed of both
> > virtio-pmem or pmem devices, we want to configure device mapper for async
> > flush?
> 
> If it's composed of both then, yes, it needs to be async flush at the
> device-mapper level. Otherwise MAP_SYNC will succeed and fail to
> trigger fsync on the host file when necessary for the virtio-pmem
> backed portion of the device-mapper device.

o.k. Agree.

Thanks you,
Pankaj
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta



> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/dax/bus.c|  2 +-
> >  drivers/dax/super.c  | 13 -
> >  drivers/md/dm.c  |  3 ++-
> >  drivers/nvdimm/pmem.c|  5 -
> >  drivers/nvdimm/region_devs.c |  7 +++
> >  include/linux/dax.h  |  8 ++--
> >  include/linux/libnvdimm.h|  1 +
> >  7 files changed, 33 insertions(+), 6 deletions(-)
> [..]
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 043f0761e4a0..ee007b75d9fd 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > sprintf(md->disk->disk_name, "dm-%d", minor);
> >
> > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > -   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
> > +   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
> > +DAXDEV_F_SYNC);
> 
> Apologies for not realizing this until now, but this is broken.
> Imaging a device-mapper configuration composed of both 'async'
> virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> across all members. I would change this argument to '0' and then
> arrange for it to be set at dm_table_supports_dax() time after
> validating that all components support synchronous dax.

o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
components support synchronous DAX.

Just a question, If device mapper configuration have composed of both 
virtio-pmem or pmem devices, we want to configure device mapper for async flush?

Thank you,
Pankaj 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   3   >