Re: [Xen-devel] [PATCH v2] xen/grant-table: correctly initialize grant table version 1

2013-01-08 Thread ANNIE LI

Thanks so much for posting this.

On 2013-1-6 19:14, Matt Wilson wrote:

Commit 85ff6acb075a484780b3d763fdf41596d8fc0970 (xen/granttable: Grant
tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from
a constant to a conditional expression. The expression depends on
grant_table_version being appropriately set. Unfortunately, at init
time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME
conditional expression checks for "grant_table_version == 1", and
therefore returns the number of grant references per frame for v2.

This causes gnttab_init() to allocate fewer pages for gnttab_list, as
a frame can old half the number of v2 entries than v1 entries. After
gnttab_resume() is called, grant_table_version is appropriately
set. nr_init_grefs will then be miscalculated and gnttab_free_count
will hold a value larger than the actual number of free gref entries.

If a guest is heavily utilizing improperly initialized v1 grant
tables, memory corruption can occur. One common manifestation is
corruption of the vmalloc list, resulting in a poisoned pointer
derefrence when accessing /proc/meminfo or /proc/vmallocinfo:

[   40.770064] BUG: unable to handle kernel paging request at 20021407
[   40.770083] IP: [] get_vmalloc_info+0x70/0x110
[   40.770102] PGD 0
[   40.770107] Oops:  [#1] SMP
[   40.770114] CPU 10

This patch introduces a static variable, grefs_per_grant_frame, to
cache the calculated value. gnttab_init() now calls
gnttab_request_version() early so that grant_table_version and
grefs_per_grant_frame can be appropriately set. A few BUG_ON()s have
been added to prevent this type of bug from reoccurring in the future.

Signed-off-by: Matt Wilson
Reviewed-and-Tested-by: Steven Noonan
Cc: Ian Campbell
Cc: Konrad Rzeszutek Wilk
Cc: Annie Li
Cc: xen-de...@lists.xen.org
Cc: linux-kernel@vger.kernel.org
Cc: sta...@vger.kernel.org # v3.3 and newer
---
Changes since v1:
* introduced a new gnttab_setup() function and moved all of the
   initialization code from gnttab_resume() there.
---
  drivers/xen/grant-table.c |   52 ++--
  1 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 043bf07..53715de 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -55,10 +55,6 @@
  /* External tools reserve first few grant table entries. */
  #define NR_RESERVED_ENTRIES 8
  #define GNTTAB_LIST_END 0x
-#define GREFS_PER_GRANT_FRAME \
-(grant_table_version == 1 ?  \
-(PAGE_SIZE / sizeof(struct grant_entry_v1)) :   \
-(PAGE_SIZE / sizeof(union grant_entry_v2)))

  static grant_ref_t **gnttab_list;
  static unsigned int nr_grant_frames;
@@ -153,6 +149,7 @@ static struct gnttab_ops *gnttab_interface;
  static grant_status_t *grstatus;

  static int grant_table_version;
+static int grefs_per_grant_frame;

  static struct gnttab_free_callback *gnttab_free_callback_list;

@@ -766,12 +763,14 @@ static int grow_gnttab_list(unsigned int more_frames)
unsigned int new_nr_grant_frames, extra_entries, i;
unsigned int nr_glist_frames, new_nr_glist_frames;

+   BUG_ON(grefs_per_grant_frame == 0);
+
new_nr_grant_frames = nr_grant_frames + more_frames;
-   extra_entries   = more_frames * GREFS_PER_GRANT_FRAME;
+   extra_entries   = more_frames * grefs_per_grant_frame;

-   nr_glist_frames = (nr_grant_frames * GREFS_PER_GRANT_FRAME + RPP - 1) / 
RPP;
+   nr_glist_frames = (nr_grant_frames * grefs_per_grant_frame + RPP - 1) / 
RPP;
new_nr_glist_frames =
-   (new_nr_grant_frames * GREFS_PER_GRANT_FRAME + RPP - 1) / RPP;
+   (new_nr_grant_frames * grefs_per_grant_frame + RPP - 1) / RPP;
for (i = nr_glist_frames; i<  new_nr_glist_frames; i++) {
gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_ATOMIC);
if (!gnttab_list[i])
@@ -779,12 +778,12 @@ static int grow_gnttab_list(unsigned int more_frames)
}


-   for (i = GREFS_PER_GRANT_FRAME * nr_grant_frames;
-i<  GREFS_PER_GRANT_FRAME * new_nr_grant_frames - 1; i++)
+   for (i = grefs_per_grant_frame * nr_grant_frames;
+i<  grefs_per_grant_frame * new_nr_grant_frames - 1; i++)
gnttab_entry(i) = i + 1;

gnttab_entry(i) = gnttab_free_head;
-   gnttab_free_head = GREFS_PER_GRANT_FRAME * nr_grant_frames;
+   gnttab_free_head = grefs_per_grant_frame * nr_grant_frames;
gnttab_free_count += extra_entries;

nr_grant_frames = new_nr_grant_frames;
@@ -904,7 +903,8 @@ EXPORT_SYMBOL_GPL(gnttab_unmap_refs);

  static unsigned nr_status_frames(unsigned nr_grant_frames)
  {
-   return (nr_grant_frames * GREFS_PER_GRANT_FRAME + SPP - 1) / SPP;
+   BUG_ON(grefs_per_grant_frame == 0);
+   return (nr_grant_frames * grefs_per_grant_frame + SPP - 1) / SPP;
  }

  static int gnttab_map_frames_v

Re: [Xen-devel] [PATCH v2] xen/grant-table: correctly initialize grant table version 1

2013-01-09 Thread annie li



On 2013-1-9 20:02, Ian Campbell wrote:

On Wed, 2013-01-09 at 02:40 +, ANNIE LI wrote:
  

@@ -1080,18 +1081,18 @@ static void gnttab_request_version(void)
panic("we need grant tables version 2, but only version 1 is 
available");
} else {
grant_table_version = 1;
+   grefs_per_grant_frame = PAGE_SIZE / sizeof(struct 
grant_entry_v1);
gnttab_interface =&gnttab_v1_ops;
}
-   printk(KERN_INFO "Grant tables using version %d layout.\n",
-   grant_table_version);
  }
  
  
Is it better to keep printk here? In your last patch, you removed it 
because gnttab_request_version and gnttab_resume are all called in 
gnttab_init. and gnttab_resume also contains calling of 
gnttab_request_version. But in this patch, gnttab_setup is used, and 
does not have this issue now.



Yes, I think we want to print this at both start of day and resume?
  

Right.

Either by adding a print to gnttab_resume()
Only adding a print into gnttab_resume() would miss this print at start 
of day. In gnttab_init, gnttab_request_version and gnttab_setup are 
called, not gnttab_resume.

 or by keeping the existing
one here in preference to moving it to gnttab_setup(). I'd prefer the
latter to avoid the duplication, unless I'm mistaken and request_version
is called in more than those two locations.
  
Yes, I'd like the latter. Request_version is only called in those two 
locations.


Thanks
Annie

Ian.


___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel
  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2] xen/grant-table: correctly initialize grant table version 1

2013-01-10 Thread ANNIE LI



On 2013-1-10 17:16, Matt Wilson wrote:

On Wed, Jan 09, 2013 at 12:02:09PM +, Ian Campbell wrote:

On Wed, 2013-01-09 at 02:40 +, ANNIE LI wrote:

@@ -1080,18 +1081,18 @@ static void gnttab_request_version(void)
panic("we need grant tables version 2, but only version 1 is 
available");
} else {
grant_table_version = 1;
+   grefs_per_grant_frame = PAGE_SIZE / sizeof(struct 
grant_entry_v1);
gnttab_interface =&gnttab_v1_ops;
}
-   printk(KERN_INFO "Grant tables using version %d layout.\n",
-   grant_table_version);
   }


Is it better to keep printk here? In your last patch, you removed it
because gnttab_request_version and gnttab_resume are all called in
gnttab_init. and gnttab_resume also contains calling of
gnttab_request_version. But in this patch, gnttab_setup is used, and
does not have this issue now.

Yes, I think we want to print this at both start of day and resume?

Either by adding a print to gnttab_resume() or by keeping the existing
one here in preference to moving it to gnttab_setup(). I'd prefer the
latter to avoid the duplication, unless I'm mistaken and request_version
is called in more than those two locations.

In this version the printk() is in gnttab_setup(), which is called
both at start of day (gnttab_init()) and resume (gnttab_resume()). I
can move it back to gnttab_request_version() if you'd like, but this
patch should be behaving as expected already.


I was thinking to change less from original source code, this patch 
moves printk without specific purpose and shows same behavior as 
original code. It is OK if you'd like to keep it.


Thanks
Annie


Matt

___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-17 Thread Annie Li
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

Signed-off-by: Annie Li 
Reviewed-by: Herbert van den Bergh 
Reviewed-by: Bhavesh Davda 
Reviewed-by: Adnan Misherfi 
---
 drivers/block/xen-blkback/xenbus.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 792da68..2adb859 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
struct pending_req *req, *n;
unsigned int j, r;
+   bool busy = false;
 
for (r = 0; r < blkif->nr_rings; r++) {
struct xen_blkif_ring *ring = &blkif->rings[r];
@@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 * don't have any discard_io or other_io requests. So, checking
 * for inflight IO is enough.
 */
-   if (atomic_read(&ring->inflight) > 0)
-   return -EBUSY;
+   if (atomic_read(&ring->inflight) > 0) {
+   busy = true;
+   continue;
+   }
 
if (ring->irq) {
unbind_from_irqhandler(ring->irq, ring);
@@ -300,6 +303,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
ring->active = false;
}
+   if (busy)
+   return -EBUSY;
+
blkif->nr_ring_pages = 0;
/*
 * blkif->rings was allocated in connect_ring, so we should free it in
-- 
1.9.3



Re: [Xen-devel] [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy

2013-12-15 Thread annie li


On 2013/12/13 7:48, Zoltan Kiss wrote:

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)

Sounds good.
Is the TX throughput gotten between one vm and one bare metal box? or 
between multiple vms and bare metal? Do you have any test results with 
netperf?


Thanks
Annie

Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
   us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss 


___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-24 Thread Annie Li
In xen_blkif_disconnect, before checking inflight I/O, following code
stops the blkback thread,
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
}
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and above code would not be called to stop thread of remaining
queue and processs them. When removing vbd device with lots of disk I/O
load, some queues with inflight I/O still have blkback thread running even
though the corresponding vbd device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0.
This patch allows thread of every queue has the chance to get stopped.
Otherwise, only thread of queue previous to(including) first busy one get
stopped, blkthread of remaining queue will still run.  So stop all threads
properly and return -EBUSY if any queue has inflight I/O.

Signed-off-by: Annie Li 
Reviewed-by: Herbert van den Bergh 
Reviewed-by: Bhavesh Davda 
Reviewed-by: Adnan Misherfi 
Acked-by: Roger Pau Monné 
---
  v2: enhance patch description and add Acked-by
---
 drivers/block/xen-blkback/xenbus.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 792da68..2adb859 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
struct pending_req *req, *n;
unsigned int j, r;
+   bool busy = false;
 
for (r = 0; r < blkif->nr_rings; r++) {
struct xen_blkif_ring *ring = &blkif->rings[r];
@@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 * don't have any discard_io or other_io requests. So, checking
 * for inflight IO is enough.
 */
-   if (atomic_read(&ring->inflight) > 0)
-   return -EBUSY;
+   if (atomic_read(&ring->inflight) > 0) {
+   busy = true;
+   continue;
+   }
 
if (ring->irq) {
unbind_from_irqhandler(ring->irq, ring);
@@ -300,6 +303,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
ring->active = false;
}
+   if (busy)
+   return -EBUSY;
+
blkif->nr_ring_pages = 0;
/*
 * blkif->rings was allocated in connect_ring, so we should free it in
-- 
1.9.3



Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-24 Thread annie li


On 8/23/2017 5:20 PM, Konrad Rzeszutek Wilk wrote:

..snip..

Acked-by: Roger Pau Monné 

Forgot to add, this needs to be backported to stable branches, so:

Annie, could you resend the patch with the tags and an update
to the description to me please?

Done

Thanks
Annie

Cc: sta...@vger.kernel.org

Roger.

___
Xen-devel mailing list
xen-de...@lists.xen.org
https://lists.xen.org/xen-devel




Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-18 Thread annie li


On 8/18/2017 5:14 AM, Roger Pau Monné wrote:

On Thu, Aug 17, 2017 at 06:43:46PM -0400, Annie Li wrote:

If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

Signed-off-by: Annie Li 
Reviewed-by: Herbert van den Bergh 
Reviewed-by: Bhavesh Davda 
Reviewed-by: Adnan Misherfi 
---
  drivers/block/xen-blkback/xenbus.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 792da68..2adb859 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
  {
struct pending_req *req, *n;
unsigned int j, r;
+   bool busy = false;
  
  	for (r = 0; r < blkif->nr_rings; r++) {

struct xen_blkif_ring *ring = &blkif->rings[r];
@@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 * don't have any discard_io or other_io requests. So, checking
 * for inflight IO is enough.
 */
-   if (atomic_read(&ring->inflight) > 0)
-   return -EBUSY;
+   if (atomic_read(&ring->inflight) > 0) {
+   busy = true;
+   continue;
+   }

I guess I'm missing something, but I don't see how this is solving the
problem described in the description.

If the problem is that xen_blkif_disconnect returns without cleaning
all the queues, this patch keeps the current behavior, just that it
will try to remove more queues before returning, as opposed to
returning when finding the first busy queue.

Before checking inflight, following code stops the blkback thread,
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
}
This patch allows thread of every queue has the chance to get stopped. 
Otherwise, only thread of queue before(including) first busy one get 
stopped, threads of remaining queue will still run, and these blkthread 
and corresponding loop device will linger forever even after guest is 
destroyed.


Thanks
Annie