Re: [patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread Jeremy Fitzhardinge
David Chinner wrote:
> Nick, Jeremy, (others?) any objections to this approach to solve
> the problem?

Seems sounds in principle.   I think Nick's shootdown-stray-mappings mm
API call is a better long-term answer, but this will do for now.  I'll
test it out today.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread David Chinner
On Tue, Oct 23, 2007 at 11:30:35AM +0200, Andi Kleen wrote:
> On Tue, Oct 23, 2007 at 05:04:14PM +1000, David Chinner wrote:
> > > You mean like vmap() could record the pages passed to it in the 
> > > area->pages
> > > array, and we walk and release than in __vunmap() like it already does
> > > for vfree()?
> > > 
> > > If we did this, it would probably be best to pass a page release function
> > > into the vmap or vunmap call - we'd need page_cache_release() called on
> > > the page rather than __free_page()
> > > 
> > > The solution belongs behind the vmap/vunmap interface, not in XFS
> > 
> > Lightly tested(*) patch that does this with lazy unmapping
> > below for comment.
> 
> Thanks
> 
> > 
> > (*) a) kernel boots, b) made an XFS filesystem with 64k directory
> > blocks, created ~100,000 files in a directory to get a wide btree
> > (~1700 blocks, still only a single level) and run repeated finds
> > across it dropping caches in between.  Each traversal maps and
> > unmaps every btree block.
> 
> Hmm, the __free_page -> page_cache_release() change in vfree() would
> have been simpler wouldn't it? 

Yes, it would, but I tried to leave vmalloc doing the same thing as
it was before. I think that it would be safe simply to call put_page()
directly in the __vunmap() code and drop all the release function
passing, but I don't know enough about that code to say for certain.
I'll spin up another patch tomorrow that does this and see how it goes.

> But if it works it is fine.

Seems to - it's passed XFSQA with 64k directories and a bunch of
dirstress workouts as well.

Nick, Jeremy, (others?) any objections to this approach to solve
the problem?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread Andi Kleen
On Tue, Oct 23, 2007 at 05:04:14PM +1000, David Chinner wrote:
> On Tue, Oct 23, 2007 at 10:36:41AM +1000, David Chinner wrote:
> > On Tue, Oct 23, 2007 at 01:35:14AM +0200, Andi Kleen wrote:
> > > On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
> > > > Could vmap()/vunmap() take references to the pages that are mapped? That
> > > > way delaying the unmap would also delay the freeing of the pages and 
> > > > hence
> > > > we'd have no problems with the pages being reused before the mapping is
> > > > torn down. That'd work for Xen even with XFS's lazy unmapping scheme, 
> > > > and
> > > > would allow Nick's more advanced methods to work as well
> > > 
> > > You could always just keep around an array of the pages and then drop the
> > > reference count after unmap. Or walk the vmalloc mapping and generate such
> > > an array before freeing, then unmap and then drop the reference counts.
> > 
> > You mean like vmap() could record the pages passed to it in the area->pages
> > array, and we walk and release than in __vunmap() like it already does
> > for vfree()?
> > 
> > If we did this, it would probably be best to pass a page release function
> > into the vmap or vunmap call - we'd need page_cache_release() called on
> > the page rather than __free_page()
> > 
> > The solution belongs behind the vmap/vunmap interface, not in XFS
> 
> Lightly tested(*) patch that does this with lazy unmapping
> below for comment.

Thanks

> 
> (*) a) kernel boots, b) made an XFS filesystem with 64k directory
> blocks, created ~100,000 files in a directory to get a wide btree
> (~1700 blocks, still only a single level) and run repeated finds
> across it dropping caches in between.  Each traversal maps and
> unmaps every btree block.

Hmm, the __free_page -> page_cache_release() change in vfree() would
have been simpler wouldn't it? 

But if it works it is fine.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread Andi Kleen
On Tue, Oct 23, 2007 at 10:36:41AM +1000, David Chinner wrote:
> > That doesn't mean it is correct.
> 
> Right, but it also points to the fact that it's not causing problems
> from 99.999% of ppl out there.

So you're waiting for someone to take months to debug this again? 

> You mean like vmap() could record the pages passed to it in the area->pages
> array, 

The page tables contain pointers to the pages anyways. vunmap() has to walk 
them.  It would not be very difficult to store them in an array during
the walk.

> 
> If we did this, it would probably be best to pass a page release function
> into the vmap or vunmap call - we'd need page_cache_release() called on
> the page rather than __free_page()

Possible. Normally vmalloc pages should not be in the LRU except yours
so it would be probably fine to just change it.

> The solution belongs behind the vmap/vunmap interface, not in XFS

You could also just keep the array from map time around yourself. 
Then you could do it yourself.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread David Chinner
On Tue, Oct 23, 2007 at 10:36:41AM +1000, David Chinner wrote:
> On Tue, Oct 23, 2007 at 01:35:14AM +0200, Andi Kleen wrote:
> > On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
> > > Could vmap()/vunmap() take references to the pages that are mapped? That
> > > way delaying the unmap would also delay the freeing of the pages and hence
> > > we'd have no problems with the pages being reused before the mapping is
> > > torn down. That'd work for Xen even with XFS's lazy unmapping scheme, and
> > > would allow Nick's more advanced methods to work as well
> > 
> > You could always just keep around an array of the pages and then drop the
> > reference count after unmap. Or walk the vmalloc mapping and generate such
> > an array before freeing, then unmap and then drop the reference counts.
> 
> You mean like vmap() could record the pages passed to it in the area->pages
> array, and we walk and release than in __vunmap() like it already does
> for vfree()?
> 
> If we did this, it would probably be best to pass a page release function
> into the vmap or vunmap call - we'd need page_cache_release() called on
> the page rather than __free_page()
> 
> The solution belongs behind the vmap/vunmap interface, not in XFS

Lightly tested(*) patch that does this with lazy unmapping
below for comment.

(*) a) kernel boots, b) made an XFS filesystem with 64k directory
blocks, created ~100,000 files in a directory to get a wide btree
(~1700 blocks, still only a single level) and run repeated finds
across it dropping caches in between.  Each traversal maps and
unmaps every btree block.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/linux-2.6/xfs_buf.c |   21 +
 include/linux/vmalloc.h|4 +
 mm/vmalloc.c   |  160 +++--
 3 files changed, 133 insertions(+), 52 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-10-18 
17:11:06.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c2007-10-23 14:48:32.131088616 
+1000
@@ -187,19 +187,6 @@ free_address(
 {
a_list_t*aentry;
 
-#ifdef CONFIG_XEN
-   /*
-* Xen needs to be able to make sure it can get an exclusive
-* RO mapping of pages it wants to turn into a pagetable.  If
-* a newly allocated page is also still being vmap()ed by xfs,
-* it will cause pagetable construction to fail.  This is a
-* quick workaround to always eagerly unmap pages so that Xen
-* is happy.
-*/
-   vunmap(addr);
-   return;
-#endif
-
aentry = kmalloc(sizeof(a_list_t), GFP_NOWAIT);
if (likely(aentry)) {
spin_lock(_lock);
@@ -209,7 +196,7 @@ free_address(
as_list_len++;
spin_unlock(_lock);
} else {
-   vunmap(addr);
+   vunmap_pages(addr, put_page);
}
 }
 
@@ -228,7 +215,7 @@ purge_addresses(void)
spin_unlock(_lock);
 
while ((old = aentry) != NULL) {
-   vunmap(aentry->vm_addr);
+   vunmap_pages(aentry->vm_addr, put_page);
aentry = aentry->next;
kfree(old);
}
@@ -456,8 +443,8 @@ _xfs_buf_map_pages(
} else if (flags & XBF_MAPPED) {
if (as_list_len > 64)
purge_addresses();
-   bp->b_addr = vmap(bp->b_pages, bp->b_page_count,
-   VM_MAP, PAGE_KERNEL);
+   bp->b_addr = vmap_pages(bp->b_pages, bp->b_page_count,
+   VM_MAP, PAGE_KERNEL, xb_to_gfp(flags));
if (unlikely(bp->b_addr == NULL))
return -ENOMEM;
bp->b_addr += bp->b_offset;
Index: 2.6.x-xfs-new/include/linux/vmalloc.h
===
--- 2.6.x-xfs-new.orig/include/linux/vmalloc.h  2007-09-12 15:41:41.0 
+1000
+++ 2.6.x-xfs-new/include/linux/vmalloc.h   2007-10-23 14:36:56.264860921 
+1000
@@ -51,6 +51,10 @@ extern void *vmap(struct page **pages, u
unsigned long flags, pgprot_t prot);
 extern void vunmap(void *addr);
 
+extern void *vmap_pages(struct page **pages, unsigned int count,
+   unsigned long flags, pgprot_t prot, gfp_t gfp_mask);
+extern void vunmap_pages(void *addr, void (*release)(struct page *page));
+
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
 void vmalloc_sync_all(void);
Index: 2.6.x-xfs-new/mm/vmalloc.c
===
--- 2.6.x-xfs-new.orig/mm/vmalloc.c 2007-09-12 15:41:48.0 +1000
+++ 2.6.x-xfs-new/mm/vmalloc.c  2007-10-23 

[patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread David Chinner
On Tue, Oct 23, 2007 at 10:36:41AM +1000, David Chinner wrote:
 On Tue, Oct 23, 2007 at 01:35:14AM +0200, Andi Kleen wrote:
  On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
   Could vmap()/vunmap() take references to the pages that are mapped? That
   way delaying the unmap would also delay the freeing of the pages and hence
   we'd have no problems with the pages being reused before the mapping is
   torn down. That'd work for Xen even with XFS's lazy unmapping scheme, and
   would allow Nick's more advanced methods to work as well
  
  You could always just keep around an array of the pages and then drop the
  reference count after unmap. Or walk the vmalloc mapping and generate such
  an array before freeing, then unmap and then drop the reference counts.
 
 You mean like vmap() could record the pages passed to it in the area-pages
 array, and we walk and release than in __vunmap() like it already does
 for vfree()?
 
 If we did this, it would probably be best to pass a page release function
 into the vmap or vunmap call - we'd need page_cache_release() called on
 the page rather than __free_page()
 
 The solution belongs behind the vmap/vunmap interface, not in XFS

Lightly tested(*) patch that does this with lazy unmapping
below for comment.

(*) a) kernel boots, b) made an XFS filesystem with 64k directory
blocks, created ~100,000 files in a directory to get a wide btree
(~1700 blocks, still only a single level) and run repeated finds
across it dropping caches in between.  Each traversal maps and
unmaps every btree block.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/linux-2.6/xfs_buf.c |   21 +
 include/linux/vmalloc.h|4 +
 mm/vmalloc.c   |  160 +++--
 3 files changed, 133 insertions(+), 52 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-10-18 
17:11:06.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c2007-10-23 14:48:32.131088616 
+1000
@@ -187,19 +187,6 @@ free_address(
 {
a_list_t*aentry;
 
-#ifdef CONFIG_XEN
-   /*
-* Xen needs to be able to make sure it can get an exclusive
-* RO mapping of pages it wants to turn into a pagetable.  If
-* a newly allocated page is also still being vmap()ed by xfs,
-* it will cause pagetable construction to fail.  This is a
-* quick workaround to always eagerly unmap pages so that Xen
-* is happy.
-*/
-   vunmap(addr);
-   return;
-#endif
-
aentry = kmalloc(sizeof(a_list_t), GFP_NOWAIT);
if (likely(aentry)) {
spin_lock(as_lock);
@@ -209,7 +196,7 @@ free_address(
as_list_len++;
spin_unlock(as_lock);
} else {
-   vunmap(addr);
+   vunmap_pages(addr, put_page);
}
 }
 
@@ -228,7 +215,7 @@ purge_addresses(void)
spin_unlock(as_lock);
 
while ((old = aentry) != NULL) {
-   vunmap(aentry-vm_addr);
+   vunmap_pages(aentry-vm_addr, put_page);
aentry = aentry-next;
kfree(old);
}
@@ -456,8 +443,8 @@ _xfs_buf_map_pages(
} else if (flags  XBF_MAPPED) {
if (as_list_len  64)
purge_addresses();
-   bp-b_addr = vmap(bp-b_pages, bp-b_page_count,
-   VM_MAP, PAGE_KERNEL);
+   bp-b_addr = vmap_pages(bp-b_pages, bp-b_page_count,
+   VM_MAP, PAGE_KERNEL, xb_to_gfp(flags));
if (unlikely(bp-b_addr == NULL))
return -ENOMEM;
bp-b_addr += bp-b_offset;
Index: 2.6.x-xfs-new/include/linux/vmalloc.h
===
--- 2.6.x-xfs-new.orig/include/linux/vmalloc.h  2007-09-12 15:41:41.0 
+1000
+++ 2.6.x-xfs-new/include/linux/vmalloc.h   2007-10-23 14:36:56.264860921 
+1000
@@ -51,6 +51,10 @@ extern void *vmap(struct page **pages, u
unsigned long flags, pgprot_t prot);
 extern void vunmap(void *addr);
 
+extern void *vmap_pages(struct page **pages, unsigned int count,
+   unsigned long flags, pgprot_t prot, gfp_t gfp_mask);
+extern void vunmap_pages(void *addr, void (*release)(struct page *page));
+
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
 void vmalloc_sync_all(void);
Index: 2.6.x-xfs-new/mm/vmalloc.c
===
--- 2.6.x-xfs-new.orig/mm/vmalloc.c 2007-09-12 15:41:48.0 +1000
+++ 2.6.x-xfs-new/mm/vmalloc.c  2007-10-23 16:29:40.130384957 +1000
@@ -318,7 +318,56 @@ 

Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread Andi Kleen
On Tue, Oct 23, 2007 at 10:36:41AM +1000, David Chinner wrote:
  That doesn't mean it is correct.
 
 Right, but it also points to the fact that it's not causing problems
 from 99.999% of ppl out there.

So you're waiting for someone to take months to debug this again? 

 You mean like vmap() could record the pages passed to it in the area-pages
 array, 

The page tables contain pointers to the pages anyways. vunmap() has to walk 
them.  It would not be very difficult to store them in an array during
the walk.

 
 If we did this, it would probably be best to pass a page release function
 into the vmap or vunmap call - we'd need page_cache_release() called on
 the page rather than __free_page()

Possible. Normally vmalloc pages should not be in the LRU except yours
so it would be probably fine to just change it.

 The solution belongs behind the vmap/vunmap interface, not in XFS

You could also just keep the array from map time around yourself. 
Then you could do it yourself.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread Andi Kleen
On Tue, Oct 23, 2007 at 05:04:14PM +1000, David Chinner wrote:
 On Tue, Oct 23, 2007 at 10:36:41AM +1000, David Chinner wrote:
  On Tue, Oct 23, 2007 at 01:35:14AM +0200, Andi Kleen wrote:
   On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
Could vmap()/vunmap() take references to the pages that are mapped? That
way delaying the unmap would also delay the freeing of the pages and 
hence
we'd have no problems with the pages being reused before the mapping is
torn down. That'd work for Xen even with XFS's lazy unmapping scheme, 
and
would allow Nick's more advanced methods to work as well
   
   You could always just keep around an array of the pages and then drop the
   reference count after unmap. Or walk the vmalloc mapping and generate such
   an array before freeing, then unmap and then drop the reference counts.
  
  You mean like vmap() could record the pages passed to it in the area-pages
  array, and we walk and release than in __vunmap() like it already does
  for vfree()?
  
  If we did this, it would probably be best to pass a page release function
  into the vmap or vunmap call - we'd need page_cache_release() called on
  the page rather than __free_page()
  
  The solution belongs behind the vmap/vunmap interface, not in XFS
 
 Lightly tested(*) patch that does this with lazy unmapping
 below for comment.

Thanks

 
 (*) a) kernel boots, b) made an XFS filesystem with 64k directory
 blocks, created ~100,000 files in a directory to get a wide btree
 (~1700 blocks, still only a single level) and run repeated finds
 across it dropping caches in between.  Each traversal maps and
 unmaps every btree block.

Hmm, the __free_page - page_cache_release() change in vfree() would
have been simpler wouldn't it? 

But if it works it is fine.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread David Chinner
On Tue, Oct 23, 2007 at 11:30:35AM +0200, Andi Kleen wrote:
 On Tue, Oct 23, 2007 at 05:04:14PM +1000, David Chinner wrote:
   You mean like vmap() could record the pages passed to it in the 
   area-pages
   array, and we walk and release than in __vunmap() like it already does
   for vfree()?
   
   If we did this, it would probably be best to pass a page release function
   into the vmap or vunmap call - we'd need page_cache_release() called on
   the page rather than __free_page()
   
   The solution belongs behind the vmap/vunmap interface, not in XFS
  
  Lightly tested(*) patch that does this with lazy unmapping
  below for comment.
 
 Thanks
 
  
  (*) a) kernel boots, b) made an XFS filesystem with 64k directory
  blocks, created ~100,000 files in a directory to get a wide btree
  (~1700 blocks, still only a single level) and run repeated finds
  across it dropping caches in between.  Each traversal maps and
  unmaps every btree block.
 
 Hmm, the __free_page - page_cache_release() change in vfree() would
 have been simpler wouldn't it? 

Yes, it would, but I tried to leave vmalloc doing the same thing as
it was before. I think that it would be safe simply to call put_page()
directly in the __vunmap() code and drop all the release function
passing, but I don't know enough about that code to say for certain.
I'll spin up another patch tomorrow that does this and see how it goes.

 But if it works it is fine.

Seems to - it's passed XFSQA with 64k directories and a bunch of
dirstress workouts as well.

Nick, Jeremy, (others?) any objections to this approach to solve
the problem?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-23 Thread Jeremy Fitzhardinge
David Chinner wrote:
 Nick, Jeremy, (others?) any objections to this approach to solve
 the problem?

Seems sounds in principle.   I think Nick's shootdown-stray-mappings mm
API call is a better long-term answer, but this will do for now.  I'll
test it out today.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread David Chinner
On Tue, Oct 23, 2007 at 01:35:14AM +0200, Andi Kleen wrote:
> On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
> > On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
> > > It's hidden now so it causes any obvious failures any more. Just subtle
> > > ones which is much worse.
> > 
> > There's no evidence of any problems ever being caused by this code.
.
> Also the corruption will not be on the XFS side, but on the uncached mapping
> so typically some data sent to some device gets a few corrupted cache line
> sized blocks.  Depending on the device this might also not be fatal -- e.g.
> if it is texture data some corrupted pixels occasionally are not that
> visible. But there can be still cases where it can cause real failures when
> it hits something critical (the failures were it was tracked down was
> usually it hitting some command ring and the hardware erroring out) 

There seems to be little evidence of that occurring around the place.

> > It's been there for years.
> 
> That doesn't mean it is correct.

Right, but it also points to the fact that it's not causing problems
from 99.999% of ppl out there.

> Basically you're saying: I don't care if I corrupt device driver data.
> That's not a very nice thing to say.

No, I did not say that. I said that there's no evidence that points to
this causing problems anywhere. 

> > > But why not just disable it?  It's not critical functionality, just a
> > > optimization that unfortunately turned out to be incorrect.
> > 
> > It is critical functionality for larger machines because vmap()/vunmap()
> > really does suck in terms of scalability.
> 
> Critical perhaps, but also broken.
> 
> And if it's that big a problem would it be really that difficult to change
> only the time critical paths using it to not?  I assume it's only the
> directory code where it is a problem?  That would be likely even faster in
> general.

Difficult - yes. All the btree code in XFS would have to be rewritten
to remove the assumption that the buffer structures are contiguous in
memory. Any bug we introduce in doing this will result in on disk corruption,
which is far worse than occasionally crashing a machine with a stray
mapping.

> > > It could be made correct short term by not freeing the pages until
> > > vunmap for example.
> > 
> > Could vmap()/vunmap() take references to the pages that are mapped? That
> > way delaying the unmap would also delay the freeing of the pages and hence
> > we'd have no problems with the pages being reused before the mapping is
> > torn down. That'd work for Xen even with XFS's lazy unmapping scheme, and
> > would allow Nick's more advanced methods to work as well
> 
> You could always just keep around an array of the pages and then drop the
> reference count after unmap. Or walk the vmalloc mapping and generate such
> an array before freeing, then unmap and then drop the reference counts.

You mean like vmap() could record the pages passed to it in the area->pages
array, and we walk and release than in __vunmap() like it already does
for vfree()?

If we did this, it would probably be best to pass a page release function
into the vmap or vunmap call - we'd need page_cache_release() called on
the page rather than __free_page()

The solution belongs behind the vmap/vunmap interface, not in XFS

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Zachary Amsden
On Tue, 2007-10-23 at 01:35 +0200, Andi Kleen wrote:
> On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
> > On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
> > > On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
> > > > Andi Kleen wrote:
> > > > > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> > > > >   
> > > > >> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> > > > >> stray mappings, which upsets Xen.
> > > > >> 
> > > > >
> > > > > Again it not just upsets Xen, keeping mappings to freed pages is 
> > > > > wrong generally 
> > > > > and violates the x86 (and likely others like PPC) architecture 
> > > > > because it can 
> > > > > cause illegal caching attribute aliases.

It is a serious offense to leave stray mappings for memory which can get
re-mapped to I/O devices... especially with PCI and other device
hotplug.  I have to back up Andi on this one unconditionally.

On architectures where you have multibyte, non-wordsize updates to
hardware page tables, you even have races here when setting, updating
and clearing PTEs that must be done in a sequence where no aliasing of
mappings to partially written PTEs can result in I/O memory getting
mapped in a cacheable state.  The window here is only one instruction,
and yes, it is possible for a window this small to create a problem.  A
large (or permanently lazy) window is extremely frightening.

These things do cause bugs.  The bugs take a very long time to show up
and are very difficult to track down, since they can basically cause
random behavior, such as hanging the machine or losing orbit and
crashing into the moon.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
> On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
> > On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
> > > Andi Kleen wrote:
> > > > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> > > >   
> > > >> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> > > >> stray mappings, which upsets Xen.
> > > >> 
> > > >
> > > > Again it not just upsets Xen, keeping mappings to freed pages is wrong 
> > > > generally 
> > > > and violates the x86 (and likely others like PPC) architecture because 
> > > > it can 
> > > > cause illegal caching attribute aliases.
> > > >
> > > > The patch that went into the tree was really not correct -- this
> > > > bogus optimization should have been unconditionally removed
> > > > or if you really wanted an ifdef made dependent on !CONFIG_XEN &&
> > > > !CONFIG_AGP (and likely && !CONFIG_DRM && !CONFIG_anything else that
> > > > uses uncached mappings in memory). 
> > > >
> > > > You just worked around the obvious failure and leave the non obvious
> > > > rare corruptions in, which isn't a good strategy.
> > > 
> > > Well, at least it becomes a known issue and/or placeholder for when Nick
> > 
> > It's hidden now so it causes any obvious failures any more. Just
> > subtle ones which is much worse.
> 
> There's no evidence of any problems ever being caused by this code.

We had a fairly nasty bug a couple of years ago with such
a mapping that took months to track down. Luckily we had 
help from a hardware debug lab, without that it would have
been very near impossible. But just stabilizing the problem was
a nightmare.

You will only see problems if the memory you're still mapping
will be allocated by someone who turns it into an uncached
mapping. And even then it doesn't happen always because
the CPU does not always do the necessary prefetches.

Also the corruption will not be on the XFS side,
but on the uncached mapping so typically some data sent
to some device gets a few corrupted cache line sized blocks.
Depending on the device this might also not be fatal -- e.g.
if it is texture data some corrupted pixels occasionally are 
not that visible. But there can be still cases where
it can cause real failures when it hits something critical
(the failures were it was tracked down was usually it hitting
some command ring and the hardware erroring out) 

Also every time I broke change_page_attr() flushing (which 
handles exactly this problem and is tricky so it got broken
a few times) I eventually got complaints/reports from some users who ran 
into such data inconsistencies. Given it was mostly from closed
3d driver users who seem to be most aggressive in using 
uncached mappings. But with more systems running more aggressive
3d drivers it'll likely get worse.

> It's been there for years.

That doesn't mean it is correct.

Basically you're saying: I don't care if I corrupt device driver
data. That's not a very nice thing to say.

> > But why not just disable it?  It's not critical functionality,
> > just a optimization that unfortunately turned out to be incorrect.
> 
> It is critical functionality for larger machines because vmap()/vunmap()
> really does suck in terms of scalability.

Critical perhaps, but also broken.

And if it's that big a problem would it be really that difficult
to change only the time critical paths using it to not?  I assume
it's only the directory code where it is a problem?  That would
be likely even faster in general.

> > It could be made correct short term by not freeing the pages until
> > vunmap for example.
> 
> Could vmap()/vunmap() take references to the pages that are
> mapped? That way delaying the unmap would also delay the freeing
> of the pages and hence we'd have no problems with the pages
> being reused before the mapping is torn down. That'd work for
> Xen even with XFS's lazy unmapping scheme, and would allow
> Nick's more advanced methods to work as well

You could always just keep around an array of the pages and 
then drop the reference count after unmap. Or walk the vmalloc mapping
and generate such an array before freeing, then unmap and then
drop the reference counts.

Or the other alternative would be change the time critical code
that uses it to not.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread David Chinner
On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
> On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
> > Andi Kleen wrote:
> > > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> > >   
> > >> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> > >> stray mappings, which upsets Xen.
> > >> 
> > >
> > > Again it not just upsets Xen, keeping mappings to freed pages is wrong 
> > > generally 
> > > and violates the x86 (and likely others like PPC) architecture because it 
> > > can 
> > > cause illegal caching attribute aliases.
> > >
> > > The patch that went into the tree was really not correct -- this
> > > bogus optimization should have been unconditionally removed
> > > or if you really wanted an ifdef made dependent on !CONFIG_XEN &&
> > > !CONFIG_AGP (and likely && !CONFIG_DRM && !CONFIG_anything else that
> > > uses uncached mappings in memory). 
> > >
> > > You just worked around the obvious failure and leave the non obvious
> > > rare corruptions in, which isn't a good strategy.
> > 
> > Well, at least it becomes a known issue and/or placeholder for when Nick
> 
> It's hidden now so it causes any obvious failures any more. Just
> subtle ones which is much worse.

There's no evidence of any problems ever being caused by this code.
It's been there for years.

> But why not just disable it?  It's not critical functionality,
> just a optimization that unfortunately turned out to be incorrect.

It is critical functionality for larger machines because vmap()/vunmap()
really does suck in terms of scalability.

> It could be made correct short term by not freeing the pages until
> vunmap for example.

Could vmap()/vunmap() take references to the pages that are
mapped? That way delaying the unmap would also delay the freeing
of the pages and hence we'd have no problems with the pages
being reused before the mapping is torn down. That'd work for
Xen even with XFS's lazy unmapping scheme, and would allow
Nick's more advanced methods to work as well

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> It's hidden now so it causes any obvious failures any more. Just
> subtle ones which is much worse.
>   

I think anything detected by Xen is still classed as "obscure" ;)

> But why not just disable it?  It's not critical functionality,
> just a optimization that unfortunately turned out to be incorrect.
>
> It could be made correct short term by not freeing the pages until
> vunmap for example.
>   

I think it only ends up holding 64 pages (or is it 64 mappings?), so its
not a horrible use of memory.  Particularly since it responds to memory
pressure callbacks.

>> does his grand unified vmap manager.  I guess a clean workaround would
>> be to add a CONFIG_XFS_LAZY_UNMAP, and do it at the Kconfig level... 
>> I'll cook up a patch.
>> 
>
> This option could only be safely set in architectures that don't
> care about caching attribute aliases or never remap kernel pages
> uncached. That's not x86, powerpc, ia64 at a minimum.
>
> I think the only good short term fix is to turn your ifdef CONFIG_XEN
> into a #if 0
>   

#if 1, but yes, I see your point.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> >   
> >> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> >> stray mappings, which upsets Xen.
> >> 
> >
> > Again it not just upsets Xen, keeping mappings to freed pages is wrong 
> > generally 
> > and violates the x86 (and likely others like PPC) architecture because it 
> > can 
> > cause illegal caching attribute aliases.
> >
> > The patch that went into the tree was really not correct -- this
> > bogus optimization should have been unconditionally removed
> > or if you really wanted an ifdef made dependent on !CONFIG_XEN &&
> > !CONFIG_AGP (and likely && !CONFIG_DRM && !CONFIG_anything else that
> > uses uncached mappings in memory). 
> >
> > You just worked around the obvious failure and leave the non obvious
> > rare corruptions in, which isn't a good strategy.
> 
> Well, at least it becomes a known issue and/or placeholder for when Nick

It's hidden now so it causes any obvious failures any more. Just
subtle ones which is much worse.

But why not just disable it?  It's not critical functionality,
just a optimization that unfortunately turned out to be incorrect.

It could be made correct short term by not freeing the pages until
vunmap for example.

> does his grand unified vmap manager.  I guess a clean workaround would
> be to add a CONFIG_XFS_LAZY_UNMAP, and do it at the Kconfig level... 
> I'll cook up a patch.

This option could only be safely set in architectures that don't
care about caching attribute aliases or never remap kernel pages
uncached. That's not x86, powerpc, ia64 at a minimum.

I think the only good short term fix is to turn your ifdef CONFIG_XEN
into a #if 0

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>   
>> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
>> stray mappings, which upsets Xen.
>> 
>
> Again it not just upsets Xen, keeping mappings to freed pages is wrong 
> generally 
> and violates the x86 (and likely others like PPC) architecture because it can 
> cause illegal caching attribute aliases.
>
> The patch that went into the tree was really not correct -- this
> bogus optimization should have been unconditionally removed
> or if you really wanted an ifdef made dependent on !CONFIG_XEN &&
> !CONFIG_AGP (and likely && !CONFIG_DRM && !CONFIG_anything else that
> uses uncached mappings in memory). 
>
> You just worked around the obvious failure and leave the non obvious
> rare corruptions in, which isn't a good strategy.

Well, at least it becomes a known issue and/or placeholder for when Nick
does his grand unified vmap manager.  I guess a clean workaround would
be to add a CONFIG_XFS_LAZY_UNMAP, and do it at the Kconfig level... 
I'll cook up a patch.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
> You could call it a bug I think. I don't know much about Xen though,
> whether or not it expects to be able to run an arbitrary OS kernel.
>   

Xen's paravirtualized mode always requires a guest OS to be modified;
certainly some operating systems would be very hard to make work with
Xen.  But you can always fall back to using shadow pagetables or full
hvm (VT/SVM) mode.

> Presumably, the hypervisor _could_ write protect and trap writes to
> *all* page table page mappings.
>   

Xen manages this stuff with refcounts; it doesn't maintain an rmap for
these pages, so it would have to exhaustively search to do this.  But
aside from that, Xen never actively modifies pagetables, so this would
be a new behaviour in the interface.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
dean gaudet wrote:
> sounds like a bug in xen to me :)
>   

I explained at the head of this thread how and why Xen works in this
manner.  It's certainly a change from native execution; whether you
consider it to be a bug is a different matter.

But it turns out that leaving stray mappings around on pages which get
later used in special ways is a bad idea, and can manifest in all kinds
of random unpleasant ways.  Getting a clear error out of Xen is probably
nicest way to discover and debug this problem.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>
> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> stray mappings, which upsets Xen.

Again it not just upsets Xen, keeping mappings to freed pages is wrong 
generally 
and violates the x86 (and likely others like PPC) architecture because it can 
cause illegal caching attribute aliases.

The patch that went into the tree was really not correct -- this
bogus optimization should have been unconditionally removed
or if you really wanted an ifdef made dependent on !CONFIG_XEN &&
!CONFIG_AGP (and likely && !CONFIG_DRM && !CONFIG_anything else that
uses uncached mappings in memory). 

You just worked around the obvious failure and leave the non obvious
rare corruptions in, which isn't a good strategy.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
On Mon, Oct 22, 2007 at 08:16:01AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2007-10-15 at 13:07 +0200, Andi Kleen wrote:
> > On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
> > > Is this true even if you don't write through those old mappings?
> > 
> > I think it happened for reads too.  It is a little counter intuitive
> > because in theory the CPU doesn't need to write back non dirty lines,
> > but in the one case which took so long to debug exactly this happened
> > somehow.
> 
> The problem exist also on ppc, and afaik, is due to the line being in
> the cache at all (either dirty (write) or not (read)), thus causing the
> snoop logic to hit, that is what's causing the problem vs. non cached
> accesses.

That makes sense. Snoop can effectively turn a read into a write.

> Also, on some processors, the simple fact of having the page mapped can
> cause the CPU to prefetch from it even if it's not actually accessed
> (speculative prefetch can cross page boundaries if things are mapped).

Exactly that happens on x86. Normally prefetches stop on TLB miss,
but the CPU can do speculative TLB fetches too.

Also even without any prefetching the CPU does speculative execution
and that can lead to random addresses being followed.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
On Mon, Oct 22, 2007 at 08:16:01AM +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2007-10-15 at 13:07 +0200, Andi Kleen wrote:
  On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
   Is this true even if you don't write through those old mappings?
  
  I think it happened for reads too.  It is a little counter intuitive
  because in theory the CPU doesn't need to write back non dirty lines,
  but in the one case which took so long to debug exactly this happened
  somehow.
 
 The problem exist also on ppc, and afaik, is due to the line being in
 the cache at all (either dirty (write) or not (read)), thus causing the
 snoop logic to hit, that is what's causing the problem vs. non cached
 accesses.

That makes sense. Snoop can effectively turn a read into a write.

 Also, on some processors, the simple fact of having the page mapped can
 cause the CPU to prefetch from it even if it's not actually accessed
 (speculative prefetch can cross page boundaries if things are mapped).

Exactly that happens on x86. Normally prefetches stop on TLB miss,
but the CPU can do speculative TLB fetches too.

Also even without any prefetching the CPU does speculative execution
and that can lead to random addresses being followed.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:

 Yes, that's precisely the problem.  xfs does delay the unmap, leaving
 stray mappings, which upsets Xen.

Again it not just upsets Xen, keeping mappings to freed pages is wrong 
generally 
and violates the x86 (and likely others like PPC) architecture because it can 
cause illegal caching attribute aliases.

The patch that went into the tree was really not correct -- this
bogus optimization should have been unconditionally removed
or if you really wanted an ifdef made dependent on !CONFIG_XEN 
!CONFIG_AGP (and likely  !CONFIG_DRM  !CONFIG_anything else that
uses uncached mappings in memory). 

You just worked around the obvious failure and leave the non obvious
rare corruptions in, which isn't a good strategy.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
dean gaudet wrote:
 sounds like a bug in xen to me :)
   

I explained at the head of this thread how and why Xen works in this
manner.  It's certainly a change from native execution; whether you
consider it to be a bug is a different matter.

But it turns out that leaving stray mappings around on pages which get
later used in special ways is a bad idea, and can manifest in all kinds
of random unpleasant ways.  Getting a clear error out of Xen is probably
nicest way to discover and debug this problem.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
 You could call it a bug I think. I don't know much about Xen though,
 whether or not it expects to be able to run an arbitrary OS kernel.
   

Xen's paravirtualized mode always requires a guest OS to be modified;
certainly some operating systems would be very hard to make work with
Xen.  But you can always fall back to using shadow pagetables or full
hvm (VT/SVM) mode.

 Presumably, the hypervisor _could_ write protect and trap writes to
 *all* page table page mappings.
   

Xen manages this stuff with refcounts; it doesn't maintain an rmap for
these pages, so it would have to exhaustively search to do this.  But
aside from that, Xen never actively modifies pagetables, so this would
be a new behaviour in the interface.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
 Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
   
 Yes, that's precisely the problem.  xfs does delay the unmap, leaving
 stray mappings, which upsets Xen.
 

 Again it not just upsets Xen, keeping mappings to freed pages is wrong 
 generally 
 and violates the x86 (and likely others like PPC) architecture because it can 
 cause illegal caching attribute aliases.

 The patch that went into the tree was really not correct -- this
 bogus optimization should have been unconditionally removed
 or if you really wanted an ifdef made dependent on !CONFIG_XEN 
 !CONFIG_AGP (and likely  !CONFIG_DRM  !CONFIG_anything else that
 uses uncached mappings in memory). 

 You just worked around the obvious failure and leave the non obvious
 rare corruptions in, which isn't a good strategy.

Well, at least it becomes a known issue and/or placeholder for when Nick
does his grand unified vmap manager.  I guess a clean workaround would
be to add a CONFIG_XFS_LAZY_UNMAP, and do it at the Kconfig level... 
I'll cook up a patch.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
 Andi Kleen wrote:
  Jeremy Fitzhardinge [EMAIL PROTECTED] writes:

  Yes, that's precisely the problem.  xfs does delay the unmap, leaving
  stray mappings, which upsets Xen.
  
 
  Again it not just upsets Xen, keeping mappings to freed pages is wrong 
  generally 
  and violates the x86 (and likely others like PPC) architecture because it 
  can 
  cause illegal caching attribute aliases.
 
  The patch that went into the tree was really not correct -- this
  bogus optimization should have been unconditionally removed
  or if you really wanted an ifdef made dependent on !CONFIG_XEN 
  !CONFIG_AGP (and likely  !CONFIG_DRM  !CONFIG_anything else that
  uses uncached mappings in memory). 
 
  You just worked around the obvious failure and leave the non obvious
  rare corruptions in, which isn't a good strategy.
 
 Well, at least it becomes a known issue and/or placeholder for when Nick

It's hidden now so it causes any obvious failures any more. Just
subtle ones which is much worse.

But why not just disable it?  It's not critical functionality,
just a optimization that unfortunately turned out to be incorrect.

It could be made correct short term by not freeing the pages until
vunmap for example.

 does his grand unified vmap manager.  I guess a clean workaround would
 be to add a CONFIG_XFS_LAZY_UNMAP, and do it at the Kconfig level... 
 I'll cook up a patch.

This option could only be safely set in architectures that don't
care about caching attribute aliases or never remap kernel pages
uncached. That's not x86, powerpc, ia64 at a minimum.

I think the only good short term fix is to turn your ifdef CONFIG_XEN
into a #if 0

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
 It's hidden now so it causes any obvious failures any more. Just
 subtle ones which is much worse.
   

I think anything detected by Xen is still classed as obscure ;)

 But why not just disable it?  It's not critical functionality,
 just a optimization that unfortunately turned out to be incorrect.

 It could be made correct short term by not freeing the pages until
 vunmap for example.
   

I think it only ends up holding 64 pages (or is it 64 mappings?), so its
not a horrible use of memory.  Particularly since it responds to memory
pressure callbacks.

 does his grand unified vmap manager.  I guess a clean workaround would
 be to add a CONFIG_XFS_LAZY_UNMAP, and do it at the Kconfig level... 
 I'll cook up a patch.
 

 This option could only be safely set in architectures that don't
 care about caching attribute aliases or never remap kernel pages
 uncached. That's not x86, powerpc, ia64 at a minimum.

 I think the only good short term fix is to turn your ifdef CONFIG_XEN
 into a #if 0
   

#if 1, but yes, I see your point.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread David Chinner
On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
 On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
  Andi Kleen wrote:
   Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
 
   Yes, that's precisely the problem.  xfs does delay the unmap, leaving
   stray mappings, which upsets Xen.
   
  
   Again it not just upsets Xen, keeping mappings to freed pages is wrong 
   generally 
   and violates the x86 (and likely others like PPC) architecture because it 
   can 
   cause illegal caching attribute aliases.
  
   The patch that went into the tree was really not correct -- this
   bogus optimization should have been unconditionally removed
   or if you really wanted an ifdef made dependent on !CONFIG_XEN 
   !CONFIG_AGP (and likely  !CONFIG_DRM  !CONFIG_anything else that
   uses uncached mappings in memory). 
  
   You just worked around the obvious failure and leave the non obvious
   rare corruptions in, which isn't a good strategy.
  
  Well, at least it becomes a known issue and/or placeholder for when Nick
 
 It's hidden now so it causes any obvious failures any more. Just
 subtle ones which is much worse.

There's no evidence of any problems ever being caused by this code.
It's been there for years.

 But why not just disable it?  It's not critical functionality,
 just a optimization that unfortunately turned out to be incorrect.

It is critical functionality for larger machines because vmap()/vunmap()
really does suck in terms of scalability.

 It could be made correct short term by not freeing the pages until
 vunmap for example.

Could vmap()/vunmap() take references to the pages that are
mapped? That way delaying the unmap would also delay the freeing
of the pages and hence we'd have no problems with the pages
being reused before the mapping is torn down. That'd work for
Xen even with XFS's lazy unmapping scheme, and would allow
Nick's more advanced methods to work as well

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Andi Kleen
On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
 On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
  On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
   Andi Kleen wrote:
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
  
Yes, that's precisely the problem.  xfs does delay the unmap, leaving
stray mappings, which upsets Xen.

   
Again it not just upsets Xen, keeping mappings to freed pages is wrong 
generally 
and violates the x86 (and likely others like PPC) architecture because 
it can 
cause illegal caching attribute aliases.
   
The patch that went into the tree was really not correct -- this
bogus optimization should have been unconditionally removed
or if you really wanted an ifdef made dependent on !CONFIG_XEN 
!CONFIG_AGP (and likely  !CONFIG_DRM  !CONFIG_anything else that
uses uncached mappings in memory). 
   
You just worked around the obvious failure and leave the non obvious
rare corruptions in, which isn't a good strategy.
   
   Well, at least it becomes a known issue and/or placeholder for when Nick
  
  It's hidden now so it causes any obvious failures any more. Just
  subtle ones which is much worse.
 
 There's no evidence of any problems ever being caused by this code.

We had a fairly nasty bug a couple of years ago with such
a mapping that took months to track down. Luckily we had 
help from a hardware debug lab, without that it would have
been very near impossible. But just stabilizing the problem was
a nightmare.

You will only see problems if the memory you're still mapping
will be allocated by someone who turns it into an uncached
mapping. And even then it doesn't happen always because
the CPU does not always do the necessary prefetches.

Also the corruption will not be on the XFS side,
but on the uncached mapping so typically some data sent
to some device gets a few corrupted cache line sized blocks.
Depending on the device this might also not be fatal -- e.g.
if it is texture data some corrupted pixels occasionally are 
not that visible. But there can be still cases where
it can cause real failures when it hits something critical
(the failures were it was tracked down was usually it hitting
some command ring and the hardware erroring out) 

Also every time I broke change_page_attr() flushing (which 
handles exactly this problem and is tricky so it got broken
a few times) I eventually got complaints/reports from some users who ran 
into such data inconsistencies. Given it was mostly from closed
3d driver users who seem to be most aggressive in using 
uncached mappings. But with more systems running more aggressive
3d drivers it'll likely get worse.

 It's been there for years.

That doesn't mean it is correct.

Basically you're saying: I don't care if I corrupt device driver
data. That's not a very nice thing to say.

  But why not just disable it?  It's not critical functionality,
  just a optimization that unfortunately turned out to be incorrect.
 
 It is critical functionality for larger machines because vmap()/vunmap()
 really does suck in terms of scalability.

Critical perhaps, but also broken.

And if it's that big a problem would it be really that difficult
to change only the time critical paths using it to not?  I assume
it's only the directory code where it is a problem?  That would
be likely even faster in general.

  It could be made correct short term by not freeing the pages until
  vunmap for example.
 
 Could vmap()/vunmap() take references to the pages that are
 mapped? That way delaying the unmap would also delay the freeing
 of the pages and hence we'd have no problems with the pages
 being reused before the mapping is torn down. That'd work for
 Xen even with XFS's lazy unmapping scheme, and would allow
 Nick's more advanced methods to work as well

You could always just keep around an array of the pages and 
then drop the reference count after unmap. Or walk the vmalloc mapping
and generate such an array before freeing, then unmap and then
drop the reference counts.

Or the other alternative would be change the time critical code
that uses it to not.

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread Zachary Amsden
On Tue, 2007-10-23 at 01:35 +0200, Andi Kleen wrote:
 On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
  On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
   On Mon, Oct 22, 2007 at 11:40:52AM -0700, Jeremy Fitzhardinge wrote:
Andi Kleen wrote:
 Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
   
 Yes, that's precisely the problem.  xfs does delay the unmap, leaving
 stray mappings, which upsets Xen.
 

 Again it not just upsets Xen, keeping mappings to freed pages is 
 wrong generally 
 and violates the x86 (and likely others like PPC) architecture 
 because it can 
 cause illegal caching attribute aliases.

It is a serious offense to leave stray mappings for memory which can get
re-mapped to I/O devices... especially with PCI and other device
hotplug.  I have to back up Andi on this one unconditionally.

On architectures where you have multibyte, non-wordsize updates to
hardware page tables, you even have races here when setting, updating
and clearing PTEs that must be done in a sequence where no aliasing of
mappings to partially written PTEs can result in I/O memory getting
mapped in a cacheable state.  The window here is only one instruction,
and yes, it is possible for a window this small to create a problem.  A
large (or permanently lazy) window is extremely frightening.

These things do cause bugs.  The bugs take a very long time to show up
and are very difficult to track down, since they can basically cause
random behavior, such as hanging the machine or losing orbit and
crashing into the moon.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-22 Thread David Chinner
On Tue, Oct 23, 2007 at 01:35:14AM +0200, Andi Kleen wrote:
 On Tue, Oct 23, 2007 at 08:32:25AM +1000, David Chinner wrote:
  On Mon, Oct 22, 2007 at 09:07:40PM +0200, Andi Kleen wrote:
   It's hidden now so it causes any obvious failures any more. Just subtle
   ones which is much worse.
  
  There's no evidence of any problems ever being caused by this code.
.
 Also the corruption will not be on the XFS side, but on the uncached mapping
 so typically some data sent to some device gets a few corrupted cache line
 sized blocks.  Depending on the device this might also not be fatal -- e.g.
 if it is texture data some corrupted pixels occasionally are not that
 visible. But there can be still cases where it can cause real failures when
 it hits something critical (the failures were it was tracked down was
 usually it hitting some command ring and the hardware erroring out) 

There seems to be little evidence of that occurring around the place.

  It's been there for years.
 
 That doesn't mean it is correct.

Right, but it also points to the fact that it's not causing problems
from 99.999% of ppl out there.

 Basically you're saying: I don't care if I corrupt device driver data.
 That's not a very nice thing to say.

No, I did not say that. I said that there's no evidence that points to
this causing problems anywhere. 

   But why not just disable it?  It's not critical functionality, just a
   optimization that unfortunately turned out to be incorrect.
  
  It is critical functionality for larger machines because vmap()/vunmap()
  really does suck in terms of scalability.
 
 Critical perhaps, but also broken.
 
 And if it's that big a problem would it be really that difficult to change
 only the time critical paths using it to not?  I assume it's only the
 directory code where it is a problem?  That would be likely even faster in
 general.

Difficult - yes. All the btree code in XFS would have to be rewritten
to remove the assumption that the buffer structures are contiguous in
memory. Any bug we introduce in doing this will result in on disk corruption,
which is far worse than occasionally crashing a machine with a stray
mapping.

   It could be made correct short term by not freeing the pages until
   vunmap for example.
  
  Could vmap()/vunmap() take references to the pages that are mapped? That
  way delaying the unmap would also delay the freeing of the pages and hence
  we'd have no problems with the pages being reused before the mapping is
  torn down. That'd work for Xen even with XFS's lazy unmapping scheme, and
  would allow Nick's more advanced methods to work as well
 
 You could always just keep around an array of the pages and then drop the
 reference count after unmap. Or walk the vmalloc mapping and generate such
 an array before freeing, then unmap and then drop the reference counts.

You mean like vmap() could record the pages passed to it in the area-pages
array, and we walk and release than in __vunmap() like it already does
for vfree()?

If we did this, it would probably be best to pass a page release function
into the vmap or vunmap call - we'd need page_cache_release() called on
the page rather than __free_page()

The solution belongs behind the vmap/vunmap interface, not in XFS

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Nick Piggin
On Monday 22 October 2007 14:28, dean gaudet wrote:
> On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote:
> > dean gaudet wrote:
> > > On Mon, 15 Oct 2007, Nick Piggin wrote:
> > >> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> > >> because it generally has to invalidate TLBs on all CPUs.
> > >
> > > why is that?  ignoring 32-bit archs we have heaps of address space
> > > available... couldn't the kernel just burn address space and delay
> > > global TLB invalidate by some relatively long time (say 1 second)?
> >
> > Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> > stray mappings, which upsets Xen.
>
> sounds like a bug in xen to me :)

You could call it a bug I think. I don't know much about Xen though,
whether or not it expects to be able to run an arbitrary OS kernel.

Presumably, the hypervisor _could_ write protect and trap writes to
*all* page table page mappings.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote:

> dean gaudet wrote:
> > On Mon, 15 Oct 2007, Nick Piggin wrote:
> >
> >   
> >> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> >> because it generally has to invalidate TLBs on all CPUs.
> >> 
> >
> > why is that?  ignoring 32-bit archs we have heaps of address space 
> > available... couldn't the kernel just burn address space and delay global 
> > TLB invalidate by some relatively long time (say 1 second)?
> >   
> 
> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> stray mappings, which upsets Xen.

sounds like a bug in xen to me :)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Jeremy Fitzhardinge
dean gaudet wrote:
> On Mon, 15 Oct 2007, Nick Piggin wrote:
>
>   
>> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
>> because it generally has to invalidate TLBs on all CPUs.
>> 
>
> why is that?  ignoring 32-bit archs we have heaps of address space 
> available... couldn't the kernel just burn address space and delay global 
> TLB invalidate by some relatively long time (say 1 second)?
>   

Yes, that's precisely the problem.  xfs does delay the unmap, leaving
stray mappings, which upsets Xen.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Mon, 15 Oct 2007, Nick Piggin wrote:

> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> because it generally has to invalidate TLBs on all CPUs.

why is that?  ignoring 32-bit archs we have heaps of address space 
available... couldn't the kernel just burn address space and delay global 
TLB invalidate by some relatively long time (say 1 second)?

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Benjamin Herrenschmidt

On Mon, 2007-10-15 at 13:07 +0200, Andi Kleen wrote:
> On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
> > Is this true even if you don't write through those old mappings?
> 
> I think it happened for reads too.  It is a little counter intuitive
> because in theory the CPU doesn't need to write back non dirty lines,
> but in the one case which took so long to debug exactly this happened
> somehow.

The problem exist also on ppc, and afaik, is due to the line being in
the cache at all (either dirty (write) or not (read)), thus causing the
snoop logic to hit, that is what's causing the problem vs. non cached
accesses.

Also, on some processors, the simple fact of having the page mapped can
cause the CPU to prefetch from it even if it's not actually accessed
(speculative prefetch can cross page boundaries if things are mapped).

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Dave Airlie
On 10/15/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Hmm, OK. It looks like DRM vmallocs memory (which gives highmem).
>
> I meant I'm not sure if it uses that memory uncached. I admit
> not quite understanding that code. There used to be at least
> one place where it set UC for an user mapping though.

Currently the only DRM memory handed to userspace is vmalloc_32 in drm_scatter.c

I notice the VIA driver does its own vmalloc for dmablit, so it may
have an issue with this if highmem is involved.

This will change with the upcoming memory manager so I'll need to
investigate it a bit perhaps...

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Dave Airlie
On 10/15/07, Andi Kleen [EMAIL PROTECTED] wrote:
  Hmm, OK. It looks like DRM vmallocs memory (which gives highmem).

 I meant I'm not sure if it uses that memory uncached. I admit
 not quite understanding that code. There used to be at least
 one place where it set UC for an user mapping though.

Currently the only DRM memory handed to userspace is vmalloc_32 in drm_scatter.c

I notice the VIA driver does its own vmalloc for dmablit, so it may
have an issue with this if highmem is involved.

This will change with the upcoming memory manager so I'll need to
investigate it a bit perhaps...

Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Benjamin Herrenschmidt

On Mon, 2007-10-15 at 13:07 +0200, Andi Kleen wrote:
 On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
  Is this true even if you don't write through those old mappings?
 
 I think it happened for reads too.  It is a little counter intuitive
 because in theory the CPU doesn't need to write back non dirty lines,
 but in the one case which took so long to debug exactly this happened
 somehow.

The problem exist also on ppc, and afaik, is due to the line being in
the cache at all (either dirty (write) or not (read)), thus causing the
snoop logic to hit, that is what's causing the problem vs. non cached
accesses.

Also, on some processors, the simple fact of having the page mapped can
cause the CPU to prefetch from it even if it's not actually accessed
(speculative prefetch can cross page boundaries if things are mapped).

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Mon, 15 Oct 2007, Nick Piggin wrote:

 Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
 because it generally has to invalidate TLBs on all CPUs.

why is that?  ignoring 32-bit archs we have heaps of address space 
available... couldn't the kernel just burn address space and delay global 
TLB invalidate by some relatively long time (say 1 second)?

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Jeremy Fitzhardinge
dean gaudet wrote:
 On Mon, 15 Oct 2007, Nick Piggin wrote:

   
 Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
 because it generally has to invalidate TLBs on all CPUs.
 

 why is that?  ignoring 32-bit archs we have heaps of address space 
 available... couldn't the kernel just burn address space and delay global 
 TLB invalidate by some relatively long time (say 1 second)?
   

Yes, that's precisely the problem.  xfs does delay the unmap, leaving
stray mappings, which upsets Xen.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote:

 dean gaudet wrote:
  On Mon, 15 Oct 2007, Nick Piggin wrote:
 

  Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
  because it generally has to invalidate TLBs on all CPUs.
  
 
  why is that?  ignoring 32-bit archs we have heaps of address space 
  available... couldn't the kernel just burn address space and delay global 
  TLB invalidate by some relatively long time (say 1 second)?

 
 Yes, that's precisely the problem.  xfs does delay the unmap, leaving
 stray mappings, which upsets Xen.

sounds like a bug in xen to me :)

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread Nick Piggin
On Monday 22 October 2007 14:28, dean gaudet wrote:
 On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote:
  dean gaudet wrote:
   On Mon, 15 Oct 2007, Nick Piggin wrote:
   Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
   because it generally has to invalidate TLBs on all CPUs.
  
   why is that?  ignoring 32-bit archs we have heaps of address space
   available... couldn't the kernel just burn address space and delay
   global TLB invalidate by some relatively long time (say 1 second)?
 
  Yes, that's precisely the problem.  xfs does delay the unmap, leaving
  stray mappings, which upsets Xen.

 sounds like a bug in xen to me :)

You could call it a bug I think. I don't know much about Xen though,
whether or not it expects to be able to run an arbitrary OS kernel.

Presumably, the hypervisor _could_ write protect and trap writes to
*all* page table page mappings.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Andi Kleen
> Hmm, OK. It looks like DRM vmallocs memory (which gives highmem).

I meant I'm not sure if it uses that memory uncached. I admit
not quite understanding that code. There used to be at least
one place where it set UC for an user mapping though.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 21:07, Andi Kleen wrote:
> On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
> > Is this true even if you don't write through those old mappings?
>
> I think it happened for reads too.  It is a little counter intuitive
> because in theory the CPU doesn't need to write back non dirty lines,
> but in the one case which took so long to debug exactly this happened
> somehow.
>
> At it is undefined for reads and writes in the architecture so
> better be safe than sorry.

Yes, typo. I meant reads or writes.


> And x86 CPUs are out of order and do speculative executation
> and that can lead to arbitary memory accesses even if the code
> never touches an particular address.
>
> Newer Intel CPUs have something called self-snoop which was supposed
> to handle this; but in some situations it doesn't seem to catch it
> either.

Fair enough, so we have to have this lazy tlb flush hook for
Xen/PAT/etc. I don't think it should be much problem to
implement.


> > Is DRM or AGP then not also broken with lazy highmem flushing, or
> > how do they solve that?
>
> AGP doesn't allocate highmem pages.  Not sure about the DRM code.

Hmm, OK. It looks like DRM vmallocs memory (which gives highmem).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Andi Kleen
On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
> Is this true even if you don't write through those old mappings?

I think it happened for reads too.  It is a little counter intuitive
because in theory the CPU doesn't need to write back non dirty lines,
but in the one case which took so long to debug exactly this happened
somehow.

At it is undefined for reads and writes in the architecture so 
better be safe than sorry.

And x86 CPUs are out of order and do speculative executation
and that can lead to arbitary memory accesses even if the code
never touches an particular address.

Newer Intel CPUs have something called self-snoop which was supposed
to handle this; but in some situations it doesn't seem to catch it
either.

> Is DRM or AGP then not also broken with lazy highmem flushing, or
> how do they solve that?

AGP doesn't allocate highmem pages.  Not sure about the DRM code.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:36, Andi Kleen wrote:
> David Chinner <[EMAIL PROTECTED]> writes:
> > And yes, we delay unmapping pages until we have a batch of them
> > to unmap. vmap and vunmap do not scale, so this is batching helps
> > alleviate some of the worst of the problems.
>
> You're keeping vmaps around for already freed pages?

> That will be a big problem for proper PAT support, which needs
> to track all mappings to memory. It's not just a problem for Xen.
>
> In fact I suspect it is already broken with DRM or AGP for example which
> can use UC and WC mappings -- if you keep the mapping around and
> DRM or AGP turns the page in another mapping uncacheable you're
> creating an illegal cache attribute alias. These are known to occasionally
> create cache corruptions on several x86s; giving ___VERY___ hard to debug
> bugs once a blue moon.

Is this true even if you don't write through those old mappings?
Is DRM or AGP then not also broken with lazy highmem flushing, or
how do they solve that?


> Probably it'll require some generic VM batching mechanism where
> Xen or PAT code can hook into the list or force unmap the mappings
> as needed.

Definitely.


> Definitely needs to be fixed if true. You're lucky that Xen caught it
> in time.

I've been thinking that a simple debug mode could be a good idea.
Have a new field in the struct page to count the number of possible
unflushed mappings to the page so that things like PAT could go
BUG_ON appropriate conditions. Would that be helpful?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Andi Kleen
David Chinner <[EMAIL PROTECTED]> writes:
>
> And yes, we delay unmapping pages until we have a batch of them
> to unmap. vmap and vunmap do not scale, so this is batching helps
> alleviate some of the worst of the problems.

You're keeping vmaps around for already freed pages?

That will be a big problem for proper PAT support, which needs
to track all mappings to memory. It's not just a problem for Xen.

In fact I suspect it is already broken with DRM or AGP for example which
can use UC and WC mappings -- if you keep the mapping around and
DRM or AGP turns the page in another mapping uncacheable you're
creating an illegal cache attribute alias. These are known to occasionally
create cache corruptions on several x86s; giving ___VERY___ hard to debug
bugs once a blue moon.

Probably it'll require some generic VM batching mechanism where
Xen or PAT code can hook into the list or force unmap the mappings
as needed.

Definitely needs to be fixed if true. You're lucky that Xen caught it
in time.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Christoph Hellwig
On Mon, Oct 15, 2007 at 02:25:46PM +1000, David Chinner wrote:
> > Hm, well I saw the problem with a filesystem made with mkfs.xfs with no
> > options, so there must be at least *some* vmapping going on there.
> 
> Sorry - I should have been more precise - vmap should never be used in
> performance critical paths on default configs.  Log recovery will
> trigger vmap/vunmap usage, so this is probably what you are seeing.

The iclogs are also vmapped, but they aren't unmapped until unmount so
this optimizations doesn't matter either.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Christoph Hellwig
On Mon, Oct 15, 2007 at 02:25:46PM +1000, David Chinner wrote:
  Hm, well I saw the problem with a filesystem made with mkfs.xfs with no
  options, so there must be at least *some* vmapping going on there.
 
 Sorry - I should have been more precise - vmap should never be used in
 performance critical paths on default configs.  Log recovery will
 trigger vmap/vunmap usage, so this is probably what you are seeing.

The iclogs are also vmapped, but they aren't unmapped until unmount so
this optimizations doesn't matter either.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Andi Kleen
David Chinner [EMAIL PROTECTED] writes:

 And yes, we delay unmapping pages until we have a batch of them
 to unmap. vmap and vunmap do not scale, so this is batching helps
 alleviate some of the worst of the problems.

You're keeping vmaps around for already freed pages?

That will be a big problem for proper PAT support, which needs
to track all mappings to memory. It's not just a problem for Xen.

In fact I suspect it is already broken with DRM or AGP for example which
can use UC and WC mappings -- if you keep the mapping around and
DRM or AGP turns the page in another mapping uncacheable you're
creating an illegal cache attribute alias. These are known to occasionally
create cache corruptions on several x86s; giving ___VERY___ hard to debug
bugs once a blue moon.

Probably it'll require some generic VM batching mechanism where
Xen or PAT code can hook into the list or force unmap the mappings
as needed.

Definitely needs to be fixed if true. You're lucky that Xen caught it
in time.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:36, Andi Kleen wrote:
 David Chinner [EMAIL PROTECTED] writes:
  And yes, we delay unmapping pages until we have a batch of them
  to unmap. vmap and vunmap do not scale, so this is batching helps
  alleviate some of the worst of the problems.

 You're keeping vmaps around for already freed pages?

 That will be a big problem for proper PAT support, which needs
 to track all mappings to memory. It's not just a problem for Xen.

 In fact I suspect it is already broken with DRM or AGP for example which
 can use UC and WC mappings -- if you keep the mapping around and
 DRM or AGP turns the page in another mapping uncacheable you're
 creating an illegal cache attribute alias. These are known to occasionally
 create cache corruptions on several x86s; giving ___VERY___ hard to debug
 bugs once a blue moon.

Is this true even if you don't write through those old mappings?
Is DRM or AGP then not also broken with lazy highmem flushing, or
how do they solve that?


 Probably it'll require some generic VM batching mechanism where
 Xen or PAT code can hook into the list or force unmap the mappings
 as needed.

Definitely.


 Definitely needs to be fixed if true. You're lucky that Xen caught it
 in time.

I've been thinking that a simple debug mode could be a good idea.
Have a new field in the struct page to count the number of possible
unflushed mappings to the page so that things like PAT could go
BUG_ON appropriate conditions. Would that be helpful?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Andi Kleen
On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
 Is this true even if you don't write through those old mappings?

I think it happened for reads too.  It is a little counter intuitive
because in theory the CPU doesn't need to write back non dirty lines,
but in the one case which took so long to debug exactly this happened
somehow.

At it is undefined for reads and writes in the architecture so 
better be safe than sorry.

And x86 CPUs are out of order and do speculative executation
and that can lead to arbitary memory accesses even if the code
never touches an particular address.

Newer Intel CPUs have something called self-snoop which was supposed
to handle this; but in some situations it doesn't seem to catch it
either.

 Is DRM or AGP then not also broken with lazy highmem flushing, or
 how do they solve that?

AGP doesn't allocate highmem pages.  Not sure about the DRM code.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 21:07, Andi Kleen wrote:
 On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote:
  Is this true even if you don't write through those old mappings?

 I think it happened for reads too.  It is a little counter intuitive
 because in theory the CPU doesn't need to write back non dirty lines,
 but in the one case which took so long to debug exactly this happened
 somehow.

 At it is undefined for reads and writes in the architecture so
 better be safe than sorry.

Yes, typo. I meant reads or writes.


 And x86 CPUs are out of order and do speculative executation
 and that can lead to arbitary memory accesses even if the code
 never touches an particular address.

 Newer Intel CPUs have something called self-snoop which was supposed
 to handle this; but in some situations it doesn't seem to catch it
 either.

Fair enough, so we have to have this lazy tlb flush hook for
Xen/PAT/etc. I don't think it should be much problem to
implement.


  Is DRM or AGP then not also broken with lazy highmem flushing, or
  how do they solve that?

 AGP doesn't allocate highmem pages.  Not sure about the DRM code.

Hmm, OK. It looks like DRM vmallocs memory (which gives highmem).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Andi Kleen
 Hmm, OK. It looks like DRM vmallocs memory (which gives highmem).

I meant I'm not sure if it uses that memory uncached. I admit
not quite understanding that code. There used to be at least
one place where it set UC for an user mapping though.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Sun, Oct 14, 2007 at 09:18:17PM -0700, Jeremy Fitzhardinge wrote:
> David Chinner wrote:
> > With defaults - little effect as vmap should never be used. It's
> > only when you start using larger block sizes for metadata that this
> > becomes an issue. The CONFIG_XEN workaround should be fine until we
> > get a proper vmap cache
> 
> Hm, well I saw the problem with a filesystem made with mkfs.xfs with no
> options, so there must be at least *some* vmapping going on there.

Sorry - I should have been more precise - vmap should never be used in
performance critical paths on default configs.  Log recovery will
trigger vmap/vunmap usage, so this is probably what you are seeing.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
David Chinner wrote:
> With defaults - little effect as vmap should never be used. It's
> only when you start using larger block sizes for metadata that this
> becomes an issue. The CONFIG_XEN workaround should be fine until we
> get a proper vmap cache

Hm, well I saw the problem with a filesystem made with mkfs.xfs with no
options, so there must be at least *some* vmapping going on there.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Sun, Oct 14, 2007 at 08:42:34PM -0700, Jeremy Fitzhardinge wrote:
> Nick Piggin wrote:
> >  That's not going to
> > happen for at least a cycle or two though, so in the meantime maybe
> > an ifdef for that XFS vmap batching code would help?
> >   
> 
> For now I've proposed a patch to simply eagerly vunmap everything when
> CONFIG_XEN is set.  It certainly works, but I don't have a good feel for
> how much of a performance hit that imposes on XFS.

With defaults - little effect as vmap should never be used. It's
only when you start using larger block sizes for metadata that this
becomes an issue. The CONFIG_XEN workaround should be fine until we
get a proper vmap cache

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
> Yeah, it would be possible. The easiest way would just be to shoot down
> all lazy vmaps (because you're doing the global IPIs anyway, which are
> the expensive thing, at which point you may as well purge the rest of
> your lazy mappings).
>   

Sure.

> If it is sufficiently rare, then it could be the simplest thing to do.
>   

Yes.  If there's some way to tell whether a particular page is in a lazy
mapping then that would help, since we could easily tell whether we need
to do the whole shootdown thing.  I would expect the population of
lazily mapped pages in the whole freepage pool to be pretty small, but
if the allocator tends to return the most recently freed pages you might
hit them fairly regularly (shoving them at the other end of the freelist
might be useful).

> OK, I see. Because even though it is technically safe where we are
> using it (because nothing writes through the mappings after the page
> is freed), a corrupted guest could use the same window to do bad
> things with the pagetables?
>   

That's right.  The hypervisor doesn't trust the guests, so it prevents
them from getting into a state where they can do bad things.

> For Xen -- shouldn't be a big deal. We can have a single Linux mm API
> to call, and we can do the right thing WRT vmap/kamp. I should try to
> merge my current lazy vmap patches which replace the XFS stuff, so we
> can implement such an API and fix your XFS issue?

Sounds good.

>  That's not going to
> happen for at least a cycle or two though, so in the meantime maybe
> an ifdef for that XFS vmap batching code would help?
>   

For now I've proposed a patch to simply eagerly vunmap everything when
CONFIG_XEN is set.  It certainly works, but I don't have a good feel for
how much of a performance hit that imposes on XFS.  A slightly more
subtle change would be to test to see if we're actually running under
Xen before taking the eager path, so at least the performance burden
only affects actual Xen users (and I presume xfs+xen is a fairly rare
combination, or this problem would have turned up earlier, or perhaps
the old xenified kernels have some other workaround for it).

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 10:57, Jeremy Fitzhardinge wrote:
> Nick Piggin wrote:
> > Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> > because it generally has to invalidate TLBs on all CPUs.
>
> I see.
>
> > I'm looking at some more general solutions to this (already have some
> > batching / lazy unmapping that replaces the XFS specific one), however
> > they are still likely going to leave vmap mappings around after freeing
> > the page.
>
> Hm.  Could there be a call to shoot down any lazy mappings of a page, so
> the Xen pagetable code could use it on any pagetable page?  Ideally one
> that could be used on any page, but only causes expensive operations
> where needed.

Yeah, it would be possible. The easiest way would just be to shoot down
all lazy vmaps (because you're doing the global IPIs anyway, which are
the expensive thing, at which point you may as well purge the rest of
your lazy mappings).

If it is sufficiently rare, then it could be the simplest thing to do.


> > We _could_ hold on to the pages as well, but that's pretty inefficient.
> > The memory cost of keeping the mappings around tends to be well under
> > 1% the cost of the page itself. OTOH we could also avoid lazy flushes
> > on architectures where it is not costly. Either way, it probably would
> > require an arch hook or even a couple of ifdefs in mm/vmalloc.c for
> > Xen. Although... it would be nice if Xen could take advantage of some
> > of these optimisations as well.
>
> In general the lazy unmappings won't worry Xen.  It's only for the
> specific case of allocating memory for pagetables.  Xen can do a bit of
> extra optimisation for cross-cpu tlb flushes (if the target vcpus are
> not currently running, then you don't need to do anything), but they're
> still an expensive operation, so the optimisation is definitely useful.

OK.


> > What's the actual problem for Xen? Anything that can be changed?
>
> Not easily.  Xen doesn't use shadow pagetables.  Instead, it gives the
> guest domains direct access to the real CPU's pagetable, but makes sure
> they're always mapped RO so that the hypervisor can control updates to
> the pagetables (either by trapping writes or via explicit hypercalls).
> This means that when constructing a new pagetable, Xen will verify that
> all the mappings of pages making up the new pagetable are RO before
> allowing it to be used.  If there are stray RW mappings of those pages,
> pagetable construction will fail.

OK, I see. Because even though it is technically safe where we are
using it (because nothing writes through the mappings after the page
is freed), a corrupted guest could use the same window to do bad
things with the pagetables?


> Aside from XFS, the only other case I've found where there could be
> stray RW mappings is when using high pages which are still in the kmap
> cache; I added an explicit call to flush the kmap cache to handle this.
> If vmap and kmap can be unified (at least the lazy unmap aspects of
> them), then that would be a nice little cleanup.

vmap is slightly harder than kmap in some respects. However it would
be really nice to get vmap fast and general enough to completely
replace all the kmap crud -- that's one goal, but the first thing
I'm doing is to concentrate on just vmap to work out how to make it
as fast as possible.

For Xen -- shouldn't be a big deal. We can have a single Linux mm API
to call, and we can do the right thing WRT vmap/kamp. I should try to
merge my current lazy vmap patches which replace the XFS stuff, so we
can implement such an API and fix your XFS issue? That's not going to
happen for at least a cycle or two though, so in the meantime maybe
an ifdef for that XFS vmap batching code would help?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> because it generally has to invalidate TLBs on all CPUs.
>   

I see.

> I'm looking at some more general solutions to this (already have some
> batching / lazy unmapping that replaces the XFS specific one), however
> they are still likely going to leave vmap mappings around after freeing
> the page.
>   

Hm.  Could there be a call to shoot down any lazy mappings of a page, so
the Xen pagetable code could use it on any pagetable page?  Ideally one
that could be used on any page, but only causes expensive operations
where needed.

> We _could_ hold on to the pages as well, but that's pretty inefficient.
> The memory cost of keeping the mappings around tends to be well under
> 1% the cost of the page itself. OTOH we could also avoid lazy flushes
> on architectures where it is not costly. Either way, it probably would
> require an arch hook or even a couple of ifdefs in mm/vmalloc.c for
> Xen. Although... it would be nice if Xen could take advantage of some
> of these optimisations as well.
>   

In general the lazy unmappings won't worry Xen.  It's only for the
specific case of allocating memory for pagetables.  Xen can do a bit of
extra optimisation for cross-cpu tlb flushes (if the target vcpus are
not currently running, then you don't need to do anything), but they're
still an expensive operation, so the optimisation is definitely useful.

> What's the actual problem for Xen? Anything that can be changed?
>   

Not easily.  Xen doesn't use shadow pagetables.  Instead, it gives the
guest domains direct access to the real CPU's pagetable, but makes sure
they're always mapped RO so that the hypervisor can control updates to
the pagetables (either by trapping writes or via explicit hypercalls). 
This means that when constructing a new pagetable, Xen will verify that
all the mappings of pages making up the new pagetable are RO before
allowing it to be used.  If there are stray RW mappings of those pages,
pagetable construction will fail.

Aside from XFS, the only other case I've found where there could be
stray RW mappings is when using high pages which are still in the kmap
cache; I added an explicit call to flush the kmap cache to handle this. 
If vmap and kmap can be unified (at least the lazy unmap aspects of
them), then that would be a nice little cleanup.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Sun, Oct 14, 2007 at 04:12:20PM -0700, Jeremy Fitzhardinge wrote:
> David Chinner wrote:
> > You mean xfs_buf.c.
> >   
> 
> Yes, sorry.
> 
> > And yes, we delay unmapping pages until we have a batch of them
> > to unmap. vmap and vunmap do not scale, so this is batching helps
> > alleviate some of the worst of the problems.
> >   
> 
> How much performance does it cost?

Every vunmap() cal causes a global TLB sync, and the region lists
are globl with a spin lock protecting them. I thin kNick has shown
a 64p altix with ~60 cpus spinning on the vmap locks under a
simple workload

> What kind of workloads would it show
> up under?

A directory traversal when using large directory block sizes
with large directories


> > Realistically, if this delayed release of vmaps is a problem for
> > Xen, then I think that some generic VM solution is needed to this
> > problem as vmap() is likely to become more common in future (think
> > large blocks in filesystems). Nick - any comments?
> >   
> 
> Well, the only real problem is that the pages are returned to the free
> pool and reallocated while still being part of a mapping.  If the pages
> are still owned by the filesystem/pagecache, then there's no problem.

The pages are still attached to the blockdev address space mapping,
but there's nothing stopping them from being reclaimed before they are
unmapped.

> What's the lifetime of things being vmapped/unmapped in xfs?  Are they
> necessarily being freed when they're unmapped, or could unmapping of
> freed memory be more immediate than other memory?

It's all "freed memory". At the time we pull the buffer down, there are
no further references to the buffer. the pages are released and the mapping
is never used again until it is torn down. it is torn down either on the
next xfsbufd run (either memory pressure or every 15s) or every 64th
new vmap() call to map new buffers.

> Maybe it just needs a notifier chain or something.

We've already got a memroy shrinker hook that triggers this reclaim.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 09:12, Jeremy Fitzhardinge wrote:
> David Chinner wrote:
> > You mean xfs_buf.c.
>
> Yes, sorry.
>
> > And yes, we delay unmapping pages until we have a batch of them
> > to unmap. vmap and vunmap do not scale, so this is batching helps
> > alleviate some of the worst of the problems.
>
> How much performance does it cost?  What kind of workloads would it show
> up under?
>
> > Realistically, if this delayed release of vmaps is a problem for
> > Xen, then I think that some generic VM solution is needed to this
> > problem as vmap() is likely to become more common in future (think
> > large blocks in filesystems). Nick - any comments?
>
> Well, the only real problem is that the pages are returned to the free
> pool and reallocated while still being part of a mapping.  If the pages
> are still owned by the filesystem/pagecache, then there's no problem.
>
> What's the lifetime of things being vmapped/unmapped in xfs?  Are they
> necessarily being freed when they're unmapped, or could unmapping of
> freed memory be more immediate than other memory?

Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
because it generally has to invalidate TLBs on all CPUs.

I'm looking at some more general solutions to this (already have some
batching / lazy unmapping that replaces the XFS specific one), however
they are still likely going to leave vmap mappings around after freeing
the page.

We _could_ hold on to the pages as well, but that's pretty inefficient.
The memory cost of keeping the mappings around tends to be well under
1% the cost of the page itself. OTOH we could also avoid lazy flushes
on architectures where it is not costly. Either way, it probably would
require an arch hook or even a couple of ifdefs in mm/vmalloc.c for
Xen. Although... it would be nice if Xen could take advantage of some
of these optimisations as well.

What's the actual problem for Xen? Anything that can be changed?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
David Chinner wrote:
> You mean xfs_buf.c.
>   

Yes, sorry.

> And yes, we delay unmapping pages until we have a batch of them
> to unmap. vmap and vunmap do not scale, so this is batching helps
> alleviate some of the worst of the problems.
>   

How much performance does it cost?  What kind of workloads would it show
up under?

> Realistically, if this delayed release of vmaps is a problem for
> Xen, then I think that some generic VM solution is needed to this
> problem as vmap() is likely to become more common in future (think
> large blocks in filesystems). Nick - any comments?
>   

Well, the only real problem is that the pages are returned to the free
pool and reallocated while still being part of a mapping.  If the pages
are still owned by the filesystem/pagecache, then there's no problem.

What's the lifetime of things being vmapped/unmapped in xfs?  Are they
necessarily being freed when they're unmapped, or could unmapping of
freed memory be more immediate than other memory?

Maybe it just needs a notifier chain or something.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Fri, Oct 12, 2007 at 09:58:43AM -0700, Jeremy Fitzhardinge wrote:
> Hi Dave & other XFS folk,
> 
> I'm tracking down a bug which appears to be a bad interaction between XFS
> and Xen.  It looks like XFS is holding RW mappings on free pages, which Xen
> is trying to get an exclusive RO mapping on so it can turn them into
> pagetables.
> 
> I'm assuming the pages are actually free, and this isn't a use after free
> problem.  From a quick poke around, the most likely pieces of XFS code is
> the stuff in xfs_map.c which creates a virtually contiguous mapping of pages
> with vmap, and seems to delay unmapping them.

You mean xfs_buf.c.

And yes, we delay unmapping pages until we have a batch of them
to unmap. vmap and vunmap do not scale, so this is batching helps
alleviate some of the worst of the problems.

> When pinning a pagetable, Xen tries to eliminate any RW aliases of the pages
> its using.  This is generally trivial because pages returned by
> get_free_page don't have any mappings aside from the normal kernel mapping.
> High pages, when using CONFIG_HIGHPTE, may have a residual kmap mapping, so
> we clear out the kmap cache if we encounter a highpage in the pagetable.
> 
> I guess we could create a special-case interface to do the same thing with
> XFS mappings, but it would be nicer to have something more generic.

*nod*

> Is my analysis correct?  Or should XFS not be holding stray mappings?  Or is
> there already some kind of generic mechanism I can use to get it to release
> its mappings?

The xfsbufd cleans out any stale mappings - it's woken by the memory
shrinker interface (i.e. calls xfsbufd_wakeup()). Otherwise every
64th buffer being vmap()d will flush out stale mappings.

Realistically, if this delayed release of vmaps is a problem for
Xen, then I think that some generic VM solution is needed to this
problem as vmap() is likely to become more common in future (think
large blocks in filesystems). Nick - any comments?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Fri, Oct 12, 2007 at 09:58:43AM -0700, Jeremy Fitzhardinge wrote:
 Hi Dave  other XFS folk,
 
 I'm tracking down a bug which appears to be a bad interaction between XFS
 and Xen.  It looks like XFS is holding RW mappings on free pages, which Xen
 is trying to get an exclusive RO mapping on so it can turn them into
 pagetables.
 
 I'm assuming the pages are actually free, and this isn't a use after free
 problem.  From a quick poke around, the most likely pieces of XFS code is
 the stuff in xfs_map.c which creates a virtually contiguous mapping of pages
 with vmap, and seems to delay unmapping them.

You mean xfs_buf.c.

And yes, we delay unmapping pages until we have a batch of them
to unmap. vmap and vunmap do not scale, so this is batching helps
alleviate some of the worst of the problems.

 When pinning a pagetable, Xen tries to eliminate any RW aliases of the pages
 its using.  This is generally trivial because pages returned by
 get_free_page don't have any mappings aside from the normal kernel mapping.
 High pages, when using CONFIG_HIGHPTE, may have a residual kmap mapping, so
 we clear out the kmap cache if we encounter a highpage in the pagetable.
 
 I guess we could create a special-case interface to do the same thing with
 XFS mappings, but it would be nicer to have something more generic.

*nod*

 Is my analysis correct?  Or should XFS not be holding stray mappings?  Or is
 there already some kind of generic mechanism I can use to get it to release
 its mappings?

The xfsbufd cleans out any stale mappings - it's woken by the memory
shrinker interface (i.e. calls xfsbufd_wakeup()). Otherwise every
64th buffer being vmap()d will flush out stale mappings.

Realistically, if this delayed release of vmaps is a problem for
Xen, then I think that some generic VM solution is needed to this
problem as vmap() is likely to become more common in future (think
large blocks in filesystems). Nick - any comments?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
David Chinner wrote:
 You mean xfs_buf.c.
   

Yes, sorry.

 And yes, we delay unmapping pages until we have a batch of them
 to unmap. vmap and vunmap do not scale, so this is batching helps
 alleviate some of the worst of the problems.
   

How much performance does it cost?  What kind of workloads would it show
up under?

 Realistically, if this delayed release of vmaps is a problem for
 Xen, then I think that some generic VM solution is needed to this
 problem as vmap() is likely to become more common in future (think
 large blocks in filesystems). Nick - any comments?
   

Well, the only real problem is that the pages are returned to the free
pool and reallocated while still being part of a mapping.  If the pages
are still owned by the filesystem/pagecache, then there's no problem.

What's the lifetime of things being vmapped/unmapped in xfs?  Are they
necessarily being freed when they're unmapped, or could unmapping of
freed memory be more immediate than other memory?

Maybe it just needs a notifier chain or something.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 09:12, Jeremy Fitzhardinge wrote:
 David Chinner wrote:
  You mean xfs_buf.c.

 Yes, sorry.

  And yes, we delay unmapping pages until we have a batch of them
  to unmap. vmap and vunmap do not scale, so this is batching helps
  alleviate some of the worst of the problems.

 How much performance does it cost?  What kind of workloads would it show
 up under?

  Realistically, if this delayed release of vmaps is a problem for
  Xen, then I think that some generic VM solution is needed to this
  problem as vmap() is likely to become more common in future (think
  large blocks in filesystems). Nick - any comments?

 Well, the only real problem is that the pages are returned to the free
 pool and reallocated while still being part of a mapping.  If the pages
 are still owned by the filesystem/pagecache, then there's no problem.

 What's the lifetime of things being vmapped/unmapped in xfs?  Are they
 necessarily being freed when they're unmapped, or could unmapping of
 freed memory be more immediate than other memory?

Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
because it generally has to invalidate TLBs on all CPUs.

I'm looking at some more general solutions to this (already have some
batching / lazy unmapping that replaces the XFS specific one), however
they are still likely going to leave vmap mappings around after freeing
the page.

We _could_ hold on to the pages as well, but that's pretty inefficient.
The memory cost of keeping the mappings around tends to be well under
1% the cost of the page itself. OTOH we could also avoid lazy flushes
on architectures where it is not costly. Either way, it probably would
require an arch hook or even a couple of ifdefs in mm/vmalloc.c for
Xen. Although... it would be nice if Xen could take advantage of some
of these optimisations as well.

What's the actual problem for Xen? Anything that can be changed?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Sun, Oct 14, 2007 at 04:12:20PM -0700, Jeremy Fitzhardinge wrote:
 David Chinner wrote:
  You mean xfs_buf.c.

 
 Yes, sorry.
 
  And yes, we delay unmapping pages until we have a batch of them
  to unmap. vmap and vunmap do not scale, so this is batching helps
  alleviate some of the worst of the problems.

 
 How much performance does it cost?

Every vunmap() cal causes a global TLB sync, and the region lists
are globl with a spin lock protecting them. I thin kNick has shown
a 64p altix with ~60 cpus spinning on the vmap locks under a
simple workload

 What kind of workloads would it show
 up under?

A directory traversal when using large directory block sizes
with large directories


  Realistically, if this delayed release of vmaps is a problem for
  Xen, then I think that some generic VM solution is needed to this
  problem as vmap() is likely to become more common in future (think
  large blocks in filesystems). Nick - any comments?

 
 Well, the only real problem is that the pages are returned to the free
 pool and reallocated while still being part of a mapping.  If the pages
 are still owned by the filesystem/pagecache, then there's no problem.

The pages are still attached to the blockdev address space mapping,
but there's nothing stopping them from being reclaimed before they are
unmapped.

 What's the lifetime of things being vmapped/unmapped in xfs?  Are they
 necessarily being freed when they're unmapped, or could unmapping of
 freed memory be more immediate than other memory?

It's all freed memory. At the time we pull the buffer down, there are
no further references to the buffer. the pages are released and the mapping
is never used again until it is torn down. it is torn down either on the
next xfsbufd run (either memory pressure or every 15s) or every 64th
new vmap() call to map new buffers.

 Maybe it just needs a notifier chain or something.

We've already got a memroy shrinker hook that triggers this reclaim.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
 Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
 because it generally has to invalidate TLBs on all CPUs.
   

I see.

 I'm looking at some more general solutions to this (already have some
 batching / lazy unmapping that replaces the XFS specific one), however
 they are still likely going to leave vmap mappings around after freeing
 the page.
   

Hm.  Could there be a call to shoot down any lazy mappings of a page, so
the Xen pagetable code could use it on any pagetable page?  Ideally one
that could be used on any page, but only causes expensive operations
where needed.

 We _could_ hold on to the pages as well, but that's pretty inefficient.
 The memory cost of keeping the mappings around tends to be well under
 1% the cost of the page itself. OTOH we could also avoid lazy flushes
 on architectures where it is not costly. Either way, it probably would
 require an arch hook or even a couple of ifdefs in mm/vmalloc.c for
 Xen. Although... it would be nice if Xen could take advantage of some
 of these optimisations as well.
   

In general the lazy unmappings won't worry Xen.  It's only for the
specific case of allocating memory for pagetables.  Xen can do a bit of
extra optimisation for cross-cpu tlb flushes (if the target vcpus are
not currently running, then you don't need to do anything), but they're
still an expensive operation, so the optimisation is definitely useful.

 What's the actual problem for Xen? Anything that can be changed?
   

Not easily.  Xen doesn't use shadow pagetables.  Instead, it gives the
guest domains direct access to the real CPU's pagetable, but makes sure
they're always mapped RO so that the hypervisor can control updates to
the pagetables (either by trapping writes or via explicit hypercalls). 
This means that when constructing a new pagetable, Xen will verify that
all the mappings of pages making up the new pagetable are RO before
allowing it to be used.  If there are stray RW mappings of those pages,
pagetable construction will fail.

Aside from XFS, the only other case I've found where there could be
stray RW mappings is when using high pages which are still in the kmap
cache; I added an explicit call to flush the kmap cache to handle this. 
If vmap and kmap can be unified (at least the lazy unmap aspects of
them), then that would be a nice little cleanup.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 10:57, Jeremy Fitzhardinge wrote:
 Nick Piggin wrote:
  Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
  because it generally has to invalidate TLBs on all CPUs.

 I see.

  I'm looking at some more general solutions to this (already have some
  batching / lazy unmapping that replaces the XFS specific one), however
  they are still likely going to leave vmap mappings around after freeing
  the page.

 Hm.  Could there be a call to shoot down any lazy mappings of a page, so
 the Xen pagetable code could use it on any pagetable page?  Ideally one
 that could be used on any page, but only causes expensive operations
 where needed.

Yeah, it would be possible. The easiest way would just be to shoot down
all lazy vmaps (because you're doing the global IPIs anyway, which are
the expensive thing, at which point you may as well purge the rest of
your lazy mappings).

If it is sufficiently rare, then it could be the simplest thing to do.


  We _could_ hold on to the pages as well, but that's pretty inefficient.
  The memory cost of keeping the mappings around tends to be well under
  1% the cost of the page itself. OTOH we could also avoid lazy flushes
  on architectures where it is not costly. Either way, it probably would
  require an arch hook or even a couple of ifdefs in mm/vmalloc.c for
  Xen. Although... it would be nice if Xen could take advantage of some
  of these optimisations as well.

 In general the lazy unmappings won't worry Xen.  It's only for the
 specific case of allocating memory for pagetables.  Xen can do a bit of
 extra optimisation for cross-cpu tlb flushes (if the target vcpus are
 not currently running, then you don't need to do anything), but they're
 still an expensive operation, so the optimisation is definitely useful.

OK.


  What's the actual problem for Xen? Anything that can be changed?

 Not easily.  Xen doesn't use shadow pagetables.  Instead, it gives the
 guest domains direct access to the real CPU's pagetable, but makes sure
 they're always mapped RO so that the hypervisor can control updates to
 the pagetables (either by trapping writes or via explicit hypercalls).
 This means that when constructing a new pagetable, Xen will verify that
 all the mappings of pages making up the new pagetable are RO before
 allowing it to be used.  If there are stray RW mappings of those pages,
 pagetable construction will fail.

OK, I see. Because even though it is technically safe where we are
using it (because nothing writes through the mappings after the page
is freed), a corrupted guest could use the same window to do bad
things with the pagetables?


 Aside from XFS, the only other case I've found where there could be
 stray RW mappings is when using high pages which are still in the kmap
 cache; I added an explicit call to flush the kmap cache to handle this.
 If vmap and kmap can be unified (at least the lazy unmap aspects of
 them), then that would be a nice little cleanup.

vmap is slightly harder than kmap in some respects. However it would
be really nice to get vmap fast and general enough to completely
replace all the kmap crud -- that's one goal, but the first thing
I'm doing is to concentrate on just vmap to work out how to make it
as fast as possible.

For Xen -- shouldn't be a big deal. We can have a single Linux mm API
to call, and we can do the right thing WRT vmap/kamp. I should try to
merge my current lazy vmap patches which replace the XFS stuff, so we
can implement such an API and fix your XFS issue? That's not going to
happen for at least a cycle or two though, so in the meantime maybe
an ifdef for that XFS vmap batching code would help?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
 Yeah, it would be possible. The easiest way would just be to shoot down
 all lazy vmaps (because you're doing the global IPIs anyway, which are
 the expensive thing, at which point you may as well purge the rest of
 your lazy mappings).
   

Sure.

 If it is sufficiently rare, then it could be the simplest thing to do.
   

Yes.  If there's some way to tell whether a particular page is in a lazy
mapping then that would help, since we could easily tell whether we need
to do the whole shootdown thing.  I would expect the population of
lazily mapped pages in the whole freepage pool to be pretty small, but
if the allocator tends to return the most recently freed pages you might
hit them fairly regularly (shoving them at the other end of the freelist
might be useful).

 OK, I see. Because even though it is technically safe where we are
 using it (because nothing writes through the mappings after the page
 is freed), a corrupted guest could use the same window to do bad
 things with the pagetables?
   

That's right.  The hypervisor doesn't trust the guests, so it prevents
them from getting into a state where they can do bad things.

 For Xen -- shouldn't be a big deal. We can have a single Linux mm API
 to call, and we can do the right thing WRT vmap/kamp. I should try to
 merge my current lazy vmap patches which replace the XFS stuff, so we
 can implement such an API and fix your XFS issue?

Sounds good.

  That's not going to
 happen for at least a cycle or two though, so in the meantime maybe
 an ifdef for that XFS vmap batching code would help?
   

For now I've proposed a patch to simply eagerly vunmap everything when
CONFIG_XEN is set.  It certainly works, but I don't have a good feel for
how much of a performance hit that imposes on XFS.  A slightly more
subtle change would be to test to see if we're actually running under
Xen before taking the eager path, so at least the performance burden
only affects actual Xen users (and I presume xfs+xen is a fairly rare
combination, or this problem would have turned up earlier, or perhaps
the old xenified kernels have some other workaround for it).

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Sun, Oct 14, 2007 at 08:42:34PM -0700, Jeremy Fitzhardinge wrote:
 Nick Piggin wrote:
   That's not going to
  happen for at least a cycle or two though, so in the meantime maybe
  an ifdef for that XFS vmap batching code would help?

 
 For now I've proposed a patch to simply eagerly vunmap everything when
 CONFIG_XEN is set.  It certainly works, but I don't have a good feel for
 how much of a performance hit that imposes on XFS.

With defaults - little effect as vmap should never be used. It's
only when you start using larger block sizes for metadata that this
becomes an issue. The CONFIG_XEN workaround should be fine until we
get a proper vmap cache

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Jeremy Fitzhardinge
David Chinner wrote:
 With defaults - little effect as vmap should never be used. It's
 only when you start using larger block sizes for metadata that this
 becomes an issue. The CONFIG_XEN workaround should be fine until we
 get a proper vmap cache

Hm, well I saw the problem with a filesystem made with mkfs.xfs with no
options, so there must be at least *some* vmapping going on there.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread David Chinner
On Sun, Oct 14, 2007 at 09:18:17PM -0700, Jeremy Fitzhardinge wrote:
 David Chinner wrote:
  With defaults - little effect as vmap should never be used. It's
  only when you start using larger block sizes for metadata that this
  becomes an issue. The CONFIG_XEN workaround should be fine until we
  get a proper vmap cache
 
 Hm, well I saw the problem with a filesystem made with mkfs.xfs with no
 options, so there must be at least *some* vmapping going on there.

Sorry - I should have been more precise - vmap should never be used in
performance critical paths on default configs.  Log recovery will
trigger vmap/vunmap usage, so this is probably what you are seeing.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-12 Thread Jeremy Fitzhardinge
Jeremy Fitzhardinge wrote:
> I guess we could create a special-case interface to do the same thing
> with XFS mappings, but it would be nicer to have something more generic.
>
> Is my analysis correct?  Or should XFS not be holding stray mappings? 
> Or is there already some kind of generic mechanism I can use to get it
> to release its mappings?

This test patch confirms my theory:

diff -r 36a518c1fb4b fs/xfs/linux-2.6/xfs_buf.c
--- a/fs/xfs/linux-2.6/xfs_buf.cFri Oct 12 10:03:56 2007 -0700
+++ b/fs/xfs/linux-2.6/xfs_buf.cFri Oct 12 10:07:03 2007 -0700
@@ -186,6 +186,11 @@ free_address(
void*addr)
 {
a_list_t*aentry;
+
+#ifdef CONFIG_XEN
+   vunmap(addr);
+   return;
+#endif
 
aentry = kmalloc(sizeof(a_list_t), GFP_NOWAIT);
if (likely(aentry)) {


With this in place, the problem goes away.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-12 Thread Jeremy Fitzhardinge
Jeremy Fitzhardinge wrote:
 I guess we could create a special-case interface to do the same thing
 with XFS mappings, but it would be nicer to have something more generic.

 Is my analysis correct?  Or should XFS not be holding stray mappings? 
 Or is there already some kind of generic mechanism I can use to get it
 to release its mappings?

This test patch confirms my theory:

diff -r 36a518c1fb4b fs/xfs/linux-2.6/xfs_buf.c
--- a/fs/xfs/linux-2.6/xfs_buf.cFri Oct 12 10:03:56 2007 -0700
+++ b/fs/xfs/linux-2.6/xfs_buf.cFri Oct 12 10:07:03 2007 -0700
@@ -186,6 +186,11 @@ free_address(
void*addr)
 {
a_list_t*aentry;
+
+#ifdef CONFIG_XEN
+   vunmap(addr);
+   return;
+#endif
 
aentry = kmalloc(sizeof(a_list_t), GFP_NOWAIT);
if (likely(aentry)) {


With this in place, the problem goes away.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/