[kvm-devel] [PATCH] virtio_ring: make structure defines packed

2008-02-08 Thread Christian Borntraeger
Currently the virtio_ring structure are not declared packed, but they 
describe an hardware like interface. We should not allow compilers to make 
alignments and optimizations that can be different between the guest and 
host compiler.

I propose to declare all structures that are in shared memory as packed.

Does anybody see a problem with packed?

Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
---
 include/linux/virtio_ring.h |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

Index: kvm/include/linux/virtio_ring.h
===
--- kvm.orig/include/linux/virtio_ring.h
+++ kvm/include/linux/virtio_ring.h
@@ -35,14 +35,14 @@ struct vring_desc
__u16 flags;
/* We chain unused descriptors via this, too */
__u16 next;
-};
+} __attribute__ ((packed));
 
 struct vring_avail
 {
__u16 flags;
__u16 idx;
__u16 ring[];
-};
+} __attribute__ ((packed));
 
 /* u32 is used here for ids for padding reasons. */
 struct vring_used_elem
@@ -51,14 +51,15 @@ struct vring_used_elem
__u32 id;
/* Total length of the descriptor chain which was used (written to) */
__u32 len;
-};
+} __attribute__ ((packed));
 
 struct vring_used
 {
__u16 flags;
__u16 idx;
+   __u32 padding;
struct vring_used_elem ring[];
-};
+} __attribute__ ((packed));
 
 struct vring {
unsigned int num;

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Fwd: [PATCH] boot a linux kernel from non-ide device

2008-02-08 Thread Anthony Liguori
This patch seems reasonable to me.

But FWIW, with extboot, it's possible to implement the -kernel option in 
a saner way.  extboot already has code to take over int19 and load a 
kernel from memory on boot.  It was based on the old -kernel support in 
QEMU (prior to hpa's rewrite) so it's not enabled at the moment.  It 
should be pretty easy to update it though.

This approach would allow -kernel to be used without any disk (which 
also solves your problem, but in a different way).  We can also 
eliminate all the boot sector hijacking silliness.

Regards,

Anthony Liguori

Glauber de Oliveira Costa wrote:
 Reposting to kvm-devel, since aliguori notices that I'm relying on
 non-upstream features of qemu

 -- Forwarded message --
 From: Glauber de Oliveira Costa [EMAIL PROTECTED]
 Date: Feb 8, 2008 5:05 AM
 Subject: [PATCH] boot a linux kernel from non-ide device
 To: [EMAIL PROTECTED]


 Since it's now possible to use the -drive option, the test for something
 in the index 0 of the IDE bus is too restrictive.

 A better idea, IMHO, is to check if the user specified any bootable device,
 and only if not, fallback to the default, compatible behaviour of checking
 hda regardless of the presence of a boot=on arg.

 --
 Glauber de Oliveira Costa.
 Free as in Freedom
 http://glommer.net

 The less confident you are, the more serious you have to act.



   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 3/6] mmu_notifier: invalidate_page callbacks

2008-02-08 Thread Christoph Lameter
Two callbacks to remove individual pages as done in rmap code

invalidate_page()

Called from the inner loop of rmap walks to invalidate pages.

age_page()

Called for the determination of the page referenced status.

If we do not care about page referenced status then an age_page callback
may be be omitted. PageLock and pte lock are held when either of the
functions is called.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Robin Holt [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/rmap.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-02-07 16:49:32.0 -0800
+++ linux-2.6/mm/rmap.c 2008-02-07 17:25:25.0 -0800
@@ -49,6 +49,7 @@
 #include linux/module.h
 #include linux/kallsyms.h
 #include linux/memcontrol.h
+#include linux/mmu_notifier.h
 
 #include asm/tlbflush.h
 
@@ -287,7 +288,8 @@ static int page_referenced_one(struct pa
if (vma-vm_flags  VM_LOCKED) {
referenced++;
*mapcount = 1;  /* break early from loop */
-   } else if (ptep_clear_flush_young(vma, address, pte))
+   } else if (ptep_clear_flush_young(vma, address, pte) |
+  mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -455,6 +457,7 @@ static int page_mkclean_one(struct page 
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -712,7 +715,8 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration  ((vma-vm_flags  VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte {
+   (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -720,6 +724,7 @@ static int try_to_unmap_one(struct page 
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -844,12 +849,14 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* If nonlinear, store the file page offset in the pte. */
if (page-index != linear_page_index(vma, address))

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-08 Thread Christoph Lameter
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.

If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate that no sleeping is
possible. Locks are only held for truncate and huge pages.

In two cases we use invalidate_range_begin/end to invalidate
single pages because the pair allows holding off new references
(idea by Robin Holt).

do_wp_page(): We hold off new references while we update the pte.

xip_unmap: We are not taking the PageLock so we cannot
use the invalidate_page mmu_rmap_notifier. invalidate_range_begin/end
stands in.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Robin Holt [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/filemap_xip.c |5 +
 mm/fremap.c  |3 +++
 mm/hugetlb.c |3 +++
 mm/memory.c  |   35 +--
 mm/mmap.c|2 ++
 mm/mprotect.c|3 +++
 mm/mremap.c  |7 ++-
 7 files changed, 51 insertions(+), 7 deletions(-)

Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c  2008-02-08 13:18:58.0 -0800
+++ linux-2.6/mm/fremap.c   2008-02-08 13:25:22.0 -0800
@@ -15,6 +15,7 @@
 #include linux/rmap.h
 #include linux/module.h
 #include linux/syscalls.h
+#include linux/mmu_notifier.h
 
 #include asm/mmu_context.h
 #include asm/cacheflush.h
@@ -214,7 +215,9 @@ asmlinkage long sys_remap_file_pages(uns
spin_unlock(mapping-i_mmap_lock);
}
 
+   mmu_notifier(invalidate_range_begin, mm, start, start + size, 0);
err = populate_range(mm, vma, start, size, pgoff);
+   mmu_notifier(invalidate_range_end, mm, start, start + size, 0);
if (!err  !(flags  MAP_NONBLOCK)) {
if (unlikely(has_write_lock)) {
downgrade_write(mm-mmap_sem);
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2008-02-08 13:22:14.0 -0800
+++ linux-2.6/mm/memory.c   2008-02-08 13:25:22.0 -0800
@@ -51,6 +51,7 @@
 #include linux/init.h
 #include linux/writeback.h
 #include linux/memcontrol.h
+#include linux/mmu_notifier.h
 
 #include asm/pgalloc.h
 #include asm/uaccess.h
@@ -611,6 +612,9 @@ int copy_page_range(struct mm_struct *ds
if (is_vm_hugetlb_page(vma))
return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
+   if (is_cow_mapping(vma-vm_flags))
+   mmu_notifier(invalidate_range_begin, src_mm, addr, end, 0);
+
dst_pgd = pgd_offset(dst_mm, addr);
src_pgd = pgd_offset(src_mm, addr);
do {
@@ -621,6 +625,11 @@ int copy_page_range(struct mm_struct *ds
vma, addr, next))
return -ENOMEM;
} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+
+   if (is_cow_mapping(vma-vm_flags))
+   mmu_notifier(invalidate_range_end, src_mm,
+   vma-vm_start, end, 0);
+
return 0;
 }
 
@@ -893,13 +902,16 @@ unsigned long zap_page_range(struct vm_a
struct mmu_gather *tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
+   int atomic = details ? (details-i_mmap_lock != 0) : 0;
 
lru_add_drain();
tlb = tlb_gather_mmu(mm, 0);
update_hiwater_rss(mm);
+   mmu_notifier(invalidate_range_begin, mm, address, end, atomic);
end = unmap_vmas(tlb, vma, address, end, nr_accounted, details);
if (tlb)
tlb_finish_mmu(tlb, address, end);
+   mmu_notifier(invalidate_range_end, mm, address, end, atomic);
return end;
 }
 
@@ -1337,7 +1349,7 @@ int remap_pfn_range(struct vm_area_struc
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + PAGE_ALIGN(size);
+   unsigned long start = addr, end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma-vm_mm;
int err;
 
@@ -1371,6 +1383,7 @@ int remap_pfn_range(struct vm_area_struc
pfn -= addr  PAGE_SHIFT;
pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
+   mmu_notifier(invalidate_range_begin, mm, start, end, 0);
do {
next = pgd_addr_end(addr, end);
err = remap_pud_range(mm, pgd, addr, next,
@@ -1378,6 +1391,7 @@ int remap_pfn_range(struct vm_area_struc
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range_end, mm, start, end, 0);
return err;
 }
 EXPORT_SYMBOL(remap_pfn_range);
@@ -1461,10 +1475,11 @@ int apply_to_page_range(struct mm_struct
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + size;
+   unsigned long 

Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Robin Holt
On Fri, Feb 08, 2008 at 03:32:19PM -0800, Christoph Lameter wrote:
 On Fri, 8 Feb 2008, Andrew Morton wrote:
 
  What about ib_umem_get()?
 
 Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
 we effectively pin a page (page migration will fail) but we will 
 continually be reclaiming the page and may repeatedly try to move it. We 
 have issues with XPmem causing too many pages to be pinned and thus the 
 OOM getting into weird behavior modes (OOM or stop lru scanning due to 
 all_reclaimable set).
 
 An elevated refcount will also not be noticed by any of the schemes under 
 consideration to improve LRU scanning performance.

Christoph, I am not sure what you are saying here.  With v4 and later,
I thought we were able to use the rmap invalidation to remove the ref
count that XPMEM was holding and therefore be able to swapout.  Did I miss
something?  I agree the existing XPMEM does pin.  I hope we are not saying
the XPMEM based upon these patches will not be able to swap/migrate.

Thanks,
Robin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Robin Holt wrote:

   What about ib_umem_get()?
  
  Ok. It pins using an elevated refcount. Same as XPmem right now. With that 
  we effectively pin a page (page migration will fail) but we will 
  continually be reclaiming the page and may repeatedly try to move it. We 
  have issues with XPmem causing too many pages to be pinned and thus the 
  OOM getting into weird behavior modes (OOM or stop lru scanning due to 
  all_reclaimable set).
  
  An elevated refcount will also not be noticed by any of the schemes under 
  consideration to improve LRU scanning performance.
 
 Christoph, I am not sure what you are saying here.  With v4 and later,
 I thought we were able to use the rmap invalidation to remove the ref
 count that XPMEM was holding and therefore be able to swapout.  Did I miss
 something?  I agree the existing XPMEM does pin.  I hope we are not saying
 the XPMEM based upon these patches will not be able to swap/migrate.

Correct.

You missed the turn of the conversation to how ib_umem_get() works. 
Currently it seems to pin the same way that the SLES10 XPmem works.




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 08 Feb 2008 14:06:16 -0800
Christoph Lameter [EMAIL PROTECTED] wrote:

 This is a patchset implementing MMU notifier callbacks based on Andrea's
 earlier work. These are needed if Linux pages are referenced from something
 else than tracked by the rmaps of the kernel (an external MMU). MMU
 notifiers allow us to get rid of the page pinning for RDMA and various
 other purposes. It gets rid of the broken use of mlock for page pinning.
 (mlock really does *not* pin pages)
 
 More information on the rationale and the technical details can be found in
 the first patch and the README provided by that patch in
 Documentation/mmu_notifiers.
 
 The known immediate users are
 
 KVM
 - Establishes a refcount to the page via get_user_pages().
 - External references are called spte.
 - Has page tables to track pages whose refcount was elevated but
   no reverse maps.
 
 GRU
 - Simple additional hardware TLB (possibly covering multiple instances of
   Linux)
 - Needs TLB shootdown when the VM unmaps pages.
 - Determines page address via follow_page (from interrupt context) but can
   fall back to get_user_pages().
 - No page reference possible since no page status is kept..
 
 XPmem
 - Allows use of a processes memory by remote instances of Linux.
 - Provides its own reverse mappings to track remote pte.
 - Established refcounts on the exported pages.
 - Must sleep in order to wait for remote acks of ptes that are being
   cleared.
 

What about ib_umem_get()?

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM binary incompatiablity

2008-02-08 Thread Anthony Liguori
Stephen Hemminger wrote:
 I notice that recent KVM is incompatiable with older versions.

 Using a KVM image created on 2.6.24 will crash on 2.6.25 (or
 vice versa). It appears that Ubuntu Hardy has incorporated the 2.6.25
 update even though it claims to be 2.6.24.
   

This isn't intentional.  What is the guest and how does it crash?

I've been using the same image for most of KVM's development life cycle 
without having issues.

Regards,

Anthony Liguori

 This is reproducible on Intel (64bit) kernel.  Was this intentional?
 is it documented? 

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel
   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 8 Feb 2008 17:43:02 -0600 Robin Holt [EMAIL PROTECTED] wrote:

 On Fri, Feb 08, 2008 at 03:41:24PM -0800, Christoph Lameter wrote:
  On Fri, 8 Feb 2008, Robin Holt wrote:
  
 What about ib_umem_get()?
  
  Correct.
  
  You missed the turn of the conversation to how ib_umem_get() works. 
  Currently it seems to pin the same way that the SLES10 XPmem works.
 
 Ah.  I took Andrew's question as more of a probe about whether we had
 worked with the IB folks to ensure this fits the ib_umem_get needs
 as well.
 

You took it correctly, and I didn't understand the answer ;)

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem)

2008-02-08 Thread Christoph Lameter
These special additional callbacks are required because XPmem (and likely
other mechanisms) do use their own rmap (multiple processes on a series
of remote Linux instances may be accessing the memory of a process).
F.e. XPmem may have to send out notifications to remote Linux instances
and receive confirmation before a page can be freed.

So we handle this like an additional Linux reverse map that is walked after
the existing rmaps have been walked. We leave the walking to the driver that
is then able to use something else than a spinlock to walk its reverse
maps. So we can actually call the driver without holding spinlocks while
we hold the Pagelock.

However, we cannot determine the mm_struct that a page belongs to at
that point. The mm_struct can only be determined from the rmaps by the
device driver.

We add another pageflag (PageExternalRmap) that is set if a page has
been remotely mapped (f.e. by a process from another Linux instance).
We can then only perform the callbacks for pages that are actually in
remote use.

Rmap notifiers need an extra page bit and are only available
on 64 bit platforms. This functionality is not available on 32 bit!

A notifier that uses the reverse maps callbacks does not need to provide
the invalidate_page() method that is called when locks are held.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 include/linux/mmu_notifier.h |   65 +++
 include/linux/page-flags.h   |   11 +++
 mm/mmu_notifier.c|   34 ++
 mm/rmap.c|9 +
 4 files changed, 119 insertions(+)

Index: linux-2.6/include/linux/page-flags.h
===
--- linux-2.6.orig/include/linux/page-flags.h   2008-02-08 12:35:14.0 
-0800
+++ linux-2.6/include/linux/page-flags.h2008-02-08 12:44:33.0 
-0800
@@ -105,6 +105,7 @@
  * 64 bit  |   FIELDS | ?? FLAGS |
  * 6332  0
  */
+#define PG_external_rmap   30  /* Page has external rmap */
 #define PG_uncached31  /* Page has been mapped as uncached */
 #endif
 
@@ -296,6 +297,16 @@ static inline void __ClearPageTail(struc
 #define SetPageUncached(page)  set_bit(PG_uncached, (page)-flags)
 #define ClearPageUncached(page)clear_bit(PG_uncached, (page)-flags)
 
+#if defined(CONFIG_MMU_NOTIFIER)  defined(CONFIG_64BIT)
+#define PageExternalRmap(page) test_bit(PG_external_rmap, (page)-flags)
+#define SetPageExternalRmap(page) set_bit(PG_external_rmap, (page)-flags)
+#define ClearPageExternalRmap(page) clear_bit(PG_external_rmap, \
+   (page)-flags)
+#else
+#define ClearPageExternalRmap(page) do {} while (0)
+#define PageExternalRmap(page) 0
+#endif
+
 struct page;   /* forward declaration */
 
 extern void cancel_dirty_page(struct page *page, unsigned int account_size);
Index: linux-2.6/include/linux/mmu_notifier.h
===
--- linux-2.6.orig/include/linux/mmu_notifier.h 2008-02-08 12:35:14.0 
-0800
+++ linux-2.6/include/linux/mmu_notifier.h  2008-02-08 12:44:33.0 
-0800
@@ -23,6 +23,18 @@
  * where sleeping is allowed or in atomic contexts. A flag is passed
  * to indicate an atomic context.
  *
+ *
+ * 2. mmu_rmap_notifier
+ *
+ * Callbacks for subsystems that provide their own rmaps. These
+ * need to walk their own rmaps for a page. The invalidate_page
+ * callback is outside of locks so that we are not in a strictly
+ * atomic context (but we may be in a PF_MEMALLOC context if the
+ * notifier is called from reclaim code) and are able to sleep.
+ *
+ * Rmap notifiers need an extra page bit and are only available
+ * on 64 bit platforms.
+ *
  * Pages must be marked dirty if dirty bits are found to be set in
  * the external ptes.
  */
@@ -89,6 +101,23 @@ struct mmu_notifier_ops {
 int atomic);
 };
 
+struct mmu_rmap_notifier_ops;
+
+struct mmu_rmap_notifier {
+   struct hlist_node hlist;
+   const struct mmu_rmap_notifier_ops *ops;
+};
+
+struct mmu_rmap_notifier_ops {
+   /*
+* Called with the page lock held after ptes are modified or removed
+* so that a subsystem with its own rmap's can remove remote ptes
+* mapping a page.
+*/
+   void (*invalidate_page)(struct mmu_rmap_notifier *mrn,
+   struct page *page);
+};
+
 #ifdef CONFIG_MMU_NOTIFIER
 
 /*
@@ -139,6 +168,27 @@ static inline void mmu_notifier_head_ini
}   \
} while (0)
 
+extern void mmu_rmap_notifier_register(struct mmu_rmap_notifier *mrn);
+extern void mmu_rmap_notifier_unregister(struct mmu_rmap_notifier *mrn);
+
+/* Must 

Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Robin Holt
On Fri, Feb 08, 2008 at 03:41:24PM -0800, Christoph Lameter wrote:
 On Fri, 8 Feb 2008, Robin Holt wrote:
 
What about ib_umem_get()?
 
 Correct.
 
 You missed the turn of the conversation to how ib_umem_get() works. 
 Currently it seems to pin the same way that the SLES10 XPmem works.

Ah.  I took Andrew's question as more of a probe about whether we had
worked with the IB folks to ensure this fits the ib_umem_get needs
as well.

Thanks,
Robin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Roland Dreier wrote:

 In general, this MMU notifier stuff will only be useful to a subset of
 InfiniBand/RDMA hardware.  Some adapters are smart enough to handle
 changing the IO virtual - bus/physical mapping on the fly, but some
 aren't.  For the dumb adapters, I think the current ib_umem_get() is
 pretty close to as good as we can get: we have to keep the physical
 pages pinned for as long as the adapter is allowed to DMA into the
 memory region.

I thought the adaptor can always remove the mapping by renegotiating 
with the remote side? Even if its dumb then a callback could notify the 
driver that it may be required to tear down the mapping. We then hold the 
pages until we get okay by the driver that the mapping has been removed.

We could also let the unmapping fail if the driver indicates that the 
mapping must stay.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

 Quite possibly none of the infiniband developers even know about it..

Well Andrea's initial approach was even featured on LWN a couple of 
weeks back.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] trying to get of all lists

2008-02-08 Thread R S
unsubscribe

 Date: Fri, 8 Feb 2008 16:16:34 -0800 From: [EMAIL PROTECTED] To: [EMAIL 
 PROTECTED] CC: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; kvm-devel@lists.sourceforge.net; [EMAIL PROTECTED]; [EMAIL 
 PROTECTED]; [EMAIL PROTECTED] Subject: Re: [ofa-general] Re: [patch 0/6] MMU 
 Notifiers V6  On Fri, 8 Feb 2008, Roland Dreier wrote:   In general, 
 this MMU notifier stuff will only be useful to a subset of  InfiniBand/RDMA 
 hardware. Some adapters are smart enough to handle  changing the IO virtual 
 - bus/physical mapping on the fly, but some  aren't. For the dumb 
 adapters, I think the current ib_umem_get() is  pretty close to as good as 
 we can get: we have to keep the physical  pages pinned for as long as the 
 adapter is allowed to DMA into the  memory region.  I thought the adaptor 
 can always remove the mapping by renegotiating  with the remote side? Even 
 if its dumb then a callback could notify the  driver that it may be required 
 to tear down the mapping. We then hold the  pages until we get okay by the 
 driver that the mapping has been removed.  We could also let the unmapping 
 fail if the driver indicates that the  mapping must stay. -- To 
 unsubscribe from this list: send the line unsubscribe linux-kernel in the 
 body of a message to [EMAIL PROTECTED] More majordomo info at 
 http://vger.kernel.org/majordomo-info.html Please read the FAQ at 
 http://www.tux.org/lkml/
_
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Roland Dreier
  I thought the adaptor can always remove the mapping by renegotiating 
  with the remote side? Even if its dumb then a callback could notify the 
  driver that it may be required to tear down the mapping. We then hold the 
  pages until we get okay by the driver that the mapping has been removed.

Of course we can always destroy the memory region but that would break
the semantics that applications expect.  Basically an application can
register some chunk of its memory and get a key that it can pass to a
remote peer to let the remote peer operate on its memory via RDMA.
And that memory region/key is expected to stay valid until there is an
application-level operation to destroy it (or until the app crashes or
gets killed, etc).

  We could also let the unmapping fail if the driver indicates that the 
  mapping must stay.

That would of course work -- dumb adapters would just always fail,
which might be inefficient.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrew Morton
On Fri, 8 Feb 2008 16:05:00 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] 
wrote:

 On Fri, 8 Feb 2008, Andrew Morton wrote:
 
  You took it correctly, and I didn't understand the answer ;)
 
 We have done several rounds of discussion on linux-kernel about this so 
 far and the IB folks have not shown up to join in. I have tried to make 
 this as general as possible.

infiniband would appear to be the major present in-kernel client of this new
interface.  So as a part of proving its usefulness, correctness, etc we
should surely work on converting infiniband to use it, and prove its
goodness.

Quite possibly none of the infiniband developers even know about it..

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM binary incompatiablity

2008-02-08 Thread Stephen Hemminger
On Fri, 08 Feb 2008 16:22:12 -0600
Anthony Liguori [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
  I notice that recent KVM is incompatiable with older versions.
 
  Using a KVM image created on 2.6.24 will crash on 2.6.25 (or
  vice versa). It appears that Ubuntu Hardy has incorporated the 2.6.25
  update even though it claims to be 2.6.24.

 
 This isn't intentional.  What is the guest and how does it crash?
 
 I've been using the same image for most of KVM's development life cycle 
 without having issues.
 
 Regards,
 
 Anthony Liguori

I'll see if I can get a backtrace, it isn't reliably reproducible.

-- 
Stephen Hemminger [EMAIL PROTECTED]

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Andrew Morton wrote:

 You took it correctly, and I didn't understand the answer ;)

We have done several rounds of discussion on linux-kernel about this so 
far and the IB folks have not shown up to join in. I have tried to make 
this as general as possible.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Roland Dreier
  We have done several rounds of discussion on linux-kernel about this so 
  far and the IB folks have not shown up to join in. I have tried to make 
  this as general as possible.

Sorry, this has been on my things to look at list for a while, but I
haven't gotten a chance to really understand where things are yet.

In general, this MMU notifier stuff will only be useful to a subset of
InfiniBand/RDMA hardware.  Some adapters are smart enough to handle
changing the IO virtual - bus/physical mapping on the fly, but some
aren't.  For the dumb adapters, I think the current ib_umem_get() is
pretty close to as good as we can get: we have to keep the physical
pages pinned for as long as the adapter is allowed to DMA into the
memory region.

For the smart adapters, we just need a chance to change the adapter's
page table when the kernel/CPU's mapping changes, and naively, this
stuff looks like it would work.

Andrew, does that help?

- R.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Fri, 8 Feb 2008, Roland Dreier wrote:

 That would of course work -- dumb adapters would just always fail,
 which might be inefficient.

H.. that means we need something that actually pins pages for good so 
that the VM can avoid reclaiming it and so that page migration can avoid 
trying to migrate them. Something like yet another page flag.

Ccing Rik.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 04:36:16PM -0800, Christoph Lameter wrote:
 On Fri, 8 Feb 2008, Roland Dreier wrote:
 
  That would of course work -- dumb adapters would just always fail,
  which might be inefficient.
 
 H.. that means we need something that actually pins pages for good so 
 that the VM can avoid reclaiming it and so that page migration can avoid 
 trying to migrate them. Something like yet another page flag.

What's wrong with pinning with the page count like now? Dumb adapters
would simply not register themself in the mmu notifier list no?

 
 Ccing Rik.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Andrea Arcangeli
On Fri, Feb 08, 2008 at 05:27:03PM -0800, Christoph Lameter wrote:
 Pages will still be on the LRU and cycle through rmap again and again. 
 If page migration is used on those pages then the code may make repeated 
 attempt to migrate the page thinking that the page count must at some 
 point drop.

 I do not think that the page count was intended to be used to pin pages 
 permanently. If we had a marker on such pages then we could take them off 
 the LRU and not try to migrate them.

The VM shouldn't break if try_to_unmap doesn't actually make the page
freeable for whatever reason. Permanent pins shouldn't happen anyway,
so defining an ad-hoc API for that doesn't sound too appealing. Not
sure if old hardware deserves those special lru-size-reduction
optimizations but it's not my call (certainly swapoff/mlock would get
higher priority in that lru-size-reduction area).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ofa-general] Re: [patch 0/6] MMU Notifiers V6

2008-02-08 Thread Christoph Lameter
On Sat, 9 Feb 2008, Andrea Arcangeli wrote:

 The VM shouldn't break if try_to_unmap doesn't actually make the page
 freeable for whatever reason. Permanent pins shouldn't happen anyway,

VM is livelocking if too many page are pinned that way right now. The 
higher the processors per node the higher the risk of livelock because 
more processors are in the process of cycling through pages that have an 
elevated refcount.

 so defining an ad-hoc API for that doesn't sound too appealing. Not
 sure if old hardware deserves those special lru-size-reduction
 optimizations but it's not my call (certainly swapoff/mlock would get
 higher priority in that lru-size-reduction area).

Rik has a patchset under development that addresses issues like this. The 
elevated refcount pin problem is not really relevant to the patchset we 
are discussing here.
 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel