date:20080408

Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Guillaume Thouvenin

On Mon, 07 Apr 2008 11:05:06 -0500
Anthony Liguori [EMAIL PROTECTED] wrote:

 Perhaps a viable way to fix this upstream would be to catch the vmentry 
 failure, look to see if SS.CPL != CS.CPL, and if so, invoke 
 x86_emulate() in a loop until SS.CPL == CS.CPL.

I tried this solution some time ago but unfortunately x86_emulate()
failed. I suspected a problem with guest EIP that could different
between the vmentry catch and the emulation. I will rebase my patch and
post them on the mailing list.

Regards,
Guillaume

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] Постельное белье для всех цени телей стиля и комфорта

2008-04-08 Thread Никодим

Великолепное качество тканей, богатое разнообразие цветовых решений украсит Ваш 
дом и сделает Вашу жизнь приятной и красивой.
Известные мировые торговые марки постельного белья - TAC, Tivolyo, Anabella, Le 
Vele, Caleffi  представленные в нашем магазине, способны удовлетворить вкус 
самого взыскательного покупателя.

http://www.newcoolmagazin.ru

Доставка по всей России!


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] Статус иностранного гражданин а в РФ (семинар)

2008-04-08 Thread Евграф

Иностpанные pаботники-особенности тpудовыx отношений и налогообложения.
Новшества в законодательстве

10 апpеля 2008, г. Москва

Пpогpамма:

1. Статус иностpанного гpажданина в pФ. pезиденты и неpезиденты. pегистpация по
месту жительства.

2. Участие иностpанцев в тpудовыx отношенияx в pФ. Кто может быть
pаботодателем.
Новый поpядок офоpмления pазpешений на pаботу для иностpанцев, пpибывшиx в pФ.
из безвизовой стpаны.
Льготы в pазpешительном поpядке пpиема на pаботу.

3. ответственность за наpушение мигpационного законодательства.

4. Договоpные отношения с иностpанными гpажданами.

5. Документы, пpедъявляемые иностpанцами пpи пpиеме на pаботу - пеpечень
необxодимыx документов и тpебования к ним.

6. Социальное стpаxование иностpанцев.

7. обязательное медицинское стpаxование гастаpбайтеpов. Добpовольное
медицинское стpаxование.

8. Пенсионное обеспечение иностpанцев в pоссии.

9. особенности уплаты НДФЛ и ЕСН за иностpанного pаботника (в том числе, новое
с 2007 г.). особенности уплаты ЕСН и НДФЛ за белоpусов.
освобождение от НДФЛ доxода неpезидента, полученного за pубежом. Избежание
двойного налогообложения.

10. Пpиобpетение услуг по пpедоставлению pабочей силы.

Пpодолжительность обучения: с 10 до 17 часов (с пеpеpывом на обед и кофе-паузу).
Место обучения: г. Москва, 5 мин. пешком от м. Академическая.
Стоимость обучения: 4900 pуб. (с НДС).
(В стоимость вxодит: pаздаточный матеpиал, кофе-пауза, обед в pестоpане).

Пpи отсутствии возможности посетить семинаp, мы пpедлагаем пpиобpести его
видеовеpсию на DVD/CD дискаx или видеокассетаx (пpилагается автоpский
pаздаточный матеpиал).
Цена видеокуpса - 3500 pублей, с учетом НДС.

Для pегистpации на семинаp необxодимо отпpавить нам по факсу или электpонной
почте: pеквизиты оpганизации, тему и дату семинаpа, полное ФИо участников,
контактный телефон и факс.
Для заказа видеокуpса необxодимо отпpавить нам по факсу или электpонной почте:
pеквизиты оpганизации, тему видеокуpса, указать носитель (ДВД или СД диски),
телефон, факс, контактное лицо и точный адpес доставки.

Получить дополнительную инфоpмацию и заpегистpиpоваться можно:
по т/ф: (495) 543-88-46
по электpонной почте: [EMAIL PROTECTED]

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Anthony Liguori

Guillaume Thouvenin wrote:
 On Mon, 07 Apr 2008 11:05:06 -0500
 Anthony Liguori [EMAIL PROTECTED] wrote:

   
 Perhaps a viable way to fix this upstream would be to catch the vmentry 
 failure, look to see if SS.CPL != CS.CPL, and if so, invoke 
 x86_emulate() in a loop until SS.CPL == CS.CPL.
 

 I tried this solution some time ago but unfortunately x86_emulate()
 failed. I suspected a problem with guest EIP that could different
 between the vmentry catch and the emulation. I will rebase my patch and
 post them on the mailing list.
   

x86 emulate is missing support for jmp far which is used to switch into 
protected mode.  It just needs to be added.

Regards,

Anthony Liguori

 Regards,
 Guillaume
   


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Compilation problems with git tree

2008-04-08 Thread Marcelo Tosatti

On Tue, Apr 08, 2008 at 01:03:58AM +0200, Zdenek Kabelac wrote:
 Hi
 
 I've tried to compile git tree for kvm-userspace.git
 I've used these configure options:
 
 --disable-gcc-check --with-patched-kernel
 
 using x86-64 platform
 
 I've got this error:
 
 ar rcs libqemu.a exec.o kqemu.o cpu-exec.o host-utils.o
 translate-all.o translate.o op.o tcg/tcg.o tcg/tcg-dyngen.o
 tcg/tcg-runtime.o qemu-kvm.o fpu/softfloat-native.o helper.o helper2.o
 qemu-kvm-x86.o kvm-tpr-opt.o qemu-kvm-helper.o disas.o i386-dis.o
 gcc -L /home/kabi/export/kvm-userspace/qemu/../libkvm  -g  -m64 -o
 qemu-system-x86_64 vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o
 migration.o block-raw-posix.o lsi53c895a.o esp.o usb-ohci.o
 eeprom93xx.o eepro100.o ne2000.o pcnet.o rtl8139.o e1000.o hypercall.o
 virtio.o virtio-net.o virtio-blk.o device-hotplug.o ide.o pckbd.o
 ps2.o vga.o sb16.o es1370.o dma.o fdc.o mc146818rtc.o serial.o i8259.o
 i8254.o pcspk.o pc.o cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o gdbstub.o
 ../libqemu_common.a libqemu.a  -lm -lz -lkvm -lgnutls   -lrt -lpthread
 -lutil -lSDL -lpthread  -lcurses
 pc.o: In function `pc_init1':
 /home/kabi/export/kvm-userspace/qemu/hw/pc.c:987: undefined reference
 to `kvm_pit_init'
 collect2: ld returned 1 exit status
 
 
 Obviously kvm_pit_init seems to be compiled in only for i386 - I've
 disables this code by #if 0

Update your host kernel. It seems backward compatibility is broken.

 
 But then during code run I've got this coredump:
 'ti' seems to be containing some garbage - am I using the latest code ??
 (as this is the last commit I could see:
 
 commit 5208ce19dca268f84a2b9441c2fbb6129161e44c
 Author: Marcelo Tosatti [EMAIL PROTECTED]
 Date:   Thu Apr 3 20:24:37 2008 -0300)
 
 
 Core was generated by `qemu-kvm -s -m 320 -smp 2 -net nic,model=pcnet
 -net user -redir'.
 Program terminated with signal 11, Segmentation fault.
 
 #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
 m=0x0) at slirp/tcp_input.c:208
 208   if (ti == (struct tcpiphdr *)tp || ti-ti_seq != tp-rcv_nxt)
 Missing separate debuginfos, use: debuginfo-install SDL.x86_64
 glibc.x86_64 gnutls.x86_64 libX11.x86_64 libXau.x86_64
 libXcursor.x86_64 libXdmcp.x86_64 libXext.x86_64 libXfixes.x86_64
 libXrandr.x86_64 libXrender.x86_64 libgcrypt.x86_64
 libgpg-error.x86_64 libtasn1.x86_64 libxcb.x86_64 ncurses.x86_64
 zlib.x86_64
 (gdb) bt
 #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
 m=0x0) at slirp/tcp_input.c:208
 #1  0x00485c3b in tcp_input (m=0x2ba7260, iphlen=value
 optimized out, inso=value optimized out)
 at slirp/tcp_input.c:1052
 #2  0x00406aa1 in qemu_send_packet (vc1=0x2b9b0b0,
 buf=0x2c9dd58 RT, size=54)
 at /home/kabi/export/kvm-userspace/qemu/vl.c:3758
 #3  0x00426211 in pcnet_transmit (s=0x2c9d990) at
 /home/kabi/export/kvm-userspace/qemu/hw/pcnet.c:1272
 #4  0x00426898 in pcnet_poll_timer (opaque=value optimized
 out) at /home/kabi/export/kvm-userspace/qemu/hw/pcnet.c:1335
 #5  0x00426f30 in pcnet_ioport_writew (opaque=0x7fabec000d60,
 addr=0, val=0)
 at /home/kabi/export/kvm-userspace/qemu/hw/pcnet.c:1617
 #6  0x005050f1 in kvm_outw (opaque=value optimized out,
 addr=0, data=0)
 at /home/kabi/export/kvm-userspace/qemu/qemu-kvm.c:515
 #7  0x005252b4 in handle_io (kvm=0x2ac4000,
 run=0x7fac0bc73000, vcpu=1) at libkvm.c:721
 #8  0x00525972 in kvm_run (kvm=0x2ac4000, vcpu=1) at libkvm.c:889
 #9  0x00505636 in kvm_cpu_exec (env=value optimized out) at
 /home/kabi/export/kvm-userspace/qemu/qemu-kvm.c:146
 #10 0x005058e0 in ap_main_loop (_env=value optimized out) at
 /home/kabi/export/kvm-userspace/qemu/qemu-kvm.c:330
 #11 0x00371600740a in start_thread () from /lib64/libpthread.so.0
 #12 0x0037154e678d in clone () from /lib64/libc.so.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Guillaume Thouvenin

On Tue, 08 Apr 2008 07:14:13 -0500
Anthony Liguori [EMAIL PROTECTED] wrote:

 Guillaume Thouvenin wrote:
  On Mon, 07 Apr 2008 11:05:06 -0500
  Anthony Liguori [EMAIL PROTECTED] wrote:
 

  Perhaps a viable way to fix this upstream would be to catch the vmentry 
  failure, look to see if SS.CPL != CS.CPL, and if so, invoke 
  x86_emulate() in a loop until SS.CPL == CS.CPL.
  
 
  I tried this solution some time ago but unfortunately x86_emulate()
  failed. I suspected a problem with guest EIP that could different
  between the vmentry catch and the emulation. I will rebase my patch and
  post them on the mailing list.

 
 x86 emulate is missing support for jmp far which is used to switch into 
 protected mode.  It just needs to be added.

Ok I see. I understand now why you said in a previous email that KVM
needs to have a proper load_seg() function like the Xen's x86_emulate.
This function is used to load the segment in a far jmp. I will look how
it is done in Xen and I will try to copy the stuff like you did.

Regards,
Guillaume

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Compilation problems with git tree

2008-04-08 Thread Zdenek Kabelac

2008/4/8, Marcelo Tosatti [EMAIL PROTECTED]:
 On Tue, Apr 08, 2008 at 01:03:58AM +0200, Zdenek Kabelac wrote:
   Hi
  
   I've tried to compile git tree for kvm-userspace.git
   I've used these configure options:
  
   --disable-gcc-check --with-patched-kernel
  
   using x86-64 platform
  
   I've got this error:
  
   pc.o: In function `pc_init1':
   /home/kabi/export/kvm-userspace/qemu/hw/pc.c:987: undefined reference
   to `kvm_pit_init'
   collect2: ld returned 1 exit status
  
  
   Obviously kvm_pit_init seems to be compiled in only for i386 - I've
   disables this code by #if 0



 Update your host kernel. It seems backward compatibility is broken.



   Core was generated by `qemu-kvm -s -m 320 -smp 2 -net nic,model=pcnet
   -net user -redir'.
   Program terminated with signal 11, Segmentation fault.
  
   #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
   m=0x0) at slirp/tcp_input.c:208


Hmm - to get fixed first compilation problem - or the second coredump crash ?

Because I need to use some combination of other kernel trees for now
I'll stay with linux git tree 2.6.25-rc8 - hopefully patches from  kvm
git tree will get there soon.

I think I'll survive the occasional crash (2x/day) caused by this
backward incompatibility.

As compared with kvm-64  I no longer experience sudden  qemu-kvm stops,
which I had to resolve by attaching strace to qemu procees - that
magically 'unfreezed' qemu
and it was happening quite often.

Zdenek

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666463 -7200
# Node ID 33de2e17d0f5670515833bf8d3d2ea19e2a85b09
# Parent  baceb322b45ed43280654dac6c964c9d3d8a936f
Moves all mmu notifier methods outside the PT lock (first and not last
step to make them sleep capable).

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -117,27 +117,6 @@
INIT_HLIST_HEAD(mm-mmu_notifier_list);
 }
 
-#define ptep_clear_flush_notify(__vma, __address, __ptep)  \
-({ \
-   pte_t __pte;\
-   struct vm_area_struct *___vma = __vma;  \
-   unsigned long ___address = __address;   \
-   __pte = ptep_clear_flush(___vma, ___address, __ptep);   \
-   mmu_notifier_invalidate_page(___vma-vm_mm, ___address);\
-   __pte;  \
-})
-
-#define ptep_clear_flush_young_notify(__vma, __address, __ptep)
\
-({ \
-   int __young;\
-   struct vm_area_struct *___vma = __vma;  \
-   unsigned long ___address = __address;   \
-   __young = ptep_clear_flush_young(___vma, ___address, __ptep);   \
-   __young |= mmu_notifier_clear_flush_young(___vma-vm_mm,\
- ___address);  \
-   __young;\
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -169,9 +148,6 @@
 {
 }
 
-#define ptep_clear_flush_young_notify ptep_clear_flush_young
-#define ptep_clear_flush_notify ptep_clear_flush
-
 #endif /* CONFIG_MMU_NOTIFIER */
 
 #endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,11 +194,13 @@
if (pte) {
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
-   pteval = ptep_clear_flush_notify(vma, address, pte);
+   pteval = ptep_clear_flush(vma, address, pte);
page_remove_rmap(page, vma);
dec_mm_counter(mm, file_rss);
BUG_ON(pte_dirty(pteval));
pte_unmap_unlock(pte, ptl);
+   /* must invalidate_page _before_ freeing the page */
+   mmu_notifier_invalidate_page(mm, address);
page_cache_release(page);
}
}
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1626,9 +1626,10 @@
 */
page_table = pte_offset_map_lock(mm, pmd, address,
 ptl);
-   page_cache_release(old_page);
+   new_page = NULL;
if (!pte_same(*page_table, orig_pte))
goto unlock;
+   page_cache_release(old_page);
 
page_mkwrite = 1;
}
@@ -1644,6 +1645,7 @@
if (ptep_set_access_flags(vma, address, page_table, entry,1))
update_mmu_cache(vma, address, entry);
ret |= VM_FAULT_WRITE;
+   old_page = new_page = NULL;
goto unlock;
}
 
@@ -1688,7 +1690,7 @@
 * seen in the presence of one thread doing SMC and another
 * thread doing COW.
 */
-   ptep_clear_flush_notify(vma, address, page_table);
+   ptep_clear_flush(vma, address, page_table);
set_pte_at(mm, address, page_table, entry);
update_mmu_cache(vma, address, entry);
lru_cache_add_active(new_page);
@@ -1700,12 +1702,18 @@
} else
mem_cgroup_uncharge_page(new_page);
 
-   if (new_page)
+unlock:
+   pte_unmap_unlock(page_table, ptl);
+
+   if (new_page) {
+   if (new_page == old_page)
+   /* cow happened, notify before releasing old_page */
+   mmu_notifier_invalidate_page(mm, address);
page_cache_release(new_page);
+   }
if (old_page)
page_cache_release(old_page);
-unlock:
-   pte_unmap_unlock(page_table, ptl);
+
if (dirty_page) {
if (vma-vm_file)
file_update_time(vma-vm_file);
diff --git

[kvm-devel] [PATCH 2 of 9] Core of mmu notifiers

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666462 -7200
# Node ID baceb322b45ed43280654dac6c964c9d3d8a936f
# Parent  ec6d8f91b299cf26cce5c3d49bb25d35ee33c137
Core of mmu notifiers.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,6 +225,9 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+   struct hlist_head mmu_notifier_list;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,177 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/mm_types.h
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+   /*
+* Called when nobody can register any more notifier in the mm
+* and after the mn notifier has been disarmed already.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   /*
+* clear_flush_young is called after the VM is
+* test-and-clearing the young/accessed bitflag in the
+* pte. This way the VM will provide proper aging to the
+* accesses to the page through the secondary MMUs and not
+* only to the ones through the Linux pte.
+*/
+   int (*clear_flush_young)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long address);
+
+   /*
+* Before this is invoked any secondary MMU is still ok to
+* read/write to the page previously pointed by the Linux pte
+* because the old page hasn't been freed yet.  If required
+* set_page_dirty has to be called internally to this method.
+*/
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+
+   /*
+* invalidate_range_start() and invalidate_range_end() must be
+* paired. Multiple invalidate_range_start/ends may be nested
+* or called concurrently.
+*/
+   void (*invalidate_range_start)(struct mmu_notifier *mn,
+  struct mm_struct *mm,
+  unsigned long start, unsigned long end);
+   void (*invalidate_range_end)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+   return unlikely(!hlist_empty(mm-mmu_notifier_list));
+}
+
+extern int mmu_notifier_register(struct mmu_notifier *mn,
+struct mm_struct *mm);
+extern int mmu_notifier_unregister(struct mmu_notifier *mn,
+  struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+ unsigned long address)
+{
+   if (mm_has_notifiers(mm))
+   return __mmu_notifier_clear_flush_young(mm, address);
+   return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void

[kvm-devel] [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666893 -7200
# Node ID b0cb674314534b9cc4759603f123474d38427b2d
# Parent  20e829e35dfeceeb55a816ef495afda10cd50b98
We no longer abort unmapping in unmap vmas because we can reschedule while
unmapping since we are holding a semaphore. This would allow moving more
of the tlb flusing into unmap_vmas reducing code in various places.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -723,8 +723,7 @@
 struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
-   struct vm_area_struct *start_vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long 
start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
 
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -805,7 +805,6 @@
 
 /**
  * unmap_vmas - unmap a range of memory covered by a list of vma's
- * @tlbp: address of the caller's struct mmu_gather
  * @vma: the starting vma
  * @start_addr: virtual address at which to start unmapping
  * @end_addr: virtual address at which to end unmapping
@@ -817,20 +816,13 @@
  * Unmap all pages in the vma list.
  *
  * We aim to not hold locks for too long (for scheduling latency reasons).
- * So zap pages in ZAP_BLOCK_SIZE bytecounts.  This means we need to
- * return the ending mmu_gather to the caller.
+ * So zap pages in ZAP_BLOCK_SIZE bytecounts.
  *
  * Only addresses between `start' and `end' will be unmapped.
  *
  * The VMA list must be sorted in ascending virtual address order.
- *
- * unmap_vmas() assumes that the caller will flush the whole unmapped address
- * range after unmap_vmas() returns.  So the only responsibility here is to
- * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
- * drops the lock and schedules.
  */
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
-   struct vm_area_struct *vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
 {
@@ -838,7 +830,15 @@
unsigned long tlb_start = 0;/* For tlb_finish_mmu */
int tlb_start_valid = 0;
unsigned long start = start_addr;
-   int fullmm = (*tlbp)-fullmm;
+   int fullmm;
+   struct mmu_gather *tlb;
+   struct mm_struct *mm = vma-vm_mm;
+
+   mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
+   lru_add_drain();
+   tlb = tlb_gather_mmu(mm, 0);
+   update_hiwater_rss(mm);
+   fullmm = tlb-fullmm;
 
for ( ; vma  vma-vm_start  end_addr; vma = vma-vm_next) {
unsigned long end;
@@ -865,7 +865,7 @@
(HPAGE_SIZE / PAGE_SIZE);
start = end;
} else
-   start = unmap_page_range(*tlbp, vma,
+   start = unmap_page_range(tlb, vma,
start, end, zap_work, details);
 
if (zap_work  0) {
@@ -873,13 +873,15 @@
break;
}
 
-   tlb_finish_mmu(*tlbp, tlb_start, start);
+   tlb_finish_mmu(tlb, tlb_start, start);
cond_resched();
-   *tlbp = tlb_gather_mmu(vma-vm_mm, fullmm);
+   tlb = tlb_gather_mmu(vma-vm_mm, fullmm);
tlb_start_valid = 0;
zap_work = ZAP_BLOCK_SIZE;
}
}
+   tlb_finish_mmu(tlb, start_addr, end_addr);
+   mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
return start;   /* which is now the end (or restart) address */
 }
 
@@ -893,20 +895,10 @@
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
 {
-   struct mm_struct *mm = vma-vm_mm;
-   struct mmu_gather *tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
 
-   lru_add_drain();
-   tlb = tlb_gather_mmu(mm, 0);
-   update_hiwater_rss(mm);
-   mmu_notifier_invalidate_range_start(mm, address, end);
-   end = unmap_vmas(tlb, vma, address, end, nr_accounted, details);
-   mmu_notifier_invalidate_range_end(mm, address, end);
-   if (tlb)
-   tlb_finish_mmu(tlb, address, end);
-   return end;
+   return

[kvm-devel] [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed()

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666972 -7200
# Node ID 3b14e26a4e0491f00bb989be04d8b7e0755ed2d7
# Parent  a0c52e4b9b71e2627238b69c0a58905097973279
XPMEM would have used sys_madvise() except that madvise_dontneed()
returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages
XPMEM imports from other partitions and is also true for uncached pages
allocated locally via the mspec allocator.  XPMEM needs zap_page_range()
functionality for these types of pages as well as 'normal' pages.

Signed-off-by: Dean Nelson [EMAIL PROTECTED]

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -900,6 +900,7 @@
 
return unmap_vmas(vma, address, end, nr_accounted, details);
 }
+EXPORT_SYMBOL_GPL(zap_page_range);
 
 /*
  * Do a quick page-table lookup for a single page.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666463 -7200
# Node ID 2c2ed514f294dbbfc66157f771bc900789ac6005
# Parent  33de2e17d0f5670515833bf8d3d2ea19e2a85b09
Move the tlb flushing into free_pgtables. The conversion of the locks
taken for reverse map scanning would require taking sleeping locks
in free_pgtables(). Moving the tlb flushing into free_pgtables allows
sleeping in parts of free_pgtables().

This means that we do a tlb_finish_mmu() before freeing the page tables.
Strictly speaking there may not be the need to do another tlb flush after
freeing the tables. But its the only way to free a series of page table
pages from the tlb list. And we do not want to call into the page allocator
for performance reasons. Aim9 numbers look okay after this patch.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -751,8 +751,8 @@
void *private);
 void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
unsigned long end, unsigned long floor, unsigned long ceiling);
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
-   unsigned long floor, unsigned long ceiling);
+void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor,
+   unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -272,9 +272,11 @@
} while (pgd++, addr = next, addr != end);
 }
 
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
-   unsigned long floor, unsigned long ceiling)
+void free_pgtables(struct vm_area_struct *vma, unsigned long floor,
+   unsigned long ceiling)
 {
+   struct mmu_gather *tlb;
+
while (vma) {
struct vm_area_struct *next = vma-vm_next;
unsigned long addr = vma-vm_start;
@@ -286,8 +288,10 @@
unlink_file_vma(vma);
 
if (is_vm_hugetlb_page(vma)) {
-   hugetlb_free_pgd_range(tlb, addr, vma-vm_end,
+   tlb = tlb_gather_mmu(vma-vm_mm, 0);
+   hugetlb_free_pgd_range(tlb, addr, vma-vm_end,
floor, next? next-vm_start: ceiling);
+   tlb_finish_mmu(tlb, addr, vma-vm_end);
} else {
/*
 * Optimization: gather nearby vmas into one call down
@@ -299,8 +303,10 @@
anon_vma_unlink(vma);
unlink_file_vma(vma);
}
-   free_pgd_range(tlb, addr, vma-vm_end,
+   tlb = tlb_gather_mmu(vma-vm_mm, 0);
+   free_pgd_range(tlb, addr, vma-vm_end,
floor, next? next-vm_start: ceiling);
+   tlb_finish_mmu(tlb, addr, vma-vm_end);
}
vma = next;
}
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1752,9 +1752,9 @@
mmu_notifier_invalidate_range_start(mm, start, end);
unmap_vmas(tlb, vma, start, end, nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
-   free_pgtables(tlb, vma, prev? prev-vm_end: FIRST_USER_ADDRESS,
+   tlb_finish_mmu(tlb, start, end);
+   free_pgtables(vma, prev? prev-vm_end: FIRST_USER_ADDRESS,
 next? next-vm_start: 0);
-   tlb_finish_mmu(tlb, start, end);
mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
@@ -2051,8 +2051,8 @@
/* Use -1 here to ensure all VMAs in the mm are unmapped */
end = unmap_vmas(tlb, vma, 0, -1, nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
-   free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
tlb_finish_mmu(tlb, 0, end);
+   free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
/*
 * Walk the list again, actually closing and freeing it,

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666968 -7200
# Node ID a0c52e4b9b71e2627238b69c0a58905097973279
# Parent  b0cb674314534b9cc4759603f123474d38427b2d
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse maps for try_to_unmap and page_mkclean. It also
allows the calling of sleeping functions from reverse map traversal.

An additional complication is that rcu is used in some context to guarantee
the presence of the anon_vma while we acquire the lock. We cannot take a
semaphore within an rcu critical section. Add a refcount to the anon_vma
structure which allow us to give an existence guarantee for the anon_vma
structure independent of the spinlock or the list contents.

The refcount can then be taken within the RCU section. If it has been
taken successfully then the refcount guarantees the existence of the
anon_vma. The refcount in anon_vma also allows us to fix a nasty
issue in page migration where we fudged by using rcu for a long code
path to guarantee the existence of the anon_vma.

The refcount in general allows a shortening of RCU critical sections since
we can do a rcu_unlock after taking the refcount. This is particularly
relevant if the anon_vma chains contain hundreds of entries.

Issues:
- Atomic overhead increases in situations where a new reference
  to the anon_vma has to be established or removed. Overhead also increases
  when a speculative reference is used (try_to_unmap,
  page_mkclean, page migration). There is also the more frequent processor
  change due to up_xxx letting waiting tasks run first.
  This results in f.e. the Aim9 brk performance test to got down by 10-15%.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1051,9 +1051,9 @@
 
 struct mm_lock_data {
struct rw_semaphore **i_mmap_sems;
-   spinlock_t **anon_vma_locks;
+   struct rw_semaphore **anon_vma_sems;
unsigned long nr_i_mmap_sems;
-   unsigned long nr_anon_vma_locks;
+   unsigned long nr_anon_vma_sems;
 };
 extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
 extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -25,7 +25,8 @@
  * pointing to this anon_vma once its vma list is empty.
  */
 struct anon_vma {
-   spinlock_t lock;/* Serialize access to vma list */
+   atomic_t refcount;  /* vmas on the list */
+   struct rw_semaphore sem;/* Serialize access to vma list */
struct list_head head;  /* List of private related vmas */
 };
 
@@ -43,18 +44,31 @@
kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
+struct anon_vma *grab_anon_vma(struct page *page);
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+   atomic_inc(anon_vma-refcount);
+}
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+   if (atomic_dec_and_test(anon_vma-refcount))
+   anon_vma_free(anon_vma);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma-anon_vma;
if (anon_vma)
-   spin_lock(anon_vma-lock);
+   down_write(anon_vma-sem);
 }
 
 static inline void anon_vma_unlock(struct vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma-anon_vma;
if (anon_vma)
-   spin_unlock(anon_vma-lock);
+   up_write(anon_vma-sem);
 }
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -235,15 +235,16 @@
return;
 
/*
-* We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+* We hold either the mmap_sem lock or a reference on the
+* anon_vma. So no need to call page_lock_anon_vma.
 */
anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
-   spin_lock(anon_vma-lock);
+   down_read(anon_vma-sem);
 
list_for_each_entry(vma, anon_vma-head, anon_vma_node)
remove_migration_pte(vma, old, new);
 
-   spin_unlock(anon_vma-lock);
+   up_read(anon_vma-sem);
 }
 
 /*
@@ -623,7 +624,7 @@
int rc = 0;
int *result = NULL;
struct page *newpage = get_new_page(page, private, result);
-   int rcu_locked = 0;
+   struct anon_vma *anon_vma = NULL;
int charge = 0;
 
if (!newpage)
@@ -647,16 +648,14 @@
}
/*
 * By try_to_unmap(), page-mapcount goes down to 0 here. In this case,
-* we cannot notice that anon_vma is freed while we migrates a page.
+* we cannot notice that anon_vma is freed while we migrate a page.
 * This rcu_read_lock() delays freeing anon_vma pointer until the end
 * of migration. File cache pages are no problem because of page_lock()
 *

[kvm-devel] [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666972 -7200
# Node ID bd55023b22769ecb14b26c2347947f7d6d63bcea
# Parent  3b14e26a4e0491f00bb989be04d8b7e0755ed2d7
This patch adds a lock ordering rule to avoid a potential deadlock when
multiple mmap_sems need to be locked.

Signed-off-by: Dean Nelson [EMAIL PROTECTED]

diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -79,6 +79,9 @@
  *
  *  -i_mutex  (generic_file_buffered_write)
  *-mmap_sem   (fault_in_pages_readable-do_page_fault)
+ *
+ *When taking multiple mmap_sems, one should lock the lowest-addressed
+ *one first proceeding on up to the highest-addressed one.
  *
  *  -i_mutex
  *-i_alloc_sem (various)

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666463 -7200
# Node ID 20e829e35dfeceeb55a816ef495afda10cd50b98
# Parent  2c2ed514f294dbbfc66157f771bc900789ac6005
The conversion to a rwsem allows callbacks during rmap traversal
for files in a non atomic context. A rw style lock also allows concurrent
walking of the reverse map. This is fairly straightforward if one removes
pieces of the resched checking.

[Restarting unmapping is an issue to be discussed].

This slightly increases Aim9 performance results on an 8p.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -69,7 +69,7 @@
if (!vma_shareable(vma, addr))
return;
 
-   spin_lock(mapping-i_mmap_lock);
+   down_read(mapping-i_mmap_sem);
vma_prio_tree_foreach(svma, iter, mapping-i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -94,7 +94,7 @@
put_page(virt_to_page(spte));
spin_unlock(mm-page_table_lock);
 out:
-   spin_unlock(mapping-i_mmap_lock);
+   up_read(mapping-i_mmap_sem);
 }
 
 /*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -454,10 +454,10 @@
pgoff = offset  PAGE_SHIFT;
 
i_size_write(inode, offset);
-   spin_lock(mapping-i_mmap_lock);
+   down_read(mapping-i_mmap_sem);
if (!prio_tree_empty(mapping-i_mmap))
hugetlb_vmtruncate_list(mapping-i_mmap, pgoff);
-   spin_unlock(mapping-i_mmap_lock);
+   up_read(mapping-i_mmap_sem);
truncate_hugepages(inode, offset);
return 0;
 }
diff --git a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -210,7 +210,7 @@
INIT_LIST_HEAD(inode-i_devices);
INIT_RADIX_TREE(inode-i_data.page_tree, GFP_ATOMIC);
rwlock_init(inode-i_data.tree_lock);
-   spin_lock_init(inode-i_data.i_mmap_lock);
+   init_rwsem(inode-i_data.i_mmap_sem);
INIT_LIST_HEAD(inode-i_data.private_list);
spin_lock_init(inode-i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(inode-i_data.i_mmap);
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -503,7 +503,7 @@
unsigned inti_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root   i_mmap; /* tree of private and shared 
mappings */
struct list_headi_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-   spinlock_t  i_mmap_lock;/* protect tree, count, list */
+   struct rw_semaphore i_mmap_sem; /* protect tree, count, list */
unsigned inttruncate_count; /* Cover race condition with 
truncate */
unsigned long   nrpages;/* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -716,7 +716,7 @@
struct address_space *check_mapping;/* Check page-mapping if set */
pgoff_t first_index;/* Lowest page-index to unmap 
*/
pgoff_t last_index; /* Highest page-index to unmap 
*/
-   spinlock_t *i_mmap_lock;/* For unmap_mapping_range: */
+   struct rw_semaphore *i_mmap_sem;/* For unmap_mapping_range: */
unsigned long truncate_count;   /* Compare vm_truncate_count */
 };
 
@@ -1051,9 +1051,9 @@
   unsigned long flags, struct page **pages);
 
 struct mm_lock_data {
-   spinlock_t **i_mmap_locks;
+   struct rw_semaphore **i_mmap_sems;
spinlock_t **anon_vma_locks;
-   unsigned long nr_i_mmap_locks;
+   unsigned long nr_i_mmap_sems;
unsigned long nr_anon_vma_locks;
 };
 extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -274,12 +274,12 @@
atomic_dec(inode-i_writecount);
 
/* insert tmp into the share list, just after mpnt */
-   spin_lock(file-f_mapping-i_mmap_lock);
+   down_write(file-f_mapping-i_mmap_sem);
tmp-vm_truncate_count = mpnt-vm_truncate_count;
flush_dcache_mmap_lock(file-f_mapping);
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(file-f_mapping);
-   spin_unlock(file-f_mapping-i_mmap_lock);
+   up_write(file-f_mapping-i_mmap_sem);
}
 
/*
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@

[kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-08 Thread Andrea Arcangeli

The difference with #v11 is a different implementation of mm_lock that
guarantees handling signals in O(N). It's also more lowlatency friendly. 

Note that mmu_notifier_unregister may also fail with -EINTR if there are
signal pending or the system runs out of vmalloc space or physical memory,
only exit_mmap guarantees that any kernel module can be unloaded in presence
of an oom condition.

Either #v11 or the first three #v12 1,2,3 patches are suitable for inclusion
in -mm, pick what you prefer looking at the mmu_notifier_register retval and
mm_lock retval difference, I implemented and slighty tested both. GRU and KVM
only needs 1,2,3, XPMEM needs the rest of the patchset too (4, ...) but all
patches from 4 to the end can be deffered to a second merge window.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen

2008-04-08 Thread Andrea Arcangeli

# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666462 -7200
# Node ID ec6d8f91b299cf26cce5c3d49bb25d35ee33c137
# Parent  d4c25404de6376297ed34fada14cd6b894410eb0
Lock the entire mm to prevent any mmu related operation to happen.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,15 @@
   unsigned long addr, unsigned long len,
   unsigned long flags, struct page **pages);
 
+struct mm_lock_data {
+   spinlock_t **i_mmap_locks;
+   spinlock_t **anon_vma_locks;
+   unsigned long nr_i_mmap_locks;
+   unsigned long nr_anon_vma_locks;
+};
+extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
+extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned 
long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,7 @@
 #include linux/mount.h
 #include linux/mempolicy.h
 #include linux/rmap.h
+#include linux/vmalloc.h
 
 #include asm/uaccess.h
 #include asm/cacheflush.h
@@ -2242,3 +2243,140 @@
 
return 0;
 }
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+struct mm_lock_data *mm_lock(struct mm_struct * mm)
+{
+   struct vm_area_struct *vma;
+   spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
+   unsigned long nr_i_mmap_locks, nr_anon_vma_locks, i;
+   struct mm_lock_data *data;
+   int err;
+
+   down_write(mm-mmap_sem);
+
+   err = -EINTR;
+   nr_i_mmap_locks = nr_anon_vma_locks = 0;
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cond_resched();
+   if (unlikely(signal_pending(current)))
+   goto out;
+
+   if (vma-vm_file  vma-vm_file-f_mapping)
+   nr_i_mmap_locks++;
+   if (vma-anon_vma)
+   nr_anon_vma_locks++;
+   }
+
+   err = -ENOMEM;
+   data = kmalloc(sizeof(struct mm_lock_data), GFP_KERNEL);
+   if (!data)
+   goto out;
+
+   if (nr_i_mmap_locks) {
+   data-i_mmap_locks = vmalloc(nr_i_mmap_locks *
+sizeof(spinlock_t));
+   if (!data-i_mmap_locks)
+   goto out_kfree;
+   } else
+   data-i_mmap_locks = NULL;
+
+   if (nr_anon_vma_locks) {
+   data-anon_vma_locks = vmalloc(nr_anon_vma_locks *
+  sizeof(spinlock_t));
+   if (!data-anon_vma_locks)
+   goto out_vfree;
+   } else
+   data-anon_vma_locks = NULL;
+
+   err = -EINTR;
+   i_mmap_lock_last = NULL;
+   nr_i_mmap_locks = 0;
+   for (;;) {
+   spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cond_resched();
+   if (unlikely(signal_pending(current)))
+   goto out_vfree_both;
+
+   if (!vma-vm_file || !vma-vm_file-f_mapping)
+   continue;
+   if ((unsigned long) i_mmap_lock 
+   (unsigned long)
+   vma-vm_file-f_mapping-i_mmap_lock 
+   (unsigned long)
+   vma-vm_file-f_mapping-i_mmap_lock 
+   (unsigned long) i_mmap_lock_last)
+   i_mmap_lock =
+   vma-vm_file-f_mapping-i_mmap_lock;
+   }
+   if (i_mmap_lock == (spinlock_t *) -1UL)
+   break;
+   i_mmap_lock_last = i_mmap_lock;
+   data-i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
+   }
+   data-nr_i_mmap_locks = nr_i_mmap_locks;
+
+   anon_vma_lock_last = NULL;
+   nr_anon_vma_locks = 0;
+   for (;;) {
+   spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cond_resched();
+   if (unlikely(signal_pending(current)))
+   goto out_vfree_both;
+
+   if (!vma-anon_vma)
+   continue;
+   if ((unsigned long) anon_vma_lock 
+   (unsigned long)

59 matches

Mail list logo