Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Guillaume Thouvenin
On Mon, 07 Apr 2008 11:05:06 -0500
Anthony Liguori [EMAIL PROTECTED] wrote:

 Perhaps a viable way to fix this upstream would be to catch the vmentry 
 failure, look to see if SS.CPL != CS.CPL, and if so, invoke 
 x86_emulate() in a loop until SS.CPL == CS.CPL.

I tried this solution some time ago but unfortunately x86_emulate()
failed. I suspected a problem with guest EIP that could different
between the vmentry catch and the emulation. I will rebase my patch and
post them on the mailing list.

Regards,
Guillaume

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Постельное белье для всех цени телей стиля и комфорта

2008-04-08 Thread Никодим
Великолепное качество тканей, богатое разнообразие цветовых решений украсит Ваш 
дом и сделает Вашу жизнь приятной и красивой.
Известные мировые торговые марки постельного белья - TAC, Tivolyo, Anabella, Le 
Vele, Caleffi  представленные в нашем магазине, способны удовлетворить вкус 
самого взыскательного покупателя.

http://www.newcoolmagazin.ru

Доставка по всей России!


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Статус иностранного гражданин а в РФ (семинар)

2008-04-08 Thread Евграф
Иностpанные pаботники-особенности тpудовыx отношений и налогообложения. 
Новшества в законодательстве

10 апpеля 2008, г. Москва

Пpогpамма: 

1. Статус иностpанного гpажданина в pФ. pезиденты и неpезиденты. pегистpация по 
месту жительства.

2. Участие иностpанцев в тpудовыx отношенияx в pФ. Кто может быть 
pаботодателем. 
Новый поpядок офоpмления pазpешений на pаботу для иностpанцев, пpибывшиx в pФ. 
из безвизовой стpаны. 
Льготы в pазpешительном поpядке пpиема на pаботу.

3. ответственность за наpушение мигpационного законодательства. 

4. Договоpные отношения с иностpанными гpажданами. 

5. Документы, пpедъявляемые иностpанцами пpи пpиеме на pаботу - пеpечень 
необxодимыx документов и тpебования к ним.

6. Социальное стpаxование иностpанцев.

7. обязательное медицинское стpаxование гастаpбайтеpов. Добpовольное 
медицинское стpаxование.

8. Пенсионное обеспечение иностpанцев в pоссии.

9. особенности уплаты НДФЛ и ЕСН за иностpанного pаботника (в том числе, новое 
с 2007 г.). особенности уплаты ЕСН и НДФЛ за белоpусов. 
освобождение от НДФЛ доxода неpезидента, полученного за pубежом. Избежание 
двойного налогообложения.

10. Пpиобpетение услуг по пpедоставлению pабочей силы.

Пpодолжительность обучения: с 10 до 17 часов (с пеpеpывом на обед и кофе-паузу).
Место обучения: г. Москва, 5 мин. пешком от м. Академическая.
Стоимость обучения: 4900 pуб. (с НДС). 
(В стоимость вxодит: pаздаточный матеpиал, кофе-пауза, обед в pестоpане).

Пpи отсутствии возможности посетить семинаp, мы пpедлагаем пpиобpести его 
видеовеpсию на DVD/CD дискаx или видеокассетаx (пpилагается автоpский 
pаздаточный матеpиал). 
Цена видеокуpса - 3500 pублей, с учетом НДС.

Для pегистpации на семинаp необxодимо отпpавить нам по факсу или электpонной 
почте: pеквизиты оpганизации, тему и дату семинаpа, полное ФИо участников, 
контактный телефон и факс. 
Для заказа видеокуpса необxодимо отпpавить нам по факсу или электpонной почте: 
pеквизиты оpганизации, тему видеокуpса, указать носитель (ДВД или СД диски), 
телефон, факс, контактное лицо и точный адpес доставки. 
 
Получить дополнительную инфоpмацию и заpегистpиpоваться можно:
по т/ф: (495) 543-88-46
по электpонной почте: [EMAIL PROTECTED]



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Anthony Liguori
Guillaume Thouvenin wrote:
 On Mon, 07 Apr 2008 11:05:06 -0500
 Anthony Liguori [EMAIL PROTECTED] wrote:

   
 Perhaps a viable way to fix this upstream would be to catch the vmentry 
 failure, look to see if SS.CPL != CS.CPL, and if so, invoke 
 x86_emulate() in a loop until SS.CPL == CS.CPL.
 

 I tried this solution some time ago but unfortunately x86_emulate()
 failed. I suspected a problem with guest EIP that could different
 between the vmentry catch and the emulation. I will rebase my patch and
 post them on the mailing list.
   

x86 emulate is missing support for jmp far which is used to switch into 
protected mode.  It just needs to be added.

Regards,

Anthony Liguori

 Regards,
 Guillaume
   


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Compilation problems with git tree

2008-04-08 Thread Marcelo Tosatti
On Tue, Apr 08, 2008 at 01:03:58AM +0200, Zdenek Kabelac wrote:
 Hi
 
 I've tried to compile git tree for kvm-userspace.git
 I've used these configure options:
 
 --disable-gcc-check --with-patched-kernel
 
 using x86-64 platform
 
 I've got this error:
 
 ar rcs libqemu.a exec.o kqemu.o cpu-exec.o host-utils.o
 translate-all.o translate.o op.o tcg/tcg.o tcg/tcg-dyngen.o
 tcg/tcg-runtime.o qemu-kvm.o fpu/softfloat-native.o helper.o helper2.o
 qemu-kvm-x86.o kvm-tpr-opt.o qemu-kvm-helper.o disas.o i386-dis.o
 gcc -L /home/kabi/export/kvm-userspace/qemu/../libkvm  -g  -m64 -o
 qemu-system-x86_64 vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o
 migration.o block-raw-posix.o lsi53c895a.o esp.o usb-ohci.o
 eeprom93xx.o eepro100.o ne2000.o pcnet.o rtl8139.o e1000.o hypercall.o
 virtio.o virtio-net.o virtio-blk.o device-hotplug.o ide.o pckbd.o
 ps2.o vga.o sb16.o es1370.o dma.o fdc.o mc146818rtc.o serial.o i8259.o
 i8254.o pcspk.o pc.o cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o gdbstub.o
 ../libqemu_common.a libqemu.a  -lm -lz -lkvm -lgnutls   -lrt -lpthread
 -lutil -lSDL -lpthread  -lcurses
 pc.o: In function `pc_init1':
 /home/kabi/export/kvm-userspace/qemu/hw/pc.c:987: undefined reference
 to `kvm_pit_init'
 collect2: ld returned 1 exit status
 
 
 Obviously kvm_pit_init seems to be compiled in only for i386 - I've
 disables this code by #if 0

Update your host kernel. It seems backward compatibility is broken.

 
 But then during code run I've got this coredump:
 'ti' seems to be containing some garbage - am I using the latest code ??
 (as this is the last commit I could see:
 
 commit 5208ce19dca268f84a2b9441c2fbb6129161e44c
 Author: Marcelo Tosatti [EMAIL PROTECTED]
 Date:   Thu Apr 3 20:24:37 2008 -0300)
 
 
 Core was generated by `qemu-kvm -s -m 320 -smp 2 -net nic,model=pcnet
 -net user -redir'.
 Program terminated with signal 11, Segmentation fault.
 
 #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
 m=0x0) at slirp/tcp_input.c:208
 208   if (ti == (struct tcpiphdr *)tp || ti-ti_seq != tp-rcv_nxt)
 Missing separate debuginfos, use: debuginfo-install SDL.x86_64
 glibc.x86_64 gnutls.x86_64 libX11.x86_64 libXau.x86_64
 libXcursor.x86_64 libXdmcp.x86_64 libXext.x86_64 libXfixes.x86_64
 libXrandr.x86_64 libXrender.x86_64 libgcrypt.x86_64
 libgpg-error.x86_64 libtasn1.x86_64 libxcb.x86_64 ncurses.x86_64
 zlib.x86_64
 (gdb) bt
 #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
 m=0x0) at slirp/tcp_input.c:208
 #1  0x00485c3b in tcp_input (m=0x2ba7260, iphlen=value
 optimized out, inso=value optimized out)
 at slirp/tcp_input.c:1052
 #2  0x00406aa1 in qemu_send_packet (vc1=0x2b9b0b0,
 buf=0x2c9dd58 RT, size=54)
 at /home/kabi/export/kvm-userspace/qemu/vl.c:3758
 #3  0x00426211 in pcnet_transmit (s=0x2c9d990) at
 /home/kabi/export/kvm-userspace/qemu/hw/pcnet.c:1272
 #4  0x00426898 in pcnet_poll_timer (opaque=value optimized
 out) at /home/kabi/export/kvm-userspace/qemu/hw/pcnet.c:1335
 #5  0x00426f30 in pcnet_ioport_writew (opaque=0x7fabec000d60,
 addr=0, val=0)
 at /home/kabi/export/kvm-userspace/qemu/hw/pcnet.c:1617
 #6  0x005050f1 in kvm_outw (opaque=value optimized out,
 addr=0, data=0)
 at /home/kabi/export/kvm-userspace/qemu/qemu-kvm.c:515
 #7  0x005252b4 in handle_io (kvm=0x2ac4000,
 run=0x7fac0bc73000, vcpu=1) at libkvm.c:721
 #8  0x00525972 in kvm_run (kvm=0x2ac4000, vcpu=1) at libkvm.c:889
 #9  0x00505636 in kvm_cpu_exec (env=value optimized out) at
 /home/kabi/export/kvm-userspace/qemu/qemu-kvm.c:146
 #10 0x005058e0 in ap_main_loop (_env=value optimized out) at
 /home/kabi/export/kvm-userspace/qemu/qemu-kvm.c:330
 #11 0x00371600740a in start_thread () from /lib64/libpthread.so.0
 #12 0x0037154e678d in clone () from /lib64/libc.so.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Guillaume Thouvenin
On Tue, 08 Apr 2008 07:14:13 -0500
Anthony Liguori [EMAIL PROTECTED] wrote:

 Guillaume Thouvenin wrote:
  On Mon, 07 Apr 2008 11:05:06 -0500
  Anthony Liguori [EMAIL PROTECTED] wrote:
 

  Perhaps a viable way to fix this upstream would be to catch the vmentry 
  failure, look to see if SS.CPL != CS.CPL, and if so, invoke 
  x86_emulate() in a loop until SS.CPL == CS.CPL.
  
 
  I tried this solution some time ago but unfortunately x86_emulate()
  failed. I suspected a problem with guest EIP that could different
  between the vmentry catch and the emulation. I will rebase my patch and
  post them on the mailing list.

 
 x86 emulate is missing support for jmp far which is used to switch into 
 protected mode.  It just needs to be added.

Ok I see. I understand now why you said in a previous email that KVM
needs to have a proper load_seg() function like the Xen's x86_emulate.
This function is used to load the segment in a far jmp. I will look how
it is done in Xen and I will try to copy the stuff like you did.

Regards,
Guillaume

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Compilation problems with git tree

2008-04-08 Thread Zdenek Kabelac
2008/4/8, Marcelo Tosatti [EMAIL PROTECTED]:
 On Tue, Apr 08, 2008 at 01:03:58AM +0200, Zdenek Kabelac wrote:
   Hi
  
   I've tried to compile git tree for kvm-userspace.git
   I've used these configure options:
  
   --disable-gcc-check --with-patched-kernel
  
   using x86-64 platform
  
   I've got this error:
  
   pc.o: In function `pc_init1':
   /home/kabi/export/kvm-userspace/qemu/hw/pc.c:987: undefined reference
   to `kvm_pit_init'
   collect2: ld returned 1 exit status
  
  
   Obviously kvm_pit_init seems to be compiled in only for i386 - I've
   disables this code by #if 0



 Update your host kernel. It seems backward compatibility is broken.



   Core was generated by `qemu-kvm -s -m 320 -smp 2 -net nic,model=pcnet
   -net user -redir'.
   Program terminated with signal 11, Segmentation fault.
  
   #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
   m=0x0) at slirp/tcp_input.c:208


Hmm - to get fixed first compilation problem - or the second coredump crash ?

Because I need to use some combination of other kernel trees for now
I'll stay with linux git tree 2.6.25-rc8 - hopefully patches from  kvm
git tree will get there soon.

I think I'll survive the occasional crash (2x/day) caused by this
backward incompatibility.

As compared with kvm-64  I no longer experience sudden  qemu-kvm stops,
which I had to resolve by attaching strace to qemu procees - that
magically 'unfreezed' qemu
and it was happening quite often.

Zdenek

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666463 -7200
# Node ID 33de2e17d0f5670515833bf8d3d2ea19e2a85b09
# Parent  baceb322b45ed43280654dac6c964c9d3d8a936f
Moves all mmu notifier methods outside the PT lock (first and not last
step to make them sleep capable).

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -117,27 +117,6 @@
INIT_HLIST_HEAD(mm-mmu_notifier_list);
 }
 
-#define ptep_clear_flush_notify(__vma, __address, __ptep)  \
-({ \
-   pte_t __pte;\
-   struct vm_area_struct *___vma = __vma;  \
-   unsigned long ___address = __address;   \
-   __pte = ptep_clear_flush(___vma, ___address, __ptep);   \
-   mmu_notifier_invalidate_page(___vma-vm_mm, ___address);\
-   __pte;  \
-})
-
-#define ptep_clear_flush_young_notify(__vma, __address, __ptep)
\
-({ \
-   int __young;\
-   struct vm_area_struct *___vma = __vma;  \
-   unsigned long ___address = __address;   \
-   __young = ptep_clear_flush_young(___vma, ___address, __ptep);   \
-   __young |= mmu_notifier_clear_flush_young(___vma-vm_mm,\
- ___address);  \
-   __young;\
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -169,9 +148,6 @@
 {
 }
 
-#define ptep_clear_flush_young_notify ptep_clear_flush_young
-#define ptep_clear_flush_notify ptep_clear_flush
-
 #endif /* CONFIG_MMU_NOTIFIER */
 
 #endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,11 +194,13 @@
if (pte) {
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
-   pteval = ptep_clear_flush_notify(vma, address, pte);
+   pteval = ptep_clear_flush(vma, address, pte);
page_remove_rmap(page, vma);
dec_mm_counter(mm, file_rss);
BUG_ON(pte_dirty(pteval));
pte_unmap_unlock(pte, ptl);
+   /* must invalidate_page _before_ freeing the page */
+   mmu_notifier_invalidate_page(mm, address);
page_cache_release(page);
}
}
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1626,9 +1626,10 @@
 */
page_table = pte_offset_map_lock(mm, pmd, address,
 ptl);
-   page_cache_release(old_page);
+   new_page = NULL;
if (!pte_same(*page_table, orig_pte))
goto unlock;
+   page_cache_release(old_page);
 
page_mkwrite = 1;
}
@@ -1644,6 +1645,7 @@
if (ptep_set_access_flags(vma, address, page_table, entry,1))
update_mmu_cache(vma, address, entry);
ret |= VM_FAULT_WRITE;
+   old_page = new_page = NULL;
goto unlock;
}
 
@@ -1688,7 +1690,7 @@
 * seen in the presence of one thread doing SMC and another
 * thread doing COW.
 */
-   ptep_clear_flush_notify(vma, address, page_table);
+   ptep_clear_flush(vma, address, page_table);
set_pte_at(mm, address, page_table, entry);
update_mmu_cache(vma, address, entry);
lru_cache_add_active(new_page);
@@ -1700,12 +1702,18 @@
} else
mem_cgroup_uncharge_page(new_page);
 
-   if (new_page)
+unlock:
+   pte_unmap_unlock(page_table, ptl);
+
+   if (new_page) {
+   if (new_page == old_page)
+   /* cow happened, notify before releasing old_page */
+   mmu_notifier_invalidate_page(mm, address);
page_cache_release(new_page);
+   }
if (old_page)
page_cache_release(old_page);
-unlock:
-   pte_unmap_unlock(page_table, ptl);
+
if (dirty_page) {
if (vma-vm_file)
file_update_time(vma-vm_file);
diff --git 

[kvm-devel] [PATCH 2 of 9] Core of mmu notifiers

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666462 -7200
# Node ID baceb322b45ed43280654dac6c964c9d3d8a936f
# Parent  ec6d8f91b299cf26cce5c3d49bb25d35ee33c137
Core of mmu notifiers.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,6 +225,9 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+   struct hlist_head mmu_notifier_list;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,177 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/mm_types.h
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+   /*
+* Called when nobody can register any more notifier in the mm
+* and after the mn notifier has been disarmed already.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   /*
+* clear_flush_young is called after the VM is
+* test-and-clearing the young/accessed bitflag in the
+* pte. This way the VM will provide proper aging to the
+* accesses to the page through the secondary MMUs and not
+* only to the ones through the Linux pte.
+*/
+   int (*clear_flush_young)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long address);
+
+   /*
+* Before this is invoked any secondary MMU is still ok to
+* read/write to the page previously pointed by the Linux pte
+* because the old page hasn't been freed yet.  If required
+* set_page_dirty has to be called internally to this method.
+*/
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+
+   /*
+* invalidate_range_start() and invalidate_range_end() must be
+* paired. Multiple invalidate_range_start/ends may be nested
+* or called concurrently.
+*/
+   void (*invalidate_range_start)(struct mmu_notifier *mn,
+  struct mm_struct *mm,
+  unsigned long start, unsigned long end);
+   void (*invalidate_range_end)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+   return unlikely(!hlist_empty(mm-mmu_notifier_list));
+}
+
+extern int mmu_notifier_register(struct mmu_notifier *mn,
+struct mm_struct *mm);
+extern int mmu_notifier_unregister(struct mmu_notifier *mn,
+  struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+ unsigned long address)
+{
+   if (mm_has_notifiers(mm))
+   return __mmu_notifier_clear_flush_young(mm, address);
+   return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void 

[kvm-devel] [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666893 -7200
# Node ID b0cb674314534b9cc4759603f123474d38427b2d
# Parent  20e829e35dfeceeb55a816ef495afda10cd50b98
We no longer abort unmapping in unmap vmas because we can reschedule while
unmapping since we are holding a semaphore. This would allow moving more
of the tlb flusing into unmap_vmas reducing code in various places.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -723,8 +723,7 @@
 struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
-   struct vm_area_struct *start_vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long 
start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
 
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -805,7 +805,6 @@
 
 /**
  * unmap_vmas - unmap a range of memory covered by a list of vma's
- * @tlbp: address of the caller's struct mmu_gather
  * @vma: the starting vma
  * @start_addr: virtual address at which to start unmapping
  * @end_addr: virtual address at which to end unmapping
@@ -817,20 +816,13 @@
  * Unmap all pages in the vma list.
  *
  * We aim to not hold locks for too long (for scheduling latency reasons).
- * So zap pages in ZAP_BLOCK_SIZE bytecounts.  This means we need to
- * return the ending mmu_gather to the caller.
+ * So zap pages in ZAP_BLOCK_SIZE bytecounts.
  *
  * Only addresses between `start' and `end' will be unmapped.
  *
  * The VMA list must be sorted in ascending virtual address order.
- *
- * unmap_vmas() assumes that the caller will flush the whole unmapped address
- * range after unmap_vmas() returns.  So the only responsibility here is to
- * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
- * drops the lock and schedules.
  */
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
-   struct vm_area_struct *vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
 {
@@ -838,7 +830,15 @@
unsigned long tlb_start = 0;/* For tlb_finish_mmu */
int tlb_start_valid = 0;
unsigned long start = start_addr;
-   int fullmm = (*tlbp)-fullmm;
+   int fullmm;
+   struct mmu_gather *tlb;
+   struct mm_struct *mm = vma-vm_mm;
+
+   mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
+   lru_add_drain();
+   tlb = tlb_gather_mmu(mm, 0);
+   update_hiwater_rss(mm);
+   fullmm = tlb-fullmm;
 
for ( ; vma  vma-vm_start  end_addr; vma = vma-vm_next) {
unsigned long end;
@@ -865,7 +865,7 @@
(HPAGE_SIZE / PAGE_SIZE);
start = end;
} else
-   start = unmap_page_range(*tlbp, vma,
+   start = unmap_page_range(tlb, vma,
start, end, zap_work, details);
 
if (zap_work  0) {
@@ -873,13 +873,15 @@
break;
}
 
-   tlb_finish_mmu(*tlbp, tlb_start, start);
+   tlb_finish_mmu(tlb, tlb_start, start);
cond_resched();
-   *tlbp = tlb_gather_mmu(vma-vm_mm, fullmm);
+   tlb = tlb_gather_mmu(vma-vm_mm, fullmm);
tlb_start_valid = 0;
zap_work = ZAP_BLOCK_SIZE;
}
}
+   tlb_finish_mmu(tlb, start_addr, end_addr);
+   mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
return start;   /* which is now the end (or restart) address */
 }
 
@@ -893,20 +895,10 @@
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
 {
-   struct mm_struct *mm = vma-vm_mm;
-   struct mmu_gather *tlb;
unsigned long end = address + size;
unsigned long nr_accounted = 0;
 
-   lru_add_drain();
-   tlb = tlb_gather_mmu(mm, 0);
-   update_hiwater_rss(mm);
-   mmu_notifier_invalidate_range_start(mm, address, end);
-   end = unmap_vmas(tlb, vma, address, end, nr_accounted, details);
-   mmu_notifier_invalidate_range_end(mm, address, end);
-   if (tlb)
-   tlb_finish_mmu(tlb, address, end);
-   return end;
+   return 

[kvm-devel] [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed()

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666972 -7200
# Node ID 3b14e26a4e0491f00bb989be04d8b7e0755ed2d7
# Parent  a0c52e4b9b71e2627238b69c0a58905097973279
XPMEM would have used sys_madvise() except that madvise_dontneed()
returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages
XPMEM imports from other partitions and is also true for uncached pages
allocated locally via the mspec allocator.  XPMEM needs zap_page_range()
functionality for these types of pages as well as 'normal' pages.

Signed-off-by: Dean Nelson [EMAIL PROTECTED]

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -900,6 +900,7 @@
 
return unmap_vmas(vma, address, end, nr_accounted, details);
 }
+EXPORT_SYMBOL_GPL(zap_page_range);
 
 /*
  * Do a quick page-table lookup for a single page.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666463 -7200
# Node ID 2c2ed514f294dbbfc66157f771bc900789ac6005
# Parent  33de2e17d0f5670515833bf8d3d2ea19e2a85b09
Move the tlb flushing into free_pgtables. The conversion of the locks
taken for reverse map scanning would require taking sleeping locks
in free_pgtables(). Moving the tlb flushing into free_pgtables allows
sleeping in parts of free_pgtables().

This means that we do a tlb_finish_mmu() before freeing the page tables.
Strictly speaking there may not be the need to do another tlb flush after
freeing the tables. But its the only way to free a series of page table
pages from the tlb list. And we do not want to call into the page allocator
for performance reasons. Aim9 numbers look okay after this patch.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -751,8 +751,8 @@
void *private);
 void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
unsigned long end, unsigned long floor, unsigned long ceiling);
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
-   unsigned long floor, unsigned long ceiling);
+void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor,
+   unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -272,9 +272,11 @@
} while (pgd++, addr = next, addr != end);
 }
 
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
-   unsigned long floor, unsigned long ceiling)
+void free_pgtables(struct vm_area_struct *vma, unsigned long floor,
+   unsigned long ceiling)
 {
+   struct mmu_gather *tlb;
+
while (vma) {
struct vm_area_struct *next = vma-vm_next;
unsigned long addr = vma-vm_start;
@@ -286,8 +288,10 @@
unlink_file_vma(vma);
 
if (is_vm_hugetlb_page(vma)) {
-   hugetlb_free_pgd_range(tlb, addr, vma-vm_end,
+   tlb = tlb_gather_mmu(vma-vm_mm, 0);
+   hugetlb_free_pgd_range(tlb, addr, vma-vm_end,
floor, next? next-vm_start: ceiling);
+   tlb_finish_mmu(tlb, addr, vma-vm_end);
} else {
/*
 * Optimization: gather nearby vmas into one call down
@@ -299,8 +303,10 @@
anon_vma_unlink(vma);
unlink_file_vma(vma);
}
-   free_pgd_range(tlb, addr, vma-vm_end,
+   tlb = tlb_gather_mmu(vma-vm_mm, 0);
+   free_pgd_range(tlb, addr, vma-vm_end,
floor, next? next-vm_start: ceiling);
+   tlb_finish_mmu(tlb, addr, vma-vm_end);
}
vma = next;
}
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1752,9 +1752,9 @@
mmu_notifier_invalidate_range_start(mm, start, end);
unmap_vmas(tlb, vma, start, end, nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
-   free_pgtables(tlb, vma, prev? prev-vm_end: FIRST_USER_ADDRESS,
+   tlb_finish_mmu(tlb, start, end);
+   free_pgtables(vma, prev? prev-vm_end: FIRST_USER_ADDRESS,
 next? next-vm_start: 0);
-   tlb_finish_mmu(tlb, start, end);
mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
@@ -2051,8 +2051,8 @@
/* Use -1 here to ensure all VMAs in the mm are unmapped */
end = unmap_vmas(tlb, vma, 0, -1, nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
-   free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
tlb_finish_mmu(tlb, 0, end);
+   free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
/*
 * Walk the list again, actually closing and freeing it,

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666968 -7200
# Node ID a0c52e4b9b71e2627238b69c0a58905097973279
# Parent  b0cb674314534b9cc4759603f123474d38427b2d
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse maps for try_to_unmap and page_mkclean. It also
allows the calling of sleeping functions from reverse map traversal.

An additional complication is that rcu is used in some context to guarantee
the presence of the anon_vma while we acquire the lock. We cannot take a
semaphore within an rcu critical section. Add a refcount to the anon_vma
structure which allow us to give an existence guarantee for the anon_vma
structure independent of the spinlock or the list contents.

The refcount can then be taken within the RCU section. If it has been
taken successfully then the refcount guarantees the existence of the
anon_vma. The refcount in anon_vma also allows us to fix a nasty
issue in page migration where we fudged by using rcu for a long code
path to guarantee the existence of the anon_vma.

The refcount in general allows a shortening of RCU critical sections since
we can do a rcu_unlock after taking the refcount. This is particularly
relevant if the anon_vma chains contain hundreds of entries.

Issues:
- Atomic overhead increases in situations where a new reference
  to the anon_vma has to be established or removed. Overhead also increases
  when a speculative reference is used (try_to_unmap,
  page_mkclean, page migration). There is also the more frequent processor
  change due to up_xxx letting waiting tasks run first.
  This results in f.e. the Aim9 brk performance test to got down by 10-15%.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1051,9 +1051,9 @@
 
 struct mm_lock_data {
struct rw_semaphore **i_mmap_sems;
-   spinlock_t **anon_vma_locks;
+   struct rw_semaphore **anon_vma_sems;
unsigned long nr_i_mmap_sems;
-   unsigned long nr_anon_vma_locks;
+   unsigned long nr_anon_vma_sems;
 };
 extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
 extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -25,7 +25,8 @@
  * pointing to this anon_vma once its vma list is empty.
  */
 struct anon_vma {
-   spinlock_t lock;/* Serialize access to vma list */
+   atomic_t refcount;  /* vmas on the list */
+   struct rw_semaphore sem;/* Serialize access to vma list */
struct list_head head;  /* List of private related vmas */
 };
 
@@ -43,18 +44,31 @@
kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
+struct anon_vma *grab_anon_vma(struct page *page);
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+   atomic_inc(anon_vma-refcount);
+}
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+   if (atomic_dec_and_test(anon_vma-refcount))
+   anon_vma_free(anon_vma);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma-anon_vma;
if (anon_vma)
-   spin_lock(anon_vma-lock);
+   down_write(anon_vma-sem);
 }
 
 static inline void anon_vma_unlock(struct vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma-anon_vma;
if (anon_vma)
-   spin_unlock(anon_vma-lock);
+   up_write(anon_vma-sem);
 }
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -235,15 +235,16 @@
return;
 
/*
-* We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+* We hold either the mmap_sem lock or a reference on the
+* anon_vma. So no need to call page_lock_anon_vma.
 */
anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
-   spin_lock(anon_vma-lock);
+   down_read(anon_vma-sem);
 
list_for_each_entry(vma, anon_vma-head, anon_vma_node)
remove_migration_pte(vma, old, new);
 
-   spin_unlock(anon_vma-lock);
+   up_read(anon_vma-sem);
 }
 
 /*
@@ -623,7 +624,7 @@
int rc = 0;
int *result = NULL;
struct page *newpage = get_new_page(page, private, result);
-   int rcu_locked = 0;
+   struct anon_vma *anon_vma = NULL;
int charge = 0;
 
if (!newpage)
@@ -647,16 +648,14 @@
}
/*
 * By try_to_unmap(), page-mapcount goes down to 0 here. In this case,
-* we cannot notice that anon_vma is freed while we migrates a page.
+* we cannot notice that anon_vma is freed while we migrate a page.
 * This rcu_read_lock() delays freeing anon_vma pointer until the end
 * of migration. File cache pages are no problem because of page_lock()
 * 

[kvm-devel] [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666972 -7200
# Node ID bd55023b22769ecb14b26c2347947f7d6d63bcea
# Parent  3b14e26a4e0491f00bb989be04d8b7e0755ed2d7
This patch adds a lock ordering rule to avoid a potential deadlock when
multiple mmap_sems need to be locked.

Signed-off-by: Dean Nelson [EMAIL PROTECTED]

diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -79,6 +79,9 @@
  *
  *  -i_mutex  (generic_file_buffered_write)
  *-mmap_sem   (fault_in_pages_readable-do_page_fault)
+ *
+ *When taking multiple mmap_sems, one should lock the lowest-addressed
+ *one first proceeding on up to the highest-addressed one.
  *
  *  -i_mutex
  *-i_alloc_sem (various)

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666463 -7200
# Node ID 20e829e35dfeceeb55a816ef495afda10cd50b98
# Parent  2c2ed514f294dbbfc66157f771bc900789ac6005
The conversion to a rwsem allows callbacks during rmap traversal
for files in a non atomic context. A rw style lock also allows concurrent
walking of the reverse map. This is fairly straightforward if one removes
pieces of the resched checking.

[Restarting unmapping is an issue to be discussed].

This slightly increases Aim9 performance results on an 8p.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -69,7 +69,7 @@
if (!vma_shareable(vma, addr))
return;
 
-   spin_lock(mapping-i_mmap_lock);
+   down_read(mapping-i_mmap_sem);
vma_prio_tree_foreach(svma, iter, mapping-i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -94,7 +94,7 @@
put_page(virt_to_page(spte));
spin_unlock(mm-page_table_lock);
 out:
-   spin_unlock(mapping-i_mmap_lock);
+   up_read(mapping-i_mmap_sem);
 }
 
 /*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -454,10 +454,10 @@
pgoff = offset  PAGE_SHIFT;
 
i_size_write(inode, offset);
-   spin_lock(mapping-i_mmap_lock);
+   down_read(mapping-i_mmap_sem);
if (!prio_tree_empty(mapping-i_mmap))
hugetlb_vmtruncate_list(mapping-i_mmap, pgoff);
-   spin_unlock(mapping-i_mmap_lock);
+   up_read(mapping-i_mmap_sem);
truncate_hugepages(inode, offset);
return 0;
 }
diff --git a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -210,7 +210,7 @@
INIT_LIST_HEAD(inode-i_devices);
INIT_RADIX_TREE(inode-i_data.page_tree, GFP_ATOMIC);
rwlock_init(inode-i_data.tree_lock);
-   spin_lock_init(inode-i_data.i_mmap_lock);
+   init_rwsem(inode-i_data.i_mmap_sem);
INIT_LIST_HEAD(inode-i_data.private_list);
spin_lock_init(inode-i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(inode-i_data.i_mmap);
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -503,7 +503,7 @@
unsigned inti_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root   i_mmap; /* tree of private and shared 
mappings */
struct list_headi_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-   spinlock_t  i_mmap_lock;/* protect tree, count, list */
+   struct rw_semaphore i_mmap_sem; /* protect tree, count, list */
unsigned inttruncate_count; /* Cover race condition with 
truncate */
unsigned long   nrpages;/* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -716,7 +716,7 @@
struct address_space *check_mapping;/* Check page-mapping if set */
pgoff_t first_index;/* Lowest page-index to unmap 
*/
pgoff_t last_index; /* Highest page-index to unmap 
*/
-   spinlock_t *i_mmap_lock;/* For unmap_mapping_range: */
+   struct rw_semaphore *i_mmap_sem;/* For unmap_mapping_range: */
unsigned long truncate_count;   /* Compare vm_truncate_count */
 };
 
@@ -1051,9 +1051,9 @@
   unsigned long flags, struct page **pages);
 
 struct mm_lock_data {
-   spinlock_t **i_mmap_locks;
+   struct rw_semaphore **i_mmap_sems;
spinlock_t **anon_vma_locks;
-   unsigned long nr_i_mmap_locks;
+   unsigned long nr_i_mmap_sems;
unsigned long nr_anon_vma_locks;
 };
 extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -274,12 +274,12 @@
atomic_dec(inode-i_writecount);
 
/* insert tmp into the share list, just after mpnt */
-   spin_lock(file-f_mapping-i_mmap_lock);
+   down_write(file-f_mapping-i_mmap_sem);
tmp-vm_truncate_count = mpnt-vm_truncate_count;
flush_dcache_mmap_lock(file-f_mapping);
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(file-f_mapping);
-   spin_unlock(file-f_mapping-i_mmap_lock);
+   up_write(file-f_mapping-i_mmap_sem);
}
 
/*
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ 

[kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-08 Thread Andrea Arcangeli
The difference with #v11 is a different implementation of mm_lock that
guarantees handling signals in O(N). It's also more lowlatency friendly. 

Note that mmu_notifier_unregister may also fail with -EINTR if there are
signal pending or the system runs out of vmalloc space or physical memory,
only exit_mmap guarantees that any kernel module can be unloaded in presence
of an oom condition.

Either #v11 or the first three #v12 1,2,3 patches are suitable for inclusion
in -mm, pick what you prefer looking at the mmu_notifier_register retval and
mm_lock retval difference, I implemented and slighty tested both. GRU and KVM
only needs 1,2,3, XPMEM needs the rest of the patchset too (4, ...) but all
patches from 4 to the end can be deffered to a second merge window.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen

2008-04-08 Thread Andrea Arcangeli
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1207666462 -7200
# Node ID ec6d8f91b299cf26cce5c3d49bb25d35ee33c137
# Parent  d4c25404de6376297ed34fada14cd6b894410eb0
Lock the entire mm to prevent any mmu related operation to happen.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,15 @@
   unsigned long addr, unsigned long len,
   unsigned long flags, struct page **pages);
 
+struct mm_lock_data {
+   spinlock_t **i_mmap_locks;
+   spinlock_t **anon_vma_locks;
+   unsigned long nr_i_mmap_locks;
+   unsigned long nr_anon_vma_locks;
+};
+extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
+extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned 
long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,7 @@
 #include linux/mount.h
 #include linux/mempolicy.h
 #include linux/rmap.h
+#include linux/vmalloc.h
 
 #include asm/uaccess.h
 #include asm/cacheflush.h
@@ -2242,3 +2243,140 @@
 
return 0;
 }
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+struct mm_lock_data *mm_lock(struct mm_struct * mm)
+{
+   struct vm_area_struct *vma;
+   spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
+   unsigned long nr_i_mmap_locks, nr_anon_vma_locks, i;
+   struct mm_lock_data *data;
+   int err;
+
+   down_write(mm-mmap_sem);
+
+   err = -EINTR;
+   nr_i_mmap_locks = nr_anon_vma_locks = 0;
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cond_resched();
+   if (unlikely(signal_pending(current)))
+   goto out;
+
+   if (vma-vm_file  vma-vm_file-f_mapping)
+   nr_i_mmap_locks++;
+   if (vma-anon_vma)
+   nr_anon_vma_locks++;
+   }
+
+   err = -ENOMEM;
+   data = kmalloc(sizeof(struct mm_lock_data), GFP_KERNEL);
+   if (!data)
+   goto out;
+
+   if (nr_i_mmap_locks) {
+   data-i_mmap_locks = vmalloc(nr_i_mmap_locks *
+sizeof(spinlock_t));
+   if (!data-i_mmap_locks)
+   goto out_kfree;
+   } else
+   data-i_mmap_locks = NULL;
+
+   if (nr_anon_vma_locks) {
+   data-anon_vma_locks = vmalloc(nr_anon_vma_locks *
+  sizeof(spinlock_t));
+   if (!data-anon_vma_locks)
+   goto out_vfree;
+   } else
+   data-anon_vma_locks = NULL;
+
+   err = -EINTR;
+   i_mmap_lock_last = NULL;
+   nr_i_mmap_locks = 0;
+   for (;;) {
+   spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cond_resched();
+   if (unlikely(signal_pending(current)))
+   goto out_vfree_both;
+
+   if (!vma-vm_file || !vma-vm_file-f_mapping)
+   continue;
+   if ((unsigned long) i_mmap_lock 
+   (unsigned long)
+   vma-vm_file-f_mapping-i_mmap_lock 
+   (unsigned long)
+   vma-vm_file-f_mapping-i_mmap_lock 
+   (unsigned long) i_mmap_lock_last)
+   i_mmap_lock =
+   vma-vm_file-f_mapping-i_mmap_lock;
+   }
+   if (i_mmap_lock == (spinlock_t *) -1UL)
+   break;
+   i_mmap_lock_last = i_mmap_lock;
+   data-i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
+   }
+   data-nr_i_mmap_locks = nr_i_mmap_locks;
+
+   anon_vma_lock_last = NULL;
+   nr_anon_vma_locks = 0;
+   for (;;) {
+   spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cond_resched();
+   if (unlikely(signal_pending(current)))
+   goto out_vfree_both;
+
+   if (!vma-anon_vma)
+   continue;
+   if ((unsigned long) anon_vma_lock 
+   (unsigned long) 

[kvm-devel] [PATCH 0/28] integrate dma_ops

2008-04-08 Thread Glauber Costa
Hi,

This is the final integration of dma_ops between x86_64 and i386.
The final code is closer to x86_64 than to i386, which is obviously expected.

At the end, pci-dma_{32,64}.c are gone, pci-nommu_64.c is gone, and the 
temporary
pci-base_32.c is gone too.

This patchset received the same level of scrutiny as the others from my side:
compiled tested in at least 6 different random configs, boot tested in my 
hardware.

The final diffstat says:

 Documentation/feature-removal-schedule.txt |7 
 arch/x86/kernel/Makefile   |9 
 arch/x86/kernel/pci-base_32.c  |   72 ---
 arch/x86/kernel/pci-dma.c  |  524 +
 arch/x86/kernel/pci-dma_32.c   |  503 +++
 arch/x86/kernel/pci-dma_64.c   |  443 +---
 arch/x86/kernel/pci-nommu.c|  100 +
 arch/x86/kernel/pci-nommu_64.c |  140 ---
 arch/x86/mm/init_64.c  |4 
 include/asm-x86/dma-mapping.h  |   14 
 include/asm-x86/scatterlist.h  |3 
 11 files changed, 832 insertions(+), 987 deletions(-)



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 2 of 9] Core of mmu notifiers

2008-04-08 Thread Robin Holt
This one does not build on ia64.  I get the following:

[EMAIL PROTECTED] mmu_v12_xpmem_v003_v1]$ make compressed
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
  CHK include/linux/compile.h
  CC  mm/mmu_notifier.o
In file included from include/linux/mmu_notifier.h:6,
 from mm/mmu_notifier.c:12:
include/linux/mm_types.h:200: error: expected specifier-qualifier-list before 
‘cpumask_t’
In file included from mm/mmu_notifier.c:12:
include/linux/mmu_notifier.h: In function ‘mm_has_notifiers’:
include/linux/mmu_notifier.h:62: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
include/linux/mmu_notifier.h: In function ‘mmu_notifier_mm_init’:
include/linux/mmu_notifier.h:117: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
In file included from include/asm/pgtable.h:155,
 from include/linux/mm.h:39,
 from mm/mmu_notifier.c:14:
include/asm/mmu_context.h: In function ‘get_mmu_context’:
include/asm/mmu_context.h:81: error: ‘struct mm_struct’ has no member named 
‘context’
include/asm/mmu_context.h:88: error: ‘struct mm_struct’ has no member named 
‘context’
include/asm/mmu_context.h:90: error: ‘struct mm_struct’ has no member named 
‘cpu_vm_mask’
include/asm/mmu_context.h:99: error: ‘struct mm_struct’ has no member named 
‘context’
include/asm/mmu_context.h: In function ‘init_new_context’:
include/asm/mmu_context.h:120: error: ‘struct mm_struct’ has no member named 
‘context’
include/asm/mmu_context.h: In function ‘activate_context’:
include/asm/mmu_context.h:173: error: ‘struct mm_struct’ has no member named 
‘cpu_vm_mask’
include/asm/mmu_context.h:174: error: ‘struct mm_struct’ has no member named 
‘cpu_vm_mask’
include/asm/mmu_context.h:180: error: ‘struct mm_struct’ has no member named 
‘context’
mm/mmu_notifier.c: In function ‘__mmu_notifier_release’:
mm/mmu_notifier.c:25: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
mm/mmu_notifier.c:26: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_clear_flush_young’:
mm/mmu_notifier.c:47: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_page’:
mm/mmu_notifier.c:61: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_range_start’:
mm/mmu_notifier.c:73: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_range_end’:
mm/mmu_notifier.c:85: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘mmu_notifier_register’:
mm/mmu_notifier.c:102: error: ‘struct mm_struct’ has no member named 
‘mmu_notifier_list’
make[1]: *** [mm/mmu_notifier.o] Error 1
make: *** [mm] Error 2


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 02/28] x86: delete empty functions from pci-nommu_64.c

2008-04-08 Thread Glauber Costa
This functions are now called conditionally on their
existence in the struct. So just delete them, instead
of keeping an empty implementation.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-nommu_64.c |   15 ---
 1 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/pci-nommu_64.c b/arch/x86/kernel/pci-nommu_64.c
index 6e33076..90a7c40 100644
--- a/arch/x86/kernel/pci-nommu_64.c
+++ b/arch/x86/kernel/pci-nommu_64.c
@@ -35,10 +35,6 @@ nommu_map_single(struct device *hwdev, phys_addr_t paddr, 
size_t size,
return bus;
 }
 
-static void nommu_unmap_single(struct device *dev, dma_addr_t addr,size_t size,
-   int direction)
-{
-}
 
 /* Map a set of buffers described by scatterlist in streaming
  * mode for DMA.  This is the scatter-gather version of the
@@ -71,20 +67,9 @@ static int nommu_map_sg(struct device *hwdev, struct 
scatterlist *sg,
return nents;
 }
 
-/* Unmap a set of streaming mode DMA translations.
- * Again, cpu read rules concerning calls here are the same as for
- * pci_unmap_single() above.
- */
-static void nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, int dir)
-{
-}
-
 const struct dma_mapping_ops nommu_dma_ops = {
.map_single = nommu_map_single,
-   .unmap_single = nommu_unmap_single,
.map_sg = nommu_map_sg,
-   .unmap_sg = nommu_unmap_sg,
.is_phys = 1,
 };
 
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 01/28] x86: introduce pci-dma.c

2008-04-08 Thread Glauber Costa
This patch introduces pci-dma.c, a common file for pci dma
between i386 and x86_64. As a start, dma_set_mask() is the same
between architectures, and is placed there.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/Makefile |2 +-
 arch/x86/kernel/pci-dma.c|   14 ++
 arch/x86/kernel/pci-dma_32.c |   12 
 arch/x86/kernel/pci-dma_64.c |9 -
 4 files changed, 15 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kernel/pci-dma.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 53c8fa4..befe901 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,7 +24,7 @@ obj-$(CONFIG_X86_32)  += sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)   += sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)   += syscall_64.o vsyscall_64.o setup64.o
 obj-y  += pci-dma_$(BITS).o  bootflag.o e820_$(BITS).o
-obj-y  += quirks.o i8237.o topology.o kdebugfs.o
+obj-y  += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
 obj-y  += alternative.o i8253.o
 obj-$(CONFIG_X86_64)   += pci-nommu_64.o bugs_64.o
 obj-$(CONFIG_X86_32)   += pci-base_32.o
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
new file mode 100644
index 000..f1c24d8
--- /dev/null
+++ b/arch/x86/kernel/pci-dma.c
@@ -0,0 +1,14 @@
+#include linux/dma-mapping.h
+
+int dma_set_mask(struct device *dev, u64 mask)
+{
+   if (!dev-dma_mask || !dma_supported(dev, mask))
+   return -EIO;
+
+   *dev-dma_mask = mask;
+
+   return 0;
+}
+EXPORT_SYMBOL(dma_set_mask);
+
+
diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index be6b1f6..9e82976 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -182,18 +182,6 @@ dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
-int
-dma_set_mask(struct device *dev, u64 mask)
-{
-   if (!dev-dma_mask || !dma_supported(dev, mask))
-   return -EIO;
-
-   *dev-dma_mask = mask;
-
-   return 0;
-}
-EXPORT_SYMBOL(dma_set_mask);
-
 
 static __devinit void via_no_dac(struct pci_dev *dev)
 {
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index f97a08d..e697b86 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -213,15 +213,6 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
-int dma_set_mask(struct device *dev, u64 mask)
-{
-   if (!dev-dma_mask || !dma_supported(dev, mask))
-   return -EIO;
-   *dev-dma_mask = mask;
-   return 0;
-}
-EXPORT_SYMBOL(dma_set_mask);
-
 /*
  * See Documentation/x86_64/boot-options.txt for the iommu kernel parameter
  * documentation.
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 03/28] x86: implement mapping_error in pci-nommu_64.c

2008-04-08 Thread Glauber Costa
This patch implements mapping_error for pci-nommu_64.c.
It takes care to keep the same compatible behaviour it already
had. Although this file is not (yet) used for i386, we introduce
the i386 version here. Again, care is taken, even at the expense of
an ifdef, to keep the same behaviour inconditionally.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-nommu_64.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-nommu_64.c b/arch/x86/kernel/pci-nommu_64.c
index 90a7c40..a4e8ccf 100644
--- a/arch/x86/kernel/pci-nommu_64.c
+++ b/arch/x86/kernel/pci-nommu_64.c
@@ -67,9 +67,21 @@ static int nommu_map_sg(struct device *hwdev, struct 
scatterlist *sg,
return nents;
 }
 
+/* Make sure we keep the same behaviour */
+static int nommu_mapping_error(dma_addr_t dma_addr)
+{
+#ifdef CONFIG_X86_32
+   return 0;
+#else
+   return (dma_addr == bad_dma_address);
+#endif
+}
+
+
 const struct dma_mapping_ops nommu_dma_ops = {
.map_single = nommu_map_single,
.map_sg = nommu_map_sg,
+   .mapping_error = nommu_mapping_error,
.is_phys = 1,
 };
 
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 05/28] x86: use sg_phys in x86_64

2008-04-08 Thread Glauber Costa
To make the code usable in i386, where we have high memory mappings,
we drop te virt_to_bus(sg_virt()) construction in favour of sg_phys.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-nommu_64.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/pci-nommu_64.c b/arch/x86/kernel/pci-nommu_64.c
index 1da9cf9..c6901e7 100644
--- a/arch/x86/kernel/pci-nommu_64.c
+++ b/arch/x86/kernel/pci-nommu_64.c
@@ -60,7 +60,7 @@ static int nommu_map_sg(struct device *hwdev, struct 
scatterlist *sg,
 
for_each_sg(sg, s, nents, i) {
BUG_ON(!sg_page(s));
-   s-dma_address = virt_to_bus(sg_virt(s));
+   s-dma_address = sg_phys(s);
if (!check_addr(map_sg, hwdev, s-dma_address, s-length))
return 0;
s-dma_length = s-length;
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 06/28] x86: use dma_length in i386

2008-04-08 Thread Glauber Costa
This is done to get the code closer to x86_64.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-base_32.c |1 +
 include/asm-x86/scatterlist.h |2 --
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-base_32.c b/arch/x86/kernel/pci-base_32.c
index 7caf5c2..837bbe9 100644
--- a/arch/x86/kernel/pci-base_32.c
+++ b/arch/x86/kernel/pci-base_32.c
@@ -24,6 +24,7 @@ static int pci32_dma_map_sg(struct device *dev, struct 
scatterlist *sglist,
BUG_ON(!sg_page(sg));
 
sg-dma_address = sg_phys(sg);
+   sg-dma_length = sg-length;
}
 
flush_write_buffers();
diff --git a/include/asm-x86/scatterlist.h b/include/asm-x86/scatterlist.h
index d13c197..c043206 100644
--- a/include/asm-x86/scatterlist.h
+++ b/include/asm-x86/scatterlist.h
@@ -11,9 +11,7 @@ struct scatterlist {
unsigned intoffset;
unsigned intlength;
dma_addr_t  dma_address;
-#ifdef CONFIG_X86_64
unsigned intdma_length;
-#endif
 };
 
 #define ARCH_HAS_SG_CHAIN
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 14/28] x86: merge iommu initialization parameters

2008-04-08 Thread Glauber Costa
we merge the iommu initialization parameters in pci-dma.c
Nice thing, that both architectures at least recognize the same
parameters.

usedac i386 parameter is marked for deprecation

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 Documentation/feature-removal-schedule.txt |7 +++
 arch/x86/kernel/pci-dma.c  |   81 
 arch/x86/kernel/pci-dma_32.c   |   12 
 arch/x86/kernel/pci-dma_64.c   |   79 ---
 include/asm-x86/dma-mapping.h  |1 +
 5 files changed, 89 insertions(+), 91 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index 1092b2e..537c88b 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -306,3 +306,10 @@ Why:   Not used in-tree. The current out-of-tree users 
used it to
code / infrastructure should be in the kernel and not in some
out-of-tree driver.
 Who:   Thomas Gleixner [EMAIL PROTECTED]
+
+
+
+What:  usedac i386 kernel parameter
+When:  2.6.27
+Why:   replaced by allowdac and no dac combination
+Who:   Glauber Costa [EMAIL PROTECTED]
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 4289a9b..e04f42c 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -24,6 +24,18 @@ int panic_on_overflow __read_mostly = 0;
 int force_iommu __read_mostly = 0;
 #endif
 
+int iommu_merge __read_mostly = 0;
+
+int no_iommu __read_mostly;
+/* Set this to 1 if there is a HW IOMMU in the system */
+int iommu_detected __read_mostly = 0;
+
+/* This tells the BIO block layer to assume merging. Default to off
+   because we cannot guarantee merging later. */
+int iommu_bio_merge __read_mostly = 0;
+EXPORT_SYMBOL(iommu_bio_merge);
+
+
 int dma_set_mask(struct device *dev, u64 mask)
 {
if (!dev-dma_mask || !dma_supported(dev, mask))
@@ -183,3 +195,72 @@ void __init pci_iommu_alloc(void)
 #endif
 }
 #endif
+
+/*
+ * See Documentation/x86_64/boot-options.txt for the iommu kernel parameter
+ * documentation.
+ */
+static __init int iommu_setup(char *p)
+{
+   iommu_merge = 1;
+
+   if (!p)
+   return -EINVAL;
+
+   while (*p) {
+   if (!strncmp(p, off, 3))
+   no_iommu = 1;
+   /* gart_parse_options has more force support */
+   if (!strncmp(p, force, 5))
+   force_iommu = 1;
+   if (!strncmp(p, noforce, 7)) {
+   iommu_merge = 0;
+   force_iommu = 0;
+   }
+
+   if (!strncmp(p, biomerge, 8)) {
+   iommu_bio_merge = 4096;
+   iommu_merge = 1;
+   force_iommu = 1;
+   }
+   if (!strncmp(p, panic, 5))
+   panic_on_overflow = 1;
+   if (!strncmp(p, nopanic, 7))
+   panic_on_overflow = 0;
+   if (!strncmp(p, merge, 5)) {
+   iommu_merge = 1;
+   force_iommu = 1;
+   }
+   if (!strncmp(p, nomerge, 7))
+   iommu_merge = 0;
+   if (!strncmp(p, forcesac, 8))
+   iommu_sac_force = 1;
+   if (!strncmp(p, allowdac, 8))
+   forbid_dac = 0;
+   if (!strncmp(p, nodac, 5))
+   forbid_dac = -1;
+   if (!strncmp(p, usedac, 6)) {
+   forbid_dac = -1;
+   return 1;
+   }
+#ifdef CONFIG_SWIOTLB
+   if (!strncmp(p, soft, 4))
+   swiotlb = 1;
+#endif
+
+#ifdef CONFIG_GART_IOMMU
+   gart_parse_options(p);
+#endif
+
+#ifdef CONFIG_CALGARY_IOMMU
+   if (!strncmp(p, calgary, 7))
+   use_calgary = 1;
+#endif /* CONFIG_CALGARY_IOMMU */
+
+   p += strcspn(p, ,);
+   if (*p == ',')
+   ++p;
+   }
+   return 0;
+}
+early_param(iommu, iommu_setup);
diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 1d4091a..eea52df 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -153,15 +153,3 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
return mem-virt_base + (pos  PAGE_SHIFT);
 }
 EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
-
-#ifdef CONFIG_PCI
-static int check_iommu(char *s)
-{
-   if (!strcmp(s, usedac)) {
-   forbid_dac = -1;
-   return 1;
-   }
-   return 0;
-}
-__setup(iommu=, check_iommu);
-#endif
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index c80da76..e7d45cf 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -14,22 +14,9 @@
 #include asm/gart.h
 #include 

[kvm-devel] [PATCH 12/28] x86: move x86_64-specific to common code.

2008-04-08 Thread Glauber Costa
This patch moves the bootmem functions, that are largely
x86_64-specific into pci-dma.c. The code goes inside an ifdef.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma.c|   73 ++
 arch/x86/kernel/pci-dma_64.c |   68 ---
 2 files changed, 73 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index e81e16f..f6d6a92 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -1,7 +1,10 @@
 #include linux/dma-mapping.h
 #include linux/dmar.h
 #include linux/pci.h
+#include linux/bootmem.h
 
+#include asm/proto.h
+#include asm/dma.h
 #include asm/gart.h
 #include asm/calgary.h
 
@@ -66,3 +69,73 @@ static __devinit void via_no_dac(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
 #endif
+
+#ifdef CONFIG_X86_64
+static __initdata void *dma32_bootmem_ptr;
+static unsigned long dma32_bootmem_size __initdata = (128ULL20);
+
+static int __init parse_dma32_size_opt(char *p)
+{
+   if (!p)
+   return -EINVAL;
+   dma32_bootmem_size = memparse(p, p);
+   return 0;
+}
+early_param(dma32_size, parse_dma32_size_opt);
+
+void __init dma32_reserve_bootmem(void)
+{
+   unsigned long size, align;
+   if (end_pfn = MAX_DMA32_PFN)
+   return;
+
+   align = 64ULL20;
+   size = round_up(dma32_bootmem_size, align);
+   dma32_bootmem_ptr = __alloc_bootmem_nopanic(size, align,
+__pa(MAX_DMA_ADDRESS));
+   if (dma32_bootmem_ptr)
+   dma32_bootmem_size = size;
+   else
+   dma32_bootmem_size = 0;
+}
+static void __init dma32_free_bootmem(void)
+{
+   int node;
+
+   if (end_pfn = MAX_DMA32_PFN)
+   return;
+
+   if (!dma32_bootmem_ptr)
+   return;
+
+   for_each_online_node(node)
+   free_bootmem_node(NODE_DATA(node), __pa(dma32_bootmem_ptr),
+ dma32_bootmem_size);
+
+   dma32_bootmem_ptr = NULL;
+   dma32_bootmem_size = 0;
+}
+
+void __init pci_iommu_alloc(void)
+{
+   /* free the range so iommu could get some range less than 4G */
+   dma32_free_bootmem();
+   /*
+* The order of these functions is important for
+* fall-back/fail-over reasons
+*/
+#ifdef CONFIG_GART_IOMMU
+   gart_iommu_hole_init();
+#endif
+
+#ifdef CONFIG_CALGARY_IOMMU
+   detect_calgary();
+#endif
+
+   detect_intel_iommu();
+
+#ifdef CONFIG_SWIOTLB
+   pci_swiotlb_init();
+#endif
+}
+#endif
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index e194460..7820675 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -268,71 +268,3 @@ static __init int iommu_setup(char *p)
return 0;
 }
 early_param(iommu, iommu_setup);
-
-static __initdata void *dma32_bootmem_ptr;
-static unsigned long dma32_bootmem_size __initdata = (128ULL20);
-
-static int __init parse_dma32_size_opt(char *p)
-{
-   if (!p)
-   return -EINVAL;
-   dma32_bootmem_size = memparse(p, p);
-   return 0;
-}
-early_param(dma32_size, parse_dma32_size_opt);
-
-void __init dma32_reserve_bootmem(void)
-{
-   unsigned long size, align;
-   if (end_pfn = MAX_DMA32_PFN)
-   return;
-
-   align = 64ULL20;
-   size = round_up(dma32_bootmem_size, align);
-   dma32_bootmem_ptr = __alloc_bootmem_nopanic(size, align,
-__pa(MAX_DMA_ADDRESS));
-   if (dma32_bootmem_ptr)
-   dma32_bootmem_size = size;
-   else
-   dma32_bootmem_size = 0;
-}
-static void __init dma32_free_bootmem(void)
-{
-   int node;
-
-   if (end_pfn = MAX_DMA32_PFN)
-   return;
-
-   if (!dma32_bootmem_ptr)
-   return;
-
-   for_each_online_node(node)
-   free_bootmem_node(NODE_DATA(node), __pa(dma32_bootmem_ptr),
- dma32_bootmem_size);
-
-   dma32_bootmem_ptr = NULL;
-   dma32_bootmem_size = 0;
-}
-
-void __init pci_iommu_alloc(void)
-{
-   /* free the range so iommu could get some range less than 4G */
-   dma32_free_bootmem();
-   /*
-* The order of these functions is important for
-* fall-back/fail-over reasons
-*/
-#ifdef CONFIG_GART_IOMMU
-   gart_iommu_hole_init();
-#endif
-
-#ifdef CONFIG_CALGARY_IOMMU
-   detect_calgary();
-#endif
-
-   detect_intel_iommu();
-
-#ifdef CONFIG_SWIOTLB
-   pci_swiotlb_init();
-#endif
-}
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

[kvm-devel] [PATCH 15/28] x86: move dma_coherent functions to pci-dma.c

2008-04-08 Thread Glauber Costa
They are placed in an ifdef, since they are i386 specific
the structure definition goes to dma-mapping.h.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma.c |   81 +++
 arch/x86/kernel/pci-dma_32.c  |   85 -
 include/asm-x86/dma-mapping.h |8 
 3 files changed, 89 insertions(+), 85 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index e04f42c..d06d8df 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -47,6 +47,87 @@ int dma_set_mask(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_set_mask);
 
+#ifdef CONFIG_X86_32
+int dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
+   dma_addr_t device_addr, size_t size, int flags)
+{
+   void __iomem *mem_base = NULL;
+   int pages = size  PAGE_SHIFT;
+   int bitmap_size = BITS_TO_LONGS(pages) * sizeof(long);
+
+   if ((flags  (DMA_MEMORY_MAP | DMA_MEMORY_IO)) == 0)
+   goto out;
+   if (!size)
+   goto out;
+   if (dev-dma_mem)
+   goto out;
+
+   /* FIXME: this routine just ignores DMA_MEMORY_INCLUDES_CHILDREN */
+
+   mem_base = ioremap(bus_addr, size);
+   if (!mem_base)
+   goto out;
+
+   dev-dma_mem = kzalloc(sizeof(struct dma_coherent_mem), GFP_KERNEL);
+   if (!dev-dma_mem)
+   goto out;
+   dev-dma_mem-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+   if (!dev-dma_mem-bitmap)
+   goto free1_out;
+
+   dev-dma_mem-virt_base = mem_base;
+   dev-dma_mem-device_base = device_addr;
+   dev-dma_mem-size = pages;
+   dev-dma_mem-flags = flags;
+
+   if (flags  DMA_MEMORY_MAP)
+   return DMA_MEMORY_MAP;
+
+   return DMA_MEMORY_IO;
+
+ free1_out:
+   kfree(dev-dma_mem);
+ out:
+   if (mem_base)
+   iounmap(mem_base);
+   return 0;
+}
+EXPORT_SYMBOL(dma_declare_coherent_memory);
+
+void dma_release_declared_memory(struct device *dev)
+{
+   struct dma_coherent_mem *mem = dev-dma_mem;
+
+   if (!mem)
+   return;
+   dev-dma_mem = NULL;
+   iounmap(mem-virt_base);
+   kfree(mem-bitmap);
+   kfree(mem);
+}
+EXPORT_SYMBOL(dma_release_declared_memory);
+
+void *dma_mark_declared_memory_occupied(struct device *dev,
+   dma_addr_t device_addr, size_t size)
+{
+   struct dma_coherent_mem *mem = dev-dma_mem;
+   int pos, err;
+   int pages = (size + (device_addr  ~PAGE_MASK) + PAGE_SIZE - 1);
+
+   pages = PAGE_SHIFT;
+
+   if (!mem)
+   return ERR_PTR(-EINVAL);
+
+   pos = (device_addr - mem-device_base)  PAGE_SHIFT;
+   err = bitmap_allocate_region(mem-bitmap, pos, get_order(pages));
+   if (err != 0)
+   return ERR_PTR(err);
+   return mem-virt_base + (pos  PAGE_SHIFT);
+}
+EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
+#endif /* CONFIG_X86_32 */
+
 int dma_supported(struct device *dev, u64 mask)
 {
 #ifdef CONFIG_PCI
diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index eea52df..818d95e 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -18,14 +18,6 @@
 dma_addr_t bad_dma_address __read_mostly = 0x0;
 EXPORT_SYMBOL(bad_dma_address);
 
-struct dma_coherent_mem {
-   void*virt_base;
-   u32 device_base;
-   int size;
-   int flags;
-   unsigned long   *bitmap;
-};
-
 void *dma_alloc_coherent(struct device *dev, size_t size,
   dma_addr_t *dma_handle, gfp_t gfp)
 {
@@ -76,80 +68,3 @@ void dma_free_coherent(struct device *dev, size_t size,
free_pages((unsigned long)vaddr, order);
 }
 EXPORT_SYMBOL(dma_free_coherent);
-
-int dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
-   dma_addr_t device_addr, size_t size, int flags)
-{
-   void __iomem *mem_base = NULL;
-   int pages = size  PAGE_SHIFT;
-   int bitmap_size = BITS_TO_LONGS(pages) * sizeof(long);
-
-   if ((flags  (DMA_MEMORY_MAP | DMA_MEMORY_IO)) == 0)
-   goto out;
-   if (!size)
-   goto out;
-   if (dev-dma_mem)
-   goto out;
-
-   /* FIXME: this routine just ignores DMA_MEMORY_INCLUDES_CHILDREN */
-
-   mem_base = ioremap(bus_addr, size);
-   if (!mem_base)
-   goto out;
-
-   dev-dma_mem = kzalloc(sizeof(struct dma_coherent_mem), GFP_KERNEL);
-   if (!dev-dma_mem)
-   goto out;
-   dev-dma_mem-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-   if (!dev-dma_mem-bitmap)
-   goto free1_out;
-
-   dev-dma_mem-virt_base = mem_base;
-   dev-dma_mem-device_base = device_addr;
-   dev-dma_mem-size = pages;
-   dev-dma_mem-flags = flags;
-
-   if 

[kvm-devel] [PATCH 16/28] x86: isolate coherent mapping functions

2008-04-08 Thread Glauber Costa
i386 implements the declare coherent memory API, and x86_64 does not
it is reflected in pieces of dma_alloc_coherent and dma_free_coherent.
Those pieces are isolated in separate functions, that are declared
as empty macros in x86_64. This way we can make the code the same.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |   51 -
 arch/x86/kernel/pci-dma_64.c |   11 -
 2 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 818d95e..78c7640 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -18,27 +18,50 @@
 dma_addr_t bad_dma_address __read_mostly = 0x0;
 EXPORT_SYMBOL(bad_dma_address);
 
-void *dma_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t gfp)
+static int dma_alloc_from_coherent_mem(struct device *dev, ssize_t size,
+  dma_addr_t *dma_handle, void **ret)
 {
-   void *ret;
struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL;
int order = get_order(size);
-   /* ignore region specifiers */
-   gfp = ~(__GFP_DMA | __GFP_HIGHMEM);
 
if (mem) {
int page = bitmap_find_free_region(mem-bitmap, mem-size,
 order);
if (page = 0) {
*dma_handle = mem-device_base + (page  PAGE_SHIFT);
-   ret = mem-virt_base + (page  PAGE_SHIFT);
-   memset(ret, 0, size);
-   return ret;
+   *ret = mem-virt_base + (page  PAGE_SHIFT);
+   memset(*ret, 0, size);
}
if (mem-flags  DMA_MEMORY_EXCLUSIVE)
-   return NULL;
+   *ret = NULL;
+   }
+   return (mem != NULL);
+}
+
+static int dma_release_coherent(struct device *dev, int order, void *vaddr)
+{
+   struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL;
+
+   if (mem  vaddr = mem-virt_base  vaddr 
+  (mem-virt_base + (mem-size  PAGE_SHIFT))) {
+   int page = (vaddr - mem-virt_base)  PAGE_SHIFT;
+
+   bitmap_release_region(mem-bitmap, page, order);
+   return 1;
}
+   return 0;
+}
+
+void *dma_alloc_coherent(struct device *dev, size_t size,
+  dma_addr_t *dma_handle, gfp_t gfp)
+{
+   void *ret = NULL;
+   int order = get_order(size);
+   /* ignore region specifiers */
+   gfp = ~(__GFP_DMA | __GFP_HIGHMEM);
+
+   if (dma_alloc_from_coherent_mem(dev, size, dma_handle, ret))
+   return ret;
 
if (dev == NULL || (dev-coherent_dma_mask  0x))
gfp |= GFP_DMA;
@@ -56,15 +79,11 @@ EXPORT_SYMBOL(dma_alloc_coherent);
 void dma_free_coherent(struct device *dev, size_t size,
 void *vaddr, dma_addr_t dma_handle)
 {
-   struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL;
int order = get_order(size);
 
WARN_ON(irqs_disabled());   /* for portability */
-   if (mem  vaddr = mem-virt_base  vaddr  (mem-virt_base + 
(mem-size  PAGE_SHIFT))) {
-   int page = (vaddr - mem-virt_base)  PAGE_SHIFT;
-
-   bitmap_release_region(mem-bitmap, page, order);
-   } else
-   free_pages((unsigned long)vaddr, order);
+   if (dma_release_coherent(dev, order, vaddr))
+   return;
+   free_pages((unsigned long)vaddr, order);
 }
 EXPORT_SYMBOL(dma_free_coherent);
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index e7d45cf..6eacd58 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -39,6 +39,8 @@ dma_alloc_pages(struct device *dev, gfp_t gfp, unsigned order)
return page ? page_address(page) : NULL;
 }
 
+#define dma_alloc_from_coherent_mem(dev, size, handle, ret) (0)
+#define dma_release_coherent(dev, order, vaddr) (0)
 /*
  * Allocate memory for a coherent mapping.
  */
@@ -50,6 +52,10 @@ dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
unsigned long dma_mask = 0;
u64 bus;
 
+
+   if (dma_alloc_from_coherent_mem(dev, size, dma_handle, memory))
+   return memory;
+
if (!dev)
dev = fallback_dev;
dma_mask = dev-coherent_dma_mask;
@@ -141,9 +147,12 @@ EXPORT_SYMBOL(dma_alloc_coherent);
 void dma_free_coherent(struct device *dev, size_t size,
 void *vaddr, dma_addr_t bus)
 {
+   int order = get_order(size);
WARN_ON(irqs_disabled());   /* for portability */
+   if (dma_release_coherent(dev, order, vaddr))
+   return;
if (dma_ops-unmap_single)
dma_ops-unmap_single(dev, bus, size, 0);
-   free_pages((unsigned long)vaddr, 

[kvm-devel] [PATCH 18/28] x86: move bad_dma_address

2008-04-08 Thread Glauber Costa
It goes to pci-dma.c, and is removed from the arch-specific files.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma.c|2 ++
 arch/x86/kernel/pci-dma_32.c |4 
 arch/x86/kernel/pci-dma_64.c |2 --
 3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d06d8df..d6734ed 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -35,6 +35,8 @@ int iommu_detected __read_mostly = 0;
 int iommu_bio_merge __read_mostly = 0;
 EXPORT_SYMBOL(iommu_bio_merge);
 
+dma_addr_t bad_dma_address __read_mostly = 0;
+EXPORT_SYMBOL(bad_dma_address);
 
 int dma_set_mask(struct device *dev, u64 mask)
 {
diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 49166a4..5ae3470 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -14,10 +14,6 @@
 #include linux/module.h
 #include asm/io.h
 
-/* For i386, we make it point to the NULL address */
-dma_addr_t bad_dma_address __read_mostly = 0x0;
-EXPORT_SYMBOL(bad_dma_address);
-
 static int dma_alloc_from_coherent_mem(struct device *dev, ssize_t size,
   dma_addr_t *dma_handle, void **ret)
 {
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index 6eacd58..5f03e41 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -14,8 +14,6 @@
 #include asm/gart.h
 #include asm/calgary.h
 
-dma_addr_t bad_dma_address __read_mostly;
-EXPORT_SYMBOL(bad_dma_address);
 
 /* Dummy device used for NULL arguments (normally ISA). Better would
be probably a smaller DMA mask, but this is bug-to-bug compatible
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 21/28] x86: retry allocation if failed

2008-04-08 Thread Glauber Costa
This patch puts in the code to retry allocation in case it fails. By its
own, it does not make much sense but making the code look like x86_64.
But later patches in this series will make we try to allocate from
zones other than DMA first, which will possibly fail.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |   34 +-
 1 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 0d630ae..f6cf434 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -66,6 +66,8 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
struct page *page;
dma_addr_t bus;
int order = get_order(size);
+   unsigned long dma_mask = 0;
+
/* ignore region specifiers */
gfp = ~(__GFP_DMA | __GFP_HIGHMEM);
 
@@ -75,15 +77,37 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
if (dev == NULL || (dev-coherent_dma_mask  0x))
gfp |= GFP_DMA;
 
+   dma_mask = dev-coherent_dma_mask;
+   if (dma_mask == 0)
+   dma_mask = DMA_32BIT_MASK;
+
+ again:
page = dma_alloc_pages(dev, gfp, order);
if (page == NULL)
return NULL;
 
-   ret = page_address(page);
-   bus = page_to_phys(page);
-
-   memset(ret, 0, size);
-   *dma_handle = bus;
+   {
+   int high, mmu;
+   bus = page_to_phys(page);
+   ret = page_address(page);
+   high = (bus + size) = dma_mask;
+   mmu = high;
+   if (force_iommu  !(gfp  GFP_DMA))
+   mmu = 1;
+   else if (high) {
+   free_pages((unsigned long)ret,
+  get_order(size));
+
+   /* Don't use the 16MB ZONE_DMA unless absolutely
+  needed. It's better to use remapping first. */
+   if (dma_mask  DMA_32BIT_MASK  !(gfp  GFP_DMA)) {
+   gfp = (gfp  ~GFP_DMA32) | GFP_DMA;
+   goto again;
+   }
+   }
+   memset(ret, 0, size);
+   *dma_handle = bus;
+   }
 
return ret;
 }
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 17/28] x86: adjust dma_free_coherent for i386

2008-04-08 Thread Glauber Costa
We call unmap_single, if available.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 78c7640..49166a4 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -84,6 +84,8 @@ void dma_free_coherent(struct device *dev, size_t size,
WARN_ON(irqs_disabled());   /* for portability */
if (dma_release_coherent(dev, order, vaddr))
return;
+   if (dma_ops-unmap_single)
+   dma_ops-unmap_single(dev, dma_handle, size, 0);
free_pages((unsigned long)vaddr, order);
 }
 EXPORT_SYMBOL(dma_free_coherent);
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 20/28] x86: use numa allocation function in i386

2008-04-08 Thread Glauber Costa
We can do it here to, in the same way x86_64 does.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |   27 ++-
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 5ae3470..0d630ae 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -48,10 +48,23 @@ static int dma_release_coherent(struct device *dev, int 
order, void *vaddr)
return 0;
 }
 
+/* Allocate DMA memory on node near device */
+noinline struct page *
+dma_alloc_pages(struct device *dev, gfp_t gfp, unsigned order)
+{
+   int node;
+
+   node = dev_to_node(dev);
+
+   return alloc_pages_node(node, gfp, order);
+}
+
 void *dma_alloc_coherent(struct device *dev, size_t size,
   dma_addr_t *dma_handle, gfp_t gfp)
 {
void *ret = NULL;
+   struct page *page;
+   dma_addr_t bus;
int order = get_order(size);
/* ignore region specifiers */
gfp = ~(__GFP_DMA | __GFP_HIGHMEM);
@@ -62,12 +75,16 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
if (dev == NULL || (dev-coherent_dma_mask  0x))
gfp |= GFP_DMA;
 
-   ret = (void *)__get_free_pages(gfp, order);
+   page = dma_alloc_pages(dev, gfp, order);
+   if (page == NULL)
+   return NULL;
+
+   ret = page_address(page);
+   bus = page_to_phys(page);
+
+   memset(ret, 0, size);
+   *dma_handle = bus;
 
-   if (ret != NULL) {
-   memset(ret, 0, size);
-   *dma_handle = virt_to_phys(ret);
-   }
return ret;
 }
 EXPORT_SYMBOL(dma_alloc_coherent);
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 23/28] x86: don't try to allocate from DMA zone at first

2008-04-08 Thread Glauber Costa
If we fail, we'll loop into the allocation again,
and then allocate in the DMA zone.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 0e9ec11..11f100a 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -84,9 +84,6 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
if (dma_alloc_from_coherent_mem(dev, size, dma_handle, ret))
return ret;
 
-   if (dev == NULL || (dev-coherent_dma_mask  0x))
-   gfp |= GFP_DMA;
-
if (!dev)
dev = fallback_dev;
 
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 22/28] x86: use a fallback dev for i386

2008-04-08 Thread Glauber Costa
We can use a fallback dev for cases of a NULL device being passed (mostly ISA)
This comes from x86_64 implementation.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index f6cf434..0e9ec11 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -14,6 +14,16 @@
 #include linux/module.h
 #include asm/io.h
 
+/* Dummy device used for NULL arguments (normally ISA). Better would
+   be probably a smaller DMA mask, but this is bug-to-bug compatible
+   to i386. */
+struct device fallback_dev = {
+   .bus_id = fallback device,
+   .coherent_dma_mask = DMA_32BIT_MASK,
+   .dma_mask = fallback_dev.coherent_dma_mask,
+};
+
+
 static int dma_alloc_from_coherent_mem(struct device *dev, ssize_t size,
   dma_addr_t *dma_handle, void **ret)
 {
@@ -77,6 +87,9 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
if (dev == NULL || (dev-coherent_dma_mask  0x))
gfp |= GFP_DMA;
 
+   if (!dev)
+   dev = fallback_dev;
+
dma_mask = dev-coherent_dma_mask;
if (dma_mask == 0)
dma_mask = DMA_32BIT_MASK;
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 25/28] x86: remove kludge from x86_64

2008-04-08 Thread Glauber Costa
The claim is that i386 does it. Just it does not.
So remove it.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_64.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index b956f59..596c8c8 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -68,10 +68,6 @@ dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
/* Don't invoke OOM killer */
gfp |= __GFP_NORETRY;
 
-   /* Kludge to make it bug-to-bug compatible with i386. i386
-  uses the normal dma_mask for alloc_coherent. */
-   dma_mask = *dev-dma_mask;
-
/* Why =? Even when the mask is smaller than 4GB it is often
   larger than 16MB and in this case we have a chance of
   finding fitting memory in the next higher zone first. If
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 26/28] x86: return conditional to mmu

2008-04-08 Thread Glauber Costa
Just return our allocation if we don't have an mmu. For i386, where this patch
is being applied, we never have. So our goal is just to have the code to look 
like
x86_64's.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |   34 --
 1 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 5450bd1..f134de3 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -116,12 +116,42 @@ again:
gfp = (gfp  ~GFP_DMA32) | GFP_DMA;
goto again;
}
+
+   /* Let low level make its own zone decisions */
+   gfp = ~(GFP_DMA32|GFP_DMA);
+
+   if (dma_ops-alloc_coherent)
+   return dma_ops-alloc_coherent(dev, size,
+  dma_handle, gfp);
+   return NULL;
+
}
memset(ret, 0, size);
-   *dma_handle = bus;
+   if (!mmu) {
+   *dma_handle = bus;
+   return ret;
+   }
+   }
+
+   if (dma_ops-alloc_coherent) {
+   free_pages((unsigned long)ret, get_order(size));
+   gfp = ~(GFP_DMA|GFP_DMA32);
+   return dma_ops-alloc_coherent(dev, size, dma_handle, gfp);
+   }
+
+   if (dma_ops-map_simple) {
+   *dma_handle = dma_ops-map_simple(dev, virt_to_phys(ret),
+ size,
+ PCI_DMA_BIDIRECTIONAL);
+   if (*dma_handle != bad_dma_address)
+   return ret;
}
 
-   return ret;
+   if (panic_on_overflow)
+   panic(dma_alloc_coherent: IOMMU overflow by %lu bytes\n,
+ (unsigned long)size);
+   free_pages((unsigned long)ret, get_order(size));
+   return NULL;
 }
 EXPORT_SYMBOL(dma_alloc_coherent);
 
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 27/28] x86: don't do dma if mask is NULL.

2008-04-08 Thread Glauber Costa
if the device hasn't provided a mask, abort allocation.
Note that we're using a fallback device now, so it does not cover
the case of a NULL device: just drivers passing NULL masks around.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index f134de3..d2f7074 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -91,6 +91,9 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
if (dma_mask == 0)
dma_mask = DMA_32BIT_MASK;
 
+   if (dev-dma_mask == NULL)
+   return NULL;
+
/* Don't invoke OOM killer */
gfp |= __GFP_NORETRY;
 again:
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 28/28] x86: integrate pci-dma.c

2008-04-08 Thread Glauber Costa
The code in pci-dma_{32,64}.c are now sufficiently
close to each other. We merge them in pci-dma.c.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/Makefile |2 +-
 arch/x86/kernel/pci-dma.c|  175 ++
 arch/x86/kernel/pci-dma_32.c |  173 -
 arch/x86/kernel/pci-dma_64.c |  154 -
 4 files changed, 176 insertions(+), 328 deletions(-)
 delete mode 100644 arch/x86/kernel/pci-dma_32.c
 delete mode 100644 arch/x86/kernel/pci-dma_64.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b2a1358..423e1c4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -23,7 +23,7 @@ obj-y += setup_$(BITS).o i8259_$(BITS).o 
setup.o
 obj-$(CONFIG_X86_32)   += sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)   += sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)   += syscall_64.o vsyscall_64.o setup64.o
-obj-y  += pci-dma_$(BITS).o  bootflag.o e820_$(BITS).o
+obj-y  += bootflag.o e820_$(BITS).o
 obj-y  += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
 obj-y  += alternative.o i8253.o pci-nommu.o
 obj-$(CONFIG_X86_64)   += bugs_64.o
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d6734ed..5cc8d5a 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -38,6 +38,15 @@ EXPORT_SYMBOL(iommu_bio_merge);
 dma_addr_t bad_dma_address __read_mostly = 0;
 EXPORT_SYMBOL(bad_dma_address);
 
+/* Dummy device used for NULL arguments (normally ISA). Better would
+   be probably a smaller DMA mask, but this is bug-to-bug compatible
+   to older i386. */
+struct device fallback_dev = {
+   .bus_id = fallback device,
+   .coherent_dma_mask = DMA_32BIT_MASK,
+   .dma_mask = fallback_dev.coherent_dma_mask,
+};
+
 int dma_set_mask(struct device *dev, u64 mask)
 {
if (!dev-dma_mask || !dma_supported(dev, mask))
@@ -128,6 +137,43 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
return mem-virt_base + (pos  PAGE_SHIFT);
 }
 EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
+
+static int dma_alloc_from_coherent_mem(struct device *dev, ssize_t size,
+  dma_addr_t *dma_handle, void **ret)
+{
+   struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL;
+   int order = get_order(size);
+
+   if (mem) {
+   int page = bitmap_find_free_region(mem-bitmap, mem-size,
+order);
+   if (page = 0) {
+   *dma_handle = mem-device_base + (page  PAGE_SHIFT);
+   *ret = mem-virt_base + (page  PAGE_SHIFT);
+   memset(*ret, 0, size);
+   }
+   if (mem-flags  DMA_MEMORY_EXCLUSIVE)
+   *ret = NULL;
+   }
+   return (mem != NULL);
+}
+
+static int dma_release_coherent(struct device *dev, int order, void *vaddr)
+{
+   struct dma_coherent_mem *mem = dev ? dev-dma_mem : NULL;
+
+   if (mem  vaddr = mem-virt_base  vaddr 
+  (mem-virt_base + (mem-size  PAGE_SHIFT))) {
+   int page = (vaddr - mem-virt_base)  PAGE_SHIFT;
+
+   bitmap_release_region(mem-bitmap, page, order);
+   return 1;
+   }
+   return 0;
+}
+#else
+#define dma_alloc_from_coherent_mem(dev, size, handle, ret) (0)
+#define dma_release_coherent(dev, order, vaddr) (0)
 #endif /* CONFIG_X86_32 */
 
 int dma_supported(struct device *dev, u64 mask)
@@ -171,6 +217,135 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
+/* Allocate DMA memory on node near device */
+noinline struct page *
+dma_alloc_pages(struct device *dev, gfp_t gfp, unsigned order)
+{
+   int node;
+
+   node = dev_to_node(dev);
+
+   return alloc_pages_node(node, gfp, order);
+}
+
+/*
+ * Allocate memory for a coherent mapping.
+ */
+void *
+dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
+  gfp_t gfp)
+{
+   void *memory = NULL;
+   struct page *page;
+   unsigned long dma_mask = 0;
+   dma_addr_t bus;
+
+   /* ignore region specifiers */
+   gfp = ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
+   if (dma_alloc_from_coherent_mem(dev, size, dma_handle, memory))
+   return memory;
+
+   if (!dev)
+   dev = fallback_dev;
+   dma_mask = dev-coherent_dma_mask;
+   if (dma_mask == 0)
+   dma_mask = DMA_32BIT_MASK;
+
+   /* Device not DMA able */
+   if (dev-dma_mask == NULL)
+   return NULL;
+
+   /* Don't invoke OOM killer */
+   gfp |= __GFP_NORETRY;
+
+#ifdef CONFIG_X86_64
+   /* Why =? Even when the mask is smaller than 4GB it is often
+  larger than 16MB and in this case we have a chance of
+

[kvm-devel] [PATCH 10/28] x86: unify pci-nommu

2008-04-08 Thread Glauber Costa
merge pci-base_32.c and pci-nommu_64.c into pci-nommu.c
Their code were made the same, so now they can be merged.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/Makefile   |5 +-
 arch/x86/kernel/pci-base_32.c  |   60 
 arch/x86/kernel/pci-dma.c  |8 +++
 arch/x86/kernel/pci-dma_64.c   |8 ---
 arch/x86/kernel/pci-nommu.c|  100 
 arch/x86/kernel/pci-nommu_64.c |  100 
 6 files changed, 110 insertions(+), 171 deletions(-)
 delete mode 100644 arch/x86/kernel/pci-base_32.c
 create mode 100644 arch/x86/kernel/pci-nommu.c
 delete mode 100644 arch/x86/kernel/pci-nommu_64.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index befe901..b2a1358 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -25,9 +25,8 @@ obj-$(CONFIG_X86_64)  += sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)   += syscall_64.o vsyscall_64.o setup64.o
 obj-y  += pci-dma_$(BITS).o  bootflag.o e820_$(BITS).o
 obj-y  += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
-obj-y  += alternative.o i8253.o
-obj-$(CONFIG_X86_64)   += pci-nommu_64.o bugs_64.o
-obj-$(CONFIG_X86_32)   += pci-base_32.o
+obj-y  += alternative.o i8253.o pci-nommu.o
+obj-$(CONFIG_X86_64)   += bugs_64.o
 obj-y  += tsc_$(BITS).o io_delay.o rtc.o
 
 obj-y  += process.o
diff --git a/arch/x86/kernel/pci-base_32.c b/arch/x86/kernel/pci-base_32.c
deleted file mode 100644
index b44ea51..000
--- a/arch/x86/kernel/pci-base_32.c
+++ /dev/null
@@ -1,60 +0,0 @@
-#include linux/mm.h
-#include linux/kernel.h
-#include linux/module.h
-#include linux/dma-mapping.h
-#include asm/dma-mapping.h
-
-static dma_addr_t pci32_map_single(struct device *dev, phys_addr_t ptr,
-  size_t size, int direction)
-{
-   WARN_ON(size == 0);
-   flush_write_buffers();
-   return ptr;
-}
-
-static int pci32_dma_map_sg(struct device *dev, struct scatterlist *sglist,
-   int nents, int direction)
-{
-   struct scatterlist *sg;
-   int i;
-
-   WARN_ON(nents == 0 || sglist[0].length == 0);
-
-   for_each_sg(sglist, sg, nents, i) {
-   BUG_ON(!sg_page(sg));
-
-   sg-dma_address = sg_phys(sg);
-   sg-dma_length = sg-length;
-   }
-
-   flush_write_buffers();
-   return nents;
-}
-
-/* Make sure we keep the same behaviour */
-static int pci32_map_error(dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-const struct dma_mapping_ops pci32_dma_ops = {
-   .map_single = pci32_map_single,
-   .unmap_single = NULL,
-   .map_sg = pci32_dma_map_sg,
-   .unmap_sg = NULL,
-   .sync_single_for_cpu = NULL,
-   .sync_single_for_device = NULL,
-   .sync_single_range_for_cpu = NULL,
-   .sync_single_range_for_device = NULL,
-   .sync_sg_for_cpu = NULL,
-   .sync_sg_for_device = NULL,
-   .mapping_error = pci32_map_error,
-};
-
-/* this is temporary */
-int __init no_iommu_init(void)
-{
-   dma_ops = pci32_dma_ops;
-   return 0;
-}
-fs_initcall(no_iommu_init);
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d30634b..6b77fd8 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -7,6 +7,14 @@
 const struct dma_mapping_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
+#ifdef CONFIG_IOMMU_DEBUG
+int panic_on_overflow __read_mostly = 1;
+int force_iommu __read_mostly = 1;
+#else
+int panic_on_overflow __read_mostly = 0;
+int force_iommu __read_mostly = 0;
+#endif
+
 int dma_set_mask(struct device *dev, u64 mask)
 {
if (!dev-dma_mask || !dma_supported(dev, mask))
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index e95f671..4202130 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -27,14 +27,6 @@ EXPORT_SYMBOL(iommu_bio_merge);
 static int iommu_sac_force __read_mostly = 0;
 
 int no_iommu __read_mostly;
-#ifdef CONFIG_IOMMU_DEBUG
-int panic_on_overflow __read_mostly = 1;
-int force_iommu __read_mostly = 1;
-#else
-int panic_on_overflow __read_mostly = 0;
-int force_iommu __read_mostly= 0;
-#endif
-
 /* Set this to 1 if there is a HW IOMMU in the system */
 int iommu_detected __read_mostly = 0;
 
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
new file mode 100644
index 000..aec43d5
--- /dev/null
+++ b/arch/x86/kernel/pci-nommu.c
@@ -0,0 +1,100 @@
+/* Fallback functions when the main IOMMU code is not compiled in. This
+   code is roughly equivalent to i386. */
+#include linux/mm.h
+#include linux/init.h
+#include linux/pci.h
+#include linux/string.h
+#include linux/dma-mapping.h
+#include linux/scatterlist.h
+
+#include asm/gart.h
+#include asm/processor.h
+#include asm/dma.h
+
+static int
+check_addr(char *name, struct device *hwdev, 

[kvm-devel] [PATCH 11/28] x86: move pci fixup to pci-dma.c

2008-04-08 Thread Glauber Costa
via_no_dac provides a fixup that is the same for both
architectures. Move it to pci-dma.c.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma.c |   18 ++
 arch/x86/kernel/pci-dma_32.c  |   13 -
 arch/x86/kernel/pci-dma_64.c  |   15 ---
 include/asm-x86/dma-mapping.h |2 +-
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 6b77fd8..e81e16f 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -1,9 +1,13 @@
 #include linux/dma-mapping.h
 #include linux/dmar.h
+#include linux/pci.h
 
 #include asm/gart.h
 #include asm/calgary.h
 
+int forbid_dac __read_mostly;
+EXPORT_SYMBOL(forbid_dac);
+
 const struct dma_mapping_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
@@ -48,3 +52,17 @@ void pci_iommu_shutdown(void)
 }
 /* Must execute after PCI subsystem */
 fs_initcall(pci_iommu_init);
+
+#ifdef CONFIG_PCI
+/* Many VIA bridges seem to corrupt data for DAC. Disable it here */
+
+static __devinit void via_no_dac(struct pci_dev *dev)
+{
+   if ((dev-class  8) == PCI_CLASS_BRIDGE_PCI  forbid_dac == 0) {
+   printk(KERN_INFO PCI: VIA PCI bridge detected.
+Disabling DAC.\n);
+   forbid_dac = 1;
+   }
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
+#endif
diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 9e82976..6543bb3 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -157,9 +157,6 @@ EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
 #ifdef CONFIG_PCI
 /* Many VIA bridges seem to corrupt data for DAC. Disable it here */
 
-int forbid_dac;
-EXPORT_SYMBOL(forbid_dac);
-
 int
 dma_supported(struct device *dev, u64 mask)
 {
@@ -182,16 +179,6 @@ dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
-
-static __devinit void via_no_dac(struct pci_dev *dev)
-{
-   if ((dev-class  8) == PCI_CLASS_BRIDGE_PCI  forbid_dac == 0) {
-   printk(KERN_INFO PCI: VIA PCI bridge detected. Disabling 
DAC.\n);
-   forbid_dac = 1;
-   }
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
-
 static int check_iommu(char *s)
 {
if (!strcmp(s, usedac)) {
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index 4202130..e194460 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -161,8 +161,6 @@ void dma_free_coherent(struct device *dev, size_t size,
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
-static int forbid_dac __read_mostly;
-
 int dma_supported(struct device *dev, u64 mask)
 {
 #ifdef CONFIG_PCI
@@ -338,16 +336,3 @@ void __init pci_iommu_alloc(void)
pci_swiotlb_init();
 #endif
 }
-
-#ifdef CONFIG_PCI
-/* Many VIA bridges seem to corrupt data for DAC. Disable it here */
-
-static __devinit void via_no_dac(struct pci_dev *dev)
-{
-   if ((dev-class  8) == PCI_CLASS_BRIDGE_PCI  forbid_dac == 0) {
-   printk(KERN_INFO PCI: VIA PCI bridge detected. Disabling 
DAC.\n);
-   forbid_dac = 1;
-   }
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
-#endif
diff --git a/include/asm-x86/dma-mapping.h b/include/asm-x86/dma-mapping.h
index 914846d..d82517d 100644
--- a/include/asm-x86/dma-mapping.h
+++ b/include/asm-x86/dma-mapping.h
@@ -14,6 +14,7 @@ extern dma_addr_t bad_dma_address;
 extern int iommu_merge;
 extern struct device fallback_dev;
 extern int panic_on_overflow;
+extern int forbid_dac;
 
 struct dma_mapping_ops {
int (*mapping_error)(dma_addr_t dma_addr);
@@ -223,6 +224,5 @@ dma_release_declared_memory(struct device *dev);
 extern void *
 dma_mark_declared_memory_occupied(struct device *dev,
  dma_addr_t device_addr, size_t size);
-extern int forbid_dac;
 #endif /* CONFIG_X86_32 */
 #endif
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 09/28] x86: move initialization functions to pci-dma.c

2008-04-08 Thread Glauber Costa
initcalls that triggers the various possibiities for
dma subsys are moved to pci-dma.c.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma.c|   25 +
 arch/x86/kernel/pci-dma_64.c |   23 ---
 2 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 1323cd8..d30634b 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -1,4 +1,8 @@
 #include linux/dma-mapping.h
+#include linux/dmar.h
+
+#include asm/gart.h
+#include asm/calgary.h
 
 const struct dma_mapping_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
@@ -14,4 +18,25 @@ int dma_set_mask(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_set_mask);
 
+static int __init pci_iommu_init(void)
+{
+#ifdef CONFIG_CALGARY_IOMMU
+   calgary_iommu_init();
+#endif
+
+   intel_iommu_init();
 
+#ifdef CONFIG_GART_IOMMU
+   gart_iommu_init();
+#endif
+
+   no_iommu_init();
+   return 0;
+}
+
+void pci_iommu_shutdown(void)
+{
+   gart_iommu_shutdown();
+}
+/* Must execute after PCI subsystem */
+fs_initcall(pci_iommu_init);
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index e697b86..e95f671 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -347,27 +347,6 @@ void __init pci_iommu_alloc(void)
 #endif
 }
 
-static int __init pci_iommu_init(void)
-{
-#ifdef CONFIG_CALGARY_IOMMU
-   calgary_iommu_init();
-#endif
-
-   intel_iommu_init();
-
-#ifdef CONFIG_GART_IOMMU
-   gart_iommu_init();
-#endif
-
-   no_iommu_init();
-   return 0;
-}
-
-void pci_iommu_shutdown(void)
-{
-   gart_iommu_shutdown();
-}
-
 #ifdef CONFIG_PCI
 /* Many VIA bridges seem to corrupt data for DAC. Disable it here */
 
@@ -380,5 +359,3 @@ static __devinit void via_no_dac(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
 #endif
-/* Must execute after PCI subsystem */
-fs_initcall(pci_iommu_init);
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 08/28] x86: move definition to pci-dma.c

2008-04-08 Thread Glauber Costa
Move dma_ops structure definition to pci-dma.c, where it
belongs.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-base_32.c |   11 ---
 arch/x86/kernel/pci-dma.c |3 +++
 arch/x86/mm/init_64.c |3 ---
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/pci-base_32.c b/arch/x86/kernel/pci-base_32.c
index 837bbe9..b44ea51 100644
--- a/arch/x86/kernel/pci-base_32.c
+++ b/arch/x86/kernel/pci-base_32.c
@@ -37,7 +37,7 @@ static int pci32_map_error(dma_addr_t dma_addr)
return 0;
 }
 
-static const struct dma_mapping_ops pci32_dma_ops = {
+const struct dma_mapping_ops pci32_dma_ops = {
.map_single = pci32_map_single,
.unmap_single = NULL,
.map_sg = pci32_dma_map_sg,
@@ -51,5 +51,10 @@ static const struct dma_mapping_ops pci32_dma_ops = {
.mapping_error = pci32_map_error,
 };
 
-const struct dma_mapping_ops *dma_ops = pci32_dma_ops;
-EXPORT_SYMBOL(dma_ops);
+/* this is temporary */
+int __init no_iommu_init(void)
+{
+   dma_ops = pci32_dma_ops;
+   return 0;
+}
+fs_initcall(no_iommu_init);
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index f1c24d8..1323cd8 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -1,5 +1,8 @@
 #include linux/dma-mapping.h
 
+const struct dma_mapping_ops *dma_ops;
+EXPORT_SYMBOL(dma_ops);
+
 int dma_set_mask(struct device *dev, u64 mask)
 {
if (!dev-dma_mask || !dma_supported(dev, mask))
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8c989b8..f06a51e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -47,9 +47,6 @@
 #include asm/numa.h
 #include asm/cacheflush.h
 
-const struct dma_mapping_ops *dma_ops;
-EXPORT_SYMBOL(dma_ops);
-
 static unsigned long dma_reserve __initdata;
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 04/28] x86: Add flush_write_buffers in nommu functions

2008-04-08 Thread Glauber Costa
This patch adds flush_write_buffers() in some functions of pci-nommu_64.c
They are added anywhere i386 would also have it. This is not a problem
for x86_64, since flush_rite_buffers() an nop for it.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-nommu_64.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-nommu_64.c b/arch/x86/kernel/pci-nommu_64.c
index a4e8ccf..1da9cf9 100644
--- a/arch/x86/kernel/pci-nommu_64.c
+++ b/arch/x86/kernel/pci-nommu_64.c
@@ -32,6 +32,7 @@ nommu_map_single(struct device *hwdev, phys_addr_t paddr, 
size_t size,
dma_addr_t bus = paddr;
if (!check_addr(map_single, hwdev, bus, size))
return bad_dma_address;
+   flush_write_buffers();
return bus;
 }
 
@@ -64,6 +65,7 @@ static int nommu_map_sg(struct device *hwdev, struct 
scatterlist *sg,
return 0;
s-dma_length = s-length;
}
+   flush_write_buffers();
return nents;
 }
 
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 13/28] x86: merge dma_supported

2008-04-08 Thread Glauber Costa
The code for both arches are very similar, so this patch merge them.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma.c|   44 ++
 arch/x86/kernel/pci-dma_32.c |   24 --
 arch/x86/kernel/pci-dma_64.c |   44 +-
 3 files changed, 45 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index f6d6a92..4289a9b 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -14,6 +14,8 @@ EXPORT_SYMBOL(forbid_dac);
 const struct dma_mapping_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
+int iommu_sac_force __read_mostly = 0;
+
 #ifdef CONFIG_IOMMU_DEBUG
 int panic_on_overflow __read_mostly = 1;
 int force_iommu __read_mostly = 1;
@@ -33,6 +35,48 @@ int dma_set_mask(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_set_mask);
 
+int dma_supported(struct device *dev, u64 mask)
+{
+#ifdef CONFIG_PCI
+   if (mask  0x  forbid_dac  0) {
+   printk(KERN_INFO PCI: Disallowing DAC for device %s\n,
+dev-bus_id);
+   return 0;
+   }
+#endif
+
+   if (dma_ops-dma_supported)
+   return dma_ops-dma_supported(dev, mask);
+
+   /* Copied from i386. Doesn't make much sense, because it will
+  only work for pci_alloc_coherent.
+  The caller just has to use GFP_DMA in this case. */
+   if (mask  DMA_24BIT_MASK)
+   return 0;
+
+   /* Tell the device to use SAC when IOMMU force is on.  This
+  allows the driver to use cheaper accesses in some cases.
+
+  Problem with this is that if we overflow the IOMMU area and
+  return DAC as fallback address the device may not handle it
+  correctly.
+
+  As a special case some controllers have a 39bit address
+  mode that is as efficient as 32bit (aic79xx). Don't force
+  SAC for these.  Assume all masks = 40 bits are of this
+  type. Normally this doesn't make any difference, but gives
+  more gentle handling of IOMMU overflow. */
+   if (iommu_sac_force  (mask = DMA_40BIT_MASK)) {
+   printk(KERN_INFO %s: Force SAC with mask %Lx\n,
+dev-bus_id, mask);
+   return 0;
+   }
+
+   return 1;
+}
+EXPORT_SYMBOL(dma_supported);
+
+
 static int __init pci_iommu_init(void)
 {
 #ifdef CONFIG_CALGARY_IOMMU
diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 6543bb3..1d4091a 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -155,30 +155,6 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
 EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
 
 #ifdef CONFIG_PCI
-/* Many VIA bridges seem to corrupt data for DAC. Disable it here */
-
-int
-dma_supported(struct device *dev, u64 mask)
-{
-   /*
-* we fall back to GFP_DMA when the mask isn't all 1s,
-* so we can't guarantee allocations that must be
-* within a tighter range than GFP_DMA..
-*/
-   if (mask  0x00ff)
-   return 0;
-
-   /* Work around chipset bugs */
-   if (forbid_dac  0  mask  0xULL)
-   return 0;
-
-   if (dma_ops-dma_supported)
-   return dma_ops-dma_supported(dev, mask);
-
-   return 1;
-}
-EXPORT_SYMBOL(dma_supported);
-
 static int check_iommu(char *s)
 {
if (!strcmp(s, usedac)) {
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index 7820675..c80da76 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -24,7 +24,7 @@ EXPORT_SYMBOL(bad_dma_address);
 int iommu_bio_merge __read_mostly = 0;
 EXPORT_SYMBOL(iommu_bio_merge);
 
-static int iommu_sac_force __read_mostly = 0;
+extern int iommu_sac_force;
 
 int no_iommu __read_mostly;
 /* Set this to 1 if there is a HW IOMMU in the system */
@@ -161,48 +161,6 @@ void dma_free_coherent(struct device *dev, size_t size,
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
-int dma_supported(struct device *dev, u64 mask)
-{
-#ifdef CONFIG_PCI
-   if (mask  0x  forbid_dac  0) {
-
-
-
-   printk(KERN_INFO PCI: Disallowing DAC for device %s\n, 
dev-bus_id);
-   return 0;
-   }
-#endif
-
-   if (dma_ops-dma_supported)
-   return dma_ops-dma_supported(dev, mask);
-
-   /* Copied from i386. Doesn't make much sense, because it will
-  only work for pci_alloc_coherent.
-  The caller just has to use GFP_DMA in this case. */
-if (mask  DMA_24BIT_MASK)
-return 0;
-
-   /* Tell the device to use SAC when IOMMU force is on.  This
-  allows the driver to use cheaper accesses in some cases.
-
-  Problem with this is that if we overflow the IOMMU area and
-  return DAC as fallback address the device may not handle it
-  

[kvm-devel] [PATCH] use NR_IRQS for irq count

2008-04-08 Thread Glauber Costa
Instead of artificially limiting irq numbers, use arch provided NR_IRQS

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 irqhook/irqhook_main.c |   16 +++-
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/irqhook/irqhook_main.c b/irqhook/irqhook_main.c
index 5f414d1..828c70a 100644
--- a/irqhook/irqhook_main.c
+++ b/irqhook/irqhook_main.c
@@ -31,15 +31,13 @@ #define ERROR(fmt, args...) printk(1
 static spinlock_t irqh_lock;
 static wait_queue_head_t irqh_proc_list;
 
-enum {NINTR = 256};
-
-static DECLARE_BITMAP(pending, NINTR);
-static DECLARE_BITMAP(handled, NINTR);
+static DECLARE_BITMAP(pending, NR_IRQS);
+static DECLARE_BITMAP(handled, NR_IRQS);
 
 #define irqh_on(which, bit)test_bit(bit, which)
 #define irqh_set(which, bit)   set_bit(bit, which)
 #define irqh_clear(which, bit) clear_bit(bit, which)
-#define irqh_ffs(which)find_first_bit(which, NINTR)
+#define irqh_ffs(which)find_first_bit(which, NR_IRQS)
 
 static irqreturn_t
 irqh_interrupt(int irq, void *p)
@@ -92,7 +90,7 @@ irqh_dev_write(struct file *fp, const ch
if (pdp) {
if (pci_enable_device(pdp))
ERROR(device not enabled\n);
-   if ((unsigned)(n = pdp-irq) = NINTR) {
+   if ((unsigned)(n = pdp-irq) = NR_IRQS) {
ERROR(device has invalid IRQ set\n);
return -EINVAL;
}
@@ -107,7 +105,7 @@ irqh_dev_write(struct file *fp, const ch
irqh_set(handled, n);
goto done;
}
-   if ((unsigned)n = NINTR)
+   if ((unsigned)n = NR_IRQS)
return -EINVAL;
if (arg[0] == '-') {
if (pdp)
@@ -135,7 +133,7 @@ irqh_dev_read(struct file *fp, char *buf
return -EINVAL;
spin_lock_irq(irqh_lock);
while (!signal_pending(current)) {
-   if ((n = irqh_ffs(pending))  NINTR) {
+   if ((n = irqh_ffs(pending))  NR_IRQS) {
if ((m = sprintf(b, %d, n) + 1)  size)
m = size;
if (copy_to_user(buf, b, m))
@@ -203,7 +201,7 @@ irqh_cleanup(void)
 
DPRINTK(ENTER\n);

-   while ((n = irqh_ffs(handled))  NINTR) {
+   while ((n = irqh_ffs(handled))  NR_IRQS) {
irqh_clear(handled, n);
free_irq(n, (void *)irqh_interrupt);
}
-- 
1.4.2


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 24/28] x86: unify gfp masks

2008-04-08 Thread Glauber Costa
Use the same gfp masks for x86_64 and i386.
It involves using HIGHMEM or DMA32 where necessary, for the sake
of code compatibility, (no real effect), and using the NORETRY
mask for i386.

Signed-off-by: Glauber Costa [EMAIL PROTECTED]
---
 arch/x86/kernel/pci-dma_32.c |6 --
 arch/x86/kernel/pci-dma_64.c |2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_32.c b/arch/x86/kernel/pci-dma_32.c
index 11f100a..5450bd1 100644
--- a/arch/x86/kernel/pci-dma_32.c
+++ b/arch/x86/kernel/pci-dma_32.c
@@ -79,7 +79,7 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
unsigned long dma_mask = 0;
 
/* ignore region specifiers */
-   gfp = ~(__GFP_DMA | __GFP_HIGHMEM);
+   gfp = ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
 
if (dma_alloc_from_coherent_mem(dev, size, dma_handle, ret))
return ret;
@@ -91,7 +91,9 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
if (dma_mask == 0)
dma_mask = DMA_32BIT_MASK;
 
- again:
+   /* Don't invoke OOM killer */
+   gfp |= __GFP_NORETRY;
+again:
page = dma_alloc_pages(dev, gfp, order);
if (page == NULL)
return NULL;
diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index 13a31a4..b956f59 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -49,6 +49,8 @@ dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
unsigned long dma_mask = 0;
u64 bus;
 
+   /* ignore region specifiers */
+   gfp = ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
 
if (dma_alloc_from_coherent_mem(dev, size, dma_handle, memory))
return memory;
-- 
1.5.0.6


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 2 of 9] Core of mmu notifiers

2008-04-08 Thread Andrea Arcangeli
On Tue, Apr 08, 2008 at 11:26:19AM -0500, Robin Holt wrote:
 This one does not build on ia64.  I get the following:

I think it's a common code compilation bug not related to my
patch. Can you test this?

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include linux/rbtree.h
 #include linux/rwsem.h
 #include linux/completion.h
+#include linux/cpumask.h
 #include asm/page.h
 #include asm/mmu.h
 


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 02/10] emm: notifier logic

2008-04-08 Thread Christoph Lameter
It may also be useful to allow invalidate_start() to fail in some contexts 
(try_to_unmap f.e., maybe if a certain flag is passed). This may allow the 
device to get out of tight situations (pending I/O f.e. or time out if 
there is no response for network communications). But then that 
complicates the API.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] your tool will be huge

2008-04-08 Thread dsd
If you love the fast life, you need a huge key to turn the women on. 
http://www.inhieit.com/

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-08 Thread Avi Kivity
Andrea Arcangeli wrote:
 Note that mmu_notifier_unregister may also fail with -EINTR if there are
 signal pending or the system runs out of vmalloc space or physical memory,
 only exit_mmap guarantees that any kernel module can be unloaded in presence
 of an oom condition.

   

That's unusual.  What happens to the notifier?  Suppose I destroy a vm 
without exiting the process, what happens if it fires?

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/5]Add some trace enties and define interface for tracing

2008-04-08 Thread Avi Kivity
Liu, Eric E wrote:

 High order 32 bits of cr2 are lost.

 

 May I use  KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
 (u32)((u64)cr2  32), handler) to handle this?
 for 32bit gust, it traces some excess data, but after all for 64bit
 guest, we don't lost high order bits.

   

Sure.

   

 }
 -
 +   KVMTRACE_1D(INJ_VIRQ, vcpu, idtv_info_field, handler);

   
 Probably need a different marker than INJ_VIRQ, as this is on exit,
 not entry.

 

 Is the marker REDELIVER_EVT ok for this?

   

Yes.

 @@ -2428,6 +2445,7 @@ void kvm_arch_exit(void)
  int kvm_emulate_halt(struct kvm_vcpu *vcpu)
  {
 ++vcpu-stat.halt_exits;
 +   KVMTRACE_0D(HLT, vcpu, handler);
 if (irqchip_in_kernel(vcpu-kvm)) {
 vcpu-arch.mp_state = VCPU_MP_STATE_HALTED;
 kvm_vcpu_block(vcpu);

   
 Would be nice to have an UNHLT to see how long sleeps are.  But this
 will probably be seen by the irq injection.
 

 I think from the cycles Between VMEXIT( caused by halt) and VMENTRY we
 can evaluate how long sleeps are. 
   

Right.

We'll merge it with the understanding that the data format is not part 
of the ABI and will change between versions.  It will cause some 
headaches when people send us traces, but on the other hand will give us 
some flexibility.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-08 Thread Avi Kivity
Guillaume Thouvenin wrote:
   
   
 x86 emulate is missing support for jmp far which is used to switch into 
 protected mode.  It just needs to be added.
 

 Ok I see. I understand now why you said in a previous email that KVM
 needs to have a proper load_seg() function like the Xen's x86_emulate.
 This function is used to load the segment in a far jmp. I will look how
 it is done in Xen and I will try to copy the stuff like you did.

   

kvm now has a load_segment_descriptor() function which might help.


-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Compilation problems with git tree

2008-04-08 Thread Avi Kivity
Zdenek Kabelac wrote:
 Core was generated by `qemu-kvm -s -m 320 -smp 2 -net nic,model=pcnet
 -net user -redir'.
 Program terminated with signal 11, Segmentation fault.

 #0  0x004849a7 in tcp_reass (tp=0x7fabec000d60, ti=0xec000d60,
 m=0x0) at slirp/tcp_input.c:208
   

User mode networking has known issues on 64-bit hosts.  Try using 
bridged networking.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/4] fix SMP migration and loadvm/savevm (V2)

2008-04-08 Thread Avi Kivity
Marcelo Tosatti wrote:
 Avi, I prefer not to fold mpstate into kvm_save_registers() as a hidden
 register because the MPSTATE is only used during migration, whereas 
 save_registers() is not (seems safer)

But that's the point... what about savevm/loadvm, etc?  They deserve to 
work too.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0 of 9] mmu notifier #v12

2008-04-08 Thread Andrea Arcangeli
On Wed, Apr 09, 2008 at 12:46:49AM +0300, Avi Kivity wrote:
 That's unusual.  What happens to the notifier?  Suppose I destroy a vm 

Yes it's quite unusual.

 without exiting the process, what happens if it fires?

The mmu notifier ops should stop doing stuff (if there will be no
memslots they will be noops), or the ops can be replaced atomically
with null pointers. The important thing is that the module can't go
away until -release is invoked or until mmu_notifier_unregister
returned 0.

Previously there was no mmu_notifier_unregister, so adding it can't be
a regression compared to #v11, even if it can fail and you may have to
retry later after returning to userland. Retrying from userland is
always safe in oom kill terms, only looping inside the kernel isn't
safe as do_exit has no chance to run.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] [v2] Move kvm_get_pit to libkvm.c common code

2008-04-08 Thread Jerone Young
# HG changeset patch
# User Jerone Young [EMAIL PROTECTED]
# Date 1207692873 18000
# Branch merge
# Node ID 8ddf560729aac228cd84068e1227e601e68a6840
# Parent  94cbc19df0f0fcab150599b10d859f1a3bc1b7cb
[v2] Move kvm_get_pit to libkvm.c common code

- I am resending this patch removing ia64. It apprently fell through 
the cracks.

Don't compile kvm_*_pit() on architectures whose currently supported platforms 
do not contain a PIT.

Signed-off-by: Hollis Blanchard [EMAIL PROTECTED]
Signed-off-by: Jerone Young [EMAIL PROTECTED]

diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -549,6 +549,7 @@ int kvm_pit_in_kernel(kvm_context_t kvm)
 
 #ifdef KVM_CAP_PIT
 
+#if defined(__i386__) || defined(__x86_64__)
 /*!
  * \brief Get in kernel PIT of the virtual domain
  *
@@ -569,6 +570,7 @@ int kvm_get_pit(kvm_context_t kvm, struc
  * \param s PIT state of the virtual domain
  */
 int kvm_set_pit(kvm_context_t kvm, struct kvm_pit_state *s);
+#endif
 
 #endif
 

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] NOTIFICATION OF PAYMENT VIA ATM CARD

2008-04-08 Thread DR USMAN SHAMSUDEEN


From the FMF, Federal Ministry of Finance

No 12 edidi lane Idumota Lagos

Honourable Minister of Finance, DR USMAN SHAMSUDEEN

Approved by the Nigerian Government

http://www.fmf.gov.ng/portal/detail.php?link=hmf





NOTIFICATION OF PAYMENT VIA ATM CARD



This is to officially inform you that we have verified your contract file 
presently on my desk, and I found out that you have not received your payment 
due to your lack of co-operation and not fulfilling the obligations giving to 
you in respect to your contract payment.



Secondly, you are hereby advised to stop dealing with some non-officials in the 
bank as this is an illegal act and will have to stop if you so wish to receive 
your payment immediately.



After the Board of director's meeting held in Abuja, we have resolved in 
finding a solution to your problem. We have arranged your payment through our 
SWIFT CARD PAYMENT CENTRE in Europe, America,Africa

and Asia Pacific,This is part of an instruction/mandate passed by the Senate in 
respect to overseas contract payment and debt re scheduling. And also the 
Nigerian Government is using this mean to rewards all the citizens of the 
United states and all part of europe including asia,australia,south america, 
Antartica e.t.c and all those who have lost their funds in either scam, or an 
uncompleted business, or otherwise.



You should know that if you are interested to receive your ATM card which will 
be credited with $920,000 united states dollars before it is been sent to you 
direct to your doorstep through any courier service of your choice.



Kindly get back to me with the following informations below so i can start 
arrangement on how to get your Atm Card shipped to you



(1) Your Full Name

(2) Full residential address

(3) Phone number



This message is supported by the Nigerian Government, After you might have 
started making use of your ATM card, you can reward my firm one way or the 
other you knows best.



Thanks for your co-operation.



From the FMF, Federal Ministry of Finance

DR USMAN SHAMSUDEEN







-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/4] fix SMP migration and loadvm/savevm (V2)

2008-04-08 Thread Marcelo Tosatti
On Wed, Apr 09, 2008 at 12:59:50AM +0300, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 Avi, I prefer not to fold mpstate into kvm_save_registers() as a hidden
 register because the MPSTATE is only used during migration, whereas 
 save_registers() is not (seems safer)
 
 But that's the point... what about savevm/loadvm, etc?  They deserve to 
 work too.

savevm/loadvm will work through cpu_save/cpu_load just as live migration
does, so they are covered.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] use NR_IRQS for irq count

2008-04-08 Thread Amit Shah
* On Tuesday 08 April 2008 22:34:36 Glauber Costa wrote:
 Instead of artificially limiting irq numbers, use arch provided NR_IRQS

 Signed-off-by: Glauber Costa [EMAIL PROTECTED]

Thanks, applied.

Going forward, we're going to use the in-kernel approach for interrupt 
injection (which already has this change). The irqhook module will stay 
around for testing / debugging purposes (will be used with -no-kvm-irqchip).

Amit.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel