Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-12 Thread Isaku Yamahata
On Wed, Dec 05, 2007 at 06:15:49PM +, Derek Murray wrote:
 Keir Fraser wrote:
 Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are
 there any other responsibilities that you acquire if you make use of
 VM_FOREIGN (in particular, how would this affect get_user_pages)?
 
 VM_FOREIGN is already set for the gntdev VMA (mostly because it's 
 directly based on the blktap code). That means that it has the array of 
 page_structs in its vm_private_data, which can be used to fulfill a 
 get_user_pages call. I've attached a patch based on this fix.
 
 Regards,
 
 Derek.

Hi Derek. Sorry for this late alert.

This patch breaks blktap and gntdev on ia64.
With auto translated physmap mode enabled, bktap/gntdev update
the pte entry with vm_insert_page(). Not direct updating it with
the hypercall.
So when zapping the pte entry, it is necessary to release page
reference counting, rmapping and etc. Thus vm_normal_page() have
to return the struct page when auto translated physmap mode is enabled.

How about passing the page struct** to the zap_pte call back
and set it to NULL if necessary?
(or
Can the condition be changed to check auto trasnalted physmap mode?
or
Should the clean up be done in zap_pte callback?)
-- 
yamahata
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Gerd Hoffmann
D.G. Murray wrote:
 Hi Mark, 
 
 Maybe a change to the gntdev userspace API to allow batching 
 of mapping requests?
 
 Something along the lines of the following?
 
 void *xc_gnttab_map_grant_refs(int xcg_handle,
uint32_t count,
uint32_t *domids,
uint32_t *refs,
int prot); 

Yes, except that it should actually work ;)

It doesn't for me (Fedora 8 again).  Grab xenner 0.9 (just uploaded),
edit blkbackd.c and flip the BATCH_MAPS from 0 to 1, compile, run, see
it not work.

With BATCH_MAPS being 0 blkbackd works nicely as blktap/tapdisk drop-in
replacement.

cheers,
  Gerd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Derek Murray

Gerd Hoffmann wrote:

Yes, except that it should actually work ;)

It doesn't for me (Fedora 8 again).  Grab xenner 0.9 (just uploaded),
edit blkbackd.c and flip the BATCH_MAPS from 0 to 1, compile, run, see
it not work.


Which version of the Xen tools are you using? There was a bug in the 
version released with Xen 3.1, which should have been cleaned up in the 
subsequent minor versions. Try grabbing the patch to libxc at:


http://xenbits.xensource.com/xen-3.1-testing.hg?raw-rev/135d5088909f

Otherwise, if this doesn't work/is some other issue, could you post the 
OOPS and relevant Xen console output?


Thanks,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Jeremy Fitzhardinge
Derek Murray wrote:
 Keir Fraser wrote:
 You'd need to track pte-grant_handle mappings somewhere, but it could
 certainly be done this way, yes.

 At the moment, blktap and gntdev provide struct pages to
 get_user_pages by smuggling them in the vm_private_data field of the
 relevant vm_area_struct. Could we use this field to get the handles to
 ptep_get_and_clear_full as well?

Yes.  Given the mm and a vaddr passed to ptep_get_and_clear, find_vma()
will return the vma_struct.  If we assert that anyone who sets the I'm
foreign bit in a pte has a standard format for the vm_private_data
field, then we can stash a callback pointer there and make the
appropriate callback.

 Only downside that I can see is that we would need to find the vma for
 each PTE that needs to be cleared this way (since we don't get this
 passed to ptep_get_and_clear_full), but this is mitigated by (i) it
 only happening in the erroneous, unclean-shutdown case, and (ii)
 getting a hit in the mm-mmap_cache for consecutive runs of mapped
 grants.

Yes.  find_vma is fairly hot, since its used on every fault, so it
should be reasonably fast.  And it doesn't sound like our case is
particularly performance critical.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Hi Gerd,

Gerd Hoffmann wrote:
Want reproduce?  Here we go:


  * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/
  * grab a xenified dom0 kernel without blktap driver (either not
compiled or module not loaded).
  * start xend
  * start blkbackd from xenner package (you probably want the -d switch
for debug output, twice for more).
  * run xm block-attach 0 tap:aio:/path/to/some/file xvda r
  * watch it blow up ;)


Thanks for the repro details. I'll have a go at this later. One thing we 
haven't tested AFAIK is mapping grants in the same domain: could you 
check to see if the bug is the same if you attach a block device to a 
domain other than Dom0? Also, could you send any Xen console output, if 
it contains errors or warnings?



I can't help wondering if this is a hint that now is the time to find a
better API, which doesn't have the requirement (a) that seems to be
causing such trouble?  Are other PV guests --- *BSD, Solaris --- going
to have the same problems with their VM layers if they try to implement
this API?  Upstream Linux pv_ops certainly will, and it would be good if
we could avoid tying unprivileged guests to ABIs which cannot hope to be
merged into pv_ops.


And I fear the problems I've trapped into up to now is only the tip of
the iceberg.  What happens if an application with active grant table
mappings calls fork() ?


Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls 
copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls 
set_pte_at, which hypercalls HYPERVISOR_update_va_mapping.


The hypercall will not succeed and will return an error code indicating 
the reason for this. Therefore the PTE will not be set. There appears to 
be no way to propagate this error through the Linux VM code, because 
there is no concept of a PTE update failing. I could add return codes to 
all those functions, but I don't fancy their chances upstream


A possibility for solving that might be to carry out the mappings upon a 
page fault: I believe this would be compatible with copy_page_range.


(In fact, it's possible that a forked process would attempt to 
demand-page in the granted page, bypassing the copy_page_range code. 
Since there is no nopage handler for a gntdev VMA, that would lead to an 
anonymous page being mapped into memory instead.)


So, as far as I can tell, there would be no kernel BUG() or 
domain_crash() in the event of a fork(). It looks like implementing 
nopage in gntdev would enable grants to be remapped after a fork() and 
the correct behaviour to happen.


Regards,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Gerd,

Can you try the attached patch against linux-2.6.18-xen.hg?

I think the problem was that the gntdev VMA is not marked as being 
VM_PFNMAP, therefore it tries to get a struct page_struct for each 
granted page when it is unmapped (and maybe sometimes succeeds 
(incorrectly), which could be why I haven't seen the bug). With this 
flag, vm_normal_page will return NULL in zap_pte_range, and so the code 
that decrements that reference count will not be executed.


Regards,

Derek.
# HG changeset patch
# User [EMAIL PROTECTED]
# Date 1196860382 0
# Node ID af26b3dd23822190acbec1872a47259e1fed88b8
# Parent  b2768401db943e66af9d64bd610ffa225f560c0b
Set gntdev VMA to be VM_PFNMAP.

diff -r b2768401db94 -r af26b3dd2382 drivers/xen/gntdev/gntdev.c
--- a/drivers/xen/gntdev/gntdev.c	Mon Dec 03 08:50:12 2007 +
+++ b/drivers/xen/gntdev/gntdev.c	Wed Dec 05 13:13:02 2007 +
@@ -501,6 +501,17 @@ static int gntdev_mmap (struct file *fli
 
 	/* The VM area contains pages from another VM. */
 	vma-vm_flags |= VM_FOREIGN;
+
+	/* The VM area contains pages that are not backed by page_structs in
+	 * this domain's memory map.
+	 *
+	 * TODO/FIXME?: We should probably use the VM_FOREIGN workaround as
+	 *  used by get_user_pages() to provide access to the
+	 *  page_structs for each page, but I'm not sure if that's
+	 *  necessary.
+	 */
+	vma-vm_flags |= VM_PFNMAP;
+
 	vma-vm_private_data = kzalloc(size * sizeof(struct page_struct *), 
    GFP_KERNEL);
 	if (vma-vm_private_data == NULL) {
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Keir Fraser wrote:

Is this patch to go into linux-2.6.18-xen.hg then?


Yes, even if it doesn't fix the exact bug we're seeing here, I think it 
should go in. I've attached a version with my signed-off-by and a better 
commit comment.


Cheers,

Derek.
# HG changeset patch
# User [EMAIL PROTECTED]
# Date 1196860382 0
# Node ID af26b3dd23822190acbec1872a47259e1fed88b8
# Parent  b2768401db943e66af9d64bd610ffa225f560c0b
Add VM_PFNMAP flag to gntdev-mmaped VM areas. This prevents an attempt in
zap_pte_range to decrement the reverse-mapping count of the non-existant
(but occasionally spuriously present) page_struct associated with the
granted PFN.

Signed-off-by: Derek Murray [EMAIL PROTECTED]

diff -r b2768401db94 -r af26b3dd2382 drivers/xen/gntdev/gntdev.c
--- a/drivers/xen/gntdev/gntdev.c	Mon Dec 03 08:50:12 2007 +
+++ b/drivers/xen/gntdev/gntdev.c	Wed Dec 05 13:13:02 2007 +
@@ -501,6 +501,17 @@ static int gntdev_mmap (struct file *fli
 
 	/* The VM area contains pages from another VM. */
 	vma-vm_flags |= VM_FOREIGN;
+
+	/* The VM area contains pages that are not backed by page_structs in
+	 * this domain's memory map.
+	 *
+	 * TODO/FIXME?: We should probably use the VM_FOREIGN workaround as
+	 *  used by get_user_pages() to provide access to the
+	 *  page_structs for each page, but I'm not sure if that's
+	 *  necessary.
+	 */
+	vma-vm_flags |= VM_PFNMAP;
+
 	vma-vm_private_data = kzalloc(size * sizeof(struct page_struct *), 
    GFP_KERNEL);
 	if (vma-vm_private_data == NULL) {
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Keir Fraser wrote:

Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are
there any other responsibilities that you acquire if you make use of
VM_FOREIGN (in particular, how would this affect get_user_pages)?


VM_FOREIGN is already set for the gntdev VMA (mostly because it's 
directly based on the blktap code). That means that it has the array of 
page_structs in its vm_private_data, which can be used to fulfill a 
get_user_pages call. I've attached a patch based on this fix.


Regards,

Derek.
# HG changeset patch
# User [EMAIL PROTECTED]
# Date 1196878124 0
# Node ID df7d0555ec3847bd5915063d8ee79123d6ebc67a
# Parent  ba918cb2cf7520604dee724dd80dad5ce4bee8a1
Changed vm_normal_page to return NULL when presented with a VMA marked
as being VM_FOREIGN.

Signed-off-by: Derek Murray [EMAIL PROTECTED]

diff -r ba918cb2cf75 -r df7d0555ec38 mm/memory.c
--- a/mm/memory.c	Tue Dec 04 11:54:22 2007 +
+++ b/mm/memory.c	Wed Dec 05 18:08:44 2007 +
@@ -395,6 +395,9 @@ struct page *vm_normal_page(struct vm_ar
 		if (!is_cow_mapping(vma-vm_flags))
 			return NULL;
 	}
+
+	if (unlikely(vma-vm_flags  VM_FOREIGN))
+		return NULL;
 
 	/*
 	 * Add some anal sanity checks for now. Eventually,
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Jeremy Fitzhardinge wrote:

Could we use one of the software-defined bits in the PTE to indicate
that this is a foreign/granted PTE, and have set_pte_at behave
differently if you pass it a pte with this bit set?


Actually, as Gerd pointed out in his answer to his own question, the use 
of VM_DONTCOPY cuts out this entire code path, so we don't need to worry 
about it.


Mind you, it looks like we're going to go ahead and use one of the PTE 
bits to signify foreign PTEs anyway, per Keir's suggestion. Either way, 
it's going to involve making Xen-specific changes to the mm code... have 
you any ideas how we can either (i) get rid of the zap_pte hook in the 
vm_operations_struct, or (ii) make a really compelling case to the 
kernel maintainers that it really should get in?


Regards,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Keir Fraser wrote:


Actually I'm not so sure now. Presumably you add VM_PFNMAP to make
vm_normal_page() return NULL? But actually I would expect pte_pfn() to
return max_mapnr because the mapped page is not a local page. And that
should cause vm_normal_page() to return NULL always, regardless of whether
you assert VM_PFNMAP. Is gntdev being used to grant-and-map local pages in
the test that causes the crash?


That's right (gntdev is being used to map (but not grant) a local page). 
The test case creates a virtual block device in Dom0, and attempts to 
map its ring buffer in a user-space daemon in Dom0. Therefore pte_pfn 
succeeds.


Regards,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Keir Fraser wrote:

Need to bite the bullet and fix this properly by setting a software flag in
ptes that are not subject to reference counting.


Could we get away with testing the VM_FOREIGN flag in vm_normal_page()? 
Although I get the impression that this wouldn't be easily justified if 
trying to merge with upstream Linux



Unfortunately that also needs a hypervisor interface change, to allow
setting of those pte flags. Easily done though, and we should definitely get
that piece in for 3.2.0.


Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for 
debugging? Indeed, if we did this, could be obviate the need for the 
PTE-zapping hook, by instead catching the case where this flag is set, 
and unmapping the grant implicitly?


Otherwise, what would the semantics of this new flag be?

Regards,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Jeremy Fitzhardinge
Derek Murray wrote:
 Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls
 copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls
 set_pte_at, which hypercalls HYPERVISOR_update_va_mapping.

 The hypercall will not succeed and will return an error code
 indicating the reason for this. Therefore the PTE will not be set.
 There appears to be no way to propagate this error through the Linux
 VM code, because there is no concept of a PTE update failing. I could
 add return codes to all those functions, but I don't fancy their
 chances upstream

Could we use one of the software-defined bits in the PTE to indicate
that this is a foreign/granted PTE, and have set_pte_at behave
differently if you pass it a pte with this bit set?

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Jeremy Fitzhardinge
Derek Murray wrote:
 Jeremy Fitzhardinge wrote:
 Could we use one of the software-defined bits in the PTE to indicate
 that this is a foreign/granted PTE, and have set_pte_at behave
 differently if you pass it a pte with this bit set?

 Actually, as Gerd pointed out in his answer to his own question, the
 use of VM_DONTCOPY cuts out this entire code path, so we don't need to
 worry about it.

 Mind you, it looks like we're going to go ahead and use one of the PTE
 bits to signify foreign PTEs anyway, per Keir's suggestion. Either
 way, it's going to involve making Xen-specific changes to the mm code... 

Sneaking in a user for the otherwise completely unused PTE bits should
be fairly straightforward.

 have you any ideas how we can either (i) get rid of the zap_pte hook
 in the vm_operations_struct, or (ii) make a really compelling case to
 the kernel maintainers that it really should get in? 

Hm, I haven't spent much time looking at how grant tables and their
mappings work yet, so I can't say I really understand all this myself. 
Hence, questions:

Can we take a different approach from the zap_pte hook?  Given that
we're 1) planning on claiming a pte bit for grant mappings, and 2) need
to hook ptep_get_and_clear anyway to solve the mprotect performance
problems, couldn't we just special-case grant mapping pte_clears?

In 2.6.18-xen the only two implementations of zap_pte are
blktap_clear_pte and gntdev_clear_pte.  Given a ptep with the
grant-mapping bit set, could we determine which of these need calling
and do the appropriate thing?  Do we even need separate implementations
of the core pte-clearing functionality?  Could we just say something like:

if (pte  _PAGE_XEN_FOREIGN)
HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ...);
else
xen_set_pte_at(...);


blktap_clear_pte and gntdev_clear_pte do other housekeeping, but do they
have to be done at the same instant as the grant mapping clear?  Could
they be done via some other hook?

(I see Gerd just proposed this, pretty much.)

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 17:17, Derek Murray [EMAIL PROTECTED] wrote:

 Actually I'm not so sure now. Presumably you add VM_PFNMAP to make
 vm_normal_page() return NULL? But actually I would expect pte_pfn() to
 return max_mapnr because the mapped page is not a local page. And that
 should cause vm_normal_page() to return NULL always, regardless of whether
 you assert VM_PFNMAP. Is gntdev being used to grant-and-map local pages in
 the test that causes the crash?
 
 That's right (gntdev is being used to map (but not grant) a local page).
 The test case creates a virtual block device in Dom0, and attempts to
 map its ring buffer in a user-space daemon in Dom0. Therefore pte_pfn
 succeeds.

Need to bite the bullet and fix this properly by setting a software flag in
ptes that are not subject to reference counting.

Unfortunately that also needs a hypervisor interface change, to allow
setting of those pte flags. Easily done though, and we should definitely get
that piece in for 3.2.0.

Setting VM_PFNMAP is bogus. We used to do that for privcmd mappings too, but
we stopped because IIRC it had other unwanted side effects.

 -- Keir


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 20:15, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

 In 2.6.18-xen the only two implementations of zap_pte are
 blktap_clear_pte and gntdev_clear_pte.  Given a ptep with the
 grant-mapping bit set, could we determine which of these need calling
 and do the appropriate thing?  Do we even need separate implementations
 of the core pte-clearing functionality?  Could we just say something like:
 
 if (pte  _PAGE_XEN_FOREIGN)
 HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ...);
 else
 xen_set_pte_at(...);

You'd need to track pte-grant_handle mappings somewhere, but it could
certainly be done this way, yes.

 -- Keir


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
 Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for
 debugging? Indeed, if we did this, could be obviate the need for the
 PTE-zapping hook, by instead catching the case where this flag is set,
 and unmapping the grant implicitly?
 
 Well, in the general case you don't have enough info to know which grant to
 release (a single page can be granted multiple times).

You'll also get the mm and the addr which should make it sufficiently
unique, so this looks like a doable approach to me.

ptep_get_and_clear_full() in include/asm-x86/pgtable_32.h needs to be
changed take care, but that should be possible to do and the change is
local to x86 paravirt_ops, which looks much better to me than touching
generic mm code.

cheers,
  Gerd


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 14:30, Derek Murray [EMAIL PROTECTED] wrote:

 Keir Fraser wrote:
 Is this patch to go into linux-2.6.18-xen.hg then?
 
 Yes, even if it doesn't fix the exact bug we're seeing here, I think it
 should go in. I've attached a version with my signed-off-by and a better
 commit comment.

Actually I'm not so sure now. Presumably you add VM_PFNMAP to make
vm_normal_page() return NULL? But actually I would expect pte_pfn() to
return max_mapnr because the mapped page is not a local page. And that
should cause vm_normal_page() to return NULL always, regardless of whether
you assert VM_PFNMAP. Is gntdev being used to grant-and-map local pages in
the test that causes the crash?

 -- Keir


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray

Stephen C. Tweedie wrote:

So... the interface (a) cannot be used on the Linux VM without at least
one invasive VM modification, due to the requirement of ptes being
explicitly unmapped via hypercall;


Also there is the use of VM_FOREIGN 
(http://xenbits.xensource.com/linux-2.6.18-xen.hg?file/b2768401db94/mm/memory.c 
lines 1040--1059), which has been used quite happily in blktap since 
2005 
(http://lists.xensource.com/archives/html/xen-changelog/2005-07/msg00053.html). 
While it may not be a priority to get gntdev into pv-ops Linux, I should 
imagine that blktap would be fairly critical.



I can't help wondering if this is a hint that now is the time to find a
better API, which doesn't have the requirement (a) that seems to be
causing such trouble?  Are other PV guests --- *BSD, Solaris --- going
to have the same problems with their VM layers if they try to implement
this API?  Upstream Linux pv_ops certainly will, and it would be good if
we could avoid tying unprivileged guests to ABIs which cannot hope to be
merged into pv_ops.


I'm open to suggestions... but I think it always reduces to needing a 
hook that is called on process exit before the PTEs are zapped.



(Just what is the cost of not having this functionality in blktap,
anyway?)


If tapdisk dies whilst holding a granted page, the page can never be 
ungranted, so we leak that page.


Regards,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
  Hi,

 gntdev doesn't even try to handle forking.  I wouldn't be surprised if
 that is a great way to kill Domain-0.  The xen hypervisor will most
 likely not be amused to find a pte refering to a granted (but foreign)
 page which wasn't established using the grant table interface.  Pinning
 the pgd of the child process will most likely fail and make the kernel
 BUG().

Ok, isn't that bad thanks to the VM_DONTCOPY.  The child just doesn't
get the grant mapping.

cheers,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Derek Murray

Gerd Hoffmann wrote:

On this point I completely agree with you! If anyone has any less
radical suggestions, then I'd be delighted to refactor the gntdev code
to use them. However, I'm not currently aware of any alternative that
maintains robustness to process crashes.


Oh, for me it isn't robust at all, it crashes on the first munmap
syscall.  It is the Fedora 8 kernel.  See attachment.  Didn't try
xensource 2.6.18 yet.


My gut feeling is that something changed in mm between 2.6.18 and 
2.6.21, but that seems like a cop out so...



Ideas what is wrong?


Since the bug appears to be in page_remove_rmap, that would tend to 
imply that there is never a corresponding page_add_*_rmap 
(page_add_file_rmap?). My knowledge of the Linux mm code is a bit shaky 
here: should gntdev be doing this? Should we be using install_page (or a 
modified version thereof) to set the PTE?


Also, does a simple program that opens gntdev, maps a grant, 
accesses/writes to the page, and unmaps it (all using the xc_gnttab_* 
functions) work?



Who uses the gntdev device right now?


Good question! I'm aware of it being used in a few research projects, 
and it seems to work for them (though I think it is mostly used with the 
linux-2.6.18-xen kernel). Anyone else?



I think this would represent good progress, though I wonder if there
would be a performance penalty due to performing the mapping and
unmapping in user-space (multiple syscalls per mapping versus a single
hypercall).


I'd expect the hard disk (and how I/O is scheduled) being the
bottleneck, not the syscall overhead.  Nevertheless I plan to benchmark
it once I have it up and running.


Great to hear that you're working on this! Let me know if there's any 
other help I can provide with gntdev.


Cheers,

Derek.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Gerd Hoffmann
Derek Murray wrote:
 Gerd Hoffmann wrote:
 Oh, for me it isn't robust at all, it crashes on the first munmap
 syscall.  It is the Fedora 8 kernel.  See attachment.  Didn't try
 xensource 2.6.18 yet.
 
 My gut feeling is that something changed in mm between 2.6.18 and
 2.6.21, but that seems like a cop out so...

Could be.  Cross checking failed thouth, 2.6.18 doesn't boot the machine
in question (intel devel box with ich9).  Doesn't finds the disk.
Probably the ahci driver is too old.

 Ideas what is wrong?
 
 Since the bug appears to be in page_remove_rmap, that would tend to
 imply that there is never a corresponding page_add_*_rmap
 (page_add_file_rmap?). My knowledge of the Linux mm code is a bit shaky
 here: should gntdev be doing this? Should we be using install_page (or a
 modified version thereof) to set the PTE?

Don't know, I'm just trying to use it.  I did some mm handling for
device drivers back in my video4linux days, but for that it wasn't
needed to be involved into setting/clearing pte entries.  I just had a
-nopage handler allocate the pages the way I needed it for the
userspace mappings of video dma buffers.

 Also, does a simple program that opens gntdev, maps a grant,
 accesses/writes to the page, and unmaps it (all using the xc_gnttab_*
 functions) work?

Didn't try yet.  The application in question (blkbackd) does this:

  * map blk shared ring
  * see the first request come in (kernel trying to read the
partition table).
  * map the grants of the request.
  * perform I/O.
  * Try to unmap the grants of the request.  On the first unmap call
the kernel oopses.

This all without even starting a guest, I'm just using xm block-attach
 to create a blkfront device in Dom0.

 Who uses the gntdev device right now?
 
 Good question! I'm aware of it being used in a few research projects,
 and it seems to work for them (though I think it is mostly used with the
 linux-2.6.18-xen kernel). Anyone else?

So it effectively got no real-world testing yet ...

cheers,
  Gerd


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Gerd Hoffmann
Stephen C. Tweedie wrote:
 Hi,
 
 On Tue, 2007-12-04 at 13:01 +0100, Gerd Hoffmann wrote:
 
 Who uses the gntdev device right now?
 Good question! I'm aware of it being used in a few research projects,
 and it seems to work for them (though I think it is mostly used with the
 linux-2.6.18-xen kernel). Anyone else?
 So it effectively got no real-world testing yet ...
 
 So... the interface (a) cannot be used on the Linux VM without at least
 one invasive VM modification, due to the requirement of ptes being
 explicitly unmapped via hypercall; and (b) isn't used significantly in
 real life yet.

(c) seems not to work for anything non-trivial.  I've compiled and
tested a xensource 2.6.18 kernel (3.1 testing mercurial tree head,
should be 3.1.2-release), it fails in a simliar way.  See attachment.

Want reproduce?  Here we go:

  * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/
  * grab a xenified dom0 kernel without blktap driver (either not
compiled or module not loaded).
  * start xend
  * start blkbackd from xenner package (you probably want the -d switch
for debug output, twice for more).
  * run xm block-attach 0 tap:aio:/path/to/some/file xvda r
  * watch it blow up ;)

 I can't help wondering if this is a hint that now is the time to find a
 better API, which doesn't have the requirement (a) that seems to be
 causing such trouble?  Are other PV guests --- *BSD, Solaris --- going
 to have the same problems with their VM layers if they try to implement
 this API?  Upstream Linux pv_ops certainly will, and it would be good if
 we could avoid tying unprivileged guests to ABIs which cannot hope to be
 merged into pv_ops.

And I fear the problems I've trapped into up to now is only the tip of
the iceberg.  What happens if an application with active grant table
mappings calls fork() ?

cheers,
  Gerd
Linux version 2.6.18-xen ([EMAIL PROTECTED]) (gcc version 4.1.2 20070925 (Red 
Hat 4.1.2-33)) #1 SMP Tue Dec 4 18:17:24 CET 2007
BIOS-provided physical RAM map:
 Xen:  - 0adc3000 (usable)
0MB HIGHMEM available.
173MB LOWMEM available.
On node 0 totalpages: 44483
  DMA zone: 44483 pages, LIFO batch:7
DMI 2.3 present.
ACPI: RSDP (v000 OID_00) @ 0x000f0010
ACPI: RSDT (v001 OID_00 RSDT_000 0x30303030  0x0001) @ 0x0bfffbd0
ACPI: FADT (v001 OID_00 FACP_000 0x30303030  0x0001) @ 0x0bfffb20
ACPI: BOOT (v001 OID_00 BOOT_000 0x30303030  0x0001) @ 0x0bfffba0
ACPI: DSDT (v001 INT440 SYSFexxx 0x1001 MSFT 0x010b) @ 0x
ACPI: Vendor INT440 System SYSFexxx Revision 0x1001 has a known ACPI BIOS 
problem.
ACPI: Reason: Does not use _REG to protect EC OpRegions. This is a 
non-recoverable error
ACPI: Disabling ACPI support
Allocating PCI resources starting at 1000 (gap: 0c00:f3fc)
Detected 600.047 MHz processor.
Built 1 zonelists.  Total pages: 44483
Kernel command line: ro root=/dev/zen/rhel5 apm=off vga=0x317 panic=30
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 1024 (order: 10, 4096 bytes)
Xen reported: 600.034 MHz processor.
Console: colour VGA+ 80x50
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Software IO TLB enabled: 
 Aperture: 2 megabytes
 Kernel range: c0aad000 - c0cad000
 Address size: 24 bits
vmalloc area: cb80-f51fe000, maxmem 2d7fe000
Memory: 155572k/177932k available (1972k kernel code, 14020k reserved, 693k 
data, 192k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1502.07 BogoMIPS (lpj=7510358)
Security Framework v1.0.0 initialized
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0387d1f1     
 
CPU: After vendor identify, caps: 0387d1f1     
 
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU serial number disabled.
CPU: After all inits, caps: 0383d1f1   0040  
 
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 12k freed
Brought up 1 CPUs
migration_cost=0
checking if image is initramfs... it is
Freeing initrd memory: 6538k freed
NET: Registered protocol family 16
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
xen_mem: Initialising balloon driver.
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI quirk: region 1000-103f claimed by PIIX4 ACPI
PCI quirk: region 1400-140f claimed by PIIX4 SMB
PIIX4 devres C PIO at 0398-0399
Boot video device is :00:09.0
PCI: Using IRQ router PIIX/ICH [8086/7198] at :00:07.0
PCI: Cannot allocate resource 

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Gerd Hoffmann
Derek Murray wrote:
 I take the blame for that one. I added the hook because, if a process
 were to die whilst holding one or more grants, there were no hooks that
 would make it possible to carry out the grant-unmap. All existing hooks
 on either the device or the VMA were called *after* the PTEs were cleared.

Hmm.  What exactly is the issue here?

This is about *userspace* mappings, right?  As far as I can see from a
quick scan there of the code is an additional kernel space mapping for
the grants and the userspace mapping is optional.  I don't see any
problems with userspace mapping going away without *instant*
notification.  Cleaning up a bit later, called from the
file_ops-release callback maybe, should work ok.

The problem I see with the additional vm_ops callback is that I suspect
you'll have to come up with some *very* good arguments to get it
accepted by the VM (as in virtual memory) folks and merged mainline.

 It gets better, though. The same hook is used in the version of blktap
 in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for
 xen-3.1-testing):

Oh, I'm thinking more in the direction of killing blktap altogether in
favor of a pure userspace implementation on top of gntdev.

cheers,
  Gerd


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Derek Murray

Gerd Hoffmann wrote:

Derek Murray wrote:

I take the blame for that one. I added the hook because, if a process
were to die whilst holding one or more grants, there were no hooks that
would make it possible to carry out the grant-unmap. All existing hooks
on either the device or the VMA were called *after* the PTEs were cleared.


Hmm.  What exactly is the issue here?

This is about *userspace* mappings, right?  As far as I can see from a
quick scan there of the code is an additional kernel space mapping for
the grants and the userspace mapping is optional.  I don't see any
problems with userspace mapping going away without *instant*
notification.  Cleaning up a bit later, called from the
file_ops-release callback maybe, should work ok.


If we let Linux zap the page tables before we unmap the grant reference, 
then it is not possible to unmap the grant reference. The 
unmap_grant_ref hypercall ultimately calls destroy_grant_pte_mapping in 
xen/arch/x86/mm.c, which ensures that the PTE does in fact point to the 
granted frame. Note also the comment further up in that file (in 
put_page_from_l1e):


/*
 * Check if this is a mapping that was established via a grant 
reference.
 * If it was then we should not be here: we require that such 
mappings are

 * explicitly destroyed via the grant-table interface.
 *
 * The upshot of this is that the guest can end up with active 
grants that

 * it cannot destroy (because it no longer has a PTE to present to the
 * grant-table interface). This can lead to subtle hard-to-catch bugs,
 * hence a special grant PTE flag can be enabled to catch the bug 
early.

 *
 * (Note that the undestroyable active grants are not a security 
hole in
 * Xen. All active grants can safely be cleaned up when the domain 
dies.)

 */

Effectively, there is a debug option that sets a bit in PTEs that map 
granted pages, and this can be used to force a domain_crash in the event 
that a VM tries to zap the entries normally. The normal behaviour is to 
silently accept the zap operation, and leak granted pages until the 
grantee domain is killed.



The problem I see with the additional vm_ops callback is that I suspect
you'll have to come up with some *very* good arguments to get it
accepted by the VM (as in virtual memory) folks and merged mainline.


On this point I completely agree with you! If anyone has any less 
radical suggestions, then I'd be delighted to refactor the gntdev code 
to use them. However, I'm not currently aware of any alternative that 
maintains robustness to process crashes.



It gets better, though. The same hook is used in the version of blktap
in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for
xen-3.1-testing):


Oh, I'm thinking more in the direction of killing blktap altogether in
favor of a pure userspace implementation on top of gntdev.


I think this would represent good progress, though I wonder if there 
would be a performance penalty due to performing the mapping and 
unmapping in user-space (multiple syscalls per mapping versus a single 
hypercall).


Cheers,

Derek Murray.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Derek Murray
I take the blame for that one. I added the hook because, if a process 
were to die whilst holding one or more grants, there were no hooks that 
would make it possible to carry out the grant-unmap. All existing hooks 
on either the device or the VMA were called *after* the PTEs were cleared.


It gets better, though. The same hook is used in the version of blktap 
in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for 
xen-3.1-testing):


http://xenbits.xensource.com/linux-2.6.18-xen.hg?file/fd879c0688bf/drivers/xen/blktap/blktap.c

Reverting back to the old (hookless) behaviour would be a retrograde 
step IMHO.


Cheers,

Derek Murray.

Gerd Hoffmann wrote:

Stephen C. Tweedie wrote:

Hi all,

  driver domains


Looked at the gntdev (grant table mappings for user space) driver,
noticed that one is not self-contained.  It needs a hook for page unmapping:

  http://xenbits.xensource.com/xen-3.1-testing.hg?rev/7180d2e61f92
  plus an s/ptep_get_and_clear_full/zap_pte/ fixup a few changesets
  later.

Upstreaming that one could become *uhm* intresting.  Nevertheless the
gntdev functionality is quite useful for writing pure userspace
backend drivers ...

cheers,
  Gerd

___
Xen-devel mailing list
[EMAIL PROTECTED]
http://lists.xensource.com/xen-devel


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Mark Williamson
  It gets better, though. The same hook is used in the version of blktap
  in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for
  xen-3.1-testing):
 
  Oh, I'm thinking more in the direction of killing blktap altogether in
  favor of a pure userspace implementation on top of gntdev.

 I think this would represent good progress, though I wonder if there
 would be a performance penalty due to performing the mapping and
 unmapping in user-space (multiple syscalls per mapping versus a single
 hypercall).

Maybe a change to the gntdev userspace API to allow batching of mapping 
requests?

I'm not aware of a batched mmap interface, which would seem to be the ideal 
solution; but it should be possible to batch this stuff somehow.  Although it 
seems like some kind of really weird ioctl might be needed :-S to do it 
*without* such a batched interface...

blktap in userspace, if any performance problems can be addressed, would seem 
to be a far nicer way of doing things.  And it's less code to merge 
upstream ;-)

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


RE: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread D.G. Murray
Hi Mark, 

 Maybe a change to the gntdev userspace API to allow batching 
 of mapping requests?

Something along the lines of the following?

/**
 * Memory maps one or more grant references from one or more domains to a
 * contiguous local address range. Mappings should be unmapped with
 * xc_gnttab_munmap. Returns NULL on failure.
 *
 * @parm xcg_handle a handle on an open grant table interface
 * @parm count the number of grant references to be mapped
 * @parm domids an array of @count domain IDs by which the corresponding
@refs
 * were granted
 * @parm refs an array of @count grant references to be mapped
 * @parm prot same flag as in mmap()
 */
void *xc_gnttab_map_grant_refs(int xcg_handle,
   uint32_t count,
   uint32_t *domids,
   uint32_t *refs,
   int prot); 

http://xenbits.xensource.com/xen-unstable.hg?file/3057f813da14/tools/libxc/x
enctrl.h

 blktap in userspace, if any performance problems can be 
 addressed, would seem to be a far nicer way of doing things.  
 And it's less code to merge upstream ;-)

Agreed.

Cheers,

Derek.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Mark Williamson
 Hi Mark,

  Maybe a change to the gntdev userspace API to allow batching
  of mapping requests?

 Something along the lines of the following?

Just like that :-D

When you said multiple syscalls per mapping I assumed you meant that we'd 
lose the batching you get by doing a mulicall.  If it's just a couple of 
syscalls (plus, presumably a couple of hypercalls) per batch of mappings, my 
gut says it's probably not going to hurt block performance.  My guts have 
been wrong in (many!) ways before of course...

I guess the overhead *could* be reduced even more by just having a magic ioctl 
that did all the mmap-ing stuff in one operation, but that'd probably be 
really gross if it wasn't necessary!  And I doubt it'd make upstream very 
happy...

We'll also be eliminating the overheads involved in having a blktap ring for 
talking to userspace and having to move requests between that ring and the 
real block ring, so there's some definite wins in overheads as well.

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jan Beulich
 It breaks with:

 Intel machine check architecture supported.
 (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 
 to
 :.
 Intel machine check reporting enabled on CPU#0.
 general protection fault:  [#1] SMP
 Modules linked in:
   

Hm.  Looks like Xen is getting upset about dom0 trying to disable
caching.  No, wait: 0x:?  That's strange; I wonder if
its just misreporting the value, because the code doesn't look like its
trying to write that.

Either way, the fix is to implement xen_write_cr0, and mask off any bits
that Xen won't want us to set/clear (or if it doesn't allow dom0 to
change cr0, just ignore all updates).

Why do you think that's a CR0 write? The messages clearly indicate an
MSR write, and these writes are clearly visible in intel_p{4,6}_mcheck_init()
and amd_mcheck_init(). The question is why intel_p4_mcheck_init() doesn't
check CPUID bits before trying to touch any registers... (And similarly
amd_mcheck_init() is checking only the MCE bit, not the MCA one.)

But then I just noticed that Xen itself doesn't clear the MCE/MCA bits either
in emulate_forced_invalid_op(), apparently under the assumption that PV
guests wouldn't try to make use of this feature.

A simple workaround would be to force mce_disabled to 1 in early Xen
initialization.

Jan


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jeremy Fitzhardinge
Jan Beulich wrote:
 It breaks with:

 Intel machine check architecture supported.
 (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from 
 :0001 to
 :.
 Intel machine check reporting enabled on CPU#0.
 general protection fault:  [#1] SMP
 Modules linked in:
   
   
 Hm.  Looks like Xen is getting upset about dom0 trying to disable
 caching.  No, wait: 0x:?  That's strange; I wonder if
 its just misreporting the value, because the code doesn't look like its
 trying to write that.

 Either way, the fix is to implement xen_write_cr0, and mask off any bits
 that Xen won't want us to set/clear (or if it doesn't allow dom0 to
 change cr0, just ignore all updates).
 

 Why do you think that's a CR0 write? 

Well, the oops says EIP is at native_write_cr0+0x0/0x4, and the caller
is prepare_set(), which does:

/*  Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
cr0 = read_cr0() | X86_CR0_CD;
write_cr0(cr0);
wbinvd();

This is in preparation to setting up the MTRRs, which needs to be all
skipped anyway.

 The messages clearly indicate an
 MSR write, and these writes are clearly visible in intel_p{4,6}_mcheck_init()
 and amd_mcheck_init(). The question is why intel_p4_mcheck_init() doesn't
 check CPUID bits before trying to touch any registers... (And similarly
 amd_mcheck_init() is checking only the MCE bit, not the MCA one.)
   

The oops and backtrace doesn't suggest it's an MSR write.  Does a crX
write take the same path through the emulator as an MSR write?

 A simple workaround would be to force mce_disabled to 1 in early Xen
 initialization.
   

That's probably necessary too.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jan Beulich
The oops and backtrace doesn't suggest it's an MSR write.  Does a crX

Oh, right, the MSR write is being ignored, not failed.

write take the same path through the emulator as an MSR write?

No, the two operations take different paths.

Jan


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-26 Thread Juan Quintela
Hi,

your console works great, but rest of patches are assuming:

arch/x86/boot/compressed/notes-xen.c
arch/x86/xen/early.c

at least.  It looks as if there is missing another patche, could you
take a look, please?
Otherwise, I will take a look at what is missing.

It breaks with:

Intel machine check architecture supported.
(XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 to
:.
Intel machine check reporting enabled on CPU#0.
general protection fault:  [#1] SMP
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.24-rc3-q2 #10)
EIP: 0061:[c0101790] EFLAGS: 00010082 CPU: 0
EIP is at native_write_cr0+0x0/0x4
EAX: c005003b EBX: c03902a0 ECX: ed03f288 EDX: 0005
ESI: c1c10c80 EDI: ed054200 EBP: 0001 ESP: ed027eb8
 DS: 007b ES: 007b FS: 00d8 GS:  SS: e021
Process swapper (pid: 1, ti=ed027000 task=ed03ebb0 task.ti=ed027000)
Stack: c01125e9  c03902a0 c1c10c80 ed054200 c01128c6 c03900a0 0008
   c010e0aa c037b48d  ed00efa0 ed027f24 000a c035215c c01e20a7
   c1c10c80 8008 06f4 00020800 c0143563 ed03ebb0 017fe000 c03902a0
Call Trace:
 [c01125e9] prepare_set+0x20/0x86
 [c01128c6] generic_set_all+0x28/0x34a
 [c010e0aa] identify_cpu+0x525/0x52d
 [c01e20a7] kvasprintf+0x3f/0x48
 [c0143563] trace_hardirqs_off+0x28/0xa1
 [c0111ac6] mtrr_ap_init+0x33/0x5d
 [c0117547] smp_store_cpu_info+0x32/0xb9
 [c0104e78] xen_cpu_up+0x22c/0x3b4
 [c0148fdf] _cpu_up+0xab/0x120
 [c014913e] cpu_up+0x4e/0x61
 [c03d33f8] kernel_init+0x9e/0x2c6
 [c0107017] restore_nocheck+0x12/0x15
 [c03d335a] kernel_init+0x0/0x2c6
 [c03d335a] kernel_init+0x0/0x2c6
 [c0107c7f] kernel_thread_helper+0x7/0x10
 ===
Code: 53 89 cb 83 ec 08 89 14 24 89 da 8b 04 24 89 4c 24 04 89 f9 0f 30 31 c0 5a
 59 5b 5e 5f c3 0f 31 c3 0f 33 c3 0f 06 c3 0f 20 c0 c3 0f 22 c0 c3 0f 20 e0 c3
 31 c0 0f 20 e0 c3 0f 09 c3 0f 01 00 c3
EIP: [c0101790] native_write_cr0+0x0/0x4 SS:ESP e021:ed027eb8
Kernel panic - not syncing: Attempted to kill init!


Later, Juan.


On Nov 22, 2007 12:12 AM, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:
 Stephen C. Tweedie wrote:
  I've been looking at the next steps to try to get Xen running fully on
  top of pv_ops.  To that end, I've (just) started looking at one of the
  next major jobs --- i686 dom0 on pv_ops.
 

 Great!

  There are still a number of things needing done to reach parity with
  xen-unstable:
 
x86_64 xen on pv_ops
 

 I think once pvops has been unified, Xen support should be fairly
 straightforward.  I wrote most of the existing code with 64-bit in mind,
 so I'm hoping I got it right...

Paravirt framebuffer/keyboard
CPU hotplug
Balloon
 

 I've done some preliminary work on balloon and hotplug.  I think balloon
 should make more use of memory hotplug, but a straight port would be a
 good first step.

kexec
driver domains
 
  but it looks like these can largely proceed in parallel if desired.
 
  My short-term goal with this is simply to come up with a first-pass
  merge of the linux-2.6.18-xen.hg dom0 support into the current
  kernel.org tree's pv_ops support.  No major refactoring in the first
  pass, but absolutely no *-xen.c code copying.
 

 Yes.  #ifdefs are the way to go here.

  I'm just starting this, but at least with the version magic check (see
 

  http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00601.html
 

 I was just about to post a fix for this.

  ) out of the way, an SMP dom0 running pv_ops gets all the way through
  start_kernel() and into rest_init() before dying with an unsupported cr0
  write.  (I'm using direct console hypercalls for printk for now, full
  xencons is not working yet.)
 

 I have some early dom0 patches already, though they're a few months old
 now.  Not much there, but I did do an early console implementation.

  I'm happy to put up a git tree for this once it gets anywhere.  We'd
  need to decide which tree to track for that purpose --- Linus's, or
  perhaps the tglx or mingo x86 merge tree might make more sense.
 

 Yes, I think the x86 tree is where we need to be, since there's a lot of
 activity there.

 I'll attach my dom0 patches for whatever use you can make of them.  The
 definitely won't apply to anything, not least because of the arch merge
 (though it looks like they did get converted by script), but also
 because they're based on some defunct experimental booting-from-bzImage
 patches.  But perhaps there's some useful stuff in there.

 I've also attached my xen-balloon and hotplug patches as-is.  They don't
 work completely, but they should be closer to applying.

 J

 ___
 Xen-devel mailing list
 [EMAIL PROTECTED]
 http://lists.xensource.com/xen-devel


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization