Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-26 Thread Anthony Liguori

Avi Kivity wrote:

rx and tx are closely related. You rarely have one without the other.

In fact, a turned implementation should have zero kicks or interrupts 
for bulk transfers. The rx interrupt on the host will process new tx 
descriptors and fill the guest's rx queue; the guest's transmit 
function can also check the receive queue. I don't know if that's 
achievable for Linuz guests currently, but we should aim to make it 
possible.


ATM, the net driver does a pretty good job of disabling kicks/interrupts 
unless they are needed.  Checking for rx on tx and vice versa is a good 
idea and could further help there.  I'll give it a try this week.


Another point is that virtio still has a lot of leading zeros in its 
mileage counter. We need to keep things flexible and learn from others 
as much as possible, especially when talking about the ABI.


Yes, after thinking about it over holiday, I agree that we should at 
least introduce a virtio-pci feature bitmask.  I'm not inclined to 
attempt to define a hypercall ABI or anything like that right now but 
having the feature bitmask will at least make it possible to do such a 
thing in the future.


I'm wary of introducing the notion of hypercalls to this device 
because it makes the device VMM specific.  Maybe we could have the 
device provide an option ROM that was treated as the device "BIOS" 
that we could use for kicking and interrupt acking?  Any idea of how 
that would map to Windows?  Are there real PCI devices that use the 
option ROM space to provide what's essentially firmware?  
Unfortunately, I don't think an option ROM BIOS would map well to 
other architectures.


  


The BIOS wouldn't work even on x86 because it isn't mapped to the 
guest address space (at least not consistently), and doesn't know the 
guest's programming model (16, 32, or 64-bits? segmented or flat?)


Xen uses a hypercall page to abstract these details out. However, I'm 
not proposing that. Simply indicate that we support hypercalls, and 
use some layer below to actually send them. It is the responsibility 
of this layer to detect if hypercalls are present and how to call them.


Hey, I think the best place for it is in paravirt_ops. We can even 
patch the hypercall instruction inline, and the driver doesn't need to 
know about it.


Yes, paravirt_ops is attractive for abstracting the hypercall calling 
mechanism but it's still necessary to figure out how hypercalls would be 
identified.  I think it would be necessary to define a virtio specific 
hypercall space and use the virtio device ID to claim subspaces.


For instance, the hypercall number could be (virtio_devid << 16) | (call 
number).  How that translates into a hypercall would then be part of the 
paravirt_ops abstraction.  In KVM, we may have a single virtio hypercall 
where we pass the virtio hypercall number as one of the arguments or 
something like that.



Not much of an argument, I know.


wrt. number of queues, 8 queues will consume 32 bytes of pci space 
if all you store is the ring pfn.

You also at least need a num argument which takes you to 48 or 64 
depending on whether you care about strange formatting.  8 queues 
may not be enough either.  Eric and I have discussed whether the 9p 
virtio device should support multiple mounts per-virtio device and 
if so, whether each one should have it's own queue.  Any devices 
that supports this sort of multiplexing will very quickly start 
using a lot of queues.

Make it appear as a pci function?  (though my feeling is that 
multiple mounts should be different devices; we can then hotplug 
mountpoints).



We may run out of PCI slots though :-/
  


Then we can start selling virtio extension chassis.


:-)  Do you know if there is a hard limit on the number of devices on a 
PCI bus?  My concern was that it was limited by something stupid like an 
8-bit identifier.


Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-26 Thread Jeremy Fitzhardinge
Juan Quintela wrote:
> Hi,
>
> your console works great, but rest of patches are assuming:
>
> arch/x86/boot/compressed/notes-xen.c
> arch/x86/xen/early.c
>   

Yes, those are leftovers from a somewhat unsuccessful attempt at getting
ELF-in-bzImage booting working.  I need to go back and make bzImage
booting work properly.

I posted those patches as a source of possibly useful code
snippets/summary of things I've looked at so far, rather than something
that can be directly used.

> at least.  It looks as if there is missing another patche, could you
> take a look, please?
> Otherwise, I will take a look at what is missing.
>
> It breaks with:
>
> Intel machine check architecture supported.
> (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 
> to
> :.
> Intel machine check reporting enabled on CPU#0.
> general protection fault:  [#1] SMP
> Modules linked in:
>   

Hm.  Looks like Xen is getting upset about dom0 trying to disable
caching.  No, wait: 0x:?  That's strange; I wonder if
its just misreporting the value, because the code doesn't look like its
trying to write that.

Either way, the fix is to implement xen_write_cr0, and mask off any bits
that Xen won't want us to set/clear (or if it doesn't allow dom0 to
change cr0, just ignore all updates).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-26 Thread Juan Quintela
Hi,

your console works great, but rest of patches are assuming:

arch/x86/boot/compressed/notes-xen.c
arch/x86/xen/early.c

at least.  It looks as if there is missing another patche, could you
take a look, please?
Otherwise, I will take a look at what is missing.

It breaks with:

Intel machine check architecture supported.
(XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 to
:.
Intel machine check reporting enabled on CPU#0.
general protection fault:  [#1] SMP
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.24-rc3-q2 #10)
EIP: 0061:[] EFLAGS: 00010082 CPU: 0
EIP is at native_write_cr0+0x0/0x4
EAX: c005003b EBX: c03902a0 ECX: ed03f288 EDX: 0005
ESI: c1c10c80 EDI: ed054200 EBP: 0001 ESP: ed027eb8
 DS: 007b ES: 007b FS: 00d8 GS:  SS: e021
Process swapper (pid: 1, ti=ed027000 task=ed03ebb0 task.ti=ed027000)
Stack: c01125e9  c03902a0 c1c10c80 ed054200 c01128c6 c03900a0 0008
   c010e0aa c037b48d  ed00efa0 ed027f24 000a c035215c c01e20a7
   c1c10c80 8008 06f4 00020800 c0143563 ed03ebb0 017fe000 c03902a0
Call Trace:
 [] prepare_set+0x20/0x86
 [] generic_set_all+0x28/0x34a
 [] identify_cpu+0x525/0x52d
 [] kvasprintf+0x3f/0x48
 [] trace_hardirqs_off+0x28/0xa1
 [] mtrr_ap_init+0x33/0x5d
 [] smp_store_cpu_info+0x32/0xb9
 [] xen_cpu_up+0x22c/0x3b4
 [] _cpu_up+0xab/0x120
 [] cpu_up+0x4e/0x61
 [] kernel_init+0x9e/0x2c6
 [] restore_nocheck+0x12/0x15
 [] kernel_init+0x0/0x2c6
 [] kernel_init+0x0/0x2c6
 [] kernel_thread_helper+0x7/0x10
 ===
Code: 53 89 cb 83 ec 08 89 14 24 89 da 8b 04 24 89 4c 24 04 89 f9 0f 30 31 c0 5a
 59 5b 5e 5f c3 0f 31 c3 0f 33 c3 0f 06 c3 0f 20 c0 c3 <0f> 22 c0 c3 0f 20 e0 c3
 31 c0 0f 20 e0 c3 0f 09 c3 0f 01 00 c3
EIP: [] native_write_cr0+0x0/0x4 SS:ESP e021:ed027eb8
Kernel panic - not syncing: Attempted to kill init!


Later, Juan.


On Nov 22, 2007 12:12 AM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> Stephen C. Tweedie wrote:
> > I've been looking at the next steps to try to get Xen running fully on
> > top of pv_ops.  To that end, I've (just) started looking at one of the
> > next major jobs --- i686 dom0 on pv_ops.
> >
>
> Great!
>
> > There are still a number of things needing done to reach parity with
> > xen-unstable:
> >
> >   x86_64 xen on pv_ops
> >
>
> I think once pvops has been unified, Xen support should be fairly
> straightforward.  I wrote most of the existing code with 64-bit in mind,
> so I'm hoping I got it right...
>
> >   Paravirt framebuffer/keyboard
> >   CPU hotplug
> >   Balloon
> >
>
> I've done some preliminary work on balloon and hotplug.  I think balloon
> should make more use of memory hotplug, but a straight port would be a
> good first step.
>
> >   kexec
> >   driver domains
> >
> > but it looks like these can largely proceed in parallel if desired.
> >
> > My short-term goal with this is simply to come up with a first-pass
> > merge of the linux-2.6.18-xen.hg dom0 support into the current
> > kernel.org tree's pv_ops support.  No major refactoring in the first
> > pass, but absolutely no *-xen.c code copying.
> >
>
> Yes.  #ifdefs are the way to go here.
>
> > I'm just starting this, but at least with the version magic check (see
> >
> >   
> > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00601.html
> >
>
> I was just about to post a fix for this.
>
> > ) out of the way, an SMP dom0 running pv_ops gets all the way through
> > start_kernel() and into rest_init() before dying with an unsupported cr0
> > write.  (I'm using direct console hypercalls for printk for now, full
> > xencons is not working yet.)
> >
>
> I have some early dom0 patches already, though they're a few months old
> now.  Not much there, but I did do an early console implementation.
>
> > I'm happy to put up a git tree for this once it gets anywhere.  We'd
> > need to decide which tree to track for that purpose --- Linus's, or
> > perhaps the tglx or mingo x86 merge tree might make more sense.
> >
>
> Yes, I think the x86 tree is where we need to be, since there's a lot of
> activity there.
>
> I'll attach my dom0 patches for whatever use you can make of them.  The
> definitely won't apply to anything, not least because of the arch merge
> (though it looks like they did get converted by script), but also
> because they're based on some defunct experimental booting-from-bzImage
> patches.  But perhaps there's some useful stuff in there.
>
> I've also attached my xen-balloon and hotplug patches as-is.  They don't
> work completely, but they should be closer to applying.
>
> J
>
> ___
> Xen-devel mailing list
> [EMAIL PROTECTED]
> http://lists.xensource.com/xen-devel
>
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization