Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
For some reason your mails in this thread only appear in the gmail web UI and not on the IMAP version of my mailbox (my own and Michael's mails are fine). So I'm replying via the web interface, sorry for the inevitable formatting mess :-/ I've CCd another mailbox in the hopes of getting your mails in that IMAP folder instead/aswell so I can avoid this next time. On 12 April 2016 at 14:59, Stefan Hajnocziwrote: > > > > > One wrinkle I came across, which I'm not sure if it is by design or a > > > > problem is that I can see this sequence coming from the guest (with > > > > other activity in between): > > > > > > > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX > > > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX > > > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX > > How did you trigger this sequence? I'd like to reproduce it. > Nothing magic. I've written some logging into my backend and captured the result for a simple backend initiated connection. In the log "TX" and "RX" indicate the thread doing the processing (with "TX" being the one which processes the guest's TX ring, i.e. data coming from the guest to the host). "<=" indicates a buffer going from guest to host and "=>" is from host to guest. NB that guest to host replies are queued synchronously by the TX thread onto the RX ring which is why the somewhat odd looking "TX: =>" combination can occur. A host initiated connection also happens from the TX thread in the same way. The trace is of a simple request response (which both fit in one buffer in each direction), the lines without an "?X:" prefix are my annotations/guesses as to what is going on: TX: =>SRC:0002.00010002 DST:0003.0948 TX: LEN: TYPE:0001 OP:1=REQUEST TX: FLAGS: BUF_ALLOC:8000 FWD_CNT: TX: <=SRC:0003.0948 DST:0002.00010002 TX: LEN: TYPE:0001 OP:2=RESPONSE TX: FLAGS: BUF_ALLOC:0004 FWD_CNT: REQUEST + RESPONSE == Channel open successfully RX: =>SRC:0002.00010002 DST:0003.0948 RX: LEN:005e TYPE:0001 OP:5=RW RX: FLAGS: BUF_ALLOC:8000 FWD_CNT: Host sends a request to the guest TX: <=SRC:0003.0948 DST:0002.00010002 TX: LEN: TYPE:0001 OP:6=CREDIT_UPDATE TX: FLAGS: BUF_ALLOC:0004 FWD_CNT:005e Guest replies with a credit update TX: <=SRC:0003.0948 DST:0002.00010002 TX: LEN:0091 TYPE:0001 OP:5=RW TX: FLAGS: BUF_ALLOC:0004 FWD_CNT:005e Guest replies with the answer to the request RX: =>SRC:0002.00010002 DST:0003.0948 RX: LEN: TYPE:0001 OP:4=SHUTDOWN RX: FLAGS:0002 BUF_ALLOC:8000 FWD_CNT:0091 Host has sent its only request, so host app must have done shutdown(SHUT_WR) I suppose and host therefore sends SHUTDOWN_TX. TX: <=SRC:0003.0948 DST:0002.00010002 TX: LEN: TYPE:0001 OP:4=SHUTDOWN TX: FLAGS:0001 BUF_ALLOC:0004 FWD_CNT:005e Guest SHUTDOWN_RX. I'm not sure if this is a direct kernel response to the SHUTDOWN_TX or if the application inside the guest saw an EOF when reading the socket and did the corresponding shutdown(SHUT_RD). TX: <=SRC:0003.0948 DST:0002.00010002 TX: LEN: TYPE:0001 OP:4=SHUTDOWN TX: FLAGS:0002 BUF_ALLOC:0004 FWD_CNT:005e Guest SHUTDOWN_TX, I presume that having sent the only response it is going to it then does shutdown(SHUT_WR). TX: <=SRC:0003.0948 DST:0002.00010002 TX: LEN: TYPE:0001 OP:4=SHUTDOWN TX: FLAGS:0003 BUF_ALLOC:0004 FWD_CNT:005e Guest shuts down both directions. Perhaps the guest end is turning shutdown(foo) directly into a vsock message without or-ing in the current state? > > > I orignally had my backend close things down at #2, however this meant > > > > that when #3 arrived it was for a non-existent socket (or, worse, an > > > > active one if the ports got reused). I checked v5 of the spec > > > > proposal[0] which says: > > > > If these bits are set and there are no more virtqueue buffers > > > > pending the socket is disconnected. > > > > > > > > but I'm not entirely sure if this behaviour contradicts this or not > > > > (the bits have both been set at #2, but not at the same time). > > > > > > > > BTW, how does one tell if there are no more virtqueue buffers pending > > > > or not while processing the op? > > > > > > #2 is odd. The shutdown bits are sticky so they cannot be cleared once > > > set. I would have expected just #1 and #3. The behavior you observe > > > look like a bug. > > > > > > The spec text does not convey the meaning of OP_SHUTDOWN well. > > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible > > > for this connection. "there are no more virtqueue buffers pending the > > > socket" really means that this isn't an immediate close from the > > > perspective of the application. If the application still has unread rx > > >
Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
Some how Stefan's reply disapeared from my INBOX (although I did see it) so replying here. On Mon, 2016-04-11 at 15:54 +0300, Michael S. Tsirkin wrote: > On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote: > > > > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote: > > > > > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote: > > > > > > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6). > > > > > > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock > > > > but > > > > first I want to share the latest version of the code. Several people > > > > are > > > > playing with vsock now so sharing the latest code should avoid > > > > duplicate work. > > > Thanks for this, I've been using it in my project and it mostly seems > > > fine. > > > > > > One wrinkle I came across, which I'm not sure if it is by design or a > > > problem is that I can see this sequence coming from the guest (with > > > other activity in between): > > > > > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX > > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX > > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX > > > > > > I orignally had my backend close things down at #2, however this meant > > > that when #3 arrived it was for a non-existent socket (or, worse, an > > > active one if the ports got reused). I checked v5 of the spec > > > proposal[0] which says: > > > If these bits are set and there are no more virtqueue buffers > > > pending the socket is disconnected. > > > > > > but I'm not entirely sure if this behaviour contradicts this or not > > > (the bits have both been set at #2, but not at the same time). > > > > > > BTW, how does one tell if there are no more virtqueue buffers pending > > > or not while processing the op? > > #2 is odd. The shutdown bits are sticky so they cannot be cleared once > > set. I would have expected just #1 and #3. The behavior you observe > > look like a bug. > > > > The spec text does not convey the meaning of OP_SHUTDOWN well. > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible > > for this connection. "there are no more virtqueue buffers pending the > > socket" really means that this isn't an immediate close from the > > perspective of the application. If the application still has unread rx > > buffers then the socket stays readable until the rx data has been fully > > read. Thanks, distinguishing the local buffer to the application from the vring would make that clearer. Perhaps by not talking about "virtqueue buffers" since they sound like a vring thing. However, as Michael observes I'm not sure that's the whole story. > Yes but you also wrote: > If these bits are set and there are no more virtqueue buffers > pending the socket is disconnected. > > how does remote know that there are no buffers pending and so it's safe > to reuse the same source/destination address now? Indeed this is one of the things I struggled with. e.g. If I send a SHUTDOWN_RX to my peer am I supposed to wait for that buffer to come back (so I know the peer has seen it) and then wait for an entire "cycle" of the TX ring to know there is nothing still in flight? That's some tricky book-keeping. > Maybe destination > should send RST at that point? i.e. upon receipt of SHUTDOWN_RX|SHUTDOWN_TX from the peer you are expected to send a RST. When the peer observes that then they know there is no further data in that connection on the ring? That sounds like it would be helpful. > > > Another thing I noticed, which is really more to do with the generic > > > AF_VSOCK bits than anything to do with your patches is that there is no > > > limitations on which vsock ports a non-privileged user can bind to and > > > relatedly that there is no netns support so e.g. users in unproivileged > > > containers can bind to any vsock port and talk to the host, which might > > > be undesirable. For my use for now I just went with the big hammer > > > approach of denying access from anything other than init_net > > > namespace[1] while I consider what the right answer is. > > From the vhost point of view each netns should have its own AF_VSOCK > > namespace. This way two containers could act as "the host" (CID 2) for > > their respective guests. When you say "should" you mean that's the intended design as opposed to what the current code is actually doing, right? Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote: > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6). > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but > first I want to share the latest version of the code. Several people are > playing with vsock now so sharing the latest code should avoid duplicate work. Thanks for this, I've been using it in my project and it mostly seems fine. One wrinkle I came across, which I'm not sure if it is by design or a problem is that I can see this sequence coming from the guest (with other activity in between): 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX I orignally had my backend close things down at #2, however this meant that when #3 arrived it was for a non-existent socket (or, worse, an active one if the ports got reused). I checked v5 of the spec proposal[0] which says: If these bits are set and there are no more virtqueue buffers pending the socket is disconnected. but I'm not entirely sure if this behaviour contradicts this or not (the bits have both been set at #2, but not at the same time). BTW, how does one tell if there are no more virtqueue buffers pending or not while processing the op? Another thing I noticed, which is really more to do with the generic AF_VSOCK bits than anything to do with your patches is that there is no limitations on which vsock ports a non-privileged user can bind to and relatedly that there is no netns support so e.g. users in unproivileged containers can bind to any vsock port and talk to the host, which might be undesirable. For my use for now I just went with the big hammer approach of denying access from anything other than init_net namespace[1] while I consider what the right answer is. Ian. [0] http://thread.gmane.org/gmane.comp.emulators.virtio.devel/1092 [1] From 366c9c42afb9bd54f92f72518470c09e46f12e88 Mon Sep 17 00:00:00 2001 From: Ian Campbell <ian.campb...@docker.com> Date: Mon, 4 Apr 2016 14:50:10 +0100 Subject: [PATCH] VSOCK: Only allow host network namespace to use AF_VSOCK. The VSOCK addressing schema does not really lend itself to simply creating an alternative end point address within a namespace. Signed-off-by: Ian Campbell <ian.campb...@docker.com> --- net/vmw_vsock/af_vsock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 1e5f5ed..cdb3dd3 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1840,6 +1840,9 @@ static const struct proto_ops vsock_stream_ops = { static int vsock_create(struct net *net, struct socket *sock, int protocol, int kern) { + if (!net_eq(net, _net)) + return -EAFNOSUPPORT; + if (!sock) return -EINVAL; -- 2.8.0.rc3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration
On Thu, 2014-10-30 at 07:45 -0700, Andy Lutomirski wrote: Xen does not have a continual source of entropy and the only feasible way is for the toolstack to provide each guest with a fixed size pool of random data during guest creation. Xen could seed a very simple per-guest DRBG at guest startup and then let the rdmsr call read from it. I think I'm a bit confused by the intended scope of this facility. The original spec said: Note that the CommonHV RNG is not intended to replace stronger, asynchronous paravirtual random number generator interfaces. It is intended primarily for seeding guest RNGs early in boot. Which to me reads that the guest should be using this facility to seed it's own simple DRBG on boot (with some finite amount of seed data from the hv) and then using that until it can switch to something better. Is that not the intention? I think it's important to nail down the intended scope of this interface, since it has quite an impact on what would be considered a reasonable common design. Post boot I would as you say expect most OSes to switch over to something more capable, not continue to rely on this facility for the duration. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH next] xen: Use more current logging styles
On Thu, 2013-06-27 at 21:57 -0700, Joe Perches wrote: Instead of mixing printk and pr_level forms, just use pr_level Miscellaneous changes around these conversions: Add a missing newline to avoid message interleaving, coalesce formats, reflow modified lines to 80 columns. Signed-off-by: Joe Perches j...@perches.com Acked-by: Ian Campbell ian.campb...@citrix.com --- drivers/net/xen-netback/netback.c | 7 +++ drivers/net/xen-netfront.c| 28 +--- 2 files changed, 16 insertions(+), 19 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 130bcb2..64828de 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1890,9 +1890,8 @@ static int __init netback_init(void) return -ENODEV; if (fatal_skb_slots XEN_NETBK_LEGACY_SLOTS_MAX) { - printk(KERN_INFO -xen-netback: fatal_skb_slots too small (%d), bump it to XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n, -fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX); + pr_info(fatal_skb_slots too small (%d), bump it to XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n, + fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX); fatal_skb_slots = XEN_NETBK_LEGACY_SLOTS_MAX; } @@ -1921,7 +1920,7 @@ static int __init netback_init(void) netback/%u, group); if (IS_ERR(netbk-task)) { - printk(KERN_ALERT kthread_create() fails at netback\n); + pr_alert(kthread_create() fails at netback\n); del_timer(netbk-net_timer); rc = PTR_ERR(netbk-task); goto failed_init; diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 76a2236..ff7f111 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -29,6 +29,8 @@ * IN THE SOFTWARE. */ +#define pr_fmt(fmt) KBUILD_MODNAME : fmt + #include linux/module.h #include linux/kernel.h #include linux/netdevice.h @@ -385,9 +387,8 @@ static void xennet_tx_buf_gc(struct net_device *dev) skb = np-tx_skbs[id].skb; if (unlikely(gnttab_query_foreign_access( np-grant_tx_ref[id]) != 0)) { - printk(KERN_ALERT xennet_tx_buf_gc: warning --- grant still in use by backend -domain.\n); + pr_alert(%s: warning -- grant still in use by backend domain\n, + __func__); BUG(); } gnttab_end_foreign_access_ref( @@ -804,14 +805,14 @@ static int xennet_set_skb_gso(struct sk_buff *skb, { if (!gso-u.gso.size) { if (net_ratelimit()) - printk(KERN_WARNING GSO size must not be zero.\n); + pr_warn(GSO size must not be zero\n); return -EINVAL; } /* Currently only TCPv4 S.O. is supported. */ if (gso-u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) { if (net_ratelimit()) - printk(KERN_WARNING Bad GSO type %d.\n, gso-u.gso.type); + pr_warn(Bad GSO type %d\n, gso-u.gso.type); return -EINVAL; } @@ -910,9 +911,8 @@ static int checksum_setup(struct net_device *dev, struct sk_buff *skb) break; default: if (net_ratelimit()) - printk(KERN_ERR Attempting to checksum a non- -TCP/UDP packet, dropping a protocol - %d packet, iph-protocol); + pr_err(Attempting to checksum a non-TCP/UDP packet, dropping a protocol %d packet\n, +iph-protocol); goto out; } @@ -1359,14 +1359,14 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev) /* A grant for every tx ring slot */ if (gnttab_alloc_grant_references(TX_MAX_TARGET, np-gref_tx_head) 0) { - printk(KERN_ALERT netfront can't alloc tx grant refs\n); + pr_alert(can't alloc tx grant refs\n); err = -ENOMEM; goto exit_free_stats; } /* A grant for every rx ring slot */ if (gnttab_alloc_grant_references(RX_MAX_TARGET, np-gref_rx_head) 0) { - printk(KERN_ALERT netfront can't alloc rx grant refs\n); + pr_alert(can't alloc rx grant refs\n); err = -ENOMEM; goto exit_free_tx; } @@ -1430,16 +1430,14 @@ static int netfront_probe(struct
Re: [PATCH next] xen: Convert printks to pr_level
On Fri, 2013-06-28 at 03:21 -0700, Joe Perches wrote: Convert printks to pr_level (excludes printk(KERN_DEBUG...) to be more consistent throughout the xen subsystem. Add pr_fmt with KBUILD_MODNAME or xen: KBUILD_MODNAME Coalesce formats and add missing word spaces Add missing newlines Align arguments and reflow to 80 columns Remove DRV_NAME from formats as pr_fmt adds the same content This does change some of the prefixes of these messages but it also does make them more consistent. Signed-off-by: Joe Perches j...@perches.com --- On Fri, 2013-06-28 at 09:02 +0100, Wei Liu wrote: Do you also need to replace other printk occurences in xen-netback directory, say, interface.c and xenbus.c? Well, I don't _need_ to but if you want it I think Wei just mean drivers/net/xen-blkback/{interface.c,xenbus.c} in addition to the netback.c you were patching in your previous patch. this is what I suggest. Wow ;-) drivers/xen/balloon.c | 6 +++-- drivers/xen/cpu_hotplug.c | 6 +++-- drivers/xen/events.c| 23 +- drivers/xen/evtchn.c| 6 +++-- drivers/xen/gntalloc.c | 6 +++-- drivers/xen/gntdev.c| 8 --- drivers/xen/grant-table.c | 17 +++--- drivers/xen/manage.c| 23 +- drivers/xen/mcelog.c| 36 +++-- drivers/xen/pcpu.c | 12 +- drivers/xen/privcmd.c | 4 +++- drivers/xen/swiotlb-xen.c | 12 ++ drivers/xen/tmem.c | 10 drivers/xen/xen-acpi-cpuhotplug.c | 2 ++ drivers/xen/xen-acpi-memhotplug.c | 2 ++ drivers/xen/xen-acpi-pad.c | 2 ++ drivers/xen/xen-acpi-processor.c| 25 ++-- drivers/xen/xen-balloon.c | 6 +++-- drivers/xen/xen-pciback/conf_space_header.c | 16 ++--- drivers/xen/xen-pciback/pci_stub.c | 25 +--- drivers/xen/xen-pciback/pciback_ops.c | 9 +--- drivers/xen/xen-pciback/vpci.c | 10 drivers/xen/xen-pciback/xenbus.c| 8 --- drivers/xen/xen-selfballoon.c | 11 - drivers/xen/xenbus/xenbus_comms.c | 13 ++- drivers/xen/xenbus/xenbus_dev_backend.c | 4 +++- drivers/xen/xenbus/xenbus_dev_frontend.c| 4 +++- drivers/xen/xenbus/xenbus_probe.c | 30 +++- drivers/xen/xenbus/xenbus_probe_backend.c | 8 --- drivers/xen/xenbus/xenbus_probe_frontend.c | 35 ++-- drivers/xen/xenbus/xenbus_xs.c | 22 -- drivers/xen/xencomm.c | 2 ++ drivers/xen/xenfs/super.c | 4 +++- include/xen/hvm.h | 4 ++-- 34 files changed, 215 insertions(+), 196 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH] arch/x86/xen: remove depends on CONFIG_EXPERIMENTAL
On Sat, 2013-02-23 at 20:47 +, Stefano Stabellini wrote: On Sat, 23 Feb 2013, Konrad Rzeszutek Wilk wrote: On Sat, Feb 23, 2013 at 09:03:20AM -0800, Kees Cook wrote: On Sat, Feb 23, 2013 at 3:59 AM, Dongsheng Song dongsheng.s...@gmail.com wrote: On Sat, Feb 23, 2013 at 3:29 PM, Kees Cook keesc...@chromium.org wrote: The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any depends on lines in Kconfigs. Signed-off-by: Kees Cook keesc...@chromium.org Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Mukesh Rathor mukesh.rat...@oracle.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/xen/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 93ff4e1..8cada4c 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -53,7 +53,7 @@ config XEN_DEBUG_FS config XEN_X86_PVH bool Support for running as a PVH guest (EXPERIMENTAL) Why not remove this 'EXPERIMENTAL' too ? It was unclear to me if the feature was actually considered unstable. I can resend with the text removed from the title too, if that's the correct action here? It certainly is unstable right now (which is why it was unstaged from the v3.9 train). I hope that by v3.10 it won't be - at which point this patch (and the EXPERIMENTAL) makes sense. So could you respin it please with the text removed as well - and I will queue it up in the branch that carries the PVH feature? We also have the same flag on Xen ARM, and the reason is that the ABI is not stable yet. As soon as it is (I think soon now), I'll send a patch to remove EXPERIMENTAL from there too. In the meantime if the depends EXPERIMENTAL is going away perhaps we should explain the EXPERIMENTAL in the title: 8 From bc22bd0f7b20296c449a05d82be950922042bc92 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Thu, 4 Oct 2012 09:12:51 +0100 Subject: [PATCH] arm: xen: explain the EXPERIMENTAL dependency in the Kconfig help Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Russell King li...@arm.linux.org.uk Cc: linux-arm-ker...@lists.infradead.org --- arch/arm/Kconfig |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 67874b8..ef14873 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1865,6 +1865,14 @@ config XEN help Say Y if you want to run Linux in a Virtual Machine on Xen on ARM. + + This option is EXPERIMENTAL because the hypervisor + interfaces which it uses are not yet considered stable + therefore backwards and forwards compatibility is not yet + guaranteed. + + If unsure, say N. + endmenu menu Boot options -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On Fri, 2013-01-04 at 19:11 +, Konrad Rzeszutek Wilk wrote: On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote: On Fri, Jan 04, 2013 at 02:41:17PM +, Jan Beulich wrote: On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote: On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote: /sbin/kexec can load the Xen crash kernel itself by issuing hypercalls using /dev/xen/privcmd. This would remove the need for the dom0 kernel to distinguish between loading a crash kernel for itself and loading a kernel for Xen. Or is this just a silly idea complicating the matter? This is impossible with current Xen kexec/kdump interface. Why? Because current KEXEC_CMD_kexec_load does not load kernel image and other things into Xen memory. It means that it should live somewhere in dom0 Linux kernel memory. We could have a very simple hypercall which would have: struct fancy_new_hypercall { xen_pfn_t payload; // IN This would have to be XEN_GUEST_HANDLE(something) since userspace cannot figure out what pfns back its memory. In any case since the hypervisor is going to want to copy the data into the crashkernel space a virtual address is convenient to have. ssize_t len; // IN #define DATA (11) #define DATA_EOF (12) #define DATA_KERNEL (13) #define DATA_RAMDISK (14) unsigned int flags; // IN unsigned int status; // OUT }; which would in a loop just iterate over the payloads and let the hypervisor stick it in the crashkernel space. This is all hand-waving of course. There probably would be a need to figure out how much space you have in the reserved Xen's 'crashkernel' memory region too. This is probably a mad idea but it's Monday morning and I'm sleep deprived so I'll throw it out there... What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow dom0 to map the kexec memory space with the usual privcmd mmap hypercalls and build things in it directly. OK, I suspect this might not be practical for a variety of reasons (lack of a p2m for such domains so no way to find out the list of mfns, dom0 userspace simply doesn't have sufficient context to write sensible things here, etc) but maybe someone has a better head on today... Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On Mon, 2013-01-07 at 10:46 +, Andrew Cooper wrote: Given that /sbin/kexec creates a binary blob in memory, surely the most simple thing is to get it to suitably mlock() the region and give a list of VAs to the hypervisor. More than likely. The DOMID_KEXEC thing was just a radon musing ;-) This way, Xen can properly take care of what it does with information and where. For example, at the moment, allowing dom0 to choose where gets overwritten in the Xen crash area is a recipe for disaster if a crash occurs midway through loading/reloading the crash kernel. That's true. I think there is a double buffering scheme in the current thing and we should preserve that in any new implementation. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On Mon, 2013-01-07 at 12:34 +, Daniel Kiper wrote: I think that new kexec hypercall function should mimics kexec syscall. We want to have an interface can be used by non-Linux domains (both dom0 and domU) as well though, so please bear this in mind. Historically we've not always been good at this when the hypercall interface is strongly tied to a particular guest implementation (in some sense this is the problem with the current kexec hypercall). Also what makes for a good syscall interface does not necessarily make for a good hypercall interface. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On Fri, 2013-01-04 at 14:22 +, Daniel Kiper wrote: On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote: On 27/12/12 18:02, Eric W. Biederman wrote: Andrew Cooperandrew.coop...@citrix.com writes: On 27/12/2012 07:53, Eric W. Biederman wrote: The syscall ABI still has the wrong semantics. Aka totally unmaintainable and umergeable. The concept of domU support is also strange. What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over. There are two requirements pulling at this patch series, but I agree that we need to clarify them. It probably make sense to split them apart a little even. Thinking about this split, there might be a way to simply it even more. /sbin/kexec can load the Xen crash kernel itself by issuing hypercalls using /dev/xen/privcmd. This would remove the need for the dom0 kernel to distinguish between loading a crash kernel for itself and loading a kernel for Xen. Or is this just a silly idea complicating the matter? This is impossible with current Xen kexec/kdump interface. It should be changed to do that. However, I suppose that Xen community would not be interested in such changes. The current HYPERVISOR_kexec interface is pretty fricken bad (it basically hardcodes the Linux Circa-2.6.18 internal interface!). I'd be all for a new HYPERVISOR_kexec (with the old gaining a _compat suffix) which implements something more generic that isn't tied to a particular dom0 kernel implementation (be it differing versions of Linux or e.g. *BSD). If that enables /sbin/kexec to load the kernel directly then so much the better, assuming the /sbin/kexec maintainers are happy with that approach. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
On Fri, 2012-12-28 at 03:16 +, Eric W. Biederman wrote: Hasn't 32bit dom0 been retired? The 32 bit hypervisor has been but 32 bit (PAE) guests (which includes dom0) are still supported on top of a 64 bit hypervisor. There are no plans to remove that support. Ian ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On Thu, 2012-12-27 at 14:18 +, Andrew Cooper wrote: Many cloud customers and service providers want the ability for a VM administrator to be able to load a kdump/kexec kernel within a domain[1]. This allows the VM administrator to take more proactive steps to isolate the cause of a crash, the state of which is most likely discarded while tearing down the domain. The result being that as far as Xen is concerned, the domain is still alive, while the kdump kernel/environment can work its usual magic. I am not aware of any feature like this existing in the past. I have a feeling that some versions of the classic-Xen port supported domU kexec as well. Certainly there was some work on that back in 2005, although I can't see much evidence that that attempt ever went anywhere so maybe I'm imagining things. It's possible that I'm confusing domU kexec support with support for domU kexec in some dom0 kernels. That was/is used to support kexec from a PV bootloader into the real kernel (which looks to the host a lot like a domU kexec would). Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v2 01/11] kexec: introduce kexec_ops struct
On Fri, 2012-11-23 at 10:37 +, Daniel Kiper wrote: On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote: On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote: The crash region (as specified by crashkernel= on the Xen command line) is isolated from dom0. [...] But all of this _could_ be done completely independent of the Dom0 kernel's kexec infrastructure (i.e. fully from user space, invoking the necessary hypercalls through the privcmd driver). No, this is impossible. kexec/kdump image lives in dom0 kernel memory until execution. Are you sure? I could have sworn they lived in the hypervisor owned memory set aside by the crashkernel= parameter as Andy suggested. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v2 01/11] kexec: introduce kexec_ops struct
On Fri, 2012-11-23 at 09:56 +, Jan Beulich wrote: On 22.11.12 at 18:37, H. Peter Anvin h...@zytor.com wrote: I actually talked to Ian Jackson at LCE, and mentioned among other That was me actually (this happens surprisingly often ;-)). things the bogosity of requiring a PUD page for three-level paging in Linux -- a bogosity which has spread from Xen into native. It's a page wasted for no good reason, since it only contains 32 bytes worth of data, *inherently*. Furthermore, contrary to popular belief, it is *not* pa page table per se. Ian told me: I didn't know we did that, and we shouldn't have to. Here we have suffered this overhead for at least six years, ... Even the Xen kernel only needs the full page when running on a 64-bit hypervisor (now that we don't have a 32-bit hypervisor anymore, that of course basically means always). I took an, admittedly very brief, look at it on the plane on the way home and it seems like the requirement for a complete page on the pvops-xen side comes from the !SHARED_KERNEL_PMD stuff (so still a Xen related thing). This requires a struct page for the list_head it contains (see pgd_list_add et al) rather than because of the use of the page as a pgd as such. But yes, I too never liked this enforced over-allocation for native kernels (and was surprised that it was allowed in at all). Completely agreed. I did wonder if just doing something like: - pgd = (pgd_t *)__get_free_page(PGALLOC_GFP); + if (SHARED_KERNEL_PMD) + pgd = some_appropriate_allocation_primitive(sizeof(*pgd)); + else + pgd = (pgd_t *)__get_free_page(PGALLOC_GFP); to pgd_alloc (+ the equivalent for the error path free case, create helper funcs as desired etc) would be sufficient to remove the over allocation for the native case but haven't had time to properly investigate. Alternatively push the allocation down into paravirt_pgd_alloc to taste :-/ Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH 09/14] xen: events: Remove redundant check on unsigned variable
On Mon, 2012-11-19 at 03:52 +, Tushar Behera wrote: On 11/16/2012 10:23 PM, Jeremy Fitzhardinge wrote: To be honest I'd nack this kind of patch. The test is only redundant in the most trivial sense that the compiler can easily optimise away. The point of the test is to make sure that the range is OK even if the type subsequently becomes signed (to hold a -ve error, for example). J The check is on the function argument which is unsigned, so checking ' 0' doesn't make sense. We should force signed check only if the argument is of signed type. In any case, even if irq has been assigned some error value, that would be caught by the check irq = nr_irqs. Jeremy is (I think) arguing that this check is not redundant because someone might change the type of the argument to be signed and until then the compiler can trivially optimise the check away, so what's the harm in it? I'm somewhat inclined to agree with him. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: memory corruption in HYPERVISOR_physdev_op()
On Fri, 2012-09-14 at 14:24 +0300, Dan Carpenter wrote: Hi Jeremy, Jeremy doesn't work on Xen much any more. Adding Konrad and the xen-devel@ list. My static analyzer complains about potential memory corruption in HYPERVISOR_physdev_op() arch/x86/include/asm/xen/hypercall.h 389 static inline int 390 HYPERVISOR_physdev_op(int cmd, void *arg) 391 { 392 int rc = _hypercall2(int, physdev_op, cmd, arg); 393 if (unlikely(rc == -ENOSYS)) { 394 struct physdev_op op; 395 op.cmd = cmd; 396 memcpy(op.u, arg, sizeof(op.u)); 397 rc = _hypercall1(int, physdev_op_compat, op); 398 memcpy(arg, op.u, sizeof(op.u)); ^ Some of the arg buffers are not as large as sizeof(op.u) which is either 12 or 16 depending on the size of longs in struct physdev_apic. Nasty! 399 } 400 return rc; 401 } One example of this is in xen_initdom_restore_msi_irqs(). arch/x86/pci/xen.c 337 struct physdev_pci_device restore_ext; 338 339 restore_ext.seg = pci_domain_nr(dev-bus); 340 restore_ext.bus = dev-bus-number; 341 restore_ext.devfn = dev-devfn; 342 ret = HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi_ext, 343 restore_ext); There are only 4 bytes here. 344 if (ret == -ENOSYS) ^^ If we hit this condition, we have corrupted some memory. I can see the memory corruption but how does it relate to ret == -ENOSYS? 345 pci_seg_supported = false; regards, dan carpenter ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- Ian Campbell Current Noise: Therapy? - Femtex Riffle West Virginia is so small that the Boy Scout had to double as the town drunk. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: potential integer overflow in xenbus_file_write()
On Thu, 2012-09-13 at 19:00 +0300, Dan Carpenter wrote: Hi, Thanks Dan. I'm not sure anyone from Xen-land really monitors virtualization@. Adding xen-devel and Konrad. I was reading some code and had a question in xenbus_file_write() drivers/xen/xenbus/xenbus_dev_frontend.c 461 if ((len + u-len) sizeof(u-u.buffer)) { Can this addition overflow? len is a size_t and u-len is an unsigned int, so I expect so. Should the test be something like: if (len sizeof(u-u.buffer) || len + u-len sizeof(u-u.buffer)) { I think that would do it. Ian. 462 /* On error, dump existing buffer */ 463 u-len = 0; 464 rc = -EINVAL; 465 goto out; 466 } 467 468 ret = copy_from_user(u-u.buffer + u-len, ubuf, len); 469 regards, dan carpenter ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] memory corruption in HYPERVISOR_physdev_op()
On Mon, 2012-10-15 at 11:48 +0100, Jan Beulich wrote: On 15.10.12 at 12:27, Ian Campbell ian.campb...@citrix.com wrote: On Fri, 2012-09-14 at 14:24 +0300, Dan Carpenter wrote: My static analyzer complains about potential memory corruption in HYPERVISOR_physdev_op() arch/x86/include/asm/xen/hypercall.h 389 static inline int 390 HYPERVISOR_physdev_op(int cmd, void *arg) 391 { 392 int rc = _hypercall2(int, physdev_op, cmd, arg); 393 if (unlikely(rc == -ENOSYS)) { 394 struct physdev_op op; 395 op.cmd = cmd; 396 memcpy(op.u, arg, sizeof(op.u)); 397 rc = _hypercall1(int, physdev_op_compat, op); 398 memcpy(arg, op.u, sizeof(op.u)); ^ Some of the arg buffers are not as large as sizeof(op.u) which is either 12 or 16 depending on the size of longs in struct physdev_apic. Nasty! Wasn't it that pv-ops expects Xen 4.0.1 or newer anyway? If so, what does this code exist for in the first place (it's framed by #if CONFIG_XEN_COMPAT = 0x030002 in the Xenified kernel)? I think the 4.0.1 or newer requirement is for dom0 only. I guess physdev op is only used in dom0 though? Or does passthrough need it? 399 } 400 return rc; 401 } One example of this is in xen_initdom_restore_msi_irqs(). arch/x86/pci/xen.c 337 struct physdev_pci_device restore_ext; 338 339 restore_ext.seg = pci_domain_nr(dev-bus); 340 restore_ext.bus = dev-bus-number; 341 restore_ext.devfn = dev-devfn; 342 ret = HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi_ext, 343 restore_ext); There are only 4 bytes here. 344 if (ret == -ENOSYS) ^^ If we hit this condition, we have corrupted some memory. I can see the memory corruption but how does it relate to ret == -ENOSYS? The (supposedly) corrupting code site inside an if (unlikely(rc == -ENOSYS)) { Ah, for some reason I assumed this was in the eventual caller, even though it was staring me right in the face in the full quote. Supposedly because as long as the argument passed to the function is in memory accessed by the local CPU only and doesn't overlap with storage used for rc (e.g. living in a register), there's no corruption possible afaict - the second memcpy() would just copy back what the first one obtained from there. Fixing this other than by removing the broken code would be pretty hard I'm afraid (and I tend to leave the code untouched altogether in the Xenified tree). Given that it is compat code the list of subops which needs to supported in this case is small and finite so a simple lookup table or even switch stmt for the size might be an option. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH] xen: do not disable netfront in dom0
On Tue, 2012-05-22 at 20:13 +0100, David Miller wrote: From: Marek Marczykowski marma...@invisiblethingslab.com Date: Sun, 20 May 2012 13:45:10 +0200 Netfront driver can be also useful in dom0, eg when all NICs are assigned to some domU (aka driver domain). Then using netback in domU and netfront in dom0 is the only way to get network access in dom0. Signed-off-by: Marek Marczykowski marma...@invisiblethingslab.com Someone please review this and I can merge it in via the 'net' tree if it looks OK to XEN folks. Konrad is Xen folks and has acked it already but FWIW: Acked-by: Ian Campbell ian.campb...@citrix.com Ian. ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv2] x86info: dump kvm cpuid's
On Tue, 2012-05-01 at 16:04 +0300, Gleb Natapov wrote: BTW, according to arch/x86/include/asm/kvm_para.h unsurprisingly KVM has a signature too 'KVMKVMKVM'. cpu-stepping = eax 0xf; cpu-model = (eax 4) 0xf; cpu-family = (eax 8) 0xf; @@ -29,6 +29,19 @@ void get_cpu_info_basics(struct cpudata *cpu) cpuid(cpu-number, 0xC000, maxei, NULL, NULL, NULL); cpu-maxei2 = maxei; + if (ecx 0x8000) { + cpuid(cpu-number, 0x4000, maxhv, NULL, NULL, NULL); + /* + * KVM up to linux 3.4 reports 0 as the max hypervisor leaf, + * where it really means 0x4001. This is something where I definitely think you want to check the signature first. In theory yes, but in practice what will this break? I've got no idea -- but what's the harm in checking? Ian. -- Ian Campbell Current Noise: Hypocrisy - Roswell 47 Angels we have heard on High Tell us to go out and Buy. -- Tom Lehrer ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCHv2] x86info: dump kvm cpuid's
On Wed, 2012-05-02 at 10:50 +0100, Michael S. Tsirkin wrote: On Wed, May 02, 2012 at 10:45:27AM +0100, Ian Campbell wrote: On Tue, 2012-05-01 at 16:04 +0300, Gleb Natapov wrote: BTW, according to arch/x86/include/asm/kvm_para.h unsurprisingly KVM has a signature too 'KVMKVMKVM'. cpu-stepping = eax 0xf; cpu-model = (eax 4) 0xf; cpu-family = (eax 8) 0xf; @@ -29,6 +29,19 @@ void get_cpu_info_basics(struct cpudata *cpu) cpuid(cpu-number, 0xC000, maxei, NULL, NULL, NULL); cpu-maxei2 = maxei; + if (ecx 0x8000) { + cpuid(cpu-number, 0x4000, maxhv, NULL, NULL, NULL); + /* + * KVM up to linux 3.4 reports 0 as the max hypervisor leaf, + * where it really means 0x4001. This is something where I definitely think you want to check the signature first. In theory yes, but in practice what will this break? I've got no idea -- but what's the harm in checking? Ian. Users can set kvm signature to anything, if they do debugging will be a bit harder for them. Ah, right, someone already mentioned that and I forgot, sorry. And, just to complete my train of thought, cpuid just returns reserved values for requests for non-existent leaves (rather than #GP for example) so it's safe enough even if you do end up trying to read an eax=0x4001 when it doesn't exist. Seems fine to me then. Ian. -- Ian Campbell Current Noise: Hypocrisy - Buried He's like a function -- he returns a value, in the form of his opinion. It's up to you to cast it into a void or not. -- Phil Lapsley ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] x86info: dump kvm cpuid's
On Mon, 2012-04-30 at 10:38 +0100, Michael S. Tsirkin wrote: On Mon, Apr 30, 2012 at 11:43:19AM +0300, Gleb Natapov wrote: On Sun, Apr 29, 2012 at 01:10:21PM +0300, Michael S. Tsirkin wrote: The following makes 'x86info -r' dump kvm cpu ids (signature+features) when running in a vm. On the guest we see the signature and the features: eax in: 0x4000, eax = ebx = 4b4d564b ecx = 564b4d56 edx = 004d eax in: 0x4001, eax = 017b ebx = ecx = edx = On the host it just adds a couple of zero lines: eax in: 0x4000, eax = ebx = ecx = edx = eax in: 0x4001, eax = ebx = ecx = edx = This is too KVM specific. That's what I have. I scratch my own itch. Other hypervisors may use more cpuid leafs. But not less so no harm's done. As far as I see Hyper-V uses 5 and use cpuid.0x4000.eax as max cpuid leaf available. Haven't checked Xen or VMWare. Xen does the same, documentation in the Xen public interfaces header: http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,cpuid.h.html. If compat mode for another h/v is enabled then those leaves will appear at 0x4000 and Xen's will be bumped up, so a fully Xen aware set of drivers (or detection routine, etc) should check at 0x100 intervals until 0x4001 for the appropriate signatures (I realise that the docs are somewhat lacking in this regard, I should cook up a patch). Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] x86info: dump kvm cpuid's
On Tue, 2012-05-01 at 11:50 +0100, Michael S. Tsirkin wrote: On Tue, May 01, 2012 at 11:29:04AM +0100, Ian Campbell wrote: On Mon, 2012-04-30 at 10:38 +0100, Michael S. Tsirkin wrote: On Mon, Apr 30, 2012 at 11:43:19AM +0300, Gleb Natapov wrote: On Sun, Apr 29, 2012 at 01:10:21PM +0300, Michael S. Tsirkin wrote: The following makes 'x86info -r' dump kvm cpu ids (signature+features) when running in a vm. On the guest we see the signature and the features: eax in: 0x4000, eax = ebx = 4b4d564b ecx = 564b4d56 edx = 004d eax in: 0x4001, eax = 017b ebx = ecx = edx = On the host it just adds a couple of zero lines: eax in: 0x4000, eax = ebx = ecx = edx = eax in: 0x4001, eax = ebx = ecx = edx = This is too KVM specific. That's what I have. I scratch my own itch. Other hypervisors may use more cpuid leafs. But not less so no harm's done. As far as I see Hyper-V uses 5 and use cpuid.0x4000.eax as max cpuid leaf available. Haven't checked Xen or VMWare. Xen does the same, documentation in the Xen public interfaces header: http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,cpuid.h.html. So ack to my patch? I didn't see the patch, where should I be looking? If compat mode for another h/v is enabled then those leaves will appear at 0x4000 and Xen's will be bumped up, so a fully Xen aware set of drivers (or detection routine, etc) should check at 0x100 intervals until 0x4001 for the appropriate signatures (I realise that the docs are somewhat lacking in this regard, I should cook up a patch). Ian. How does guest know that the data at 0x4100 makes sense? http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,cpuid.h.html EBX, ECX and EDX contain a signature XenVMMXenVMM. I'm fairly certain that hyperv has it's own magic number here. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv2] x86info: dump kvm cpuid's
On Mon, 2012-04-30 at 17:38 +0300, Michael S. Tsirkin wrote: The following makes 'x86info -r' dump hypervisor leaf cpu ids (for kvm this is signature+features) when running in a vm. On the guest we see the signature and the features: eax in: 0x4000, eax = ebx = 4b4d564b ecx = 564b4d56 edx = 004d eax in: 0x4001, eax = 017b ebx = ecx = edx = Hypervisor flag is checked to avoid output changes when not running on a VM. Signed-off-by: Michael S. Tsirkin m...@redhat.com Changes from v1: Make work on non KVM hypervisors (only KVM was tested). Avi Kivity said kvm will in the future report max HV leaf in eax. For now it reports eax = 0 so add a work around for that. --- diff --git a/identify.c b/identify.c index 33f35de..a4a3763 100644 --- a/identify.c +++ b/identify.c @@ -9,8 +9,8 @@ void get_cpu_info_basics(struct cpudata *cpu) { - unsigned int maxi, maxei, vendor, address_bits; - unsigned int eax; + unsigned int maxi, maxei, maxhv, vendor, address_bits; + unsigned int eax, ebx, ecx; cpuid(cpu-number, 0, maxi, vendor, NULL, NULL); maxi = 0x; /* The high-order word is non-zero on some Cyrix CPUs */ @@ -19,7 +19,7 @@ void get_cpu_info_basics(struct cpudata *cpu) return; /* Everything that supports cpuid supports these. */ - cpuid(cpu-number, 1, eax, NULL, NULL, NULL); + cpuid(cpu-number, 1, eax, ebx, ecx, NULL); You probably want to check ebx, ecx, edx for the signatures of the hypervisor's you are willing to support and which you know do something sane with eax? Also it would be something worth reporting in its own right? BTW, according to arch/x86/include/asm/kvm_para.h unsurprisingly KVM has a signature too 'KVMKVMKVM'. cpu-stepping = eax 0xf; cpu-model = (eax 4) 0xf; cpu-family = (eax 8) 0xf; @@ -29,6 +29,19 @@ void get_cpu_info_basics(struct cpudata *cpu) cpuid(cpu-number, 0xC000, maxei, NULL, NULL, NULL); cpu-maxei2 = maxei; + if (ecx 0x8000) { + cpuid(cpu-number, 0x4000, maxhv, NULL, NULL, NULL); + /* + * KVM up to linux 3.4 reports 0 as the max hypervisor leaf, + * where it really means 0x4001. This is something where I definitely think you want to check the signature first. Ian. + * Most (all?) hypervisors have at least one CPUID besides + * the vendor ID so assume that. + */ + cpu-maxhv = maxhv ? maxhv : 0x4001; + } else { + /* Suppress hypervisor cpuid unless running on a hypervisor */ + cpu-maxhv = 0; + } cpuid(cpu-number, 0x8008,address_bits, NULL, NULL, NULL); cpu-phyaddr_bits = address_bits 0xFF; diff --git a/x86info.c b/x86info.c index 22c4734..80cae36 100644 --- a/x86info.c +++ b/x86info.c @@ -44,6 +44,10 @@ static void display_detailed_info(struct cpudata *cpu) if (cpu-maxei2 =0xC000) dump_raw_cpuid(cpu-number, 0xC000, cpu-maxei2); + + if (cpu-maxhv = 0x4000) + dump_raw_cpuid(cpu-number, 0x4000, cpu-maxhv); + } if (show_cacheinfo) { diff --git a/x86info.h b/x86info.h index 7d2a455..c4f5d81 100644 --- a/x86info.h +++ b/x86info.h @@ -84,7 +84,7 @@ struct cpudata { unsigned int cachesize_trace; unsigned int phyaddr_bits; unsigned int viraddr_bits; - unsigned int cpuid_level, maxei, maxei2; + unsigned int cpuid_level, maxei, maxei2, maxhv; char name[CPU_NAME_LEN]; enum connector connector; unsigned int flags_ecx; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- Ian Campbell Your own qualities will help prevent your advancement in the world. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH RFC V7 0/12] Paravirtualized ticketlocks
On Thu, 2012-04-19 at 21:12 +0100, Raghavendra K T wrote: From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com This series replaces the existing paravirtualized spinlock mechanism with a paravirtualized ticketlock mechanism. (targeted for 3.5 window) Which tree is this series going through, tip.git I guess? I don't see it there. Ian. Changes in V7: - Reabsed patches to 3.4-rc3 - Added jumplabel split patch (originally from Andrew Jones rebased to 3.4-rc3 - jumplabel changes from Ingo and Jason taken and now using static_key_* instead of static_branch. - using UNINLINE_SPIN_UNLOCK (which was splitted as per suggestion from Linus) - This patch series is rebased on debugfs patch (that sould be already in Xen/linux-next https://lkml.org/lkml/2012/3/23/51) Ticket locks have an inherent problem in a virtualized case, because the vCPUs are scheduled rather than running concurrently (ignoring gang scheduled vCPUs). This can result in catastrophic performance collapses when the vCPU scheduler doesn't schedule the correct next vCPU, and ends up scheduling a vCPU which burns its entire timeslice spinning. (Note that this is not the same problem as lock-holder preemption, which this series also addresses; that's also a problem, but not catastrophic). (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) Currently we deal with this by having PV spinlocks, which adds a layer of indirection in front of all the spinlock functions, and defining a completely new implementation for Xen (and for other pvops users, but there are none at present). PV ticketlocks keeps the existing ticketlock implemenentation (fastpath) as-is, but adds a couple of pvops for the slow paths: - If a CPU has been waiting for a spinlock for SPIN_THRESHOLD iterations, then call out to the __ticket_lock_spinning() pvop, which allows a backend to block the vCPU rather than spinning. This pvop can set the lock into slowpath state. - When releasing a lock, if it is in slowpath state, the call __ticket_unlock_kick() to kick the next vCPU in line awake. If the lock is no longer in contention, it also clears the slowpath flag. The slowpath state is stored in the LSB of the within the lock tail ticket. This has the effect of reducing the max number of CPUs by half (so, a small ticket can deal with 128 CPUs, and large ticket 32768). This series provides a Xen implementation, KVM implementation will be posted in next 2-3 days. Overall, it results in a large reduction in code, it makes the native and virtualized cases closer, and it removes a layer of indirection around all the spinlock functions. The fast path (taking an uncontended lock which isn't in slowpath state) is optimal, identical to the non-paravirtualized case. The inner part of ticket lock code becomes: inc = xadd(lock-tickets, inc); inc.tail = ~TICKET_SLOWPATH_FLAG; if (likely(inc.head == inc.tail)) goto out; for (;;) { unsigned count = SPIN_THRESHOLD; do { if (ACCESS_ONCE(lock-tickets.head) == inc.tail) goto out; cpu_relax(); } while (--count); __ticket_lock_spinning(lock, inc.tail); } out:barrier(); which results in: push %rbp mov%rsp,%rbp mov$0x200,%eax lock xadd %ax,(%rdi) movzbl %ah,%edx cmp%al,%dl jne1f # Slowpath if lock in contention pop%rbp retq ### SLOWPATH START 1: and$-2,%edx movzbl %dl,%esi 2: mov$0x800,%eax jmp4f 3: pause sub$0x1,%eax je 5f 4: movzbl (%rdi),%ecx cmp%cl,%dl jne3b pop%rbp retq 5: callq *__ticket_lock_spinning jmp2b ### SLOWPATH END with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where the fastpath case is straight through (taking the lock without contention), and the spin loop is out of line: push %rbp mov%rsp,%rbp mov$0x100,%eax lock xadd %ax,(%rdi) movzbl %ah,%edx cmp%al,%dl jne1f pop%rbp retq ### SLOWPATH START 1: pause movzbl (%rdi),%eax cmp%dl,%al jne1b pop%rbp retq ### SLOWPATH END The unlock code is complicated by the need to both add to the lock's head and fetch the slowpath flag from tail. This version of the patch uses a locked add to do this, followed by a test to see if the slowflag is set. The lock prefix acts as a full memory barrier, so we
Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks
On Mon, 2012-04-16 at 16:44 +0100, Konrad Rzeszutek Wilk wrote: On Sat, Mar 31, 2012 at 09:37:45AM +0530, Srivatsa Vaddagiri wrote: * Thomas Gleixner t...@linutronix.de [2012-03-31 00:07:58]: I know that Peter is going to go berserk on me, but if we are running a paravirt guest then it's simple to provide a mechanism which allows the host (aka hypervisor) to check that in the guest just by looking at some global state. So if a guest exits due to an external event it's easy to inspect the state of that guest and avoid to schedule away when it was interrupted in a spinlock held section. That guest/host shared state needs to be modified to indicate the guest to invoke an exit when the last nested lock has been released. I had attempted something like that long back: http://lkml.org/lkml/2010/6/3/4 The issue is with ticketlocks though. VCPUs could go into a spin w/o a lock being held by anybody. Say VCPUs 1-99 try to grab a lock in that order (on a host with one cpu). VCPU1 wins (after VCPU0 releases it) and releases the lock. VCPU1 is next eligible to take the lock. If that is not scheduled early enough by host, then remaining vcpus would keep spinning (even though lock is technically not held by anybody) w/o making forward progress. In that situation, what we really need is for the guest to hint to host scheduler to schedule VCPU1 early (via yield_to or something similar). The current pv-spinlock patches however does not track which vcpu is spinning at what head of the ticketlock. I suppose we can consider that optimization in future and see how much benefit it provides (over plain yield/sleep the way its done now). Right. I think Jeremy played around with this some time? 5/11 xen/pvticketlock: Xen implementation for PV ticket locks tracks which vcpus are waiting for a lock in cpumask_t waiting_cpus and tracks which lock each is waiting for in per-cpu lock_waiting. This is used in xen_unlock_kick to kick the right CPU. There's a loop over only the waiting cpus to figure out who to kick. Do you see any issues if we take in what we have today and address the finer-grained optimization as next step? I think that is the proper course - these patches show that on baremetal we don't incur performance regressions and in virtualization case we benefit greatly. Since these are the basic building blocks of a kernel - taking it slow and just adding this set of patches for v3.5 is a good idea - and then building on top of that for further refinement. - vatsa ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks
On Fri, 2012-03-30 at 23:07 +0100, Thomas Gleixner wrote: So if we need to fiddle with the scheduler and frankly that's the only way to get a real gain (the numbers, which are achieved by this patches, are not that impressive) then the question arises whether we should turn the whole thing around. It probably doesn't materially effect your core point (which seems valid to me) but it's worth pointing out that the numbers presented in this thread are AFAICT mostly focused on ensuring that that the impact of this infrastructure is acceptable on native rather than showing the benefits for virtualized workloads. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text
On Fri, 2012-01-06 at 09:26 +, Andrew Jones wrote: - Original Message - On Fri, 2012-01-06 at 08:57 +, Andrew Jones wrote: Describe dom0 support in the config menu and supply help text for it. This turns a non-user visible symbol into a user visible one. Previously if Xen was enabled and the other prerequisites were met you would get dom0 support automatically -- do we really want to change that? According to 6b0661a5e6fbf it was a deliberate decision to have it this way. I think it's a necessary evil in order to give users the ability to compile kernels without the support. I know it doesn't make much sense for most users, but... Who actually wants to do this though and why? Do you have a bug report requesting this change? Almost all of the things which dom0 needs (e.g. PCI device management etc) is also required by a domU with passthrough enabled so the savings are really very slight. We are talking less than 1k of code AFAICT, 319 bytes for arch/x86/xen/vga.o and 573 for drivers/xen/xenfs/xenstored.o plus whatever xen_register_gsi (a couple of dozen lines of code) adds to arch/x86/pci/xen.o. grep doesn't show CONFIG_XEN_DOM0 being used anywhere else. What savings do you see in practice from disabling just this symbol? We need to weigh up the size change against the complexity of asking the user yet another question, I'm not convinced the question is worth it on balance. BTW, you forgot a Signed-off-by and the appropriate CCs (please use MAINTAINERS or ./scripts/get-maintainer.pl). Sorry, I'll resend properly. I've added those CC's to this reply too. Ian. Drew Ian. --- arch/x86/xen/Kconfig |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 26c731a..88862d5 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -14,9 +14,14 @@ config XEN Xen hypervisor. config XEN_DOM0 - def_bool y + bool Xen Initial Domain (Dom0) support + default y depends on XEN PCI_XEN SWIOTLB_XEN depends on X86_LOCAL_APIC X86_IO_APIC ACPI PCI + help + This allows the kernel to be used for the initial Xen domain, + Domain0. This is a privileged guest that supplies backends + and is used to manage the other Xen domains. # Dummy symbol since people have come to rely on the PRIVILEGED_GUEST # name in tools. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
On Fri, 2012-01-06 at 16:39 +, Andrew Jones wrote: remove XEN_PRIVILEGED_GUEST as it's just an alias for XEN_DOM0. Hmm, this one is used by tools like update-grub to know when it is ok to create xen+kernel entries so I think it needs to stay, or we at least need to give lengthly warning to distros etc that it is going away. Perhaps you can just patch it out locally? Ian. I compile tested this on a latest pull using an F16 config. The compile succeeded and 'make oldconfig' only removed these two options as expected. CONFIG_XEN_DOM0=y CONFIG_XEN_PRIVILEGED_GUEST=y Signed-off-by: Andrew Jones drjo...@redhat.com --- arch/x86/include/asm/xen/pci.h | 21 + arch/x86/pci/xen.c |6 -- arch/x86/xen/Kconfig | 10 -- arch/x86/xen/Makefile |3 +-- arch/x86/xen/xen-ops.h |7 --- drivers/xen/Kconfig|3 ++- drivers/xen/Makefile |3 +-- drivers/xen/xenfs/Makefile |3 +-- include/xen/xen.h | 11 +++ 9 files changed, 9 insertions(+), 58 deletions(-) diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h index 968d57d..b423889 100644 --- a/arch/x86/include/asm/xen/pci.h +++ b/arch/x86/include/asm/xen/pci.h @@ -13,30 +13,11 @@ static inline int pci_xen_hvm_init(void) return -1; } #endif -#if defined(CONFIG_XEN_DOM0) + int __init pci_xen_initial_domain(void); int xen_find_device_domain_owner(struct pci_dev *dev); int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain); int xen_unregister_device_domain_owner(struct pci_dev *dev); -#else -static inline int __init pci_xen_initial_domain(void) -{ - return -1; -} -static inline int xen_find_device_domain_owner(struct pci_dev *dev) -{ - return -1; -} -static inline int xen_register_device_domain_owner(struct pci_dev *dev, -uint16_t domain) -{ - return -1; -} -static inline int xen_unregister_device_domain_owner(struct pci_dev *dev) -{ - return -1; -} -#endif #if defined(CONFIG_PCI_MSI) #if defined(CONFIG_PCI_XEN) diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 492ade8..e298726 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -108,7 +108,6 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, u32 gsi, false /* no mapping of GSI to PIRQ */); } -#ifdef CONFIG_XEN_DOM0 static int xen_register_gsi(u32 gsi, int gsi_override, int triggering, int polarity) { int rc, irq; @@ -143,7 +142,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi, return xen_register_gsi(gsi, -1 /* no GSI override */, trigger, polarity); } #endif -#endif #if defined(CONFIG_PCI_MSI) #include linux/msi.h @@ -251,7 +249,6 @@ error: return irq; } -#ifdef CONFIG_XEN_DOM0 static bool __read_mostly pci_seg_supported = true; static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) @@ -324,7 +321,6 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) out: return ret; } -#endif static void xen_teardown_msi_irqs(struct pci_dev *dev) { @@ -392,7 +388,6 @@ int __init pci_xen_hvm_init(void) return 0; } -#ifdef CONFIG_XEN_DOM0 static __init void xen_setup_acpi_sci(void) { int rc; @@ -539,4 +534,3 @@ int xen_unregister_device_domain_owner(struct pci_dev *dev) return 0; } EXPORT_SYMBOL_GPL(xen_unregister_device_domain_owner); -#endif diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 26c731a..3c7e89a 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -13,16 +13,6 @@ config XEN kernel to boot in a paravirtualized environment under the Xen hypervisor. -config XEN_DOM0 - def_bool y - depends on XEN PCI_XEN SWIOTLB_XEN - depends on X86_LOCAL_APIC X86_IO_APIC ACPI PCI - -# Dummy symbol since people have come to rely on the PRIVILEGED_GUEST -# name in tools. -config XEN_PRIVILEGED_GUEST - def_bool XEN_DOM0 - config XEN_PVHVM def_bool y depends on XEN PCI X86_LOCAL_APIC diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index add2c2d..b2d4c4b 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -13,12 +13,11 @@ CFLAGS_mmu.o := $(nostackp) obj-y:= enlighten.o setup.o multicalls.o mmu.o irq.o \ time.o xen-asm.o xen-asm_$(BITS).o \ grant-table.o suspend.o platform-pci-unplug.o \ - p2m.o + p2m.o vga.o obj-$(CONFIG_EVENT_TRACING) += trace.o obj-$(CONFIG_SMP)+= smp.o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o -obj-$(CONFIG_XEN_DOM0)
Re: [PATCH] XEN: xenbus: integer overflow in process_msg()
On Tue, 2012-01-03 at 19:42 +, Haogang Chen wrote: There is a potential integer overflow in process_msg() that could result in cross-domain attack. body = kmalloc(msg-hdr.len + 1, GFP_NOIO | __GFP_HIGH); When a malicious guest passes 0x in msg-hdr.len, the subsequent call to xb_read() would write to a zero-length buffer. The other end of this connection is always the xenstore backend daemon so there is no guest (malicious or otherwise) which can do this. The xenstore daemon is a trusted component in the system. However this seem like a reasonable robustness improvement so we should have it. This causes kernel oops in the receiving guest and hangs its xenbus kernel thread. The patch returns -EINVAL in that case. Signed-off-by: Haogang Chen haogangc...@gmail.com Acked-by: Ian Campbell ian.campb...@citrix.com --- drivers/xen/xenbus/xenbus_xs.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c index ede860f..e32aefb 100644 --- a/drivers/xen/xenbus/xenbus_xs.c +++ b/drivers/xen/xenbus/xenbus_xs.c @@ -801,6 +801,12 @@ static int process_msg(void) goto out; } + if (msg-hdr.len == UINT_MAX) { + kfree(msg); + err = -EINVAL; + goto out; + } + body = kmalloc(msg-hdr.len + 1, GFP_NOIO | __GFP_HIGH); if (body == NULL) { kfree(msg); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/2] xen: Miscelaneous xenbus cleanups
Just a couple of things I noticed while reviewing Haogang's patch. Applies on top of my suggested replacement for that patch (in 1325669689.25206.181.ca...@zakaz.uk.xensource.com). Not extensively tested but I did run it in dom0 and start both and HVM and PV guest. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/2] xenbus: maximum buffer size is XENSTORE_PAYLOAD_MAX
Use this now that it is defined even though it happens to be == PAGE_SIZE. The code which takes requests from userspace already validates against the size of this buffer so no further checks are required to ensure that userspace requests comply with the protocol in this respect. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Haogang Chen haogangc...@gmail.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Jeremy Fitzhardinge jer...@goop.org Cc: xen-de...@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org --- drivers/xen/xenbus/xenbus_dev_frontend.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c index fb30cff..1fe4324 100644 --- a/drivers/xen/xenbus/xenbus_dev_frontend.c +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c @@ -104,7 +104,7 @@ struct xenbus_file_priv { unsigned int len; union { struct xsd_sockmsg msg; - char buffer[PAGE_SIZE]; + char buffer[XENSTORE_PAYLOAD_MAX]; } u; /* Response queue. */ -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] xen/xenbus: don't reimplement kvasprintf via a fixed size buffer
Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Haogang Chen haogangc...@gmail.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Jeremy Fitzhardinge jer...@goop.org Cc: xen-de...@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org --- drivers/xen/xenbus/xenbus_xs.c | 17 +++-- 1 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c index 6f0121e..226d1ac 100644 --- a/drivers/xen/xenbus/xenbus_xs.c +++ b/drivers/xen/xenbus/xenbus_xs.c @@ -532,21 +532,18 @@ int xenbus_printf(struct xenbus_transaction t, { va_list ap; int ret; -#define PRINTF_BUFFER_SIZE 4096 - char *printf_buffer; - - printf_buffer = kmalloc(PRINTF_BUFFER_SIZE, GFP_NOIO | __GFP_HIGH); - if (printf_buffer == NULL) - return -ENOMEM; + char *buf; va_start(ap, fmt); - ret = vsnprintf(printf_buffer, PRINTF_BUFFER_SIZE, fmt, ap); + buf = kvasprintf(GFP_NOIO | __GFP_HIGH, fmt, ap); va_end(ap); - BUG_ON(ret PRINTF_BUFFER_SIZE-1); - ret = xenbus_write(t, dir, node, printf_buffer); + if (!buf) + return -ENOMEM; + + ret = xenbus_write(t, dir, node, buf); - kfree(printf_buffer); + kfree(buf); return ret; } -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 REPOST] xen-netfront: delay gARP until backend switches to Connected
On Fri, 2011-12-09 at 18:45 +, David Miller wrote: From: Laszlo Ersek ler...@redhat.com Date: Fri, 9 Dec 2011 12:38:58 +0100 These two together provide complete ordering. Sub-condition (1) is satisfied by pvops commit 43223efd9bfd. I don't see this commit in Linus's tree, The referenced commit is in git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#xen/next-2.6.32 some people call the pvops tree but there's no reason to expect someone outside the Xen world to know that... A better reference would have been 6b0b80ca7165 in git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history which is the precise branch that was flattened to make f942dc2552b8, which is the upstream commit that added netback, so this change is already in upstream. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote: On Wed, 30 Nov 2011, Arnd Bergmann wrote: KVM and Xen at least both fall into the single-return-value category, so we should be able to agree on a calling conventions. KVM does not have an hcall API on ARM yet, and I see no reason not to use the same implementation that you have in the Xen guest. Stefano, can you split out the generic parts of your asm/xen/hypercall.h file into a common asm/hypercall.h and submit it for review to the arm kernel list? Sure, I can do that. Usually the hypercall calling convention is very hypervisor specific, but if it turns out that we have the same requirements I happy to design a common interface. I expect the only real decision to be made is hypercall page vs. raw hvc instruction. The page was useful on x86 where there is a variety of instructions which could be used (at least for PV there was systenter/syscall/int, I think vmcall instruction differs between AMD and Intel also) and gives some additional flexibility. It's hard to predict but I don't think I'd expect that to be necessary on ARM. Another reason for having a hypercall page instead of a raw instruction might be wanting to support 32 bit guests (from ~today) on a 64 bit hypervisor in the future and perhaps needing to do some shimming/arg translation. It would be better to aim for having the interface just be 32/64 agnostic but mistakes do happen. Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
On Wed, 2011-11-30 at 18:15 +, Arnd Bergmann wrote: On Wednesday 30 November 2011, Ian Campbell wrote: On Wed, 2011-11-30 at 14:32 +, Arnd Bergmann wrote: On Wednesday 30 November 2011, Ian Campbell wrote: What I suggested to the KVM developers is to start out with the vexpress platform, but then generalize it to the point where it fits your needs. All hardware that one expects a guest to have (GIC, timer, ...) will still show up in the same location as on a real vexpress, while anything that makes no sense or is better paravirtualized (LCD, storage, ...) just becomes optional and has to be described in the device tree if it's actually there. That's along the lines of what I was thinking as well. The DT contains the address of GIC, timer etc as well right? So at least in principal we needn't provide e.g. the GIC at the same address as any real platform but in practice I expect we will. Yes. In principal we could also offer the user options as to which particular platform a guest looks like. At least when using a qemu based simulation. Most platforms have some characteristics that are not meaningful in a classic virtualization scenario, but it would certainly be helpful to use the virtualization extensions to run a kernel that was built for a particular platform faster than with pure qemu, when you want to test that kernel image. It has been suggested in the past that it would be nice to run the guest kernel built for the same platform as the host kernel by default, but I think it would be much better to have just one platform that we end up using for guests on any host platform, unless there is a strong reason to do otherwise. Yes, I agree, certainly that is what we were planning to target in the first instance. Doing this means that we can get away with minimal emulation of actual hardware, relying instead on PV drivers or hardware virtualisation features. Supporting specific board platforms as guests would be nice to have eventually. We would need to do more emulation (e.g. running qemu as a device model) for that case. There is also ongoing restructuring in the ARM Linux kernel to allow running the same kernel binary on multiple platforms. While there is still a lot of work to be done, you should assume that we will finish it before you see lots of users in production, there is no need to plan for the current one-kernel-per-board case. We were absolutely banking on targeting the results of this work, so that's good ;-) Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
On Thu, 2011-12-01 at 15:10 +, Catalin Marinas wrote: On Thu, Dec 01, 2011 at 10:26:37AM +, Ian Campbell wrote: On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote: On Wed, 30 Nov 2011, Arnd Bergmann wrote: KVM and Xen at least both fall into the single-return-value category, so we should be able to agree on a calling conventions. KVM does not have an hcall API on ARM yet, and I see no reason not to use the same implementation that you have in the Xen guest. Stefano, can you split out the generic parts of your asm/xen/hypercall.h file into a common asm/hypercall.h and submit it for review to the arm kernel list? Sure, I can do that. Usually the hypercall calling convention is very hypervisor specific, but if it turns out that we have the same requirements I happy to design a common interface. I expect the only real decision to be made is hypercall page vs. raw hvc instruction. The page was useful on x86 where there is a variety of instructions which could be used (at least for PV there was systenter/syscall/int, I think vmcall instruction differs between AMD and Intel also) and gives some additional flexibility. It's hard to predict but I don't think I'd expect that to be necessary on ARM. Another reason for having a hypercall page instead of a raw instruction might be wanting to support 32 bit guests (from ~today) on a 64 bit hypervisor in the future and perhaps needing to do some shimming/arg translation. It would be better to aim for having the interface just be 32/64 agnostic but mistakes do happen. Given the way register banking is done on AArch64, issuing an HVC on a 32-bit guest OS doesn't require translation on a 64-bit hypervisor. The issue I was thinking about was struct packing for arguments passed as pointers etc rather than the argument registers themselves. Since the preference appears to be for raw hvc we should just be careful that they are agnostic in these. Ian. We have a similar implementation at the SVC level (for 32-bit user apps on a 64-bit kernel), the only modification was where a 32-bit SVC takes a 64-bit parameter in two separate 32-bit registers, so packing needs to be done in a syscall wrapper. I'm not closely involved with any of the Xen or KVM work but I would vote for using HVC than a hypercall page. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
On Wed, 2011-11-30 at 13:03 +, Arnd Bergmann wrote: On Wednesday 30 November 2011, Stefano Stabellini wrote: On Tue, 29 Nov 2011, Arnd Bergmann wrote: On Tuesday 29 November 2011, Stefano Stabellini wrote: Do you have a pointer to the kernel sources for the Linux guest? We have very few changes to the Linux kernel at the moment (only 3 commits!), just enough to be able to issue hypercalls and start a PV console. A git branch is available here (not ready for submission): git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm Ok, interesting. There really isn't much of the platform support that I was expecting there. I finally found the information I was looking for in the xen construct_dom0() function: 167 regs-r0 = 0; /* SBZ */ 168 regs-r1 = 2272; /* Machine NR: Versatile Express */ 169 regs-r2 = 0xc100; /* ATAGS */ What this means is that you are emulating the current ARM/Keil reference board, at least to the degree that is necessary to get the guest started. This is the same choice people have made for KVM, but it's not necessarily the best option in the long run. In particular, this board has a lot of hardware that you claim to have by putting the machine number there, when you don't really want to emulate it. This code is actually setting up dom0 which (for the most part) sees the real hardware. The hardcoding of the platform is just a short term hack. Pawell Moll is working on a variant of the vexpress code that uses the flattened device tree to describe the present hardware [1], and I think that would be a much better target for an official release. Ideally, the hypervisor should provide the device tree binary (dtb) to the guest OS describing the hardware that is actually there. Agreed. Our intention was to use DT so this fits perfectly with our plans. For dom0 we would expose a (possibly filtered) version of the DT given to us by the firmware (e.g. we might hide a serial port to reserve it for Xen's use, we'd likely fiddle with the memory map etc). For domU the DT would presumably be constructed by the toolstack (in dom0 userspace) as appropriate for the guest configuration. I guess this needn't correspond to any particular real hardware platform. This would also be the place where you tell the guest that it should look for PV devices. I'm not familiar with how Xen announces PV devices to the guest on other architectures, but you have the choice between providing a full binding, i.e. a formal specification in device tree format for the guest to detect PV devices in the same way as physical or emulated devices, or just providing a single place in the device tree in which the guest detects the presence of a xen device bus and then uses hcalls to find the devices on that bus. On x86 there is an emulated PCI device which serves as the hooking point for the PV drivers. For ARM I don't think it would be unreasonable to have a DT entry instead. I think it would be fine just represent the root of the xenbus and further discovery would occur using the normal xenbus mechanisms (so not a full binding). AIUI for buses which are enumerable this is the preferred DT scheme to use. Another topic is the question whether there are any hcalls that we should try to standardize before we get another architecture with multiple conflicting hcall APIs as we have on x86 and powerpc. The hcall API we are currently targeting is the existing Xen API (at least the generic parts of it). These generally deal with fairly Xen specific concepts like grant tables etc. Ian. Arnd [1] http://www.spinics.net/lists/arm-kernel/msg149604.html ___ Xen-devel mailing list xen-de...@lists.xensource.com http://lists.xensource.com/xen-devel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
On Wed, 2011-11-30 at 14:32 +, Arnd Bergmann wrote: On Wednesday 30 November 2011, Ian Campbell wrote: On Wed, 2011-11-30 at 13:03 +, Arnd Bergmann wrote: For domU the DT would presumably be constructed by the toolstack (in dom0 userspace) as appropriate for the guest configuration. I guess this needn't correspond to any particular real hardware platform. Correct, but it needs to correspond to some platform that is supported by the guest OS, which leaves the choice between emulating a real hardware platform, adding a completely new platform specifically for virtual machines, or something in between the two. What I suggested to the KVM developers is to start out with the vexpress platform, but then generalize it to the point where it fits your needs. All hardware that one expects a guest to have (GIC, timer, ...) will still show up in the same location as on a real vexpress, while anything that makes no sense or is better paravirtualized (LCD, storage, ...) just becomes optional and has to be described in the device tree if it's actually there. That's along the lines of what I was thinking as well. The DT contains the address of GIC, timer etc as well right? So at least in principal we needn't provide e.g. the GIC at the same address as any real platform but in practice I expect we will. In principal we could also offer the user options as to which particular platform a guest looks like. This would also be the place where you tell the guest that it should look for PV devices. I'm not familiar with how Xen announces PV devices to the guest on other architectures, but you have the choice between providing a full binding, i.e. a formal specification in device tree format for the guest to detect PV devices in the same way as physical or emulated devices, or just providing a single place in the device tree in which the guest detects the presence of a xen device bus and then uses hcalls to find the devices on that bus. On x86 there is an emulated PCI device which serves as the hooking point for the PV drivers. For ARM I don't think it would be unreasonable to have a DT entry instead. I think it would be fine just represent the root of the xenbus and further discovery would occur using the normal xenbus mechanisms (so not a full binding). AIUI for buses which are enumerable this is the preferred DT scheme to use. In general that is the case, yes. One could argue that any software protocol between Xen and the guest is as good as any other, so it makes sense to use the device tree to describe all devices here. The counterargument to that is that Linux and other OSs already support Xenbus, so there is no need to come up with a new binding. Right. I don't care much either way, but I think it would be good to use similar solutions across all hypervisors. The two options that I've seen discussed for KVM were to use either a virtual PCI bus with individual virtio-pci devices as on the PC, or to use the new virtio-mmio driver and individually put virtio devices into the device tree. Another topic is the question whether there are any hcalls that we should try to standardize before we get another architecture with multiple conflicting hcall APIs as we have on x86 and powerpc. The hcall API we are currently targeting is the existing Xen API (at least the generic parts of it). These generally deal with fairly Xen specific concepts like grant tables etc. Ok. It would of course still be possible to agree on an argument passing convention so that we can share the macros used to issue the hcalls, even if the individual commands are all different. I think it likely that we can all agree on a common calling convention for N-argument hypercalls. It doubt there are that many useful choices with conflicting requirements yet strongly compelling advantages. I think I also remember talk about the need for a set of hypervisor independent calls that everyone should implement, but I can't remember what those were. I'd not heard of this, maybe I just wasn't looking the right way though. Maybe we can split the number space into a range of some generic and some vendor specific hcalls? Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 63/75] xen: netfront: convert to SKB paged frag API.
Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: xen-de...@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- drivers/net/xen-netfront.c | 28 +--- 1 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index d7c8a98..882a957 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -275,7 +275,7 @@ no_skb: break; } - skb_shinfo(skb)-frags[0].page = page; + skb_frag_set_page(skb, 0, page); skb_shinfo(skb)-nr_frags = 1; __skb_queue_tail(np-rx_batch, skb); } @@ -309,8 +309,8 @@ no_skb: BUG_ON((signed short)ref 0); np-grant_rx_ref[id] = ref; - pfn = page_to_pfn(skb_shinfo(skb)-frags[0].page); - vaddr = page_address(skb_shinfo(skb)-frags[0].page); + pfn = page_to_pfn(skb_frag_page(skb_shinfo(skb)-frags[0])); + vaddr = page_address(skb_frag_page(skb_shinfo(skb)-frags[0])); req = RING_GET_REQUEST(np-rx, req_prod + i); gnttab_grant_foreign_access_ref(ref, @@ -461,7 +461,7 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, ref = gnttab_claim_grant_reference(np-gref_tx_head); BUG_ON((signed short)ref 0); - mfn = pfn_to_mfn(page_to_pfn(frag-page)); + mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); gnttab_grant_foreign_access_ref(ref, np-xbdev-otherend_id, mfn, GNTMAP_readonly); @@ -768,8 +768,9 @@ static RING_IDX xennet_fill_frags(struct netfront_info *np, while ((nskb = __skb_dequeue(list))) { struct xen_netif_rx_response *rx = RING_GET_RESPONSE(np-rx, ++cons); + skb_frag_t *nfrag = skb_shinfo(nskb)-frags[0]; - frag-page = skb_shinfo(nskb)-frags[0].page; + __skb_frag_set_page(frag, skb_frag_page(nfrag)); frag-page_offset = rx-offset; frag-size = rx-status; @@ -873,7 +874,7 @@ static int handle_incoming_queue(struct net_device *dev, memcpy(skb-data, vaddr + offset, skb_headlen(skb)); - if (page != skb_shinfo(skb)-frags[0].page) + if (page != skb_frag_page(skb_shinfo(skb)-frags[0])) __free_page(page); /* Ethernet work: Delayed to here as it peeks the header. */ @@ -954,7 +955,8 @@ err: } } - NETFRONT_SKB_CB(skb)-page = skb_shinfo(skb)-frags[0].page; + NETFRONT_SKB_CB(skb)-page = + skb_frag_page(skb_shinfo(skb)-frags[0]); NETFRONT_SKB_CB(skb)-offset = rx-offset; len = rx-status; @@ -968,7 +970,7 @@ err: skb_shinfo(skb)-frags[0].size = rx-status - len; skb-data_len = rx-status - len; } else { - skb_shinfo(skb)-frags[0].page = NULL; + skb_frag_set_page(skb, 0, NULL); skb_shinfo(skb)-nr_frags = 0; } @@ -1143,7 +1145,8 @@ static void xennet_release_rx_bufs(struct netfront_info *np) if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* Remap the page. */ - struct page *page = skb_shinfo(skb)-frags[0].page; + const struct page *page = + skb_frag_page(skb_shinfo(skb)-frags[0]); unsigned long pfn = page_to_pfn(page); void *vaddr = page_address(page); @@ -1650,6 +1653,8 @@ static int xennet_connect(struct net_device *dev) /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ for (requeue_idx = 0, i = 0; i NET_RX_RING_SIZE; i++) { + skb_frag_t *frag; + const struct page *page; if (!np-rx_skbs[i]) continue; @@ -1657,10 +1662,11 @@ static int xennet_connect(struct net_device *dev) ref = np-grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); req = RING_GET_REQUEST(np-rx, requeue_idx); + frag = skb_shinfo(skb)-frags[0]; + page = skb_frag_page(frag); gnttab_grant_foreign_access_ref( ref, np-xbdev-otherend_id, - pfn_to_mfn(page_to_pfn(skb_shinfo(skb)- - frags-page)), + pfn_to_mfn
[PATCH 59/75] virtionet: convert to SKB paged frag API.
Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- drivers/net/virtio_net.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..52667a8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -149,7 +149,7 @@ static void set_skb_frag(struct sk_buff *skb, struct page *page, f = skb_shinfo(skb)-frags[i]; f-size = min((unsigned)PAGE_SIZE - offset, *len); f-page_offset = offset; - f-page = page; + __skb_frag_set_page(f, page); skb-data_len += f-size; skb-len += f-size; -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH 5/7] Xen: fix whitespaces, tabs coding style issue in drivers/xen/xenbus/xenbus_client.c
On Tue, 2011-07-26 at 12:17 -0400, Jeremy Fitzhardinge wrote: On 07/26/2011 04:16 AM, ruslanpisa...@gmail.com wrote: @@ -43,15 +43,15 @@ const char *xenbus_strstate(enum xenbus_state state) { static const char *const name[] = { - [ XenbusStateUnknown ] = Unknown, - [ XenbusStateInitialising ] = Initialising, - [ XenbusStateInitWait ] = InitWait, - [ XenbusStateInitialised ] = Initialised, - [ XenbusStateConnected] = Connected, - [ XenbusStateClosing ] = Closing, - [ XenbusStateClosed ] = Closed, - [XenbusStateReconfiguring] = Reconfiguring, - [XenbusStateReconfigured] = Reconfigured, + [XenbusStateUnknown] = Unknown, + [XenbusStateInitialising] = Initialising, + [XenbusStateInitWait] = InitWait, + [XenbusStateInitialised] = Initialised, + [XenbusStateConnected] =Connected, + [XenbusStateClosing] = Closing, + [XenbusStateClosed] = Closed, + [XenbusStateReconfiguring] =Reconfiguring, + [XenbusStateReconfigured] = Reconfigured, }; Eh, I think this looks worse now. Me too. If we're going to change this to anything I'd suggest #define N(x) [XenbusState#x] = ##x ... N(Connected), N(Closing), ... #undef N (modulo my never quite remembering the cpp stringification rules first time) Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [TOME] Re: [PATCH] Modpost section mismatch fix
is what acpi_gsi_to_irq ends up calling when starting the + * the ACPI interpreter and keels over since IRQ 9 has not been + * setup as we had setup IRQ 20 for it). + */ + /* Check whether the GSI != IRQ */ + if (acpi_gsi_to_irq(gsi, irq) == 0) { + if (irq = 0 irq != gsi) + /* Bugger, we MUST have that IRQ. */ + gsi_override = irq; + } + + gsi = xen_register_gsi(gsi, gsi_override, trigger, polarity); printk(KERN_INFO xen: acpi sci %d\n, gsi); return; @@ -450,7 +450,7 @@ static __init void xen_setup_acpi_sci(void) static int acpi_register_gsi_xen(struct device *dev, u32 gsi, int trigger, int polarity) { - return xen_register_gsi(gsi, trigger, polarity); + return xen_register_gsi(gsi, -1 /* no GSI override */, trigger, polarity); } static int __init pci_xen_initial_domain(void) @@ -489,7 +489,7 @@ void __init xen_setup_pirqs(void) if (acpi_get_override_irq(irq, trigger, polarity) == -1) continue; - xen_register_pirq(irq, + xen_register_pirq(irq, -1 /* no GSI override */, trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE); } } ___ Xen-devel mailing list xen-de...@lists.xensource.com http://lists.xensource.com/xen-devel -- Ian Campbell While having never invented a sin, I'm trying to perfect several. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] Modpost section mismatch fix
On Mon, 2011-07-04 at 04:55 +0530, Raghavendra D Prabhu wrote: [Sorry if duplicate, one earlier was corrupt] Hi, I got section mismatches reported by modpost in latest build. It got reported for xen_register_pirq and xen_unplug_emulated_devices functions. xen_register_pirq makes reference to acpi_sci_override_gsi in init.data section; marking xen_register_pirq with __init is not feasible since calls are made to it from acpi_register_gsi in non-init contexts. So marking it __refdata based on assumption that when acpi_sci_override_gsi is referenced, it is in early stages where it is alive. I don't think this assumption holds, since xen_register_pirq can be called at any time and basically unconditionally references acpi_sci_override_gsi. If we don't want to remove the __init from acpi_sci_override_gsi then perhaps xen_setup_acpi_sci needs to stash it somewhere? Or maybe xen_register_pirq could take an int force_irq which, if not -1, would force a particular IRQ. The callsite in xen_setup_acpi_sci (actually via xen_register_gsi so the param would need to be propagated there) would be the only actual user? The xen_unplug_emulated_devices change looks correct to me since xen_unplug_emulated_devices is called from xen_arch_hvm_post_suspend. Ian. -- Raghavendra Prabhu GPG Id : 0xD72BE977 Fingerprint: B93F EBCB 8E05 7039 CD3C A4B8 A616 DCA1 D72B E977 www: wnohang.net ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization -- Ian Campbell Current Noise: Crowbar - Remember Tomorrow (A Tribute To Iron Maiden) SANTA CLAUS comes down a FIRE ESCAPE wearing bright blue LEG WARMERS ... He scrubs the POPE with a mild soap or detergent for 15 minutes, starring JANE FONDA!! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH 1/2] xen: Populate xenbus device attributes
On Fri, 2011-06-24 at 22:51 +0100, Bastian Blank wrote: The xenbus bus type uses device_create_file to assign all used device attributes. However it does not remove them when the device goes away. Doesn't the cleanup happen automatically in the device core when the device goes away? Either way this is a good cleanup in its own right. This patch uses the dev_attrs field of the bus type to specify default attributes for all devices. Signed-off-by: Bastian Blank wa...@debian.org Acked-by: Ian Campbell ian.campb...@citrix.com Thanks Bastian. Ian. --- drivers/xen/xenbus/xenbus_probe.c | 41 +-- drivers/xen/xenbus/xenbus_probe.h |2 + drivers/xen/xenbus/xenbus_probe_backend.c |6 +--- drivers/xen/xenbus/xenbus_probe_frontend.c |6 +--- 4 files changed, 18 insertions(+), 37 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 7397695..2ed0b04 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -378,26 +378,31 @@ static void xenbus_dev_release(struct device *dev) kfree(to_xenbus_device(dev)); } -static ssize_t xendev_show_nodename(struct device *dev, - struct device_attribute *attr, char *buf) +static ssize_t nodename_show(struct device *dev, + struct device_attribute *attr, char *buf) { return sprintf(buf, %s\n, to_xenbus_device(dev)-nodename); } -static DEVICE_ATTR(nodename, S_IRUSR | S_IRGRP | S_IROTH, xendev_show_nodename, NULL); -static ssize_t xendev_show_devtype(struct device *dev, -struct device_attribute *attr, char *buf) +static ssize_t devtype_show(struct device *dev, + struct device_attribute *attr, char *buf) { return sprintf(buf, %s\n, to_xenbus_device(dev)-devicetype); } -static DEVICE_ATTR(devtype, S_IRUSR | S_IRGRP | S_IROTH, xendev_show_devtype, NULL); -static ssize_t xendev_show_modalias(struct device *dev, - struct device_attribute *attr, char *buf) +static ssize_t modalias_show(struct device *dev, + struct device_attribute *attr, char *buf) { return sprintf(buf, xen:%s\n, to_xenbus_device(dev)-devicetype); } -static DEVICE_ATTR(modalias, S_IRUSR | S_IRGRP | S_IROTH, xendev_show_modalias, NULL); + +struct device_attribute xenbus_dev_attrs[] = { + __ATTR_RO(nodename), + __ATTR_RO(devtype), + __ATTR_RO(modalias), + __ATTR_NULL +}; +EXPORT_SYMBOL_GPL(xenbus_dev_attrs); int xenbus_probe_node(struct xen_bus_type *bus, const char *type, @@ -449,25 +454,7 @@ int xenbus_probe_node(struct xen_bus_type *bus, if (err) goto fail; - err = device_create_file(xendev-dev, dev_attr_nodename); - if (err) - goto fail_unregister; - - err = device_create_file(xendev-dev, dev_attr_devtype); - if (err) - goto fail_remove_nodename; - - err = device_create_file(xendev-dev, dev_attr_modalias); - if (err) - goto fail_remove_devtype; - return 0; -fail_remove_devtype: - device_remove_file(xendev-dev, dev_attr_devtype); -fail_remove_nodename: - device_remove_file(xendev-dev, dev_attr_nodename); -fail_unregister: - device_unregister(xendev-dev); fail: kfree(xendev); return err; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index 888b990..b814935 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -48,6 +48,8 @@ struct xen_bus_type struct bus_type bus; }; +extern struct device_attribute xenbus_dev_attrs[]; + extern int xenbus_match(struct device *_dev, struct device_driver *_drv); extern int xenbus_dev_probe(struct device *_dev); extern int xenbus_dev_remove(struct device *_dev); diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index 6cf467b..ec510e5 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -183,10 +183,6 @@ static void frontend_changed(struct xenbus_watch *watch, xenbus_otherend_changed(watch, vec, len, 0); } -static struct device_attribute xenbus_backend_dev_attrs[] = { - __ATTR_NULL -}; - static struct xen_bus_type xenbus_backend = { .root = backend, .levels = 3,/* backend/type/frontend/id */ @@ -200,7 +196,7 @@ static struct xen_bus_type xenbus_backend = { .probe = xenbus_dev_probe, .remove = xenbus_dev_remove, .shutdown = xenbus_dev_shutdown, - .dev_attrs = xenbus_backend_dev_attrs, + .dev_attrs = xenbus_dev_attrs, }, }; diff --git
Re: [Xen-devel] [PATCH 2/2] xen: Add alias to autoload backend drivers
On Fri, 2011-06-24 at 22:51 +0100, Bastian Blank wrote: All the Xen backend drivers are assigned to a special bus type xen-backend. This allows userspace to load the modules on request. This patch defines xen-backend:* aliases on the modules and exports this names through modalias and uevent. Excellent, this was a big missing piece of functionality for distros. Thanks! Signed-off-by: Bastian Blank wa...@debian.org Acked-by: Ian Campbell ian.campb...@citrix.com --- drivers/block/xen-blkback/blkback.c |1 + drivers/net/xen-netback/netback.c |1 + drivers/xen/xenbus/xenbus_probe.c |3 ++- drivers/xen/xenbus/xenbus_probe_backend.c |3 +++ 4 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 5cf2993..ed62008 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -824,3 +824,4 @@ static int __init xen_blkif_init(void) module_init(xen_blkif_init); MODULE_LICENSE(Dual BSD/GPL); +MODULE_ALIAS(xen-backend:vbd); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 0e4851b..fd00f25 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1743,3 +1743,4 @@ failed_init: module_init(netback_init); MODULE_LICENSE(Dual BSD/GPL); +MODULE_ALIAS(xen-backend:vif); diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 2ed0b04..bd2f90c 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -393,7 +393,8 @@ static ssize_t devtype_show(struct device *dev, static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sprintf(buf, xen:%s\n, to_xenbus_device(dev)-devicetype); + return sprintf(buf, %s:%s\n, dev-bus-name, +to_xenbus_device(dev)-devicetype); } struct device_attribute xenbus_dev_attrs[] = { diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index ec510e5..60adf91 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -107,6 +107,9 @@ static int xenbus_uevent_backend(struct device *dev, if (xdev == NULL) return -ENODEV; + if (add_uevent_var(env, MODALIAS=xen-backend:%s, xdev-devicetype)) + return -ENOMEM; + /* stuff we want to pass to /sbin/hotplug */ if (add_uevent_var(env, XENBUS_TYPE=%s, xdev-devicetype)) return -ENOMEM; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS
On Fri, 2011-04-08 at 11:24 -0700, Jeremy Fitzhardinge wrote: On 04/08/2011 08:42 AM, Jan Beulich wrote: On 08.04.11 at 17:25, Jeremy Fitzhardinge jer...@goop.org wrote: On 04/07/2011 11:38 PM, Ian Campbell wrote: Is there any downside to this patch (is X86_CMPXCHG in the same sort of boat?) Only if we don't use cmpxchg in shared memory with other domains or the hypervisor. (I don't think it will dynamically switch between real and emulated cmpxchg depending on availability.) We do use cmpxchg in the grant table code at least (actually, sync_cmpxchng in that case). Actually it does - see the #ifndef CONFIG_X86_CMPXCHG section in asm/cmpxchg_32.h. Hm, OK. Still, I'm happiest with that dependency in case someone knobbles the cpu to exclude cmpxchg and breaks things. Dropping the TSC patch is sensible though? Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: drop anti-dependency on X86_VISWS
(dropping netdev and the visws list) On Thu, 2011-04-07 at 11:07 -0700, Jeremy Fitzhardinge wrote: On 04/06/2011 11:58 PM, Ian Campbell wrote: On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote: From: Ian Campbell ian.campb...@eu.citrix.com Date: Mon, 4 Apr 2011 10:55:55 +0100 You mean the !X86_VISWS I presume? It doesn't make sense to me either. No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC dependency. TSC is a real dependency of the Xen interfaces. Not really. The TSC register is a requirement, but that's going to be present on any CPU which can boot Xen. We don't need any of the kernel's TSC machinery though. So why the Kconfig dependency then? In principal a kernel compiled for a non-TSC processor (which meets the other requirements for Xen, such as PAE support) will run just fine under Xen on a newer piece of hardware. Is there any downside to this patch (is X86_CMPXCHG in the same sort of boat?) 8-- From 7204945696a927d281366f2a57baee37e2b43ca3 Mon Sep 17 00:00:00 2001 From: Ian Campbell i...@hellion.org.uk Date: Fri, 8 Apr 2011 07:33:21 +0100 Subject: [PATCH] xen: remove Kconfig dependency on X86_TSC The TSC register is a requirement when running under Xen, but that's going to be present on any CPU which can boot Xen. We don't need any of the kernel's TSC machinery, since the usage is contained within the Xen interfaces, and therefore XEN does not need to depend on CONFIG_X86_TSC. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- arch/x86/xen/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 1c7121b..ac69c5b 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -7,7 +7,7 @@ config XEN select PARAVIRT select PARAVIRT_CLOCK depends on X86_64 || (X86_32 X86_PAE !X86_VISWS) - depends on X86_CMPXCHG X86_TSC + depends on X86_CMPXCHG help This is the Linux Xen port. Enabling this will allow the kernel to boot in a paravirtualized environment under the -- 1.7.4.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: drop anti-dependency on X86_VISWS
(dropping netdev and visws list) On Thu, 2011-04-07 at 18:00 +0100, H. Peter Anvin wrote: On 04/06/2011 11:58 PM, Ian Campbell wrote: I'm not sure why ELAN belongs in the EXTENDED_PLATFORM option space rather than in the CPU choice option, since its only impact seems to be on -march, MODULE_PROC_FAMILY and some cpufreq drivers which doesn't sound like an extended platform to me but does it appear to be deliberate (see 9e111f3e167a x86: move ELAN to the NON_STANDARD_PLATFORM section, that was the old name for EXTENDED_PLATFORM). Historic... we used to have nonstandard A20M# handling on Elan, until it was discovered that we could make it work without it. Any reason not switch it over at this point then? 8-- From b1942fa168aee77537bf467e4c68c6f181b8fdee Mon Sep 17 00:00:00 2001 From: Ian Campbell i...@hellion.org.uk Date: Fri, 8 Apr 2011 07:42:29 +0100 Subject: [PATCH] x86: move AMD Elan Kconfig under Processor family Currently the option resides under X86_EXTENDED_PLATFORM due to historical nonstandard A20M# handling. However that is no longer the case and so Elan can be treated as part of the standard processor choice Kconfig option. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: H. Peter Anvin h...@zytor.com --- arch/x86/Kconfig| 11 --- arch/x86/Kconfig.cpu| 16 ++-- arch/x86/Makefile_32.cpu|2 +- arch/x86/include/asm/module.h |2 +- arch/x86/kernel/cpu/cpufreq/Kconfig |4 ++-- 5 files changed, 14 insertions(+), 21 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cc6c53a..f00a3f3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -365,17 +365,6 @@ config X86_UV # Following is an alphabetically sorted list of 32 bit extended platforms # Please maintain the alphabetic order if and when there are additions -config X86_ELAN - bool AMD Elan - depends on X86_32 - depends on X86_EXTENDED_PLATFORM - ---help--- - Select this for an AMD Elan processor. - - Do not use this option for K6/Athlon/Opteron processors! - - If unsure, choose PC-compatible instead. - config X86_INTEL_CE bool CE4100 TV platform depends on PCI diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index d161e93..6a7cfdf 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -1,6 +1,4 @@ # Put here option for CPU selection and depending optimization -if !X86_ELAN - choice prompt Processor family default M686 if X86_32 @@ -203,6 +201,14 @@ config MWINCHIP3D stores for this CPU, which can increase performance of some operations. +config MELAN + bool AMD Elan + depends on X86_32 + ---help--- + Select this for an AMD Elan processor. + + Do not use this option for K6/Athlon/Opteron processors! + config MGEODEGX1 bool GeodeGX1 depends on X86_32 @@ -292,8 +298,6 @@ config X86_GENERIC This is really intended for distributors who need more generic optimizations. -endif - # # Define implied options from the CPU selection here config X86_INTERNODE_CACHE_SHIFT @@ -312,7 +316,7 @@ config X86_L1_CACHE_SHIFT int default 7 if MPENTIUM4 || MPSC default 6 if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU - default 4 if X86_ELAN || M486 || M386 || MGEODEGX1 + default 4 if MELAN || M486 || M386 || MGEODEGX1 default 5 if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX config X86_XADD @@ -358,7 +362,7 @@ config X86_POPAD_OK config X86_ALIGNMENT_16 def_bool y - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || X86_ELAN || MK6 || M586MMX || M586TSC || M586 || M486 || MVIAC3_2 || MGEODEGX1 + depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MELAN || MK6 || M586MMX || M586TSC || M586 || M486 || MVIAC3_2 || MGEODEGX1 config X86_INTEL_USERCOPY def_bool y diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu index f2ee1ab..86cee7b 100644 --- a/arch/x86/Makefile_32.cpu +++ b/arch/x86/Makefile_32.cpu @@ -37,7 +37,7 @@ cflags-$(CONFIG_MATOM)+= $(call cc-option,-march=atom,$(call cc-option,-march= $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic)) # AMD Elan support -cflags-$(CONFIG_X86_ELAN) += -march=i486 +cflags-$(CONFIG_MELAN) += -march=i486 # Geode GX1 support cflags-$(CONFIG_MGEODEGX1) += -march=pentium-mmx diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h index 67763c5..9eae775 100644 --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -35,7 +35,7 @@ #define MODULE_PROC_FAMILY K7 #elif defined CONFIG_MK8 #define MODULE_PROC_FAMILY K8 -#elif
Re: [PATCH] xen: drop anti-dependency on X86_VISWS
On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote: From: Ian Campbell ian.campb...@eu.citrix.com Date: Mon, 4 Apr 2011 10:55:55 +0100 You mean the !X86_VISWS I presume? It doesn't make sense to me either. No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC dependency. TSC is a real dependency of the Xen interfaces. And, well, you could type make allmodconfig on your tree and see for yourself instead of asking me :-) True. X86_TSC not being enabled appears to due to CONFIG_ELAN being enabled which causes the processor selection option (which defaults to M686, which is a sane choice and enables TSC etc) to be gated at the top level in arch/x86/Kconfig.cpu. Disabling the ELAN option then leaves X86_TSC gated on !CONFIG_NUMAQ but removing that results in a generally useful looking config. It's a shame that these sorts of minority options cause allmodconfig to omit support for more interesting configurations, such as modern processors. Other than negating the semantics of such options I'm not really sure what can be done about it though. On the other hand compiling all the unusual stuff in an allmodconfig is probably a positive thing. I'm not sure why ELAN belongs in the EXTENDED_PLATFORM option space rather than in the CPU choice option, since its only impact seems to be on -march, MODULE_PROC_FAMILY and some cpufreq drivers which doesn't sound like an extended platform to me but does it appear to be deliberate (see 9e111f3e167a x86: move ELAN to the NON_STANDARD_PLATFORM section, that was the old name for EXTENDED_PLATFORM). Hrm, what about the following? (doesn't actually make a difference to Xen since allmodconfig chooses HIGHMEM4G instead of HIGHMEM64G in the ! NUMAQ case but I stopped worrying about that several paragraphs ago) 8 x86: invert X86_EXTENDED_PLATFORM to X86_STANDARD_PLATFORM Having the =y choice be the more standard configuration causes all*config to provide greater coverage of usual configurations. Signed-off-by: Ian Campbell ian.campb...@citrix.com diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cc6c53a..6d8a404 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -299,15 +299,15 @@ config X86_BIGSMP This option is needed for the systems that have more than 8 CPUs if X86_32 -config X86_EXTENDED_PLATFORM - bool Support for extended (non-PC) x86 platforms +config X86_STANDARD_PLATFORM + bool Restrict support to standard (PC) x86 platforms default y ---help--- - If you disable this option then the kernel will only support + If you enable this option then the kernel will only support standard PC platforms. (which covers the vast majority of systems out there.) - If you enable this option then you'll be able to select support + If you disable this option then you'll be able to select support for the following (non-PC) 32 bit x86 platforms: AMD Elan NUMAQ (IBM/Sequent) @@ -318,25 +318,25 @@ config X86_EXTENDED_PLATFORM Moorestown MID devices If you have one of these systems, or if you want to build a - generic distribution kernel, say Y here - otherwise say N. + generic distribution kernel, say N here - otherwise say Y. endif if X86_64 -config X86_EXTENDED_PLATFORM - bool Support for extended (non-PC) x86 platforms +config X86_STANDARD_PLATFORM + bool Restrict support to standard (PC) x86 platforms default y ---help--- - If you disable this option then the kernel will only support + If you enable this option then the kernel will only support standard PC platforms. (which covers the vast majority of systems out there.) - If you enable this option then you'll be able to select support + If you disable this option then you'll be able to select support for the following (non-PC) 64 bit x86 platforms: ScaleMP vSMP SGI Ultraviolet If you have one of these systems, or if you want to build a - generic distribution kernel, say Y here - otherwise say N. + generic distribution kernel, say N here - otherwise say Y. endif # This is an alphabetically sorted list of 64 bit extended platforms # Please maintain the alphabetic order if and when there are additions @@ -346,7 +346,7 @@ config X86_VSMP select PARAVIRT_GUEST select PARAVIRT depends on X86_64 PCI - depends on X86_EXTENDED_PLATFORM + depends on !X86_STANDARD_PLATFORM ---help--- Support for ScaleMP vSMP systems. Say 'Y' here if this kernel is supposed to run on these EM64T-based machines. Only choose this option @@ -355,7 +355,7 @@ config X86_VSMP config X86_UV bool SGI Ultraviolet depends on X86_64 - depends on X86_EXTENDED_PLATFORM + depends
Re: Signed bit field; int have_hotplug_status_watch:1
On Sun, 2011-04-03 at 22:32 +0100, Dr. David Alan Gilbert wrote: Hi Ian, I've been going through some sparse scans of the kernel and it threw up: CHECK drivers/net/xen-netback/xenbus.c drivers/net/xen-netback/xenbus.c:29:40: error: dubious one-bit signed bitfield int have_hotplug_status_watch:1; from your patch f942dc2552b8bfdee607be867b12a8971bb9cd85 It does look like that should be an unsigned (given it's assigned 0 and 1) I agree. 8-- From 38fdb7199a0c3c5eb18ec27d2380e21116c97e29 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Mon, 4 Apr 2011 09:18:35 +0100 Subject: [PATCH] xen: netback: use unsigned type for one-bit bitfield. Fixes error from sparse: CHECK drivers/net/xen-netback/xenbus.c drivers/net/xen-netback/xenbus.c:29:40: error: dubious one-bit signed bitfield int have_hotplug_status_watch:1; Reported-by: Dr. David Alan Gilbert li...@treblig.org Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: net...@vger.kernel.org Cc: xen-de...@lists.xensource.com --- drivers/net/xen-netback/xenbus.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 22b8c35..1ce729d 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -26,7 +26,7 @@ struct backend_info { struct xenvif *vif; enum xenbus_state frontend_state; struct xenbus_watch hotplug_status_watch; - int have_hotplug_status_watch:1; + u8 have_hotplug_status_watch:1; }; static int connect_rings(struct backend_info *); -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH] xen: drop anti-dependency on X86_VISWS (Was: Re: [PATCH] xen: netfront: fix declaration order)
On Mon, 2011-04-04 at 01:24 +0100, David Miller wrote: From: Eric Dumazet eric.duma...@gmail.com Date: Sun, 03 Apr 2011 13:07:19 +0200 [PATCH] xen: netfront: fix declaration order Must declare xennet_fix_features() and xennet_set_features() before using them. Signed-off-by: Eric Dumazet eric.duma...@gmail.com Cc: Michał Mirosław mirq-li...@rere.qmqm.pl Ugh, it makes no sense that XEN won't make it into the x86_32 allmodconfig build. Those dependencies in arch/x86/xen/Kconfig are terrible. You mean the !X86_VISWS I presume? It doesn't make sense to me either. Or at least I'm not sure why this single X86_32_NON_STANDARD machine is more special than the others to require an anti-dependency like this. It seems to have originally appeared from f0f32fccbffa on CONFIG_PARAVIRT due to a conflict around ARCH_SETUP() and subsequently got pushed down to CONFIG_XEN. However ARCH_SETUP doesn't exist any more and I think the subarch stuff has been much improved since then so there should be no conflict any more. I dropped the dependency and, with a bit of fiddling, was able to build a kernel with both CONFIG_X86_VISWS and CONFIG_XEN which booted as a Xen domU. tglx, Andrey, to get VISWS to build I had to comment out some code in arch/x86/platform/visws/visws_quirks.c which seems to have been missed during some irq_chip update or something? CC arch/x86/platform/visws/visws_quirks.o arch/x86/platform/visws/visws_quirks.c: In function 'startup_piix4_master_irq': arch/x86/platform/visws/visws_quirks.c:474: warning: no return statement in function returning non-void arch/x86/platform/visws/visws_quirks.c: At top level: arch/x86/platform/visws/visws_quirks.c:495: error: unknown field 'mask' specified in initializer arch/x86/platform/visws/visws_quirks.c:495: warning: initialization from incompatible pointer type arch/x86/platform/visws/visws_quirks.c: In function 'set_piix4_virtual_irq_type': arch/x86/platform/visws/visws_quirks.c:583: error: 'struct irq_chip' has no member named 'enable' arch/x86/platform/visws/visws_quirks.c:583: error: 'struct irq_chip' has no member named 'unmask' arch/x86/platform/visws/visws_quirks.c:584: error: 'struct irq_chip' has no member named 'disable' arch/x86/platform/visws/visws_quirks.c:584: error: 'struct irq_chip' has no member named 'mask' arch/x86/platform/visws/visws_quirks.c:585: error: 'struct irq_chip' has no member named 'unmask' arch/x86/platform/visws/visws_quirks.c:585: error: 'struct irq_chip' has no member named 'unmask' arch/x86/platform/visws/visws_quirks.c: In function 'visws_pre_intr_init': arch/x86/platform/visws/visws_quirks.c:602: error: expected expression before '' token make[4]: *** [arch/x86/platform/visws/visws_quirks.o] Error 1 Ian 8 From db0ae26f479306ee8ebcfe2a08aa56a6dfe63987 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Mon, 4 Apr 2011 10:27:47 +0100 Subject: [PATCH] xen: drop anti-dependency on X86_VISWS This seems to have been added in f0f32fccbffa to avoid a conflict arising from the long deceased ARCH_SETUP() macro and subsequently pushed down to the XEN option. As far as I can tell the conflict is no longer present and by dropping the dependency I was able to build a kernel which has both CONFIG_XEN and CONFIG_X86_VISWS enabled and boot it on Xen. I didn't try it on the VISWS platform. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Jeremy Fitzhardinge jer...@goop.org Cc: konrad.w...@oracle.com Cc: xen-de...@lists.xensource.com Cc: Randy Dunlap randy.dun...@oracle.com Cc: Andrey Panin pa...@donpac.ru Cc: linux-visws-de...@lists.sf.net Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: x...@kernel.org --- arch/x86/xen/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 1c7121b..65d7b13 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -6,7 +6,7 @@ config XEN bool Xen guest support select PARAVIRT select PARAVIRT_CLOCK - depends on X86_64 || (X86_32 X86_PAE !X86_VISWS) + depends on X86_64 || (X86_32 X86_PAE) depends on X86_CMPXCHG X86_TSC help This is the Linux Xen port. Enabling this will allow the -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH RESEND] net: convert xen-netfront to hw_features
On Sat, 2011-04-02 at 04:54 +0100, David Miller wrote: From: Michał Mirosław mirq-li...@rere.qmqm.pl Date: Thu, 31 Mar 2011 13:01:35 +0200 (CEST) Not tested in any way. The original code for offload setting seems broken as it resets the features on every netback reconnect. This will set GSO_ROBUST at device creation time (earlier than connect time). RX checksum offload is forced on - so advertise as it is. Signed-off-by: Michał Mirosław mirq-li...@rere.qmqm.pl Applied. Thanks, but unfortunately the patch results in the features all being disabled by default, since they are not set in the initial dev-features and the initial dev-wanted_features is based on features hw_features. The ndo_fix_features hook only clears features and doesn't add new features (nor should it AFAICT). Features cannot be negotiated with the backend until xennet_connect(). The carrier is not enabled until the end of that function, therefore I think it is safe to start with a full set of features in dev-features and rely on the call to netdev_update_features() in xennet_connect() to clear those which turn out to be unavailable. The following works for me, I guess the alternative is for xennet_connect() to expand dev-features based on what it detects? Or is there a mechanism for a driver to inform the core that a new hardware feature has become available (I doubt that really happens on physical h/w so I guess not). Ian. 8- From 0b56469abe56efae415b4603ef508ce9aec0e4c1 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Mon, 4 Apr 2011 10:58:50 +0100 Subject: [PATCH] xen: netfront: assume all hw features are available until backend connection setup We need to assume that all features will be available when registering the netdev otherwise they are ommitted from the initial set of dev-wanted_features. When we connect to the backed we reduce the set as necessary due to the call to netdev_update_features() in xennet_connect(). Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: mirq-li...@rere.qmqm.pl Cc: net...@vger.kernel.org net...@vger.kernel.org Cc: Jeremy Fitzhardinge jer...@goop.org Cc: konrad.w...@oracle.com Cc: Eric Dumazet eric.duma...@gmail.com Cc: xen-de...@lists.xensource.com --- drivers/net/xen-netfront.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 0cfe4cc..db9a763 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1251,6 +1251,14 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev NETIF_F_GSO_ROBUST; netdev-hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO; + /* + * Assume that all hw features are available for now. This set + * will be adjusted by the call to netdev_update_features() in + * xennet_connect() which is the earliest point where we can + * negotiate with the backend regarding supported features. + */ + netdev-features |= netdev-hw_features; + SET_ETHTOOL_OPS(netdev, xennet_ethtool_ops); SET_NETDEV_DEV(netdev, dev-dev); -- 1.7.2.5 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH RESEND] net: convert xen-netfront to hw_features
On Thu, 2011-03-31 at 12:01 +0100, Michał Mirosław wrote: Not tested in any way. The original code for offload setting seems broken as it resets the features on every netback reconnect. Thanks, I've got a pending TODO item to test this and propagate similar changes to netback. I hope to get to it soon... Is this urgent (for 2.6.39) IYHO? I think it's been broken this way for a long time now... Ian. This will set GSO_ROBUST at device creation time (earlier than connect time). RX checksum offload is forced on - so advertise as it is. Signed-off-by: Michał Mirosław mirq-li...@rere.qmqm.pl --- [I don't know Xen code enough to say this is correct. There is Xen netback driver coming in, that has similar changes to be made. Please match them up if you can.] drivers/net/xen-netfront.c | 57 +-- 1 files changed, 23 insertions(+), 34 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 5c8d9c3..2a71c9f 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1148,6 +1148,8 @@ static const struct net_device_ops xennet_netdev_ops = { .ndo_change_mtu = xennet_change_mtu, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, + .ndo_fix_features= xennet_fix_features, + .ndo_set_features= xennet_set_features, }; static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev) @@ -1209,7 +1211,9 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev netdev-netdev_ops = xennet_netdev_ops; netif_napi_add(netdev, np-napi, xennet_poll, 64); - netdev-features= NETIF_F_IP_CSUM; + netdev-features= NETIF_F_IP_CSUM | NETIF_F_RXCSUM | + NETIF_F_GSO_ROBUST; + netdev-hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO; SET_ETHTOOL_OPS(netdev, xennet_ethtool_ops); SET_NETDEV_DEV(netdev, dev-dev); @@ -1510,52 +1514,40 @@ again: return err; } -static int xennet_set_sg(struct net_device *dev, u32 data) +static u32 xennet_fix_features(struct net_device *dev, u32 features) { - if (data) { - struct netfront_info *np = netdev_priv(dev); - int val; + struct netfront_info *np = netdev_priv(dev); + int val; + if (features NETIF_F_SG) { if (xenbus_scanf(XBT_NIL, np-xbdev-otherend, feature-sg, %d, val) 0) val = 0; + if (!val) - return -ENOSYS; - } else if (dev-mtu ETH_DATA_LEN) - dev-mtu = ETH_DATA_LEN; - - return ethtool_op_set_sg(dev, data); -} - -static int xennet_set_tso(struct net_device *dev, u32 data) -{ - if (data) { - struct netfront_info *np = netdev_priv(dev); - int val; + features = ~NETIF_F_SG; + } + if (features NETIF_F_TSO) { if (xenbus_scanf(XBT_NIL, np-xbdev-otherend, feature-gso-tcpv4, %d, val) 0) val = 0; + if (!val) - return -ENOSYS; + features = ~NETIF_F_TSO; } - return ethtool_op_set_tso(dev, data); + return features; } -static void xennet_set_features(struct net_device *dev) +static int xennet_set_features(struct net_device *dev, u32 features) { - /* Turn off all GSO bits except ROBUST. */ - dev-features = ~NETIF_F_GSO_MASK; - dev-features |= NETIF_F_GSO_ROBUST; - xennet_set_sg(dev, 0); + if (!(features NETIF_F_SG) dev-mtu ETH_DATA_LEN) { + netdev_info(dev, Reducing MTU because no SG offload); + dev-mtu = ETH_DATA_LEN; + } - /* We need checksum offload to enable scatter/gather and TSO. */ - if (!(dev-features NETIF_F_IP_CSUM)) - return; - - if (!xennet_set_sg(dev, 1)) - xennet_set_tso(dev, 1); + return 0; } static int xennet_connect(struct net_device *dev) @@ -1582,7 +1574,7 @@ static int xennet_connect(struct net_device *dev) if (err) return err; - xennet_set_features(dev); + netdev_update_features(dev); spin_lock_bh(np-rx_lock); spin_lock_irq(np-tx_lock); @@ -1710,9 +1702,6 @@ static void xennet_get_strings(struct net_device *dev, u32 stringset, u8 * data) static const struct ethtool_ops xennet_ethtool_ops = { - .set_tx_csum = ethtool_op_set_tx_csum, - .set_sg = xennet_set_sg, - .set_tso = xennet_set_tso, .get_link = ethtool_op_get_link, .get_sset_count = xennet_get_sset_count, ___ Virtualization mailing list Virtualization@lists.linux-foundation.org
Re: [Xen-devel] Re: [PATCH] x86/pvclock-xen: zero last_value on resume
On Wed, 2010-10-27 at 13:59 -0700, H. Peter Anvin wrote: I'll check it this evening when I'm at a working network again :( Did this get applied? It seems to affect 2.6.32.x too (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=602273) so can we tag it for stable as well? Thanks, Ian. Jeremy Fitzhardinge jer...@goop.org wrote: On 10/26/2010 10:48 AM, Glauber Costa wrote: On Tue, 2010-10-26 at 09:59 -0700, Jeremy Fitzhardinge wrote: If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. [ I don't know if kvm needs an analogous fix or not. ] After migration, save/restore, etc, we issue an ioctl where we tell the host the last clock value. That (in theory) guarantees monotonicity. I am not opposed to this patch in any way, however. Thanks. HPA, do you want to take this, or shall I send it on? Thanks, J -- Ian Campbell BOFH excuse #191: Just type 'mv * /dev/null'. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv3 1/3] x86: use ELF format in compressed images.
On Thu, 2008-02-14 at 11:34 +, Mark McLoughlin wrote: On Wed, 2008-02-13 at 20:54 +, Ian Campbell wrote: This allows other boot loaders such as the Xen domain builder the opportunity to extract the ELF file. Right, Xen currently can't boot bzImage (it needs the ELF image) so you still can't use the same kernel image on Xen as bare-metal. I have a xen domain builder patch as well. I was waiting for the Linux side to gain some traction before putting it forward (I'd attach it now but it's at home on a laptop which is sleeping). +Field name:compressed_payload_offset +Type: read +Offset/size: 0x248/4 +Protocol: 2.08+ + + If non-zero then this field contains the offset from the end of the + real-mode code to the compressed payload. The compression format + should be determined using the standard magic number, currently only + gzip is used. Should probably mention that the payload format is expected to be ELF. Agreed. Probably the same deal as the compression format, i.e. use the magic number but only ELF is possible today (even less likely to change than the compression format I guess...). How about this? +sed-offsets := -e 's/^00*/0/' \ +-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/-D\2=0x\1 /p' + +$(obj)/header.o: AFLAGS_header.o += $(shell $(NM) $(obj)/compressed/vmlinux | sed -n $(sed-offsets)) +$(obj)/header.o: $(obj)/compressed/vmlinux FORCE That's probably a neater way of doing it. Although the .../header.o: AFLAGS_header.o is redundant, either header.o: AFLAGS += foo or AFLAGS_header.o += foo with the second being preferred in Linux Makefiles I think. I'll try and get an updated patch out before I head for my flight tomorrow. Ian. -- Ian Campbell Current Noise: Reverend Bizarre - The Festival While money can't buy happiness, it certainly lets you choose your own form of misery. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv3 1/3] x86: use ELF format in compressed images.
On Thu, 2008-02-14 at 17:01 +, Ian Campbell wrote: +Field name:compressed_payload_offset +Type: read +Offset/size: 0x248/4 +Protocol: 2.08+ + + If non-zero then this field contains the offset from the end of the + real-mode code to the compressed payload. The compression format + should be determined using the standard magic number, currently only + gzip is used. Should probably mention that the payload format is expected to be ELF. Agreed. Probably the same deal as the compression format, i.e. use the magic number but only ELF is possible today (even less likely to change than the compression format I guess...). Updated with a note about ELF format payload. I've also changed the fields to just payload_{offset,length} and adjusted the description to allow for the possibility of non-compressed ELF payloads. I don't have a use for it myself but I can see how it might be useful (embedded systems?) so it seems reasonable not to rule it out. ELF-in-gzip and plain ELF can both be identified by magic numbers. Ian. --- From 544c003d4067d895556180fc11a951e211202d0d Mon Sep 17 00:00:00 2001 From: Ian Campbell [EMAIL PROTECTED] Date: Thu, 14 Feb 2008 18:29:01 + Subject: [PATCH] x86: use ELF format in compressed images. This allows other boot loaders such as the Xen domain builder the opportunity to extract the ELF file. Signed-off-by: Ian Campbell [EMAIL PROTECTED] Cc: Thomas Gleixner [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: H. Peter Anvin [EMAIL PROTECTED] Cc: Jeremy Fitzhardinge [EMAIL PROTECTED] Cc: virtualization@lists.linux-foundation.org --- Documentation/i386/boot.txt | 20 + arch/x86/boot/Makefile| 14 + arch/x86/boot/compressed/Makefile |2 +- arch/x86/boot/compressed/misc.c | 56 + arch/x86/boot/header.S|4 ++ 5 files changed, 95 insertions(+), 1 deletions(-) diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index fc49b79..f2e54e5 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -170,6 +170,8 @@ Offset Proto NameMeaning 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line 023C/4 2.07+ hardware_subarch Hardware subarchitecture 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data +0248/4 2.08+ payload_offset Offset of kernel payload +024C/4 2.08+ payload_length Length of kernel payload (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -512,6 +514,24 @@ Protocol: 2.07+ A pointer to data that is specific to hardware subarch +Field name:payload_offset +Type: read +Offset/size: 0x248/4 +Protocol: 2.08+ + + If non-zero then this field contains the offset from the end of the + real-mode code to the payload. + + The payload may be compressed. The format of both the compressed and + uncompressed data should be determined using the standard magic + numbers. Currently only gzip compressed ELF is used. + +Field name:payload_length +Type: read +Offset/size: 0x24c/4 +Protocol: 2.08+ + + The length of the payload. THE KERNEL COMMAND LINE diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index f88458e..9695aff 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -94,6 +94,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) +sed-offsets := -e 's/^00*/0/' \ +-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 0x\1/p' + +quiet_cmd_offsets = OFFSETS $@ + cmd_offsets = $(NM) $ | sed -n $(sed-offsets) $@ + +$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE + $(call if_changed,offsets) + +targets += offsets.h + +AFLAGS_header.o += -I$(obj) +$(obj)/header.o: $(obj)/offsets.h + LDFLAGS_setup.elf := -T $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index d2b9f3b..92fdd35 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds $(obj)/head_$(BITS).o $(obj)/misc.o $ $(call if_changed,ld) @: -OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S +OBJCOPYFLAGS_vmlinux.bin := -R .comment -S $(obj)/vmlinux.bin: vmlinux FORCE $(call if_changed,objcopy) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 8182e32..69aec2f 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -15,6 +15,10 @@ * we just keep it from happening */ #undef CONFIG_PARAVIRT +#ifdef CONFIG_X86_32 +#define _ASM_DESC_H_ 1 +#endif + #ifdef CONFIG_X86_64 #define _LINUX_STRING_H_ 1 #define
[PATCHv3 1/3] x86: use ELF format in compressed images.
This allows other boot loaders such as the Xen domain builder the opportunity to extract the ELF file. Signed-off-by: Ian Campbell [EMAIL PROTECTED] Cc: Thomas Gleixner [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: H. Peter Anvin [EMAIL PROTECTED] Cc: Jeremy Fitzhardinge [EMAIL PROTECTED] Cc: virtualization@lists.linux-foundation.org --- Documentation/i386/boot.txt | 18 arch/x86/boot/Makefile| 14 + arch/x86/boot/compressed/Makefile |2 +- arch/x86/boot/compressed/misc.c | 56 + arch/x86/boot/header.S|6 5 files changed, 95 insertions(+), 1 deletions(-) diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index fc49b79..b5f5ba1 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -170,6 +170,8 @@ Offset Proto NameMeaning 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line 023C/4 2.07+ hardware_subarch Hardware subarchitecture 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data +0248/4 2.08+ compressed_payload_offset +024C/4 2.08+ compressed_payload_length (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -512,6 +514,22 @@ Protocol: 2.07+ A pointer to data that is specific to hardware subarch +Field name:compressed_payload_offset +Type: read +Offset/size: 0x248/4 +Protocol: 2.08+ + + If non-zero then this field contains the offset from the end of the + real-mode code to the compressed payload. The compression format + should be determined using the standard magic number, currently only + gzip is used. + +Field name:compressed_payload_length +Type: read +Offset/size: 0x24c/4 +Protocol: 2.08+ + + The length of the compressed payload. THE KERNEL COMMAND LINE diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index f88458e..9695aff 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -94,6 +94,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) +sed-offsets := -e 's/^00*/0/' \ +-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 0x\1/p' + +quiet_cmd_offsets = OFFSETS $@ + cmd_offsets = $(NM) $ | sed -n $(sed-offsets) $@ + +$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE + $(call if_changed,offsets) + +targets += offsets.h + +AFLAGS_header.o += -I$(obj) +$(obj)/header.o: $(obj)/offsets.h + LDFLAGS_setup.elf := -T $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index d2b9f3b..92fdd35 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds $(obj)/head_$(BITS).o $(obj)/misc.o $ $(call if_changed,ld) @: -OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S +OBJCOPYFLAGS_vmlinux.bin := -R .comment -S $(obj)/vmlinux.bin: vmlinux FORCE $(call if_changed,objcopy) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 8182e32..69aec2f 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -15,6 +15,10 @@ * we just keep it from happening */ #undef CONFIG_PARAVIRT +#ifdef CONFIG_X86_32 +#define _ASM_DESC_H_ 1 +#endif + #ifdef CONFIG_X86_64 #define _LINUX_STRING_H_ 1 #define __LINUX_BITMAP_H 1 @@ -22,6 +26,7 @@ #include linux/linkage.h #include linux/screen_info.h +#include linux/elf.h #include asm/io.h #include asm/page.h #include asm/boot.h @@ -365,6 +370,56 @@ static void error(char *x) asm(hlt); } +static void parse_elf(void *output) +{ +#ifdef CONFIG_X86_64 + Elf64_Ehdr ehdr; + Elf64_Phdr *phdrs, *phdr; +#else + Elf32_Ehdr ehdr; + Elf32_Phdr *phdrs, *phdr; +#endif + void *dest; + int i; + + memcpy(ehdr, output, sizeof(ehdr)); + if(ehdr.e_ident[EI_MAG0] != ELFMAG0 || + ehdr.e_ident[EI_MAG1] != ELFMAG1 || + ehdr.e_ident[EI_MAG2] != ELFMAG2 || + ehdr.e_ident[EI_MAG3] != ELFMAG3) + { + error(Kernel is not a valid ELF file); + return; + } + + putstr(Parsing ELF... ); + + phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum); + if (!phdrs) + error(Failed to allocate space for phdrs); + + memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum); + + for (i=0; iehdr.e_phnum; i++) { + phdr = phdrs[i]; + + switch (phdr-p_type) { + case PT_LOAD: +#ifdef CONFIG_RELOCATABLE + dest = output; + dest += (phdr-p_paddr - LOAD_PHYSICAL_ADDR); +#else + dest = (void
[PATCHv2 1/3] x86: use ELF format in compressed images.
This allows other boot loaders such as the Xen domain builder the opportunity to extract the ELF file. Signed-off-by: Ian Campbell [EMAIL PROTECTED] Cc: Thomas Gleixner [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: H. Peter Anvin [EMAIL PROTECTED] Cc: Jeremy Fitzhardinge [EMAIL PROTECTED] Cc: virtualization@lists.linux-foundation.org --- Documentation/i386/boot.txt | 18 arch/x86/boot/Makefile| 14 + arch/x86/boot/compressed/Makefile |2 +- arch/x86/boot/compressed/misc.c | 56 + arch/x86/boot/header.S|6 5 files changed, 95 insertions(+), 1 deletions(-) diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index fc49b79..b5f5ba1 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -170,6 +170,8 @@ Offset Proto NameMeaning 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line 023C/4 2.07+ hardware_subarch Hardware subarchitecture 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data +0248/4 2.08+ compressed_payload_offset +024C/4 2.08+ compressed_payload_length (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -512,6 +514,22 @@ Protocol: 2.07+ A pointer to data that is specific to hardware subarch +Field name:compressed_payload_offset +Type: read +Offset/size: 0x248/4 +Protocol: 2.08+ + + If non-zero then this field contains the offset from the end of the + real-mode code to the compressed payload. The compression format + should be determined using the standard magic number, currently only + gzip is used. + +Field name:compressed_payload_length +Type: read +Offset/size: 0x24c/4 +Protocol: 2.08+ + + The length of the compressed payload. THE KERNEL COMMAND LINE diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index f88458e..9695aff 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -94,6 +94,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) +sed-offsets := -e 's/^00*/0/' \ +-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 0x\1/p' + +quiet_cmd_offsets = OFFSETS $@ + cmd_offsets = $(NM) $ | sed -n $(sed-offsets) $@ + +$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE + $(call if_changed,offsets) + +targets += offsets.h + +AFLAGS_header.o += -I$(obj) +$(obj)/header.o: $(obj)/offsets.h + LDFLAGS_setup.elf := -T $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index d2b9f3b..92fdd35 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds $(obj)/head_$(BITS).o $(obj)/misc.o $ $(call if_changed,ld) @: -OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S +OBJCOPYFLAGS_vmlinux.bin := -R .comment -S $(obj)/vmlinux.bin: vmlinux FORCE $(call if_changed,objcopy) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 8182e32..69aec2f 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -15,6 +15,10 @@ * we just keep it from happening */ #undef CONFIG_PARAVIRT +#ifdef CONFIG_X86_32 +#define _ASM_DESC_H_ 1 +#endif + #ifdef CONFIG_X86_64 #define _LINUX_STRING_H_ 1 #define __LINUX_BITMAP_H 1 @@ -22,6 +26,7 @@ #include linux/linkage.h #include linux/screen_info.h +#include linux/elf.h #include asm/io.h #include asm/page.h #include asm/boot.h @@ -365,6 +370,56 @@ static void error(char *x) asm(hlt); } +static void parse_elf(void *output) +{ +#ifdef CONFIG_X86_64 + Elf64_Ehdr ehdr; + Elf64_Phdr *phdrs, *phdr; +#else + Elf32_Ehdr ehdr; + Elf32_Phdr *phdrs, *phdr; +#endif + void *dest; + int i; + + memcpy(ehdr, output, sizeof(ehdr)); + if(ehdr.e_ident[EI_MAG0] != ELFMAG0 || + ehdr.e_ident[EI_MAG1] != ELFMAG1 || + ehdr.e_ident[EI_MAG2] != ELFMAG2 || + ehdr.e_ident[EI_MAG3] != ELFMAG3) + { + error(Kernel is not a valid ELF file); + return; + } + + putstr(Parsing ELF... ); + + phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum); + if (!phdrs) + error(Failed to allocate space for phdrs); + + memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum); + + for (i=0; iehdr.e_phnum; i++) { + phdr = phdrs[i]; + + switch (phdr-p_type) { + case PT_LOAD: +#ifdef CONFIG_RELOCATABLE + dest = output; + dest += (phdr-p_paddr - LOAD_PHYSICAL_ADDR); +#else + dest = (void
[PATCH] x86: use ELF format in compressed images.
This allows other boot loaders such as the Xen domain builder the opportunity to extract the ELF file. Signed-off-by: Ian Campbell [EMAIL PROTECTED] Cc: Thomas Gleixner [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: H. Peter Anvin [EMAIL PROTECTED] Cc: Jeremy Fitzhardinge [EMAIL PROTECTED] Cc: virtualization@lists.linux-foundation.org --- Documentation/i386/boot.txt | 18 + arch/x86/boot/Makefile| 14 ++ arch/x86/boot/compressed/Makefile |2 +- arch/x86/boot/compressed/misc.c | 49 + arch/x86/boot/header.S|6 5 files changed, 88 insertions(+), 1 deletions(-) diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index fc49b79..b5f5ba1 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -170,6 +170,8 @@ Offset Proto NameMeaning 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line 023C/4 2.07+ hardware_subarch Hardware subarchitecture 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data +0248/4 2.08+ compressed_payload_offset +024C/4 2.08+ compressed_payload_length (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -512,6 +514,22 @@ Protocol: 2.07+ A pointer to data that is specific to hardware subarch +Field name:compressed_payload_offset +Type: read +Offset/size: 0x248/4 +Protocol: 2.08+ + + If non-zero then this field contains the offset from the end of the + real-mode code to the compressed payload. The compression format + should be determined using the standard magic number, currently only + gzip is used. + +Field name:compressed_payload_length +Type: read +Offset/size: 0x24c/4 +Protocol: 2.08+ + + The length of the compressed payload. THE KERNEL COMMAND LINE diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index 254a583..0c629dc 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -86,6 +86,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) +sed-offsets := -e 's/^00*/0/' \ +-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 0x\1/p' + +quiet_cmd_offsets = OFFSETS $@ + cmd_offsets = $(NM) $ | sed -n $(sed-offsets) $@ + +$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE + $(call if_changed,offsets) + +targets += offsets.h + +AFLAGS_header.o += -I$(obj) +$(obj)/header.o: $(obj)/offsets.h + LDFLAGS_setup.elf := -T $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index d2b9f3b..92fdd35 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds $(obj)/head_$(BITS).o $(obj)/misc.o $ $(call if_changed,ld) @: -OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S +OBJCOPYFLAGS_vmlinux.bin := -R .comment -S $(obj)/vmlinux.bin: vmlinux FORCE $(call if_changed,objcopy) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 8182e32..8a5daf5 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -15,6 +15,10 @@ * we just keep it from happening */ #undef CONFIG_PARAVIRT +#ifdef CONFIG_X86_32 +#define _ASM_DESC_H_ 1 +#endif + #ifdef CONFIG_X86_64 #define _LINUX_STRING_H_ 1 #define __LINUX_BITMAP_H 1 @@ -22,6 +26,7 @@ #include linux/linkage.h #include linux/screen_info.h +#include linux/elf.h #include asm/io.h #include asm/page.h #include asm/boot.h @@ -365,6 +370,49 @@ static void error(char *x) asm(hlt); } +static void parse_elf(void *output) +{ +#ifdef CONFIG_X86_64 + Elf64_Ehdr ehdr; + Elf64_Phdr *phdrs, *phdr; +#else + Elf32_Ehdr ehdr; + Elf32_Phdr *phdrs, *phdr; +#endif + int i; + + memcpy(ehdr, output, sizeof(ehdr)); + if(ehdr.e_ident[EI_MAG0] != ELFMAG0 || + ehdr.e_ident[EI_MAG1] != ELFMAG1 || + ehdr.e_ident[EI_MAG2] != ELFMAG2 || + ehdr.e_ident[EI_MAG3] != ELFMAG3) + { + error(Kernel is not a valid ELF file); + return; + } + + putstr(Parsing ELF... ); + + phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum); + if (!phdrs) + error(Failed to allocate space for phdrs); + + memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum); + + for (i=0; iehdr.e_phnum; i++) { + phdr = phdrs[i]; + + switch (phdr-p_type) { + case PT_LOAD: + memcpy((void*)phdr-p_paddr, + output + phdr-p_offset, + phdr-p_filesz); + break
[PATCH] Implement getgeo for Xen virtual block device.
Hi Jeremy, The below implements the getgeo hook for Xen block devices. Extracted from the xen-unstable tree where it has been used for ages. It is useful to have because it allows things like grub2 (used by the Debian installer images) to work in a guest domain without having to sprinkle Xen specific hacks around the place. Signed-off-by: Ian Campbell [EMAIL PROTECTED] diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 2bdebcb..b0a2e69 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -37,6 +37,7 @@ #include linux/interrupt.h #include linux/blkdev.h +#include linux/hdreg.h #include linux/module.h #include xen/xenbus.h @@ -135,6 +136,22 @@ static void blkif_restart_queue_callback(void *arg) schedule_work(info-work); } +int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg) +{ + /* We don't have real geometry info, but let's at least return + values consistent with the size of the device */ + sector_t nsect = get_capacity(bd-bd_disk); + sector_t cylinders = nsect; + + hg-heads = 0xff; + hg-sectors = 0x3f; + sector_div(cylinders, hg-heads * hg-sectors); + hg-cylinders = cylinders; + if ((sector_t)(hg-cylinders + 1) * hg-heads * hg-sectors nsect) + hg-cylinders = 0x; + return 0; +} + /* * blkif_queue_request * @@ -939,6 +956,7 @@ static struct block_device_operations xlvbd_block_fops = .owner = THIS_MODULE, .open = blkif_open, .release = blkif_release, + .getgeo = blkif_getgeo, }; -- Ian Campbell 'Martyrdom' is the only way a person can become famous without ability. -- George Bernard Shaw signature.asc Description: This is a digitally signed message part ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization