Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK

2016-04-12 Thread Ian Campbell
For some reason your mails in this thread only appear in the gmail web UI
and not on the IMAP version of my mailbox (my own and Michael's mails are
fine).

So I'm replying via the web interface, sorry for the inevitable formatting
mess :-/

I've CCd another mailbox in the hopes of getting your mails in that IMAP
folder instead/aswell so I can avoid this next time.

On 12 April 2016 at 14:59, Stefan Hajnoczi  wrote:
>
> > > > One wrinkle I came across, which I'm not sure if it is by design or a
> > > > problem is that I can see this sequence coming from the guest (with
> > > > other activity in between):
> > > >
> > > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
>
> How did you trigger this sequence?  I'd like to reproduce it.
>

Nothing magic. I've written some logging into my backend and captured the
result for a simple backend initiated connection.

In the log "TX" and "RX" indicate the thread doing the processing (with
"TX" being the one which processes the guest's TX ring, i.e. data coming
from the guest to the host). "<=" indicates a buffer going from guest to
host and "=>" is from host to guest. NB that guest to host replies are
queued synchronously by the TX thread onto the RX ring which is why the
somewhat odd looking "TX: =>" combination can occur. A host initiated
connection also happens from the TX thread in the same way.

The trace is of a simple request response (which both fit in one buffer in
each direction), the lines without an "?X:" prefix are my
annotations/guesses as to what is going on:

TX: =>SRC:0002.00010002 DST:0003.0948
TX:   LEN: TYPE:0001 OP:1=REQUEST
TX:  FLAGS: BUF_ALLOC:8000 FWD_CNT:

TX: <=SRC:0003.0948 DST:0002.00010002
TX:   LEN: TYPE:0001 OP:2=RESPONSE
TX:  FLAGS: BUF_ALLOC:0004 FWD_CNT:

REQUEST + RESPONSE == Channel open successfully

RX: =>SRC:0002.00010002 DST:0003.0948
RX:   LEN:005e TYPE:0001 OP:5=RW
RX:  FLAGS: BUF_ALLOC:8000 FWD_CNT:

Host sends a request to the guest

TX: <=SRC:0003.0948 DST:0002.00010002
TX:   LEN: TYPE:0001 OP:6=CREDIT_UPDATE
TX:  FLAGS: BUF_ALLOC:0004 FWD_CNT:005e

Guest replies with a credit update

TX: <=SRC:0003.0948 DST:0002.00010002
TX:   LEN:0091 TYPE:0001 OP:5=RW
TX:  FLAGS: BUF_ALLOC:0004 FWD_CNT:005e

Guest replies with the answer to the request

RX: =>SRC:0002.00010002 DST:0003.0948
RX:   LEN: TYPE:0001 OP:4=SHUTDOWN
RX:  FLAGS:0002 BUF_ALLOC:8000 FWD_CNT:0091

Host has sent its only request, so host app must have done
shutdown(SHUT_WR) I suppose and host therefore sends SHUTDOWN_TX.

TX: <=SRC:0003.0948 DST:0002.00010002
TX:   LEN: TYPE:0001 OP:4=SHUTDOWN
TX:  FLAGS:0001 BUF_ALLOC:0004 FWD_CNT:005e

Guest SHUTDOWN_RX. I'm not sure if this is a direct kernel response to the
SHUTDOWN_TX or if the application inside the guest saw an EOF when reading
the socket and did the corresponding shutdown(SHUT_RD).

TX: <=SRC:0003.0948 DST:0002.00010002
TX:   LEN: TYPE:0001 OP:4=SHUTDOWN
TX:  FLAGS:0002 BUF_ALLOC:0004 FWD_CNT:005e

Guest SHUTDOWN_TX, I presume that having sent the only response it is going
to it then does shutdown(SHUT_WR).

TX: <=SRC:0003.0948 DST:0002.00010002
TX:   LEN: TYPE:0001 OP:4=SHUTDOWN
TX:  FLAGS:0003 BUF_ALLOC:0004 FWD_CNT:005e

Guest shuts down both directions.

Perhaps the guest end is turning shutdown(foo) directly into a vsock
message without or-ing in the current state?

> > > I orignally had my backend close things down at #2, however this meant
> > > > that when #3 arrived it was for a non-existent socket (or, worse, an
> > > > active one if the ports got reused). I checked v5 of the spec
> > > > proposal[0] which says:
> > > > If these bits are set and there are no more virtqueue buffers
> > > > pending the socket is disconnected.
> > > >
> > > > but I'm not entirely sure if this behaviour contradicts this or not
> > > > (the bits have both been set at #2, but not at the same time).
> > > >
> > > > BTW, how does one tell if there are no more virtqueue buffers pending
> > > > or not while processing the op?
> > >
> > > #2 is odd.  The shutdown bits are sticky so they cannot be cleared once
> > > set.  I would have expected just #1 and #3.  The behavior you observe
> > > look like a bug.
> > >
> > > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> > > for this connection.  "there are no more virtqueue buffers pending the
> > > socket" really means that this isn't an immediate close from the
> > > perspective of the application.  If the application still has unread rx
> > > 

Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK

2016-04-12 Thread Ian Campbell
Some how Stefan's reply disapeared from my INBOX (although I did see
it) so replying here.

On Mon, 2016-04-11 at 15:54 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> > 
> > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > > 
> > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > > 
> > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> > > > 
> > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock 
> > > > but
> > > > first I want to share the latest version of the code.  Several people 
> > > > are
> > > > playing with vsock now so sharing the latest code should avoid 
> > > > duplicate work.
> > > Thanks for this, I've been using it in my project and it mostly seems
> > > fine.
> > > 
> > > One wrinkle I came across, which I'm not sure if it is by design or a
> > > problem is that I can see this sequence coming from the guest (with
> > > other activity in between):
> > > 
> > >     1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
> > > 
> > > I orignally had my backend close things down at #2, however this meant
> > > that when #3 arrived it was for a non-existent socket (or, worse, an
> > > active one if the ports got reused). I checked v5 of the spec
> > > proposal[0] which says:
> > > If these bits are set and there are no more virtqueue buffers
> > > pending the socket is disconnected.
> > > 
> > > but I'm not entirely sure if this behaviour contradicts this or not
> > > (the bits have both been set at #2, but not at the same time).
> > > 
> > > BTW, how does one tell if there are no more virtqueue buffers pending
> > > or not while processing the op?
> > #2 is odd.  The shutdown bits are sticky so they cannot be cleared once
> > set.  I would have expected just #1 and #3.  The behavior you observe
> > look like a bug.
> > 
> > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> > for this connection.  "there are no more virtqueue buffers pending the
> > socket" really means that this isn't an immediate close from the
> > perspective of the application.  If the application still has unread rx
> > buffers then the socket stays readable until the rx data has been fully
> > read.

Thanks, distinguishing the local buffer to the application from the
vring would make that clearer. Perhaps by not talking about "virtqueue
buffers" since they sound like a vring thing.

However, as Michael observes I'm not sure that's the whole story.

> Yes but you also wrote:
>   If these bits are set and there are no more virtqueue buffers
>   pending the socket is disconnected.
> 
> how does remote know that there are no buffers pending and so it's safe
> to reuse the same source/destination address now?

Indeed this is one of the things I struggled with. e.g. If I send a
SHUTDOWN_RX to my peer am I supposed to wait for that buffer to come
back (so I know the peer has seen it) and then wait for an entire
"cycle" of the TX ring to know there is nothing still in flight? That's
some tricky book-keeping.

>   Maybe destination
> should send RST at that point?

i.e. upon receipt of SHUTDOWN_RX|SHUTDOWN_TX from the peer you are
expected to send a RST. When the peer observes that then they know
there is no further data in that connection on the ring?

That sounds like it would be helpful.

> > > Another thing I noticed, which is really more to do with the generic
> > > AF_VSOCK bits than anything to do with your patches is that there is no
> > > limitations on which vsock ports a non-privileged user can bind to and
> > > relatedly that there is no netns support so e.g. users in unproivileged
> > > containers can bind to any vsock port and talk to the host, which might
> > > be undesirable. For my use for now I just went with the big hammer
> > > approach of denying access from anything other than init_net
> > > namespace[1] while I consider what the right answer is.
> > From the vhost point of view each netns should have its own AF_VSOCK
> > namespace.  This way two containers could act as "the host" (CID 2) for
> > their respective guests.

When you say "should" you mean that's the intended design as opposed to
what the current code is actually doing, right?

Ian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK

2016-04-08 Thread Ian Campbell
On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> 
> I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but
> first I want to share the latest version of the code.  Several people are
> playing with vsock now so sharing the latest code should avoid duplicate work.

Thanks for this, I've been using it in my project and it mostly seems
fine.

One wrinkle I came across, which I'm not sure if it is by design or a
problem is that I can see this sequence coming from the guest (with
other activity in between):

    1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX

I orignally had my backend close things down at #2, however this meant
that when #3 arrived it was for a non-existent socket (or, worse, an
active one if the ports got reused). I checked v5 of the spec
proposal[0] which says:
If these bits are set and there are no more virtqueue buffers
pending the socket is disconnected.

but I'm not entirely sure if this behaviour contradicts this or not
(the bits have both been set at #2, but not at the same time).

BTW, how does one tell if there are no more virtqueue buffers pending
or not while processing the op?

Another thing I noticed, which is really more to do with the generic
AF_VSOCK bits than anything to do with your patches is that there is no
limitations on which vsock ports a non-privileged user can bind to and
relatedly that there is no netns support so e.g. users in unproivileged
containers can bind to any vsock port and talk to the host, which might
be undesirable. For my use for now I just went with the big hammer
approach of denying access from anything other than init_net
namespace[1] while I consider what the right answer is.

Ian.

[0] http://thread.gmane.org/gmane.comp.emulators.virtio.devel/1092
[1]
From 366c9c42afb9bd54f92f72518470c09e46f12e88 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campb...@docker.com>
Date: Mon, 4 Apr 2016 14:50:10 +0100
Subject: [PATCH] VSOCK: Only allow host network namespace to use AF_VSOCK.

The VSOCK addressing schema does not really lend itself to simply creating an
alternative end point address within a namespace.

Signed-off-by: Ian Campbell <ian.campb...@docker.com>
---
 net/vmw_vsock/af_vsock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 1e5f5ed..cdb3dd3 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1840,6 +1840,9 @@ static const struct proto_ops vsock_stream_ops = {
 static int vsock_create(struct net *net, struct socket *sock,
    int protocol, int kern)
 {
+   if (!net_eq(net, _net))
+   return -EAFNOSUPPORT;
+
    if (!sock)
    return -EINVAL;
 
-- 
2.8.0.rc3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration

2014-10-30 Thread Ian Campbell
On Thu, 2014-10-30 at 07:45 -0700, Andy Lutomirski wrote:
  Xen does not have a continual source of entropy and the only feasible
  way is for the toolstack to provide each guest with a fixed size pool of
  random data during guest creation.
 
 
 Xen could seed a very simple per-guest DRBG at guest startup and then
 let the rdmsr call read from it.

I think I'm a bit confused by the intended scope of this facility. The
original spec said:

Note that the CommonHV RNG is not intended to replace stronger, 
asynchronous
paravirtual random number generator interfaces.  It is intended 
primarily
for seeding guest RNGs early in boot.

Which to me reads that the guest should be using this facility to seed
it's own simple DRBG on boot (with some finite amount of seed data from
the hv) and then using that until it can switch to something better. Is
that not the intention?

I think it's important to nail down the intended scope of this
interface, since it has quite an impact on what would be considered a
reasonable common design. 

Post boot I would as you say expect most OSes to switch over to
something more capable, not continue to rely on this facility for the
duration.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH next] xen: Use more current logging styles

2013-06-28 Thread Ian Campbell
On Thu, 2013-06-27 at 21:57 -0700, Joe Perches wrote:
 Instead of mixing printk and pr_level forms,
 just use pr_level
 
 Miscellaneous changes around these conversions:
 
 Add a missing newline to avoid message interleaving,
 coalesce formats, reflow modified lines to 80 columns.
 
 Signed-off-by: Joe Perches j...@perches.com

Acked-by: Ian Campbell ian.campb...@citrix.com

 ---
  drivers/net/xen-netback/netback.c |  7 +++
  drivers/net/xen-netfront.c| 28 +---
  2 files changed, 16 insertions(+), 19 deletions(-)
 
 diff --git a/drivers/net/xen-netback/netback.c 
 b/drivers/net/xen-netback/netback.c
 index 130bcb2..64828de 100644
 --- a/drivers/net/xen-netback/netback.c
 +++ b/drivers/net/xen-netback/netback.c
 @@ -1890,9 +1890,8 @@ static int __init netback_init(void)
   return -ENODEV;
  
   if (fatal_skb_slots  XEN_NETBK_LEGACY_SLOTS_MAX) {
 - printk(KERN_INFO
 -xen-netback: fatal_skb_slots too small (%d), bump it to 
 XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n,
 -fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX);
 + pr_info(fatal_skb_slots too small (%d), bump it to 
 XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n,
 + fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX);
   fatal_skb_slots = XEN_NETBK_LEGACY_SLOTS_MAX;
   }
  
 @@ -1921,7 +1920,7 @@ static int __init netback_init(void)
netback/%u, group);
  
   if (IS_ERR(netbk-task)) {
 - printk(KERN_ALERT kthread_create() fails at 
 netback\n);
 + pr_alert(kthread_create() fails at netback\n);
   del_timer(netbk-net_timer);
   rc = PTR_ERR(netbk-task);
   goto failed_init;
 diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
 index 76a2236..ff7f111 100644
 --- a/drivers/net/xen-netfront.c
 +++ b/drivers/net/xen-netfront.c
 @@ -29,6 +29,8 @@
   * IN THE SOFTWARE.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/module.h
  #include linux/kernel.h
  #include linux/netdevice.h
 @@ -385,9 +387,8 @@ static void xennet_tx_buf_gc(struct net_device *dev)
   skb = np-tx_skbs[id].skb;
   if (unlikely(gnttab_query_foreign_access(
   np-grant_tx_ref[id]) != 0)) {
 - printk(KERN_ALERT xennet_tx_buf_gc: warning 
 --- grant still in use by backend 
 -domain.\n);
 + pr_alert(%s: warning -- grant still in use by 
 backend domain\n,
 +  __func__);
   BUG();
   }
   gnttab_end_foreign_access_ref(
 @@ -804,14 +805,14 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
  {
   if (!gso-u.gso.size) {
   if (net_ratelimit())
 - printk(KERN_WARNING GSO size must not be zero.\n);
 + pr_warn(GSO size must not be zero\n);
   return -EINVAL;
   }
  
   /* Currently only TCPv4 S.O. is supported. */
   if (gso-u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
   if (net_ratelimit())
 - printk(KERN_WARNING Bad GSO type %d.\n, 
 gso-u.gso.type);
 + pr_warn(Bad GSO type %d\n, gso-u.gso.type);
   return -EINVAL;
   }
  
 @@ -910,9 +911,8 @@ static int checksum_setup(struct net_device *dev, struct 
 sk_buff *skb)
   break;
   default:
   if (net_ratelimit())
 - printk(KERN_ERR Attempting to checksum a non-
 -TCP/UDP packet, dropping a protocol
 - %d packet, iph-protocol);
 + pr_err(Attempting to checksum a non-TCP/UDP packet, 
 dropping a protocol %d packet\n,
 +iph-protocol);
   goto out;
   }
  
 @@ -1359,14 +1359,14 @@ static struct net_device *xennet_create_dev(struct 
 xenbus_device *dev)
   /* A grant for every tx ring slot */
   if (gnttab_alloc_grant_references(TX_MAX_TARGET,
 np-gref_tx_head)  0) {
 - printk(KERN_ALERT  netfront can't alloc tx grant refs\n);
 + pr_alert(can't alloc tx grant refs\n);
   err = -ENOMEM;
   goto exit_free_stats;
   }
   /* A grant for every rx ring slot */
   if (gnttab_alloc_grant_references(RX_MAX_TARGET,
 np-gref_rx_head)  0) {
 - printk(KERN_ALERT  netfront can't alloc rx grant refs\n);
 + pr_alert(can't alloc rx grant refs\n);
   err = -ENOMEM;
   goto exit_free_tx;
   }
 @@ -1430,16 +1430,14 @@ static int netfront_probe(struct

Re: [PATCH next] xen: Convert printks to pr_level

2013-06-28 Thread Ian Campbell
On Fri, 2013-06-28 at 03:21 -0700, Joe Perches wrote:
 Convert printks to pr_level (excludes printk(KERN_DEBUG...)
 to be more consistent throughout the xen subsystem.
 
 Add pr_fmt with KBUILD_MODNAME or xen: KBUILD_MODNAME
 Coalesce formats and add missing word spaces
 Add missing newlines
 Align arguments and reflow to 80 columns
 Remove DRV_NAME from formats as pr_fmt adds the same content
 
 This does change some of the prefixes of these messages
 but it also does make them more consistent.
 
 Signed-off-by: Joe Perches j...@perches.com
 ---
 
 On Fri, 2013-06-28 at 09:02 +0100, Wei Liu wrote:
  Do you also need to replace other printk occurences in xen-netback
  directory, say, interface.c and xenbus.c?
 
 Well, I don't _need_ to but if you want it

I think Wei just mean drivers/net/xen-blkback/{interface.c,xenbus.c} in
addition to the netback.c you were patching in your previous patch.

 this is what I suggest.

Wow ;-)

  drivers/xen/balloon.c   |  6 +++--
  drivers/xen/cpu_hotplug.c   |  6 +++--
  drivers/xen/events.c| 23 +-
  drivers/xen/evtchn.c|  6 +++--
  drivers/xen/gntalloc.c  |  6 +++--
  drivers/xen/gntdev.c|  8 ---
  drivers/xen/grant-table.c   | 17 +++---
  drivers/xen/manage.c| 23 +-
  drivers/xen/mcelog.c| 36 
 +++--
  drivers/xen/pcpu.c  | 12 +-
  drivers/xen/privcmd.c   |  4 +++-
  drivers/xen/swiotlb-xen.c   | 12 ++
  drivers/xen/tmem.c  | 10 
  drivers/xen/xen-acpi-cpuhotplug.c   |  2 ++
  drivers/xen/xen-acpi-memhotplug.c   |  2 ++
  drivers/xen/xen-acpi-pad.c  |  2 ++
  drivers/xen/xen-acpi-processor.c| 25 ++--
  drivers/xen/xen-balloon.c   |  6 +++--
  drivers/xen/xen-pciback/conf_space_header.c | 16 ++---
  drivers/xen/xen-pciback/pci_stub.c  | 25 +---
  drivers/xen/xen-pciback/pciback_ops.c   |  9 +---
  drivers/xen/xen-pciback/vpci.c  | 10 
  drivers/xen/xen-pciback/xenbus.c|  8 ---
  drivers/xen/xen-selfballoon.c   | 11 -
  drivers/xen/xenbus/xenbus_comms.c   | 13 ++-
  drivers/xen/xenbus/xenbus_dev_backend.c |  4 +++-
  drivers/xen/xenbus/xenbus_dev_frontend.c|  4 +++-
  drivers/xen/xenbus/xenbus_probe.c   | 30 +++-
  drivers/xen/xenbus/xenbus_probe_backend.c   |  8 ---
  drivers/xen/xenbus/xenbus_probe_frontend.c  | 35 ++--
  drivers/xen/xenbus/xenbus_xs.c  | 22 --
  drivers/xen/xencomm.c   |  2 ++
  drivers/xen/xenfs/super.c   |  4 +++-
  include/xen/hvm.h   |  4 ++--
  34 files changed, 215 insertions(+), 196 deletions(-)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH] arch/x86/xen: remove depends on CONFIG_EXPERIMENTAL

2013-02-25 Thread Ian Campbell
On Sat, 2013-02-23 at 20:47 +, Stefano Stabellini wrote:
 On Sat, 23 Feb 2013, Konrad Rzeszutek Wilk wrote:
  On Sat, Feb 23, 2013 at 09:03:20AM -0800, Kees Cook wrote:
   On Sat, Feb 23, 2013 at 3:59 AM, Dongsheng Song
   dongsheng.s...@gmail.com wrote:
On Sat, Feb 23, 2013 at 3:29 PM, Kees Cook keesc...@chromium.org 
wrote:
   
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any depends on lines in Kconfigs.
   
Signed-off-by: Kees Cook keesc...@chromium.org
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Mukesh Rathor mukesh.rat...@oracle.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 arch/x86/xen/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
   
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 93ff4e1..8cada4c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -53,7 +53,7 @@ config XEN_DEBUG_FS
   
 config XEN_X86_PVH
bool Support for running as a PVH guest (EXPERIMENTAL)
   
Why not remove this 'EXPERIMENTAL' too ?
   
   It was unclear to me if the feature was actually considered unstable.
   I can resend with the text removed from the title too, if that's the
   correct action here?
  
  It certainly is unstable right now (which is why it was unstaged from
  the v3.9 train). I hope that by v3.10 it won't be - at which
  point this patch (and the EXPERIMENTAL) makes sense.
  
  So could you respin it please with the text removed as well - and I will
  queue it up in the branch that carries the PVH feature?
 
 We also have the same flag on Xen ARM, and the reason is that the ABI is
 not stable yet. As soon as it is (I think soon now), I'll send a patch
 to remove EXPERIMENTAL from there too.

In the meantime if the depends EXPERIMENTAL is going away perhaps we
should explain the EXPERIMENTAL in the title:

8

From bc22bd0f7b20296c449a05d82be950922042bc92 Mon Sep 17 00:00:00 2001
From: Ian Campbell ian.campb...@citrix.com
Date: Thu, 4 Oct 2012 09:12:51 +0100
Subject: [PATCH] arm: xen: explain the EXPERIMENTAL dependency in the Kconfig 
help

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: Russell King li...@arm.linux.org.uk
Cc: linux-arm-ker...@lists.infradead.org
---
 arch/arm/Kconfig |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b8..ef14873 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1865,6 +1865,14 @@ config XEN
help
  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
 
+
+ This option is EXPERIMENTAL because the hypervisor
+ interfaces which it uses are not yet considered stable
+ therefore backwards and forwards compatibility is not yet
+ guaranteed.
+
+ If unsure, say N.
+
 endmenu
 
 menu Boot options
-- 
1.7.2.5



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-07 Thread Ian Campbell
On Fri, 2013-01-04 at 19:11 +, Konrad Rzeszutek Wilk wrote:
 On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
  On Fri, Jan 04, 2013 at 02:41:17PM +, Jan Beulich wrote:
On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote:
On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
/sbin/kexec can load the Xen crash kernel itself by issuing
hypercalls using /dev/xen/privcmd.  This would remove the need for
the dom0 kernel to distinguish between loading a crash kernel for
itself and loading a kernel for Xen.
   
Or is this just a silly idea complicating the matter?
   
This is impossible with current Xen kexec/kdump interface.
  
   Why?
  
  Because current KEXEC_CMD_kexec_load does not load kernel
  image and other things into Xen memory. It means that it
  should live somewhere in dom0 Linux kernel memory.
 
 We could have a very simple hypercall which would have:
 
 struct fancy_new_hypercall {
   xen_pfn_t payload; // IN

This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
figure out what pfns back its memory. In any case since the hypervisor
is going to want to copy the data into the crashkernel space a virtual
address is convenient to have.

   ssize_t len; // IN
 #define DATA (11)
 #define DATA_EOF (12)
 #define DATA_KERNEL (13)
 #define DATA_RAMDISK (14)
   unsigned int flags; // IN
   unsigned int status; // OUT
 };
 
 which would in a loop just iterate over the payloads and
 let the hypervisor stick it in the crashkernel space.
 
 This is all hand-waving of course. There probably would be a need
 to figure out how much space you have in the reserved Xen's
 'crashkernel' memory region too.

This is probably a mad idea but it's Monday morning and I'm sleep
deprived so I'll throw it out there...

What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
dom0 to map the kexec memory space with the usual privcmd mmap
hypercalls and build things in it directly.

OK, I suspect this might not be practical for a variety of reasons (lack
of a p2m for such domains so no way to find out the list of mfns, dom0
userspace simply doesn't have sufficient context to write sensible
things here, etc) but maybe someone has a better head on today...

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-07 Thread Ian Campbell
On Mon, 2013-01-07 at 10:46 +, Andrew Cooper wrote:

 Given that /sbin/kexec creates a binary blob in memory, surely the most 
 simple thing is to get it to suitably mlock() the region and give a list 
 of VAs to the hypervisor.

More than likely. The DOMID_KEXEC thing was just a radon musing ;-)

 This way, Xen can properly take care of what it does with information 
 and where.  For example, at the moment, allowing dom0 to choose where 
 gets overwritten in the Xen crash area is a recipe for disaster if a 
 crash occurs midway through loading/reloading the crash kernel.

That's true. I think there is a double buffering scheme in the current
thing and we should preserve that in any new implementation.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-07 Thread Ian Campbell
On Mon, 2013-01-07 at 12:34 +, Daniel Kiper wrote:
 I think that new kexec hypercall function should mimics kexec syscall.

We want to have an interface can be used by non-Linux domains (both dom0
and domU) as well though, so please bear this in mind.

Historically we've not always been good at this when the hypercall
interface is strongly tied to a particular guest implementation (in some
sense this is the problem with the current kexec hypercall).

Also what makes for a good syscall interface does not necessarily make
for a good hypercall interface.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Ian Campbell
On Fri, 2013-01-04 at 14:22 +, Daniel Kiper wrote:
 On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  On 27/12/12 18:02, Eric W. Biederman wrote:
  Andrew Cooperandrew.coop...@citrix.com  writes:
  
  On 27/12/2012 07:53, Eric W. Biederman wrote:
  The syscall ABI still has the wrong semantics.
  
  Aka totally unmaintainable and umergeable.
  
  The concept of domU support is also strange.  What does domU support 
  even mean, when the dom0 support is loading a kernel to pick up Xen when 
  Xen falls over.
  There are two requirements pulling at this patch series, but I agree
  that we need to clarify them.
  It probably make sense to split them apart a little even.
  
  
 
  Thinking about this split, there might be a way to simply it even more.
 
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
 This is impossible with current Xen kexec/kdump interface.
 It should be changed to do that. However, I suppose that
 Xen community would not be interested in such changes.

The current HYPERVISOR_kexec interface is pretty fricken bad (it
basically hardcodes the Linux Circa-2.6.18 internal interface!).

I'd be all for a new HYPERVISOR_kexec (with the old gaining a _compat
suffix) which implements something more generic that isn't tied to a
particular dom0 kernel implementation (be it differing versions of Linux
or e.g. *BSD).

If that enables /sbin/kexec to load the kernel directly then so much the
better, assuming the /sbin/kexec maintainers are happy with that
approach.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation

2013-01-02 Thread Ian Campbell
On Fri, 2012-12-28 at 03:16 +, Eric W. Biederman wrote:
 Hasn't 32bit dom0 been retired?

The 32 bit hypervisor has been but 32 bit (PAE) guests (which includes
dom0) are still supported on top of a 64 bit hypervisor. There are no
plans to remove that support.

Ian

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-02 Thread Ian Campbell
On Thu, 2012-12-27 at 14:18 +, Andrew Cooper wrote:
 Many cloud customers and service providers want the ability for a VM
 administrator to be able to load a kdump/kexec kernel within a
 domain[1].  This allows the VM administrator to take more proactive
 steps to isolate the cause of a crash, the state of which is most likely
 discarded while tearing down the domain.  The result being that as far
 as Xen is concerned, the domain is still alive, while the kdump
 kernel/environment can work its usual magic.  I am not aware of any
 feature like this existing in the past.

I have a feeling that some versions of the classic-Xen port supported
domU kexec as well. Certainly there was some work on that back in 2005,
although I can't see much evidence that that attempt ever went anywhere
so maybe I'm imagining things.

It's possible that I'm confusing domU kexec support with support for
domU kexec in some dom0 kernels. That was/is used to support kexec
from a PV bootloader into the real kernel (which looks to the host a lot
like a domU kexec would).

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Ian Campbell
On Fri, 2012-11-23 at 10:37 +, Daniel Kiper wrote:
 On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
   On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
   The crash region (as specified by crashkernel= on the Xen command line)
   is isolated from dom0.
  [...]
 
  But all of this _could_ be done completely independent of the
  Dom0 kernel's kexec infrastructure (i.e. fully from user space,
  invoking the necessary hypercalls through the privcmd driver).
 
 No, this is impossible. kexec/kdump image lives in dom0 kernel memory
 until execution.

Are you sure? I could have sworn they lived in the hypervisor owned
memory set aside by the crashkernel= parameter as Andy suggested.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Ian Campbell
On Fri, 2012-11-23 at 09:56 +, Jan Beulich wrote:
  On 22.11.12 at 18:37, H. Peter Anvin h...@zytor.com wrote:
  I actually talked to Ian Jackson at LCE, and mentioned among other 

That was me actually (this happens surprisingly often ;-)).

  things the bogosity of requiring a PUD page for three-level paging in 
  Linux -- a bogosity which has spread from Xen into native.  It's a page 
  wasted for no good reason, since it only contains 32 bytes worth of 
  data, *inherently*.  Furthermore, contrary to popular belief, it is 
  *not* pa page table per se.
  
  Ian told me: I didn't know we did that, and we shouldn't have to. 
  Here we have suffered this overhead for at least six years, ...
 
 Even the Xen kernel only needs the full page when running on a
 64-bit hypervisor (now that we don't have a 32-bit hypervisor
 anymore, that of course basically means always).

I took an, admittedly very brief, look at it on the plane on the way
home and it seems like the requirement for a complete page on the
pvops-xen side comes from the !SHARED_KERNEL_PMD stuff (so still a Xen
related thing). This requires a struct page for the list_head it
contains (see pgd_list_add et al) rather than because of the use of the
page as a pgd as such.

  But yes, I too
 never liked this enforced over-allocation for native kernels (and
 was surprised that it was allowed in at all).

Completely agreed.

I did wonder if just doing something like:
-   pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);
+   if (SHARED_KERNEL_PMD)
+   pgd = some_appropriate_allocation_primitive(sizeof(*pgd));
+   else
+   pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);

to pgd_alloc (+ the equivalent for the error path  free case, create
helper funcs as desired etc) would be sufficient to remove the over
allocation for the native case but haven't had time to properly
investigate.

Alternatively push the allocation down into paravirt_pgd_alloc to
taste :-/

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH 09/14] xen: events: Remove redundant check on unsigned variable

2012-11-19 Thread Ian Campbell
On Mon, 2012-11-19 at 03:52 +, Tushar Behera wrote:
 On 11/16/2012 10:23 PM, Jeremy Fitzhardinge wrote:
  To be honest I'd nack this kind of patch. The test is only redundant in the 
  most trivial sense that the compiler can easily optimise away. The point of 
  the test is to make sure that the range is OK even if the type subsequently 
  becomes signed (to hold a -ve error, for example).
  
  J
  
 
 The check is on the function argument which is unsigned, so checking '
 0' doesn't make sense. We should force signed check only if the argument
 is of signed type. In any case, even if irq has been assigned some error
 value, that would be caught by the check irq = nr_irqs.

Jeremy is (I think) arguing that this check is not redundant because
someone might change the type of the argument to be signed and until
then the compiler can trivially optimise the check away, so what's the
harm in it?

I'm somewhat inclined to agree with him.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: memory corruption in HYPERVISOR_physdev_op()

2012-10-15 Thread Ian Campbell
On Fri, 2012-09-14 at 14:24 +0300, Dan Carpenter wrote:
 Hi Jeremy,

Jeremy doesn't work on Xen much any more. Adding Konrad and the
xen-devel@ list.

 My static analyzer complains about potential memory corruption in
 HYPERVISOR_physdev_op()
 
 arch/x86/include/asm/xen/hypercall.h
389  static inline int
390  HYPERVISOR_physdev_op(int cmd, void *arg)
391  {
392  int rc = _hypercall2(int, physdev_op, cmd, arg);
393  if (unlikely(rc == -ENOSYS)) {
394  struct physdev_op op;
395  op.cmd = cmd;
396  memcpy(op.u, arg, sizeof(op.u));
397  rc = _hypercall1(int, physdev_op_compat, op);
398  memcpy(arg, op.u, sizeof(op.u));
 ^
 Some of the arg buffers are not as large as sizeof(op.u) which is either
 12 or 16 depending on the size of longs in struct physdev_apic.

Nasty!

 
399  }
400  return rc;
401  }
 
 One example of this is in xen_initdom_restore_msi_irqs().
 
 arch/x86/pci/xen.c
337  struct physdev_pci_device restore_ext;
338  
339  restore_ext.seg = pci_domain_nr(dev-bus);
340  restore_ext.bus = dev-bus-number;
341  restore_ext.devfn = dev-devfn;
342  ret = HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi_ext,
343  restore_ext);
 
 There are only 4 bytes here.
 
344  if (ret == -ENOSYS)
 ^^
 If we hit this condition, we have corrupted some memory.

I can see the memory corruption but how does it relate to ret ==
-ENOSYS?

 
345  pci_seg_supported = false;
 
 regards,
 dan carpenter
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
 

-- 
Ian Campbell
Current Noise: Therapy? - Femtex

Riffle West Virginia is so small that the Boy Scout had to double as the
town drunk.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: potential integer overflow in xenbus_file_write()

2012-10-15 Thread Ian Campbell
On Thu, 2012-09-13 at 19:00 +0300, Dan Carpenter wrote:
 Hi,

Thanks Dan. I'm not sure anyone from Xen-land really monitors
virtualization@. Adding xen-devel and Konrad.

 
 I was reading some code and had a question in xenbus_file_write()
 
 drivers/xen/xenbus/xenbus_dev_frontend.c
461  if ((len + u-len)  sizeof(u-u.buffer)) {
  
 Can this addition overflow?

len is a size_t and u-len is an unsigned int, so I expect so.

   Should the test be something like:
 
   if (len  sizeof(u-u.buffer) || len + u-len  sizeof(u-u.buffer)) {

I think that would do it.

Ian.

462  /* On error, dump existing buffer */
463  u-len = 0;
464  rc = -EINVAL;
465  goto out;
466  }
467  
468  ret = copy_from_user(u-u.buffer + u-len, ubuf, len);
469  
 
 regards,
 dan carpenter
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] memory corruption in HYPERVISOR_physdev_op()

2012-10-15 Thread Ian Campbell
On Mon, 2012-10-15 at 11:48 +0100, Jan Beulich wrote:
  On 15.10.12 at 12:27, Ian Campbell ian.campb...@citrix.com wrote:
  On Fri, 2012-09-14 at 14:24 +0300, Dan Carpenter wrote:
  My static analyzer complains about potential memory corruption in
  HYPERVISOR_physdev_op()
  
  arch/x86/include/asm/xen/hypercall.h
 389  static inline int
 390  HYPERVISOR_physdev_op(int cmd, void *arg)
 391  {
 392  int rc = _hypercall2(int, physdev_op, cmd, arg);
 393  if (unlikely(rc == -ENOSYS)) {
 394  struct physdev_op op;
 395  op.cmd = cmd;
 396  memcpy(op.u, arg, sizeof(op.u));
 397  rc = _hypercall1(int, physdev_op_compat, op);
 398  memcpy(arg, op.u, sizeof(op.u));
  ^
  Some of the arg buffers are not as large as sizeof(op.u) which is either
  12 or 16 depending on the size of longs in struct physdev_apic.
  
  Nasty!
 
 Wasn't it that pv-ops expects Xen 4.0.1 or newer anyway? If so,
 what does this code exist for in the first place (it's framed by
 #if CONFIG_XEN_COMPAT = 0x030002 in the Xenified kernel)?

I think the 4.0.1 or newer requirement is for dom0 only. I guess physdev
op is only used in dom0 though? Or does passthrough need it?

 
 399  }
 400  return rc;
 401  }
  
  One example of this is in xen_initdom_restore_msi_irqs().
  
  arch/x86/pci/xen.c
 337  struct physdev_pci_device restore_ext;
 338  
 339  restore_ext.seg = pci_domain_nr(dev-bus);
 340  restore_ext.bus = dev-bus-number;
 341  restore_ext.devfn = dev-devfn;
 342  ret = 
  HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi_ext,
 343  restore_ext);
  
  There are only 4 bytes here.
  
 344  if (ret == -ENOSYS)
  ^^
  If we hit this condition, we have corrupted some memory.
  
  I can see the memory corruption but how does it relate to ret ==
  -ENOSYS?
 
 The (supposedly) corrupting code site inside an
 
   if (unlikely(rc == -ENOSYS)) {

Ah, for some reason I assumed this was in the eventual caller, even
though it was staring me right in the face in the full quote.

 Supposedly because as long as the argument passed to the
 function is in memory accessed by the local CPU only and
 doesn't overlap with storage used for rc (e.g. living in a
 register), there's no corruption possible afaict - the second
 memcpy() would just copy back what the first one obtained
 from there.
 
 Fixing this other than by removing the broken code would be
 pretty hard I'm afraid (and I tend to leave the code untouched
 altogether in the Xenified tree).

Given that it is compat code the list of subops which needs to supported
in this case is small and finite so a simple lookup table or even switch
stmt for the size might be an option.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH] xen: do not disable netfront in dom0

2012-05-22 Thread Ian Campbell
On Tue, 2012-05-22 at 20:13 +0100, David Miller wrote:
 From: Marek Marczykowski marma...@invisiblethingslab.com
 Date: Sun, 20 May 2012 13:45:10 +0200
 
  Netfront driver can be also useful in dom0, eg when all NICs are assigned to
  some domU (aka driver domain). Then using netback in domU and netfront in 
  dom0
  is the only way to get network access in dom0.
  
  Signed-off-by: Marek Marczykowski marma...@invisiblethingslab.com
 
 Someone please review this and I can merge it in via the 'net' tree if
 it looks OK to XEN folks.

Konrad is Xen folks and has acked it already but FWIW:

Acked-by: Ian Campbell ian.campb...@citrix.com

Ian.

 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCHv2] x86info: dump kvm cpuid's

2012-05-02 Thread Ian Campbell
On Tue, 2012-05-01 at 16:04 +0300, Gleb Natapov wrote:
  BTW, according to arch/x86/include/asm/kvm_para.h unsurprisingly KVM has
  a signature too 'KVMKVMKVM'.
  
 cpu-stepping = eax  0xf;
 cpu-model = (eax  4)  0xf;
 cpu-family = (eax  8)  0xf;
   @@ -29,6 +29,19 @@ void get_cpu_info_basics(struct cpudata *cpu)

 cpuid(cpu-number, 0xC000, maxei, NULL, NULL, NULL);
 cpu-maxei2 = maxei;
   + if (ecx  0x8000) {
   + cpuid(cpu-number, 0x4000, maxhv, NULL, NULL, NULL);
   + /*
   +  * KVM up to linux 3.4 reports 0 as the max hypervisor leaf,
   +  * where it really means 0x4001.
  
  This is something where I definitely think you want to check the
  signature first.
 In theory yes, but in practice what will this break?

I've got no idea -- but what's the harm in checking?

Ian.

-- 
Ian Campbell
Current Noise: Hypocrisy - Roswell 47

Angels we have heard on High
Tell us to go out and Buy.
-- Tom Lehrer

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCHv2] x86info: dump kvm cpuid's

2012-05-02 Thread Ian Campbell
On Wed, 2012-05-02 at 10:50 +0100, Michael S. Tsirkin wrote:
 On Wed, May 02, 2012 at 10:45:27AM +0100, Ian Campbell wrote:
  On Tue, 2012-05-01 at 16:04 +0300, Gleb Natapov wrote:
BTW, according to arch/x86/include/asm/kvm_para.h unsurprisingly KVM has
a signature too 'KVMKVMKVM'.

   cpu-stepping = eax  0xf;
   cpu-model = (eax  4)  0xf;
   cpu-family = (eax  8)  0xf;
 @@ -29,6 +29,19 @@ void get_cpu_info_basics(struct cpudata *cpu)
  
   cpuid(cpu-number, 0xC000, maxei, NULL, NULL, NULL);
   cpu-maxei2 = maxei;
 + if (ecx  0x8000) {
 + cpuid(cpu-number, 0x4000, maxhv, NULL, NULL, 
 NULL);
 + /*
 +  * KVM up to linux 3.4 reports 0 as the max hypervisor 
 leaf,
 +  * where it really means 0x4001.

This is something where I definitely think you want to check the
signature first.
   In theory yes, but in practice what will this break?
  
  I've got no idea -- but what's the harm in checking?
  
  Ian.
 
 Users can set kvm signature to anything, if they do
 debugging will be a bit harder for them.

Ah, right, someone already mentioned that and I forgot, sorry.

And, just to complete my train of thought, cpuid just returns reserved
values for requests for non-existent leaves (rather than #GP for
example) so it's safe enough even if you do end up trying to read an
eax=0x4001 when it doesn't exist.

Seems fine to me then.

Ian.

-- 
Ian Campbell
Current Noise: Hypocrisy - Buried

He's like a function -- he returns a value, in the form of his opinion.
It's up to you to cast it into a void or not.
-- Phil Lapsley

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] x86info: dump kvm cpuid's

2012-05-01 Thread Ian Campbell
On Mon, 2012-04-30 at 10:38 +0100, Michael S. Tsirkin wrote:
 On Mon, Apr 30, 2012 at 11:43:19AM +0300, Gleb Natapov wrote:
  On Sun, Apr 29, 2012 at 01:10:21PM +0300, Michael S. Tsirkin wrote:
   The following makes 'x86info -r' dump kvm cpu ids
   (signature+features) when running in a vm.
   
   On the guest we see the signature and the features:
   eax in: 0x4000, eax =  ebx = 4b4d564b ecx = 564b4d56 edx = 
   004d
   eax in: 0x4001, eax = 017b ebx =  ecx =  edx = 
   
   
   On the host it just adds a couple of zero lines:
   eax in: 0x4000, eax =  ebx =  ecx =  edx = 
   
   eax in: 0x4001, eax =  ebx =  ecx =  edx = 
   
   
  This is too KVM specific.
 
 That's what I have. I scratch my own itch.
 
  Other hypervisors may use more cpuid leafs.
 
 But not less so no harm's done.
 
  As far as I see Hyper-V uses 5 and use cpuid.0x4000.eax as max cpuid
  leaf available. Haven't checked Xen or VMWare.

Xen does the same, documentation in the Xen public interfaces header:
http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,cpuid.h.html.

If compat mode for another h/v is enabled then those leaves will appear
at 0x4000 and Xen's will be bumped up, so a fully Xen aware set of
drivers (or detection routine, etc) should check at 0x100 intervals
until 0x4001 for the appropriate signatures (I realise that the docs
are somewhat lacking in this regard, I should cook up a patch).

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] x86info: dump kvm cpuid's

2012-05-01 Thread Ian Campbell
On Tue, 2012-05-01 at 11:50 +0100, Michael S. Tsirkin wrote:
 On Tue, May 01, 2012 at 11:29:04AM +0100, Ian Campbell wrote:
  On Mon, 2012-04-30 at 10:38 +0100, Michael S. Tsirkin wrote:
   On Mon, Apr 30, 2012 at 11:43:19AM +0300, Gleb Natapov wrote:
On Sun, Apr 29, 2012 at 01:10:21PM +0300, Michael S. Tsirkin wrote:
 The following makes 'x86info -r' dump kvm cpu ids
 (signature+features) when running in a vm.
 
 On the guest we see the signature and the features:
 eax in: 0x4000, eax =  ebx = 4b4d564b ecx = 564b4d56 edx 
 = 004d
 eax in: 0x4001, eax = 017b ebx =  ecx =  edx 
 = 
 
 On the host it just adds a couple of zero lines:
 eax in: 0x4000, eax =  ebx =  ecx =  edx 
 = 
 eax in: 0x4001, eax =  ebx =  ecx =  edx 
 = 
 
This is too KVM specific.
   
   That's what I have. I scratch my own itch.
   
Other hypervisors may use more cpuid leafs.
   
   But not less so no harm's done.
   
As far as I see Hyper-V uses 5 and use cpuid.0x4000.eax as max cpuid
leaf available. Haven't checked Xen or VMWare.
  
  Xen does the same, documentation in the Xen public interfaces header:
  http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,cpuid.h.html.
 
 So ack to my patch?

I didn't see the patch, where should I be looking?

  If compat mode for another h/v is enabled then those leaves will appear
  at 0x4000 and Xen's will be bumped up, so a fully Xen aware set of
  drivers (or detection routine, etc) should check at 0x100 intervals
  until 0x4001 for the appropriate signatures (I realise that the docs
  are somewhat lacking in this regard, I should cook up a patch).
  
  Ian.
 
 How does guest know that the data at 0x4100 makes sense?

http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,cpuid.h.html
EBX, ECX and EDX contain a signature XenVMMXenVMM. I'm fairly certain
that hyperv has it's own magic number here.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCHv2] x86info: dump kvm cpuid's

2012-05-01 Thread Ian Campbell
On Mon, 2012-04-30 at 17:38 +0300, Michael S. Tsirkin wrote:
 The following makes 'x86info -r' dump hypervisor leaf cpu ids
 (for kvm this is signature+features) when running in a vm.
 
 On the guest we see the signature and the features:
 eax in: 0x4000, eax =  ebx = 4b4d564b ecx = 564b4d56 edx = 
 004d
 eax in: 0x4001, eax = 017b ebx =  ecx =  edx = 
 
 
 Hypervisor flag is checked to avoid output changes when not
 running on a VM.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Changes from v1:
   Make work on non KVM hypervisors (only KVM was tested).
   Avi Kivity said kvm will in the future report
   max HV leaf in eax. For now it reports eax = 0
 so add a work around for that.
 
 ---
 
 diff --git a/identify.c b/identify.c
 index 33f35de..a4a3763 100644
 --- a/identify.c
 +++ b/identify.c
 @@ -9,8 +9,8 @@
  
  void get_cpu_info_basics(struct cpudata *cpu)
  {
 - unsigned int maxi, maxei, vendor, address_bits;
 - unsigned int eax;
 + unsigned int maxi, maxei, maxhv, vendor, address_bits;
 + unsigned int eax, ebx, ecx;
  
   cpuid(cpu-number, 0, maxi, vendor, NULL, NULL);
   maxi = 0x; /* The high-order word is non-zero on some 
 Cyrix CPUs */
 @@ -19,7 +19,7 @@ void get_cpu_info_basics(struct cpudata *cpu)
   return;
  
   /* Everything that supports cpuid supports these. */
 - cpuid(cpu-number, 1, eax, NULL, NULL, NULL);
 + cpuid(cpu-number, 1, eax, ebx, ecx, NULL);

You probably want to check ebx, ecx, edx for the signatures of the
hypervisor's you are willing to support and which you know do something
sane with eax? Also it would be something worth reporting in its own
right?

BTW, according to arch/x86/include/asm/kvm_para.h unsurprisingly KVM has
a signature too 'KVMKVMKVM'.

   cpu-stepping = eax  0xf;
   cpu-model = (eax  4)  0xf;
   cpu-family = (eax  8)  0xf;
 @@ -29,6 +29,19 @@ void get_cpu_info_basics(struct cpudata *cpu)
  
   cpuid(cpu-number, 0xC000, maxei, NULL, NULL, NULL);
   cpu-maxei2 = maxei;
 + if (ecx  0x8000) {
 + cpuid(cpu-number, 0x4000, maxhv, NULL, NULL, NULL);
 + /*
 +  * KVM up to linux 3.4 reports 0 as the max hypervisor leaf,
 +  * where it really means 0x4001.

This is something where I definitely think you want to check the
signature first.

Ian.

 +  * Most (all?) hypervisors have at least one CPUID besides
 +  * the vendor ID so assume that.
 +  */
 + cpu-maxhv = maxhv ? maxhv : 0x4001;
 + } else {
 + /* Suppress hypervisor cpuid unless running on a hypervisor */
 + cpu-maxhv = 0;
 + }
  
   cpuid(cpu-number, 0x8008,address_bits, NULL, NULL, NULL);
   cpu-phyaddr_bits = address_bits  0xFF;
 diff --git a/x86info.c b/x86info.c
 index 22c4734..80cae36 100644
 --- a/x86info.c
 +++ b/x86info.c
 @@ -44,6 +44,10 @@ static void display_detailed_info(struct cpudata *cpu)
  
   if (cpu-maxei2 =0xC000)
   dump_raw_cpuid(cpu-number, 0xC000, cpu-maxei2);
 +
 + if (cpu-maxhv = 0x4000)
 + dump_raw_cpuid(cpu-number, 0x4000, cpu-maxhv);
 +
   }
  
   if (show_cacheinfo) {
 diff --git a/x86info.h b/x86info.h
 index 7d2a455..c4f5d81 100644
 --- a/x86info.h
 +++ b/x86info.h
 @@ -84,7 +84,7 @@ struct cpudata {
   unsigned int cachesize_trace;
   unsigned int phyaddr_bits;
   unsigned int viraddr_bits;
 - unsigned int cpuid_level, maxei, maxei2;
 + unsigned int cpuid_level, maxei, maxei2, maxhv;
   char name[CPU_NAME_LEN];
   enum connector connector;
   unsigned int flags_ecx;
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
 

-- 
Ian Campbell

Your own qualities will help prevent your advancement in the world.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC V7 0/12] Paravirtualized ticketlocks

2012-05-01 Thread Ian Campbell
On Thu, 2012-04-19 at 21:12 +0100, Raghavendra K T wrote:
 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 
 This series replaces the existing paravirtualized spinlock mechanism
 with a paravirtualized ticketlock mechanism. (targeted for 3.5 window)

Which tree is this series going through, tip.git I guess?

I don't see it there.

Ian.

 
 Changes in V7:
  - Reabsed patches to 3.4-rc3
  - Added jumplabel split patch (originally from Andrew Jones rebased to
 3.4-rc3
  - jumplabel changes from Ingo and Jason taken and now using static_key_*
 instead of static_branch.
  - using UNINLINE_SPIN_UNLOCK (which was splitted as per suggestion from 
 Linus)
  - This patch series is rebased on debugfs patch (that sould be already in
 Xen/linux-next https://lkml.org/lkml/2012/3/23/51)
 
 Ticket locks have an inherent problem in a virtualized case, because
 the vCPUs are scheduled rather than running concurrently (ignoring
 gang scheduled vCPUs).  This can result in catastrophic performance
 collapses when the vCPU scheduler doesn't schedule the correct next
 vCPU, and ends up scheduling a vCPU which burns its entire timeslice
 spinning.  (Note that this is not the same problem as lock-holder
 preemption, which this series also addresses; that's also a problem,
 but not catastrophic).
 
 (See Thomas Friebel's talk Prevent Guests from Spinning Around
 http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
 
 Currently we deal with this by having PV spinlocks, which adds a layer
 of indirection in front of all the spinlock functions, and defining a
 completely new implementation for Xen (and for other pvops users, but
 there are none at present).
 
 PV ticketlocks keeps the existing ticketlock implemenentation
 (fastpath) as-is, but adds a couple of pvops for the slow paths:
 
 - If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
   iterations, then call out to the __ticket_lock_spinning() pvop,
   which allows a backend to block the vCPU rather than spinning.  This
   pvop can set the lock into slowpath state.
 
 - When releasing a lock, if it is in slowpath state, the call
   __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
   lock is no longer in contention, it also clears the slowpath flag.
 
 The slowpath state is stored in the LSB of the within the lock tail
 ticket.  This has the effect of reducing the max number of CPUs by
 half (so, a small ticket can deal with 128 CPUs, and large ticket
 32768).
 
 This series provides a Xen implementation, KVM implementation will be
 posted in next 2-3 days.
 
 Overall, it results in a large reduction in code, it makes the native
 and virtualized cases closer, and it removes a layer of indirection
 around all the spinlock functions.
 
 The fast path (taking an uncontended lock which isn't in slowpath
 state) is optimal, identical to the non-paravirtualized case.
 
 The inner part of ticket lock code becomes:
 inc = xadd(lock-tickets, inc);
 inc.tail = ~TICKET_SLOWPATH_FLAG;
 
 if (likely(inc.head == inc.tail))
 goto out;
 for (;;) {
 unsigned count = SPIN_THRESHOLD;
 do {
 if (ACCESS_ONCE(lock-tickets.head) == inc.tail)
 goto out;
 cpu_relax();
 } while (--count);
 __ticket_lock_spinning(lock, inc.tail);
 }
 out:barrier();
 which results in:
 push   %rbp
 mov%rsp,%rbp
 
 mov$0x200,%eax
 lock xadd %ax,(%rdi)
 movzbl %ah,%edx
 cmp%al,%dl
 jne1f   # Slowpath if lock in contention
 
 pop%rbp
 retq
 
 ### SLOWPATH START
 1:  and$-2,%edx
 movzbl %dl,%esi
 
 2:  mov$0x800,%eax
 jmp4f
 
 3:  pause
 sub$0x1,%eax
 je 5f
 
 4:  movzbl (%rdi),%ecx
 cmp%cl,%dl
 jne3b
 
 pop%rbp
 retq
 
 5:  callq  *__ticket_lock_spinning
 jmp2b
 ### SLOWPATH END
 
 with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
 the fastpath case is straight through (taking the lock without
 contention), and the spin loop is out of line:
 
 push   %rbp
 mov%rsp,%rbp
 
 mov$0x100,%eax
 lock xadd %ax,(%rdi)
 movzbl %ah,%edx
 cmp%al,%dl
 jne1f
 
 pop%rbp
 retq
 
 ### SLOWPATH START
 1:  pause
 movzbl (%rdi),%eax
 cmp%dl,%al
 jne1b
 
 pop%rbp
 retq
 ### SLOWPATH END
 
 The unlock code is complicated by the need to both add to the lock's
 head and fetch the slowpath flag from tail.  This version of the
 patch uses a locked add to do this, followed by a test to see if the
 slowflag is set.  The lock prefix acts as a full memory barrier, so we
 

Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks

2012-04-16 Thread Ian Campbell
On Mon, 2012-04-16 at 16:44 +0100, Konrad Rzeszutek Wilk wrote:
 On Sat, Mar 31, 2012 at 09:37:45AM +0530, Srivatsa Vaddagiri wrote:
  * Thomas Gleixner t...@linutronix.de [2012-03-31 00:07:58]:
  
   I know that Peter is going to go berserk on me, but if we are running
   a paravirt guest then it's simple to provide a mechanism which allows
   the host (aka hypervisor) to check that in the guest just by looking
   at some global state.
   
   So if a guest exits due to an external event it's easy to inspect the
   state of that guest and avoid to schedule away when it was interrupted
   in a spinlock held section. That guest/host shared state needs to be
   modified to indicate the guest to invoke an exit when the last nested
   lock has been released.
  
  I had attempted something like that long back:
  
  http://lkml.org/lkml/2010/6/3/4
  
  The issue is with ticketlocks though. VCPUs could go into a spin w/o
  a lock being held by anybody. Say VCPUs 1-99 try to grab a lock in
  that order (on a host with one cpu). VCPU1 wins (after VCPU0 releases it)
  and releases the lock. VCPU1 is next eligible to take the lock. If 
  that is not scheduled early enough by host, then remaining vcpus would keep 
  spinning (even though lock is technically not held by anybody) w/o making 
  forward progress.
  
  In that situation, what we really need is for the guest to hint to host
  scheduler to schedule VCPU1 early (via yield_to or something similar). 
  
  The current pv-spinlock patches however does not track which vcpu is
  spinning at what head of the ticketlock. I suppose we can consider 
  that optimization in future and see how much benefit it provides (over
  plain yield/sleep the way its done now).
 
 Right. I think Jeremy played around with this some time?

5/11 xen/pvticketlock: Xen implementation for PV ticket locks tracks
which vcpus are waiting for a lock in cpumask_t waiting_cpus and
tracks which lock each is waiting for in per-cpu lock_waiting. This is
used in xen_unlock_kick to kick the right CPU. There's a loop over only
the waiting cpus to figure out who to kick.

  
  Do you see any issues if we take in what we have today and address the
  finer-grained optimization as next step?
 
 I think that is the proper course - these patches show
 that on baremetal we don't incur performance regressions and in
 virtualization case we benefit greatly. Since these are the basic
 building blocks of a kernel - taking it slow and just adding
 this set of patches for v3.5 is a good idea - and then building on top
 of that for further refinement.
 
  
  - vatsa 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks

2012-04-02 Thread Ian Campbell
On Fri, 2012-03-30 at 23:07 +0100, Thomas Gleixner wrote:
 So if we need to fiddle with the scheduler and frankly that's the only
 way to get a real gain (the numbers, which are achieved by this
 patches, are not that impressive) then the question arises whether we
 should turn the whole thing around.

It probably doesn't materially effect your core point (which seems valid
to me) but it's worth pointing out that the numbers presented in this
thread are AFAICT mostly focused on ensuring that that the impact of
this infrastructure is acceptable on native rather than showing the
benefits for virtualized workloads.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text

2012-01-06 Thread Ian Campbell
On Fri, 2012-01-06 at 09:26 +, Andrew Jones wrote:
 
 - Original Message -
  On Fri, 2012-01-06 at 08:57 +, Andrew Jones wrote:
   Describe dom0 support in the config menu and supply help text for
   it.
  
  This turns a non-user visible symbol into a user visible one.
  Previously
  if Xen was enabled and the other prerequisites were met you would get
  dom0 support automatically -- do we really want to change that?
  According to 6b0661a5e6fbf it was a deliberate decision to have it
  this
  way.
 
 I think it's a necessary evil in order to give users the ability to
 compile kernels without the support. I know it doesn't make much sense
 for most users, but...

Who actually wants to do this though and why? Do you have a bug report
requesting this change?

Almost all of the things which dom0 needs (e.g. PCI device management
etc) is also required by a domU with passthrough enabled so the savings
are really very slight.

We are talking less than 1k of code AFAICT, 319 bytes for
arch/x86/xen/vga.o and 573 for drivers/xen/xenfs/xenstored.o plus
whatever xen_register_gsi (a couple of dozen lines of code) adds to
arch/x86/pci/xen.o. grep doesn't show CONFIG_XEN_DOM0 being used
anywhere else. What savings do you see in practice from disabling just
this symbol?

We need to weigh up the size change against the complexity of asking the
user yet another question, I'm not convinced the question is worth it on
balance.

  
  BTW, you forgot a Signed-off-by and the appropriate CCs (please use
  MAINTAINERS or ./scripts/get-maintainer.pl).
  
 
 Sorry, I'll resend properly.

I've added those CC's to this reply too.

Ian.

 
 Drew
 
  Ian.
  
   ---
arch/x86/xen/Kconfig |7 ++-
1 files changed, 6 insertions(+), 1 deletions(-)
   
   diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
   index 26c731a..88862d5 100644
   --- a/arch/x86/xen/Kconfig
   +++ b/arch/x86/xen/Kconfig
   @@ -14,9 +14,14 @@ config XEN
   Xen hypervisor.

config XEN_DOM0
   - def_bool y
   + bool Xen Initial Domain (Dom0) support
   + default y
 depends on XEN  PCI_XEN  SWIOTLB_XEN
 depends on X86_LOCAL_APIC  X86_IO_APIC  ACPI  PCI
   + help
   +   This allows the kernel to be used for the initial Xen domain,
   +   Domain0. This is a privileged guest that supplies backends
   +   and is used to manage the other Xen domains.

# Dummy symbol since people have come to rely on the
PRIVILEGED_GUEST
# name in tools.
  
  
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option

2012-01-06 Thread Ian Campbell
On Fri, 2012-01-06 at 16:39 +, Andrew Jones wrote:
 remove XEN_PRIVILEGED_GUEST as it's just an alias for XEN_DOM0.

Hmm, this one is used by tools like update-grub to know when it is ok to
create xen+kernel entries so I think it needs to stay, or we at least
need to give lengthly warning to distros etc that it is going away.

Perhaps you can just patch it out locally?

Ian.

 
 I compile tested this on a latest pull using an F16 config. The compile
 succeeded and 'make oldconfig' only removed these two options as
 expected.
 
 CONFIG_XEN_DOM0=y
 CONFIG_XEN_PRIVILEGED_GUEST=y
 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  arch/x86/include/asm/xen/pci.h |   21 +
  arch/x86/pci/xen.c |6 --
  arch/x86/xen/Kconfig   |   10 --
  arch/x86/xen/Makefile  |3 +--
  arch/x86/xen/xen-ops.h |7 ---
  drivers/xen/Kconfig|3 ++-
  drivers/xen/Makefile   |3 +--
  drivers/xen/xenfs/Makefile |3 +--
  include/xen/xen.h  |   11 +++
  9 files changed, 9 insertions(+), 58 deletions(-)
 
 diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
 index 968d57d..b423889 100644
 --- a/arch/x86/include/asm/xen/pci.h
 +++ b/arch/x86/include/asm/xen/pci.h
 @@ -13,30 +13,11 @@ static inline int pci_xen_hvm_init(void)
   return -1;
  }
  #endif
 -#if defined(CONFIG_XEN_DOM0)
 +
  int __init pci_xen_initial_domain(void);
  int xen_find_device_domain_owner(struct pci_dev *dev);
  int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
  int xen_unregister_device_domain_owner(struct pci_dev *dev);
 -#else
 -static inline int __init pci_xen_initial_domain(void)
 -{
 - return -1;
 -}
 -static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 -{
 - return -1;
 -}
 -static inline int xen_register_device_domain_owner(struct pci_dev *dev,
 -uint16_t domain)
 -{
 - return -1;
 -}
 -static inline int xen_unregister_device_domain_owner(struct pci_dev *dev)
 -{
 - return -1;
 -}
 -#endif
  
  #if defined(CONFIG_PCI_MSI)
  #if defined(CONFIG_PCI_XEN)
 diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
 index 492ade8..e298726 100644
 --- a/arch/x86/pci/xen.c
 +++ b/arch/x86/pci/xen.c
 @@ -108,7 +108,6 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, 
 u32 gsi,
false /* no mapping of GSI to PIRQ */);
  }
  
 -#ifdef CONFIG_XEN_DOM0
  static int xen_register_gsi(u32 gsi, int gsi_override, int triggering, int 
 polarity)
  {
   int rc, irq;
 @@ -143,7 +142,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 
 gsi,
   return xen_register_gsi(gsi, -1 /* no GSI override */, trigger, 
 polarity);
  }
  #endif
 -#endif
  
  #if defined(CONFIG_PCI_MSI)
  #include linux/msi.h
 @@ -251,7 +249,6 @@ error:
   return irq;
  }
  
 -#ifdef CONFIG_XEN_DOM0
  static bool __read_mostly pci_seg_supported = true;
  
  static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
 type)
 @@ -324,7 +321,6 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev 
 *dev, int nvec, int type)
  out:
   return ret;
  }
 -#endif
  
  static void xen_teardown_msi_irqs(struct pci_dev *dev)
  {
 @@ -392,7 +388,6 @@ int __init pci_xen_hvm_init(void)
   return 0;
  }
  
 -#ifdef CONFIG_XEN_DOM0
  static __init void xen_setup_acpi_sci(void)
  {
   int rc;
 @@ -539,4 +534,3 @@ int xen_unregister_device_domain_owner(struct pci_dev 
 *dev)
   return 0;
  }
  EXPORT_SYMBOL_GPL(xen_unregister_device_domain_owner);
 -#endif
 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
 index 26c731a..3c7e89a 100644
 --- a/arch/x86/xen/Kconfig
 +++ b/arch/x86/xen/Kconfig
 @@ -13,16 +13,6 @@ config XEN
 kernel to boot in a paravirtualized environment under the
 Xen hypervisor.
  
 -config XEN_DOM0
 - def_bool y
 - depends on XEN  PCI_XEN  SWIOTLB_XEN
 - depends on X86_LOCAL_APIC  X86_IO_APIC  ACPI  PCI
 -
 -# Dummy symbol since people have come to rely on the PRIVILEGED_GUEST
 -# name in tools.
 -config XEN_PRIVILEGED_GUEST
 - def_bool XEN_DOM0
 -
  config XEN_PVHVM
   def_bool y
   depends on XEN  PCI  X86_LOCAL_APIC
 diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
 index add2c2d..b2d4c4b 100644
 --- a/arch/x86/xen/Makefile
 +++ b/arch/x86/xen/Makefile
 @@ -13,12 +13,11 @@ CFLAGS_mmu.o  := $(nostackp)
  obj-y:= enlighten.o setup.o multicalls.o mmu.o irq.o \
   time.o xen-asm.o xen-asm_$(BITS).o \
   grant-table.o suspend.o platform-pci-unplug.o \
 - p2m.o
 + p2m.o vga.o
  
  obj-$(CONFIG_EVENT_TRACING) += trace.o
  
  obj-$(CONFIG_SMP)+= smp.o
  obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
  obj-$(CONFIG_XEN_DEBUG_FS)   += debugfs.o
 -obj-$(CONFIG_XEN_DOM0)

Re: [PATCH] XEN: xenbus: integer overflow in process_msg()

2012-01-04 Thread Ian Campbell
On Tue, 2012-01-03 at 19:42 +, Haogang Chen wrote:
 There is a potential integer overflow in process_msg() that could result
 in cross-domain attack.
 
   body = kmalloc(msg-hdr.len + 1, GFP_NOIO | __GFP_HIGH);
 
 When a malicious guest passes 0x in msg-hdr.len, the subsequent
 call to xb_read() would write to a zero-length buffer.

The other end of this connection is always the xenstore backend daemon
so there is no guest (malicious or otherwise) which can do this. The
xenstore daemon is a trusted component in the system.

However this seem like a reasonable robustness improvement so we should
have it.

 This causes
 kernel oops in the receiving guest and hangs its xenbus kernel thread.
 The patch returns -EINVAL in that case.
 
 Signed-off-by: Haogang Chen haogangc...@gmail.com

Acked-by: Ian Campbell ian.campb...@citrix.com

 ---
  drivers/xen/xenbus/xenbus_xs.c |6 ++
  1 files changed, 6 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
 index ede860f..e32aefb 100644
 --- a/drivers/xen/xenbus/xenbus_xs.c
 +++ b/drivers/xen/xenbus/xenbus_xs.c
 @@ -801,6 +801,12 @@ static int process_msg(void)
   goto out;
   }
  
 + if (msg-hdr.len == UINT_MAX) {
 + kfree(msg);
 + err = -EINVAL;
 + goto out;
 + }
 +
   body = kmalloc(msg-hdr.len + 1, GFP_NOIO | __GFP_HIGH);
   if (body == NULL) {
   kfree(msg);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] xen: Miscelaneous xenbus cleanups

2012-01-04 Thread Ian Campbell
Just a couple of things I noticed while reviewing Haogang's patch.
Applies on top of my suggested replacement for that patch (in
1325669689.25206.181.ca...@zakaz.uk.xensource.com).

Not extensively tested but I did run it in dom0 and start both and HVM
and PV guest.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] xenbus: maximum buffer size is XENSTORE_PAYLOAD_MAX

2012-01-04 Thread Ian Campbell
Use this now that it is defined even though it happens to be == PAGE_SIZE.

The code which takes requests from userspace already validates against the size
of this buffer so no further checks are required to ensure that userspace
requests comply with the protocol in this respect.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: Haogang Chen haogangc...@gmail.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Jeremy Fitzhardinge jer...@goop.org
Cc: xen-de...@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/xen/xenbus/xenbus_dev_frontend.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c 
b/drivers/xen/xenbus/xenbus_dev_frontend.c
index fb30cff..1fe4324 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -104,7 +104,7 @@ struct xenbus_file_priv {
unsigned int len;
union {
struct xsd_sockmsg msg;
-   char buffer[PAGE_SIZE];
+   char buffer[XENSTORE_PAYLOAD_MAX];
} u;
 
/* Response queue. */
-- 
1.7.2.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] xen/xenbus: don't reimplement kvasprintf via a fixed size buffer

2012-01-04 Thread Ian Campbell
Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: Haogang Chen haogangc...@gmail.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Jeremy Fitzhardinge jer...@goop.org
Cc: xen-de...@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/xen/xenbus/xenbus_xs.c |   17 +++--
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 6f0121e..226d1ac 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -532,21 +532,18 @@ int xenbus_printf(struct xenbus_transaction t,
 {
va_list ap;
int ret;
-#define PRINTF_BUFFER_SIZE 4096
-   char *printf_buffer;
-
-   printf_buffer = kmalloc(PRINTF_BUFFER_SIZE, GFP_NOIO | __GFP_HIGH);
-   if (printf_buffer == NULL)
-   return -ENOMEM;
+   char *buf;
 
va_start(ap, fmt);
-   ret = vsnprintf(printf_buffer, PRINTF_BUFFER_SIZE, fmt, ap);
+   buf = kvasprintf(GFP_NOIO | __GFP_HIGH, fmt, ap);
va_end(ap);
 
-   BUG_ON(ret  PRINTF_BUFFER_SIZE-1);
-   ret = xenbus_write(t, dir, node, printf_buffer);
+   if (!buf)
+   return -ENOMEM;
+
+   ret = xenbus_write(t, dir, node, buf);
 
-   kfree(printf_buffer);
+   kfree(buf);
 
return ret;
 }
-- 
1.7.2.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 REPOST] xen-netfront: delay gARP until backend switches to Connected

2011-12-09 Thread Ian Campbell
On Fri, 2011-12-09 at 18:45 +, David Miller wrote:
 From: Laszlo Ersek ler...@redhat.com
 Date: Fri,  9 Dec 2011 12:38:58 +0100
 
  These two together provide complete ordering. Sub-condition (1) is
  satisfied by pvops commit 43223efd9bfd.
 
 I don't see this commit in Linus's tree,

The referenced commit is in
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#xen/next-2.6.32  
some people call the pvops tree but there's no reason to expect someone 
outside the Xen world to know that...

A better reference would have been 6b0b80ca7165 in
git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history
 which is the precise branch that was flattened to make f942dc2552b8, which is 
the upstream commit that added netback, so this change is already in upstream.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Ian Campbell
On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote:
 On Wed, 30 Nov 2011, Arnd Bergmann wrote:

 
  KVM and Xen at least both fall into the single-return-value category,
  so we should be able to agree on a calling conventions. KVM does not
  have an hcall API on ARM yet, and I see no reason not to use the
  same implementation that you have in the Xen guest.
  
  Stefano, can you split out the generic parts of your asm/xen/hypercall.h
  file into a common asm/hypercall.h and submit it for review to the
  arm kernel list?
 
 Sure, I can do that.
 Usually the hypercall calling convention is very hypervisor specific,
 but if it turns out that we have the same requirements I happy to design
 a common interface.

I expect the only real decision to be made is hypercall page vs. raw hvc
instruction.

The page was useful on x86 where there is a variety of instructions
which could be used (at least for PV there was systenter/syscall/int, I
think vmcall instruction differs between AMD and Intel also) and gives
some additional flexibility. It's hard to predict but I don't think I'd
expect that to be necessary on ARM.

Another reason for having a hypercall page instead of a raw instruction
might be wanting to support 32 bit guests (from ~today) on a 64 bit
hypervisor in the future and perhaps needing to do some shimming/arg
translation. It would be better to aim for having the interface just be
32/64 agnostic but mistakes do happen.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Ian Campbell
On Wed, 2011-11-30 at 18:15 +, Arnd Bergmann wrote:
 On Wednesday 30 November 2011, Ian Campbell wrote:
  On Wed, 2011-11-30 at 14:32 +, Arnd Bergmann wrote:
   On Wednesday 30 November 2011, Ian Campbell wrote:
   What I suggested to the KVM developers is to start out with the
   vexpress platform, but then generalize it to the point where it fits
   your needs. All hardware that one expects a guest to have (GIC, timer,
   ...) will still show up in the same location as on a real vexpress,
   while anything that makes no sense or is better paravirtualized (LCD,
   storage, ...) just becomes optional and has to be described in the
   device tree if it's actually there.
  
  That's along the lines of what I was thinking as well.
  
  The DT contains the address of GIC, timer etc as well right? So at least
  in principal we needn't provide e.g. the GIC at the same address as any
  real platform but in practice I expect we will.
 
 Yes.
 
  In principal we could also offer the user options as to which particular
  platform a guest looks like.
 
 At least when using a qemu based simulation. Most platforms have some
 characteristics that are not meaningful in a classic virtualization
 scenario, but it would certainly be helpful to use the virtualization
 extensions to run a kernel that was built for a particular platform
 faster than with pure qemu, when you want to test that kernel image.
 
 It has been suggested in the past that it would be nice to run the
 guest kernel built for the same platform as the host kernel by
 default, but I think it would be much better to have just one
 platform that we end up using for guests on any host platform,
 unless there is a strong reason to do otherwise.

Yes, I agree, certainly that is what we were planning to target in the
first instance. Doing this means that we can get away with minimal
emulation of actual hardware, relying instead on PV drivers or hardware
virtualisation features.

Supporting specific board platforms as guests would be nice to have
eventually. We would need to do more emulation (e.g. running qemu as a
device model) for that case.

 There is also ongoing restructuring in the ARM Linux kernel to
 allow running the same kernel binary on multiple platforms. While
 there is still a lot of work to be done, you should assume that
 we will finish it before you see lots of users in production, there
 is no need to plan for the current one-kernel-per-board case.

We were absolutely banking on targeting the results of this work, so
that's good ;-)

Ian.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Ian Campbell
On Thu, 2011-12-01 at 15:10 +, Catalin Marinas wrote:
 On Thu, Dec 01, 2011 at 10:26:37AM +, Ian Campbell wrote:
  On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote:
   On Wed, 30 Nov 2011, Arnd Bergmann wrote:
KVM and Xen at least both fall into the single-return-value category,
so we should be able to agree on a calling conventions. KVM does not
have an hcall API on ARM yet, and I see no reason not to use the
same implementation that you have in the Xen guest.

Stefano, can you split out the generic parts of your asm/xen/hypercall.h
file into a common asm/hypercall.h and submit it for review to the
arm kernel list?
   
   Sure, I can do that.
   Usually the hypercall calling convention is very hypervisor specific,
   but if it turns out that we have the same requirements I happy to design
   a common interface.
  
  I expect the only real decision to be made is hypercall page vs. raw hvc
  instruction.
  
  The page was useful on x86 where there is a variety of instructions
  which could be used (at least for PV there was systenter/syscall/int, I
  think vmcall instruction differs between AMD and Intel also) and gives
  some additional flexibility. It's hard to predict but I don't think I'd
  expect that to be necessary on ARM.
  
  Another reason for having a hypercall page instead of a raw instruction
  might be wanting to support 32 bit guests (from ~today) on a 64 bit
  hypervisor in the future and perhaps needing to do some shimming/arg
  translation. It would be better to aim for having the interface just be
  32/64 agnostic but mistakes do happen.
 
 Given the way register banking is done on AArch64, issuing an HVC on a
 32-bit guest OS doesn't require translation on a 64-bit hypervisor.

The issue I was thinking about was struct packing for arguments passed
as pointers etc rather than the argument registers themselves. Since the
preference appears to be for raw hvc we should just be careful that they
are agnostic in these.

Ian.

  We
 have a similar implementation at the SVC level (for 32-bit user apps on
 a 64-bit kernel), the only modification was where a 32-bit SVC takes a
 64-bit parameter in two separate 32-bit registers, so packing needs to
 be done in a syscall wrapper.
 
 I'm not closely involved with any of the Xen or KVM work but I would
 vote for using HVC than a hypercall page.
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-11-30 Thread Ian Campbell
On Wed, 2011-11-30 at 13:03 +, Arnd Bergmann wrote:
 On Wednesday 30 November 2011, Stefano Stabellini wrote:
  On Tue, 29 Nov 2011, Arnd Bergmann wrote:
   On Tuesday 29 November 2011, Stefano Stabellini wrote:
   
   Do you have a pointer to the kernel sources for the Linux guest?
  
  We have very few changes to the Linux kernel at the moment (only 3
  commits!), just enough to be able to issue hypercalls and start a PV
  console.
  
  A git branch is available here (not ready for submission):
  
  git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm
 
 Ok, interesting. There really isn't much of the platform support
 that I was expecting there. I finally found the information
 I was looking for in the xen construct_dom0() function:
 
  167 regs-r0 = 0; /* SBZ */
  168 regs-r1 = 2272; /* Machine NR: Versatile Express */
  169 regs-r2 = 0xc100; /* ATAGS */
 
 What this means is that you are emulating the current ARM/Keil reference
 board, at least to the degree that is necessary to get the guest started.
 
 This is the same choice people have made for KVM, but it's not
 necessarily the best option in the long run. In particular, this
 board has a lot of hardware that you claim to have by putting the
 machine number there, when you don't really want to emulate it.

This code is actually setting up dom0 which (for the most part) sees the
real hardware.

The hardcoding of the platform is just a short term hack.

 Pawell Moll is working on a variant of the vexpress code that uses
 the flattened device tree to describe the present hardware [1], and
 I think that would be a much better target for an official release.
 Ideally, the hypervisor should provide the device tree binary (dtb)
 to the guest OS describing the hardware that is actually there.

Agreed. Our intention was to use DT so this fits perfectly with our
plans.

For dom0 we would expose a (possibly filtered) version of the DT given
to us by the firmware (e.g. we might hide a serial port to reserve it
for Xen's use, we'd likely fiddle with the memory map etc).

For domU the DT would presumably be constructed by the toolstack (in
dom0 userspace) as appropriate for the guest configuration. I guess this
needn't correspond to any particular real hardware platform.

 This would also be the place where you tell the guest that it should
 look for PV devices. I'm not familiar with how Xen announces PV
 devices to the guest on other architectures, but you have the
 choice between providing a full binding, i.e. a formal specification
 in device tree format for the guest to detect PV devices in the
 same way as physical or emulated devices, or just providing a single
 place in the device tree in which the guest detects the presence
 of a xen device bus and then uses hcalls to find the devices on that
 bus.

On x86 there is an emulated PCI device which serves as the hooking point
for the PV drivers. For ARM I don't think it would be unreasonable to
have a DT entry instead. I think it would be fine just represent the
root of the xenbus and further discovery would occur using the normal
xenbus mechanisms (so not a full binding). AIUI for buses which are
enumerable this is the preferred DT scheme to use.

 Another topic is the question whether there are any hcalls that
 we should try to standardize before we get another architecture
 with multiple conflicting hcall APIs as we have on x86 and powerpc.

The hcall API we are currently targeting is the existing Xen API (at
least the generic parts of it). These generally deal with fairly Xen
specific concepts like grant tables etc.

Ian.

 
   Arnd
 
 [1] http://www.spinics.net/lists/arm-kernel/msg149604.html
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xensource.com
 http://lists.xensource.com/xen-devel


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-11-30 Thread Ian Campbell
On Wed, 2011-11-30 at 14:32 +, Arnd Bergmann wrote:
 On Wednesday 30 November 2011, Ian Campbell wrote:
  On Wed, 2011-11-30 at 13:03 +, Arnd Bergmann wrote:
  For domU the DT would presumably be constructed by the toolstack (in
  dom0 userspace) as appropriate for the guest configuration. I guess this
  needn't correspond to any particular real hardware platform.
 
 Correct, but it needs to correspond to some platform that is supported
 by the guest OS, which leaves the choice between emulating a real
 hardware platform, adding a completely new platform specifically for
 virtual machines, or something in between the two.
 
 What I suggested to the KVM developers is to start out with the
 vexpress platform, but then generalize it to the point where it fits
 your needs. All hardware that one expects a guest to have (GIC, timer,
 ...) will still show up in the same location as on a real vexpress,
 while anything that makes no sense or is better paravirtualized (LCD,
 storage, ...) just becomes optional and has to be described in the
 device tree if it's actually there.

That's along the lines of what I was thinking as well.

The DT contains the address of GIC, timer etc as well right? So at least
in principal we needn't provide e.g. the GIC at the same address as any
real platform but in practice I expect we will.

In principal we could also offer the user options as to which particular
platform a guest looks like.

   This would also be the place where you tell the guest that it should
   look for PV devices. I'm not familiar with how Xen announces PV
   devices to the guest on other architectures, but you have the
   choice between providing a full binding, i.e. a formal specification
   in device tree format for the guest to detect PV devices in the
   same way as physical or emulated devices, or just providing a single
   place in the device tree in which the guest detects the presence
   of a xen device bus and then uses hcalls to find the devices on that
   bus.
  
  On x86 there is an emulated PCI device which serves as the hooking point
  for the PV drivers. For ARM I don't think it would be unreasonable to
  have a DT entry instead. I think it would be fine just represent the
  root of the xenbus and further discovery would occur using the normal
  xenbus mechanisms (so not a full binding). AIUI for buses which are
  enumerable this is the preferred DT scheme to use.
 
 In general that is the case, yes. One could argue that any software
 protocol between Xen and the guest is as good as any other, so it
 makes sense to use the device tree to describe all devices here.
 The counterargument to that is that Linux and other OSs already
 support Xenbus, so there is no need to come up with a new binding.

Right.

 I don't care much either way, but I think it would be good to
 use similar solutions across all hypervisors. The two options
 that I've seen discussed for KVM were to use either a virtual PCI
 bus with individual virtio-pci devices as on the PC, or to
 use the new virtio-mmio driver and individually put virtio devices
 into the device tree.
 
   Another topic is the question whether there are any hcalls that
   we should try to standardize before we get another architecture
   with multiple conflicting hcall APIs as we have on x86 and powerpc.
  
  The hcall API we are currently targeting is the existing Xen API (at
  least the generic parts of it). These generally deal with fairly Xen
  specific concepts like grant tables etc.
 
 Ok. It would of course still be possible to agree on an argument passing
 convention so that we can share the macros used to issue the hcalls,
 even if the individual commands are all different.

I think it likely that we can all agree on a common calling convention
for N-argument hypercalls. It doubt there are that many useful choices
with conflicting requirements yet strongly compelling advantages.

  I think I also
 remember talk about the need for a set of hypervisor independent calls
 that everyone should implement, but I can't remember what those were.

I'd not heard of this, maybe I just wasn't looking the right way though.

 Maybe we can split the number space into a range of some generic and
 some vendor specific hcalls?

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 63/75] xen: netfront: convert to SKB paged frag API.

2011-08-19 Thread Ian Campbell
Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: xen-de...@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/net/xen-netfront.c |   28 +---
 1 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d7c8a98..882a957 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -275,7 +275,7 @@ no_skb:
break;
}
 
-   skb_shinfo(skb)-frags[0].page = page;
+   skb_frag_set_page(skb, 0, page);
skb_shinfo(skb)-nr_frags = 1;
__skb_queue_tail(np-rx_batch, skb);
}
@@ -309,8 +309,8 @@ no_skb:
BUG_ON((signed short)ref  0);
np-grant_rx_ref[id] = ref;
 
-   pfn = page_to_pfn(skb_shinfo(skb)-frags[0].page);
-   vaddr = page_address(skb_shinfo(skb)-frags[0].page);
+   pfn = page_to_pfn(skb_frag_page(skb_shinfo(skb)-frags[0]));
+   vaddr = page_address(skb_frag_page(skb_shinfo(skb)-frags[0]));
 
req = RING_GET_REQUEST(np-rx, req_prod + i);
gnttab_grant_foreign_access_ref(ref,
@@ -461,7 +461,7 @@ static void xennet_make_frags(struct sk_buff *skb, struct 
net_device *dev,
ref = gnttab_claim_grant_reference(np-gref_tx_head);
BUG_ON((signed short)ref  0);
 
-   mfn = pfn_to_mfn(page_to_pfn(frag-page));
+   mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
gnttab_grant_foreign_access_ref(ref, np-xbdev-otherend_id,
mfn, GNTMAP_readonly);
 
@@ -768,8 +768,9 @@ static RING_IDX xennet_fill_frags(struct netfront_info *np,
while ((nskb = __skb_dequeue(list))) {
struct xen_netif_rx_response *rx =
RING_GET_RESPONSE(np-rx, ++cons);
+   skb_frag_t *nfrag = skb_shinfo(nskb)-frags[0];
 
-   frag-page = skb_shinfo(nskb)-frags[0].page;
+   __skb_frag_set_page(frag, skb_frag_page(nfrag));
frag-page_offset = rx-offset;
frag-size = rx-status;
 
@@ -873,7 +874,7 @@ static int handle_incoming_queue(struct net_device *dev,
memcpy(skb-data, vaddr + offset,
   skb_headlen(skb));
 
-   if (page != skb_shinfo(skb)-frags[0].page)
+   if (page != skb_frag_page(skb_shinfo(skb)-frags[0]))
__free_page(page);
 
/* Ethernet work: Delayed to here as it peeks the header. */
@@ -954,7 +955,8 @@ err:
}
}
 
-   NETFRONT_SKB_CB(skb)-page = skb_shinfo(skb)-frags[0].page;
+   NETFRONT_SKB_CB(skb)-page =
+   skb_frag_page(skb_shinfo(skb)-frags[0]);
NETFRONT_SKB_CB(skb)-offset = rx-offset;
 
len = rx-status;
@@ -968,7 +970,7 @@ err:
skb_shinfo(skb)-frags[0].size = rx-status - len;
skb-data_len = rx-status - len;
} else {
-   skb_shinfo(skb)-frags[0].page = NULL;
+   skb_frag_set_page(skb, 0, NULL);
skb_shinfo(skb)-nr_frags = 0;
}
 
@@ -1143,7 +1145,8 @@ static void xennet_release_rx_bufs(struct netfront_info 
*np)
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
/* Remap the page. */
-   struct page *page = skb_shinfo(skb)-frags[0].page;
+   const struct page *page =
+   skb_frag_page(skb_shinfo(skb)-frags[0]);
unsigned long pfn = page_to_pfn(page);
void *vaddr = page_address(page);
 
@@ -1650,6 +1653,8 @@ static int xennet_connect(struct net_device *dev)
 
/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
for (requeue_idx = 0, i = 0; i  NET_RX_RING_SIZE; i++) {
+   skb_frag_t *frag;
+   const struct page *page;
if (!np-rx_skbs[i])
continue;
 
@@ -1657,10 +1662,11 @@ static int xennet_connect(struct net_device *dev)
ref = np-grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
req = RING_GET_REQUEST(np-rx, requeue_idx);
 
+   frag = skb_shinfo(skb)-frags[0];
+   page = skb_frag_page(frag);
gnttab_grant_foreign_access_ref(
ref, np-xbdev-otherend_id,
-   pfn_to_mfn(page_to_pfn(skb_shinfo(skb)-
-  frags-page)),
+   pfn_to_mfn

[PATCH 59/75] virtionet: convert to SKB paged frag API.

2011-08-19 Thread Ian Campbell
Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 drivers/net/virtio_net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0c7321c..52667a8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -149,7 +149,7 @@ static void set_skb_frag(struct sk_buff *skb, struct page 
*page,
f = skb_shinfo(skb)-frags[i];
f-size = min((unsigned)PAGE_SIZE - offset, *len);
f-page_offset = offset;
-   f-page = page;
+   __skb_frag_set_page(f, page);
 
skb-data_len += f-size;
skb-len += f-size;
-- 
1.7.2.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH 5/7] Xen: fix whitespaces, tabs coding style issue in drivers/xen/xenbus/xenbus_client.c

2011-07-26 Thread Ian Campbell
On Tue, 2011-07-26 at 12:17 -0400, Jeremy Fitzhardinge wrote:
 On 07/26/2011 04:16 AM, ruslanpisa...@gmail.com wrote:
  @@ -43,15 +43,15 @@
   const char *xenbus_strstate(enum xenbus_state state)
   {
  static const char *const name[] = {
  -   [ XenbusStateUnknown  ] = Unknown,
  -   [ XenbusStateInitialising ] = Initialising,
  -   [ XenbusStateInitWait ] = InitWait,
  -   [ XenbusStateInitialised  ] = Initialised,
  -   [ XenbusStateConnected] = Connected,
  -   [ XenbusStateClosing  ] = Closing,
  -   [ XenbusStateClosed   ] = Closed,
  -   [XenbusStateReconfiguring] = Reconfiguring,
  -   [XenbusStateReconfigured] = Reconfigured,
  +   [XenbusStateUnknown] =  Unknown,
  +   [XenbusStateInitialising] = Initialising,
  +   [XenbusStateInitWait] = InitWait,
  +   [XenbusStateInitialised] =  Initialised,
  +   [XenbusStateConnected] =Connected,
  +   [XenbusStateClosing] =  Closing,
  +   [XenbusStateClosed] =   Closed,
  +   [XenbusStateReconfiguring] =Reconfiguring,
  +   [XenbusStateReconfigured] = Reconfigured,
  };
 
 Eh, I think this looks worse now.

Me too.

If we're going to change this to anything I'd suggest
#define N(x) [XenbusState#x] = ##x
...
 N(Connected),
 N(Closing),
...
#undef N

(modulo my never quite remembering the cpp stringification rules first
time)

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [TOME] Re: [PATCH] Modpost section mismatch fix

2011-07-06 Thread Ian Campbell
 is what acpi_gsi_to_irq ends up calling when starting the
 +  * the ACPI interpreter and keels over since IRQ 9 has not been
 +  * setup as we had setup IRQ 20 for it).
 +  */
 + /* Check whether the GSI != IRQ */
 + if (acpi_gsi_to_irq(gsi, irq) == 0) {
 + if (irq = 0  irq != gsi)
 + /* Bugger, we MUST have that IRQ. */
 + gsi_override = irq;
 + }
 +
 + gsi = xen_register_gsi(gsi, gsi_override, trigger, polarity);
   printk(KERN_INFO xen: acpi sci %d\n, gsi);
  
   return;
 @@ -450,7 +450,7 @@ static __init void xen_setup_acpi_sci(void)
  static int acpi_register_gsi_xen(struct device *dev, u32 gsi,
int trigger, int polarity)
  {
 - return xen_register_gsi(gsi, trigger, polarity);
 + return xen_register_gsi(gsi, -1 /* no GSI override */, trigger, 
 polarity);
  }
  
  static int __init pci_xen_initial_domain(void)
 @@ -489,7 +489,7 @@ void __init xen_setup_pirqs(void)
   if (acpi_get_override_irq(irq, trigger, polarity) == -1)
   continue;
  
 - xen_register_pirq(irq,
 + xen_register_pirq(irq, -1 /* no GSI override */,
   trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE);
   }
  }
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xensource.com
 http://lists.xensource.com/xen-devel

-- 
Ian Campbell

While having never invented a sin, I'm trying to perfect several.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] Modpost section mismatch fix

2011-07-04 Thread Ian Campbell
On Mon, 2011-07-04 at 04:55 +0530, Raghavendra D Prabhu wrote:
 [Sorry if duplicate, one earlier was corrupt]
 
 Hi,
  I got section mismatches reported by modpost in latest build. It got
  reported for xen_register_pirq and xen_unplug_emulated_devices
  functions.


  xen_register_pirq makes reference to
  acpi_sci_override_gsi in init.data section; marking
  xen_register_pirq with __init is not feasible since calls are made
  to it from acpi_register_gsi in non-init contexts. So marking it
  __refdata based on assumption that when acpi_sci_override_gsi is
  referenced, it is in  early stages where it is alive.

I don't think this assumption holds, since xen_register_pirq can be
called at any time and basically unconditionally references
acpi_sci_override_gsi.

If we don't want to remove the __init from acpi_sci_override_gsi then
perhaps xen_setup_acpi_sci needs to stash it somewhere?

Or maybe xen_register_pirq could take an int force_irq which, if not
-1, would force a particular IRQ. The callsite in xen_setup_acpi_sci
(actually via xen_register_gsi so the param would need to be propagated
there) would be the only actual user?

The xen_unplug_emulated_devices change looks correct to me since
xen_unplug_emulated_devices is called from xen_arch_hvm_post_suspend.

Ian.

 
 
 --
 Raghavendra Prabhu
 GPG Id : 0xD72BE977
 Fingerprint: B93F EBCB 8E05 7039 CD3C A4B8 A616 DCA1 D72B E977
 www: wnohang.net
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/virtualization

-- 
Ian Campbell
Current Noise: Crowbar - Remember Tomorrow (A Tribute To Iron Maiden)

SANTA CLAUS comes down a FIRE ESCAPE wearing bright blue LEG WARMERS
... He scrubs the POPE with a mild soap or detergent for 15 minutes,
starring JANE FONDA!!

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH 1/2] xen: Populate xenbus device attributes

2011-06-27 Thread Ian Campbell
On Fri, 2011-06-24 at 22:51 +0100, Bastian Blank wrote:
 The xenbus bus type uses device_create_file to assign all used device
 attributes. However it does not remove them when the device goes away.

Doesn't the cleanup happen automatically in the device core when the
device goes away? Either way this is a good cleanup in its own right.

 This patch uses the dev_attrs field of the bus type to specify default
 attributes for all devices.
 
 Signed-off-by: Bastian Blank wa...@debian.org

Acked-by: Ian Campbell ian.campb...@citrix.com

Thanks Bastian.

Ian.

 ---
  drivers/xen/xenbus/xenbus_probe.c  |   41 +--
  drivers/xen/xenbus/xenbus_probe.h  |2 +
  drivers/xen/xenbus/xenbus_probe_backend.c  |6 +---
  drivers/xen/xenbus/xenbus_probe_frontend.c |6 +---
  4 files changed, 18 insertions(+), 37 deletions(-)
 
 diff --git a/drivers/xen/xenbus/xenbus_probe.c 
 b/drivers/xen/xenbus/xenbus_probe.c
 index 7397695..2ed0b04 100644
 --- a/drivers/xen/xenbus/xenbus_probe.c
 +++ b/drivers/xen/xenbus/xenbus_probe.c
 @@ -378,26 +378,31 @@ static void xenbus_dev_release(struct device *dev)
   kfree(to_xenbus_device(dev));
  }
  
 -static ssize_t xendev_show_nodename(struct device *dev,
 - struct device_attribute *attr, char *buf)
 +static ssize_t nodename_show(struct device *dev,
 +  struct device_attribute *attr, char *buf)
  {
   return sprintf(buf, %s\n, to_xenbus_device(dev)-nodename);
  }
 -static DEVICE_ATTR(nodename, S_IRUSR | S_IRGRP | S_IROTH, 
 xendev_show_nodename, NULL);
  
 -static ssize_t xendev_show_devtype(struct device *dev,
 -struct device_attribute *attr, char *buf)
 +static ssize_t devtype_show(struct device *dev,
 + struct device_attribute *attr, char *buf)
  {
   return sprintf(buf, %s\n, to_xenbus_device(dev)-devicetype);
  }
 -static DEVICE_ATTR(devtype, S_IRUSR | S_IRGRP | S_IROTH, 
 xendev_show_devtype, NULL);
  
 -static ssize_t xendev_show_modalias(struct device *dev,
 - struct device_attribute *attr, char *buf)
 +static ssize_t modalias_show(struct device *dev,
 +  struct device_attribute *attr, char *buf)
  {
   return sprintf(buf, xen:%s\n, to_xenbus_device(dev)-devicetype);
  }
 -static DEVICE_ATTR(modalias, S_IRUSR | S_IRGRP | S_IROTH, 
 xendev_show_modalias, NULL);
 +
 +struct device_attribute xenbus_dev_attrs[] = {
 + __ATTR_RO(nodename),
 + __ATTR_RO(devtype),
 + __ATTR_RO(modalias),
 + __ATTR_NULL
 +};
 +EXPORT_SYMBOL_GPL(xenbus_dev_attrs);
  
  int xenbus_probe_node(struct xen_bus_type *bus,
 const char *type,
 @@ -449,25 +454,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
   if (err)
   goto fail;
  
 - err = device_create_file(xendev-dev, dev_attr_nodename);
 - if (err)
 - goto fail_unregister;
 -
 - err = device_create_file(xendev-dev, dev_attr_devtype);
 - if (err)
 - goto fail_remove_nodename;
 -
 - err = device_create_file(xendev-dev, dev_attr_modalias);
 - if (err)
 - goto fail_remove_devtype;
 -
   return 0;
 -fail_remove_devtype:
 - device_remove_file(xendev-dev, dev_attr_devtype);
 -fail_remove_nodename:
 - device_remove_file(xendev-dev, dev_attr_nodename);
 -fail_unregister:
 - device_unregister(xendev-dev);
  fail:
   kfree(xendev);
   return err;
 diff --git a/drivers/xen/xenbus/xenbus_probe.h 
 b/drivers/xen/xenbus/xenbus_probe.h
 index 888b990..b814935 100644
 --- a/drivers/xen/xenbus/xenbus_probe.h
 +++ b/drivers/xen/xenbus/xenbus_probe.h
 @@ -48,6 +48,8 @@ struct xen_bus_type
   struct bus_type bus;
  };
  
 +extern struct device_attribute xenbus_dev_attrs[];
 +
  extern int xenbus_match(struct device *_dev, struct device_driver *_drv);
  extern int xenbus_dev_probe(struct device *_dev);
  extern int xenbus_dev_remove(struct device *_dev);
 diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
 b/drivers/xen/xenbus/xenbus_probe_backend.c
 index 6cf467b..ec510e5 100644
 --- a/drivers/xen/xenbus/xenbus_probe_backend.c
 +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
 @@ -183,10 +183,6 @@ static void frontend_changed(struct xenbus_watch *watch,
   xenbus_otherend_changed(watch, vec, len, 0);
  }
  
 -static struct device_attribute xenbus_backend_dev_attrs[] = {
 - __ATTR_NULL
 -};
 -
  static struct xen_bus_type xenbus_backend = {
   .root = backend,
   .levels = 3,/* backend/type/frontend/id */
 @@ -200,7 +196,7 @@ static struct xen_bus_type xenbus_backend = {
   .probe  = xenbus_dev_probe,
   .remove = xenbus_dev_remove,
   .shutdown   = xenbus_dev_shutdown,
 - .dev_attrs  = xenbus_backend_dev_attrs,
 + .dev_attrs  = xenbus_dev_attrs,
   },
  };
  
 diff --git

Re: [Xen-devel] [PATCH 2/2] xen: Add alias to autoload backend drivers

2011-06-27 Thread Ian Campbell
On Fri, 2011-06-24 at 22:51 +0100, Bastian Blank wrote:
 All the Xen backend drivers are assigned to a special bus type
 xen-backend. This allows userspace to load the modules on request.
 
 This patch defines xen-backend:* aliases on the modules and exports this
 names through modalias and uevent.

Excellent, this was a big missing piece of functionality for distros.
Thanks!

 Signed-off-by: Bastian Blank wa...@debian.org

Acked-by: Ian Campbell ian.campb...@citrix.com

 ---
  drivers/block/xen-blkback/blkback.c   |1 +
  drivers/net/xen-netback/netback.c |1 +
  drivers/xen/xenbus/xenbus_probe.c |3 ++-
  drivers/xen/xenbus/xenbus_probe_backend.c |3 +++
  4 files changed, 7 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/block/xen-blkback/blkback.c 
 b/drivers/block/xen-blkback/blkback.c
 index 5cf2993..ed62008 100644
 --- a/drivers/block/xen-blkback/blkback.c
 +++ b/drivers/block/xen-blkback/blkback.c
 @@ -824,3 +824,4 @@ static int __init xen_blkif_init(void)
  module_init(xen_blkif_init);
  
  MODULE_LICENSE(Dual BSD/GPL);
 +MODULE_ALIAS(xen-backend:vbd);
 diff --git a/drivers/net/xen-netback/netback.c 
 b/drivers/net/xen-netback/netback.c
 index 0e4851b..fd00f25 100644
 --- a/drivers/net/xen-netback/netback.c
 +++ b/drivers/net/xen-netback/netback.c
 @@ -1743,3 +1743,4 @@ failed_init:
  module_init(netback_init);
  
  MODULE_LICENSE(Dual BSD/GPL);
 +MODULE_ALIAS(xen-backend:vif);
 diff --git a/drivers/xen/xenbus/xenbus_probe.c 
 b/drivers/xen/xenbus/xenbus_probe.c
 index 2ed0b04..bd2f90c 100644
 --- a/drivers/xen/xenbus/xenbus_probe.c
 +++ b/drivers/xen/xenbus/xenbus_probe.c
 @@ -393,7 +393,8 @@ static ssize_t devtype_show(struct device *dev,
  static ssize_t modalias_show(struct device *dev,
struct device_attribute *attr, char *buf)
  {
 - return sprintf(buf, xen:%s\n, to_xenbus_device(dev)-devicetype);
 + return sprintf(buf, %s:%s\n, dev-bus-name,
 +to_xenbus_device(dev)-devicetype);
  }
  
  struct device_attribute xenbus_dev_attrs[] = {
 diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
 b/drivers/xen/xenbus/xenbus_probe_backend.c
 index ec510e5..60adf91 100644
 --- a/drivers/xen/xenbus/xenbus_probe_backend.c
 +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
 @@ -107,6 +107,9 @@ static int xenbus_uevent_backend(struct device *dev,
   if (xdev == NULL)
   return -ENODEV;
  
 + if (add_uevent_var(env, MODALIAS=xen-backend:%s, xdev-devicetype))
 + return -ENOMEM;
 +
   /* stuff we want to pass to /sbin/hotplug */
   if (add_uevent_var(env, XENBUS_TYPE=%s, xdev-devicetype))
   return -ENOMEM;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-09 Thread Ian Campbell
On Fri, 2011-04-08 at 11:24 -0700, Jeremy Fitzhardinge wrote:
 On 04/08/2011 08:42 AM, Jan Beulich wrote:
  On 08.04.11 at 17:25, Jeremy Fitzhardinge jer...@goop.org wrote:
  On 04/07/2011 11:38 PM, Ian Campbell wrote:
  Is there any downside to this patch (is X86_CMPXCHG in the same sort of
  boat?)
  Only if we don't use cmpxchg in shared memory with other domains or the
  hypervisor.  (I don't think it will dynamically switch between real and
  emulated cmpxchg depending on availability.)

We do use cmpxchg in the grant table code at least (actually,
sync_cmpxchng in that case).

  Actually it does - see the #ifndef CONFIG_X86_CMPXCHG section
  in asm/cmpxchg_32.h.
 
 Hm, OK.  Still, I'm happiest with that dependency in case someone
 knobbles the cpu to exclude cmpxchg and breaks things.

Dropping the TSC patch is sensible though?

Ian.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-08 Thread Ian Campbell
(dropping netdev and the visws list)

On Thu, 2011-04-07 at 11:07 -0700, Jeremy Fitzhardinge wrote:
 On 04/06/2011 11:58 PM, Ian Campbell wrote:
  On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote:
  From: Ian Campbell ian.campb...@eu.citrix.com
  Date: Mon, 4 Apr 2011 10:55:55 +0100
 
  You mean the !X86_VISWS I presume? It doesn't make sense to me either.
  No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC 
  dependency.
  TSC is a real dependency of the Xen interfaces.
 
 Not really.  The TSC register is a requirement, but that's going to be
 present on any CPU which can boot Xen.  We don't need any of the
 kernel's TSC machinery though.

So why the Kconfig dependency then? In principal a kernel compiled for a
non-TSC processor (which meets the other requirements for Xen, such as
PAE support) will run just fine under Xen on a newer piece of hardware.

Is there any downside to this patch (is X86_CMPXCHG in the same sort of
boat?)

8--

From 7204945696a927d281366f2a57baee37e2b43ca3 Mon Sep 17 00:00:00 2001
From: Ian Campbell i...@hellion.org.uk
Date: Fri, 8 Apr 2011 07:33:21 +0100
Subject: [PATCH] xen: remove Kconfig dependency on X86_TSC

The TSC register is a requirement when running under Xen, but that's going to
be present on any CPU which can boot Xen. We don't need any of the kernel's TSC
machinery, since the usage is contained within the Xen interfaces, and therefore
XEN does not need to depend on CONFIG_X86_TSC.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
 arch/x86/xen/Kconfig |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 1c7121b..ac69c5b 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,7 +7,7 @@ config XEN
select PARAVIRT
select PARAVIRT_CLOCK
depends on X86_64 || (X86_32  X86_PAE  !X86_VISWS)
-   depends on X86_CMPXCHG  X86_TSC
+   depends on X86_CMPXCHG
help
  This is the Linux Xen port.  Enabling this will allow the
  kernel to boot in a paravirtualized environment under the
-- 
1.7.4.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-08 Thread Ian Campbell
(dropping netdev and visws list)
On Thu, 2011-04-07 at 18:00 +0100, H. Peter Anvin wrote:
 On 04/06/2011 11:58 PM, Ian Campbell wrote:
  
  I'm not sure why ELAN belongs in the EXTENDED_PLATFORM option space
  rather than in the CPU choice option, since its only impact seems to be
  on -march, MODULE_PROC_FAMILY and some cpufreq drivers which doesn't
  sound like an extended platform to me but does it appear to be
  deliberate (see 9e111f3e167a x86: move ELAN to the
  NON_STANDARD_PLATFORM section, that was the old name for
  EXTENDED_PLATFORM).
  
 
 Historic... we used to have nonstandard A20M# handling on Elan, until it
 was discovered that we could make it work without it.

Any reason not switch it over at this point then?

8--

From b1942fa168aee77537bf467e4c68c6f181b8fdee Mon Sep 17 00:00:00 2001
From: Ian Campbell i...@hellion.org.uk
Date: Fri, 8 Apr 2011 07:42:29 +0100
Subject: [PATCH] x86: move AMD Elan Kconfig under Processor family

Currently the option resides under X86_EXTENDED_PLATFORM due to historical
nonstandard A20M# handling. However that is no longer the case and so Elan can
be treated as part of the standard processor choice Kconfig option.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: H. Peter Anvin h...@zytor.com
---
 arch/x86/Kconfig|   11 ---
 arch/x86/Kconfig.cpu|   16 ++--
 arch/x86/Makefile_32.cpu|2 +-
 arch/x86/include/asm/module.h   |2 +-
 arch/x86/kernel/cpu/cpufreq/Kconfig |4 ++--
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc6c53a..f00a3f3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -365,17 +365,6 @@ config X86_UV
 # Following is an alphabetically sorted list of 32 bit extended platforms
 # Please maintain the alphabetic order if and when there are additions
 
-config X86_ELAN
-   bool AMD Elan
-   depends on X86_32
-   depends on X86_EXTENDED_PLATFORM
-   ---help---
- Select this for an AMD Elan processor.
-
- Do not use this option for K6/Athlon/Opteron processors!
-
- If unsure, choose PC-compatible instead.
-
 config X86_INTEL_CE
bool CE4100 TV platform
depends on PCI
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index d161e93..6a7cfdf 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -1,6 +1,4 @@
 # Put here option for CPU selection and depending optimization
-if !X86_ELAN
-
 choice
prompt Processor family
default M686 if X86_32
@@ -203,6 +201,14 @@ config MWINCHIP3D
  stores for this CPU, which can increase performance of some
  operations.
 
+config MELAN
+   bool AMD Elan
+   depends on X86_32
+   ---help---
+ Select this for an AMD Elan processor.
+
+ Do not use this option for K6/Athlon/Opteron processors!
+
 config MGEODEGX1
bool GeodeGX1
depends on X86_32
@@ -292,8 +298,6 @@ config X86_GENERIC
  This is really intended for distributors who need more
  generic optimizations.
 
-endif
-
 #
 # Define implied options from the CPU selection here
 config X86_INTERNODE_CACHE_SHIFT
@@ -312,7 +316,7 @@ config X86_L1_CACHE_SHIFT
int
default 7 if MPENTIUM4 || MPSC
default 6 if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || 
X86_GENERIC || GENERIC_CPU
-   default 4 if X86_ELAN || M486 || M386 || MGEODEGX1
+   default 4 if MELAN || M486 || M386 || MGEODEGX1
default 5 if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || 
MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || 
M586 || MVIAC3_2 || MGEODE_LX
 
 config X86_XADD
@@ -358,7 +362,7 @@ config X86_POPAD_OK
 
 config X86_ALIGNMENT_16
def_bool y
-   depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || X86_ELAN || MK6 || 
M586MMX || M586TSC || M586 || M486 || MVIAC3_2 || MGEODEGX1
+   depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MELAN || MK6 || 
M586MMX || M586TSC || M586 || M486 || MVIAC3_2 || MGEODEGX1
 
 config X86_INTEL_USERCOPY
def_bool y
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index f2ee1ab..86cee7b 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -37,7 +37,7 @@ cflags-$(CONFIG_MATOM)+= $(call 
cc-option,-march=atom,$(call cc-option,-march=
$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
 
 # AMD Elan support
-cflags-$(CONFIG_X86_ELAN)  += -march=i486
+cflags-$(CONFIG_MELAN) += -march=i486
 
 # Geode GX1 support
 cflags-$(CONFIG_MGEODEGX1) += -march=pentium-mmx
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index 67763c5..9eae775 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -35,7 +35,7 @@
 #define MODULE_PROC_FAMILY K7 
 #elif defined CONFIG_MK8
 #define MODULE_PROC_FAMILY K8 
-#elif

Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-07 Thread Ian Campbell
On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote:
 From: Ian Campbell ian.campb...@eu.citrix.com
 Date: Mon, 4 Apr 2011 10:55:55 +0100
 
  You mean the !X86_VISWS I presume? It doesn't make sense to me either.
 
 No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC 
 dependency.

TSC is a real dependency of the Xen interfaces.

 And, well, you could type make allmodconfig on your tree and see for
 yourself instead of asking me :-)

True.

X86_TSC not being enabled appears to due to CONFIG_ELAN being enabled
which causes the processor selection option (which defaults to M686,
which is a sane choice and enables TSC etc) to be gated at the top level
in arch/x86/Kconfig.cpu. Disabling the ELAN option then leaves X86_TSC
gated on !CONFIG_NUMAQ but removing that results in a generally useful
looking config.

It's a shame that these sorts of minority options cause allmodconfig to
omit support for more interesting configurations, such as modern
processors. Other than negating the semantics of such options I'm not
really sure what can be done about it though. On the other hand
compiling all the unusual stuff in an allmodconfig is probably a
positive thing.

I'm not sure why ELAN belongs in the EXTENDED_PLATFORM option space
rather than in the CPU choice option, since its only impact seems to be
on -march, MODULE_PROC_FAMILY and some cpufreq drivers which doesn't
sound like an extended platform to me but does it appear to be
deliberate (see 9e111f3e167a x86: move ELAN to the
NON_STANDARD_PLATFORM section, that was the old name for
EXTENDED_PLATFORM).

Hrm, what about the following? (doesn't actually make a difference to
Xen since allmodconfig chooses HIGHMEM4G instead of HIGHMEM64G in the !
NUMAQ case but I stopped worrying about that several paragraphs ago)

8

x86: invert X86_EXTENDED_PLATFORM to X86_STANDARD_PLATFORM

Having the =y choice be the more standard configuration causes
all*config to provide greater coverage of usual configurations.

Signed-off-by: Ian Campbell ian.campb...@citrix.com

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc6c53a..6d8a404 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -299,15 +299,15 @@ config X86_BIGSMP
  This option is needed for the systems that have more than 8 CPUs
 
 if X86_32
-config X86_EXTENDED_PLATFORM
-   bool Support for extended (non-PC) x86 platforms
+config X86_STANDARD_PLATFORM
+   bool Restrict support to standard (PC) x86 platforms
default y
---help---
- If you disable this option then the kernel will only support
+ If you enable this option then the kernel will only support
  standard PC platforms. (which covers the vast majority of
  systems out there.)
 
- If you enable this option then you'll be able to select support
+ If you disable this option then you'll be able to select support
  for the following (non-PC) 32 bit x86 platforms:
AMD Elan
NUMAQ (IBM/Sequent)
@@ -318,25 +318,25 @@ config X86_EXTENDED_PLATFORM
Moorestown MID devices
 
  If you have one of these systems, or if you want to build a
- generic distribution kernel, say Y here - otherwise say N.
+ generic distribution kernel, say N here - otherwise say Y.
 endif
 
 if X86_64
-config X86_EXTENDED_PLATFORM
-   bool Support for extended (non-PC) x86 platforms
+config X86_STANDARD_PLATFORM
+   bool Restrict support to standard (PC) x86 platforms
default y
---help---
- If you disable this option then the kernel will only support
+ If you enable this option then the kernel will only support
  standard PC platforms. (which covers the vast majority of
  systems out there.)
 
- If you enable this option then you'll be able to select support
+ If you disable this option then you'll be able to select support
  for the following (non-PC) 64 bit x86 platforms:
ScaleMP vSMP
SGI Ultraviolet
 
  If you have one of these systems, or if you want to build a
- generic distribution kernel, say Y here - otherwise say N.
+ generic distribution kernel, say N here - otherwise say Y.
 endif
 # This is an alphabetically sorted list of 64 bit extended platforms
 # Please maintain the alphabetic order if and when there are additions
@@ -346,7 +346,7 @@ config X86_VSMP
select PARAVIRT_GUEST
select PARAVIRT
depends on X86_64  PCI
-   depends on X86_EXTENDED_PLATFORM
+   depends on !X86_STANDARD_PLATFORM
---help---
  Support for ScaleMP vSMP systems.  Say 'Y' here if this kernel is
  supposed to run on these EM64T-based machines.  Only choose this 
option
@@ -355,7 +355,7 @@ config X86_VSMP
 config X86_UV
bool SGI Ultraviolet
depends on X86_64
-   depends on X86_EXTENDED_PLATFORM
+   depends

Re: Signed bit field; int have_hotplug_status_watch:1

2011-04-04 Thread Ian Campbell
On Sun, 2011-04-03 at 22:32 +0100, Dr. David Alan Gilbert wrote:
 Hi Ian,
I've been going through some sparse scans of the kernel and
 it threw up:
 
   CHECK   drivers/net/xen-netback/xenbus.c
 drivers/net/xen-netback/xenbus.c:29:40: error: dubious one-bit signed bitfield
 
 int have_hotplug_status_watch:1;
 
 from your patch f942dc2552b8bfdee607be867b12a8971bb9cd85 
 
 It does look like that should be an unsigned (given it's assigned
 0 and 1)

I agree.

8--

From 38fdb7199a0c3c5eb18ec27d2380e21116c97e29 Mon Sep 17 00:00:00 2001
From: Ian Campbell ian.campb...@citrix.com
Date: Mon, 4 Apr 2011 09:18:35 +0100
Subject: [PATCH] xen: netback: use unsigned type for one-bit bitfield.

Fixes error from sparse:
  CHECK   drivers/net/xen-netback/xenbus.c
drivers/net/xen-netback/xenbus.c:29:40: error: dubious one-bit signed bitfield

int have_hotplug_status_watch:1;

Reported-by: Dr. David Alan Gilbert li...@treblig.org
Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: net...@vger.kernel.org
Cc: xen-de...@lists.xensource.com
---
 drivers/net/xen-netback/xenbus.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 22b8c35..1ce729d 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -26,7 +26,7 @@ struct backend_info {
struct xenvif *vif;
enum xenbus_state frontend_state;
struct xenbus_watch hotplug_status_watch;
-   int have_hotplug_status_watch:1;
+   u8 have_hotplug_status_watch:1;
 };
 
 static int connect_rings(struct backend_info *);
-- 
1.7.2.5


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] xen: drop anti-dependency on X86_VISWS (Was: Re: [PATCH] xen: netfront: fix declaration order)

2011-04-04 Thread Ian Campbell
On Mon, 2011-04-04 at 01:24 +0100, David Miller wrote:
 From: Eric Dumazet eric.duma...@gmail.com
 Date: Sun, 03 Apr 2011 13:07:19 +0200
 
  [PATCH] xen: netfront: fix declaration order
  
  Must declare xennet_fix_features() and xennet_set_features() before
  using them.
  
  Signed-off-by: Eric Dumazet eric.duma...@gmail.com
  Cc: Michał Mirosław mirq-li...@rere.qmqm.pl
 
 Ugh, it makes no sense that XEN won't make it into the x86_32
 allmodconfig build.  Those dependencies in arch/x86/xen/Kconfig
 are terrible.

You mean the !X86_VISWS I presume? It doesn't make sense to me either.
Or at least I'm not sure why this single X86_32_NON_STANDARD machine is
more special than the others to require an anti-dependency like this.

It seems to have originally appeared from f0f32fccbffa on
CONFIG_PARAVIRT due to a conflict around ARCH_SETUP() and subsequently
got pushed down to CONFIG_XEN. However ARCH_SETUP doesn't exist any more
and I think the subarch stuff has been much improved since then so there
should be no conflict any more.

I dropped the dependency and, with a bit of fiddling, was able to build
a kernel with both CONFIG_X86_VISWS and CONFIG_XEN which booted as a Xen
domU.

tglx, Andrey, to get VISWS to build I had to comment out some code in
arch/x86/platform/visws/visws_quirks.c which seems to have been missed
during some irq_chip update or something?

  CC  arch/x86/platform/visws/visws_quirks.o
arch/x86/platform/visws/visws_quirks.c: In function 
'startup_piix4_master_irq':
arch/x86/platform/visws/visws_quirks.c:474: warning: no return 
statement in function returning non-void
arch/x86/platform/visws/visws_quirks.c: At top level:
arch/x86/platform/visws/visws_quirks.c:495: error: unknown field 'mask' 
specified in initializer
arch/x86/platform/visws/visws_quirks.c:495: warning: initialization 
from incompatible pointer type
arch/x86/platform/visws/visws_quirks.c: In function 
'set_piix4_virtual_irq_type':
arch/x86/platform/visws/visws_quirks.c:583: error: 'struct irq_chip' 
has no member named 'enable'
arch/x86/platform/visws/visws_quirks.c:583: error: 'struct irq_chip' 
has no member named 'unmask'
arch/x86/platform/visws/visws_quirks.c:584: error: 'struct irq_chip' 
has no member named 'disable'
arch/x86/platform/visws/visws_quirks.c:584: error: 'struct irq_chip' 
has no member named 'mask'
arch/x86/platform/visws/visws_quirks.c:585: error: 'struct irq_chip' 
has no member named 'unmask'
arch/x86/platform/visws/visws_quirks.c:585: error: 'struct irq_chip' 
has no member named 'unmask'
arch/x86/platform/visws/visws_quirks.c: In function 
'visws_pre_intr_init':
arch/x86/platform/visws/visws_quirks.c:602: error: expected expression 
before '' token
make[4]: *** [arch/x86/platform/visws/visws_quirks.o] Error 1

Ian

8

From db0ae26f479306ee8ebcfe2a08aa56a6dfe63987 Mon Sep 17 00:00:00 2001
From: Ian Campbell ian.campb...@citrix.com
Date: Mon, 4 Apr 2011 10:27:47 +0100
Subject: [PATCH] xen: drop anti-dependency on X86_VISWS

This seems to have been added in f0f32fccbffa to avoid a conflict arising from
the long deceased ARCH_SETUP() macro and subsequently pushed down to the XEN
option.

As far as I can tell the conflict is no longer present and by dropping the
dependency I was able to build a kernel which has both CONFIG_XEN and
CONFIG_X86_VISWS enabled and boot it on Xen. I didn't try it on the VISWS
platform.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: Jeremy Fitzhardinge jer...@goop.org
Cc: konrad.w...@oracle.com
Cc: xen-de...@lists.xensource.com
Cc: Randy Dunlap randy.dun...@oracle.com
Cc: Andrey Panin pa...@donpac.ru
Cc: linux-visws-de...@lists.sf.net
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: x...@kernel.org
---
 arch/x86/xen/Kconfig |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 1c7121b..65d7b13 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -6,7 +6,7 @@ config XEN
bool Xen guest support
select PARAVIRT
select PARAVIRT_CLOCK
-   depends on X86_64 || (X86_32  X86_PAE  !X86_VISWS)
+   depends on X86_64 || (X86_32  X86_PAE)
depends on X86_CMPXCHG  X86_TSC
help
  This is the Linux Xen port.  Enabling this will allow the
-- 
1.7.2.5




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH RESEND] net: convert xen-netfront to hw_features

2011-04-04 Thread Ian Campbell
On Sat, 2011-04-02 at 04:54 +0100, David Miller wrote:
 From: Michał Mirosław mirq-li...@rere.qmqm.pl
 Date: Thu, 31 Mar 2011 13:01:35 +0200 (CEST)
 
  Not tested in any way. The original code for offload setting seems broken
  as it resets the features on every netback reconnect.
  
  This will set GSO_ROBUST at device creation time (earlier than connect 
  time).
  
  RX checksum offload is forced on - so advertise as it is.
  
  Signed-off-by: Michał Mirosław mirq-li...@rere.qmqm.pl
 
 Applied.

Thanks, but unfortunately the patch results in the features all being
disabled by default, since they are not set in the initial dev-features
and the initial dev-wanted_features is based on features  hw_features.
The ndo_fix_features hook only clears features and doesn't add new
features (nor should it AFAICT).

Features cannot be negotiated with the backend until xennet_connect().
The carrier is not enabled until the end of that function, therefore I
think it is safe to start with a full set of features in dev-features
and rely on the call to netdev_update_features() in xennet_connect() to
clear those which turn out to be unavailable.

The following works for me, I guess the alternative is for
xennet_connect() to expand dev-features based on what it detects? Or is
there a mechanism for a driver to inform the core that a new hardware
feature has become available (I doubt that really happens on physical
h/w so I guess not).

Ian.

8-

From 0b56469abe56efae415b4603ef508ce9aec0e4c1 Mon Sep 17 00:00:00 2001
From: Ian Campbell ian.campb...@citrix.com
Date: Mon, 4 Apr 2011 10:58:50 +0100
Subject: [PATCH] xen: netfront: assume all hw features are available until 
backend connection setup

We need to assume that all features will be available when registering the
netdev otherwise they are ommitted from the initial set of
dev-wanted_features. When we connect to the backed we reduce the set as
necessary due to the call to netdev_update_features() in xennet_connect().

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Cc: mirq-li...@rere.qmqm.pl
Cc: net...@vger.kernel.org net...@vger.kernel.org
Cc: Jeremy Fitzhardinge jer...@goop.org
Cc: konrad.w...@oracle.com
Cc: Eric Dumazet eric.duma...@gmail.com
Cc: xen-de...@lists.xensource.com
---
 drivers/net/xen-netfront.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 0cfe4cc..db9a763 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1251,6 +1251,14 @@ static struct net_device * __devinit 
xennet_create_dev(struct xenbus_device *dev
  NETIF_F_GSO_ROBUST;
netdev-hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO;
 
+   /*
+ * Assume that all hw features are available for now. This set
+ * will be adjusted by the call to netdev_update_features() in
+ * xennet_connect() which is the earliest point where we can
+ * negotiate with the backend regarding supported features.
+ */
+   netdev-features |= netdev-hw_features;
+
SET_ETHTOOL_OPS(netdev, xennet_ethtool_ops);
SET_NETDEV_DEV(netdev, dev-dev);
 
-- 
1.7.2.5



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH RESEND] net: convert xen-netfront to hw_features

2011-03-31 Thread Ian Campbell
On Thu, 2011-03-31 at 12:01 +0100, Michał Mirosław wrote:
 Not tested in any way. The original code for offload setting seems broken
 as it resets the features on every netback reconnect.

Thanks, I've got a pending TODO item to test this and propagate similar
changes to netback. I hope to get to it soon...

Is this urgent (for 2.6.39) IYHO? I think it's been broken this way for
a long time now...

Ian.

 
 This will set GSO_ROBUST at device creation time (earlier than connect time).
 
 RX checksum offload is forced on - so advertise as it is.
 
 Signed-off-by: Michał Mirosław mirq-li...@rere.qmqm.pl
 ---
 [I don't know Xen code enough to say this is correct. There is Xen netback
 driver coming in, that has similar changes to be made. Please match
 them up if you can.]
 
  drivers/net/xen-netfront.c |   57 +--
  1 files changed, 23 insertions(+), 34 deletions(-)
 
 diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
 index 5c8d9c3..2a71c9f 100644
 --- a/drivers/net/xen-netfront.c
 +++ b/drivers/net/xen-netfront.c
 @@ -1148,6 +1148,8 @@ static const struct net_device_ops xennet_netdev_ops = {
   .ndo_change_mtu  = xennet_change_mtu,
   .ndo_set_mac_address = eth_mac_addr,
   .ndo_validate_addr   = eth_validate_addr,
 + .ndo_fix_features= xennet_fix_features,
 + .ndo_set_features= xennet_set_features,
  };
  
  static struct net_device * __devinit xennet_create_dev(struct xenbus_device 
 *dev)
 @@ -1209,7 +1211,9 @@ static struct net_device * __devinit 
 xennet_create_dev(struct xenbus_device *dev
   netdev-netdev_ops  = xennet_netdev_ops;
  
   netif_napi_add(netdev, np-napi, xennet_poll, 64);
 - netdev-features= NETIF_F_IP_CSUM;
 + netdev-features= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 +   NETIF_F_GSO_ROBUST;
 + netdev-hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO;
  
   SET_ETHTOOL_OPS(netdev, xennet_ethtool_ops);
   SET_NETDEV_DEV(netdev, dev-dev);
 @@ -1510,52 +1514,40 @@ again:
   return err;
  }
  
 -static int xennet_set_sg(struct net_device *dev, u32 data)
 +static u32 xennet_fix_features(struct net_device *dev, u32 features)
  {
 - if (data) {
 - struct netfront_info *np = netdev_priv(dev);
 - int val;
 + struct netfront_info *np = netdev_priv(dev);
 + int val;
  
 + if (features  NETIF_F_SG) {
   if (xenbus_scanf(XBT_NIL, np-xbdev-otherend, feature-sg,
%d, val)  0)
   val = 0;
 +
   if (!val)
 - return -ENOSYS;
 - } else if (dev-mtu  ETH_DATA_LEN)
 - dev-mtu = ETH_DATA_LEN;
 -
 - return ethtool_op_set_sg(dev, data);
 -}
 -
 -static int xennet_set_tso(struct net_device *dev, u32 data)
 -{
 - if (data) {
 - struct netfront_info *np = netdev_priv(dev);
 - int val;
 + features = ~NETIF_F_SG;
 + }
  
 + if (features  NETIF_F_TSO) {
   if (xenbus_scanf(XBT_NIL, np-xbdev-otherend,
feature-gso-tcpv4, %d, val)  0)
   val = 0;
 +
   if (!val)
 - return -ENOSYS;
 + features = ~NETIF_F_TSO;
   }
  
 - return ethtool_op_set_tso(dev, data);
 + return features;
  }
  
 -static void xennet_set_features(struct net_device *dev)
 +static int xennet_set_features(struct net_device *dev, u32 features)
  {
 - /* Turn off all GSO bits except ROBUST. */
 - dev-features = ~NETIF_F_GSO_MASK;
 - dev-features |= NETIF_F_GSO_ROBUST;
 - xennet_set_sg(dev, 0);
 + if (!(features  NETIF_F_SG)  dev-mtu  ETH_DATA_LEN) {
 + netdev_info(dev, Reducing MTU because no SG offload);
 + dev-mtu = ETH_DATA_LEN;
 + }
  
 - /* We need checksum offload to enable scatter/gather and TSO. */
 - if (!(dev-features  NETIF_F_IP_CSUM))
 - return;
 -
 - if (!xennet_set_sg(dev, 1))
 - xennet_set_tso(dev, 1);
 + return 0;
  }
  
  static int xennet_connect(struct net_device *dev)
 @@ -1582,7 +1574,7 @@ static int xennet_connect(struct net_device *dev)
   if (err)
   return err;
  
 - xennet_set_features(dev);
 + netdev_update_features(dev);
  
   spin_lock_bh(np-rx_lock);
   spin_lock_irq(np-tx_lock);
 @@ -1710,9 +1702,6 @@ static void xennet_get_strings(struct net_device *dev, 
 u32 stringset, u8 * data)
  
  static const struct ethtool_ops xennet_ethtool_ops =
  {
 - .set_tx_csum = ethtool_op_set_tx_csum,
 - .set_sg = xennet_set_sg,
 - .set_tso = xennet_set_tso,
   .get_link = ethtool_op_get_link,
  
   .get_sset_count = xennet_get_sset_count,


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [Xen-devel] Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-11-03 Thread Ian Campbell
On Wed, 2010-10-27 at 13:59 -0700, H. Peter Anvin wrote:
 I'll check it this evening when I'm at a working network again :(

Did this get applied? It seems to affect 2.6.32.x too
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=602273) so can we tag
it for stable as well?

Thanks,
Ian.

 
 Jeremy Fitzhardinge jer...@goop.org wrote:
 
  On 10/26/2010 10:48 AM, Glauber Costa wrote:
  On Tue, 2010-10-26 at 09:59 -0700, Jeremy Fitzhardinge wrote:
  If the guest domain has been suspend/resumed or migrated, then the
  system clock backing the pvclock clocksource may revert to a smaller
  value (ie, can be non-monotonic across the migration/save-restore).
  Make sure we zero last_value in that case so that the domain
  continues to see clock updates.
 
  [ I don't know if kvm needs an analogous fix or not. ]
  After migration, save/restore, etc, we issue an ioctl where we tell
  the host the last clock value. That (in theory) guarantees
 monotonicity.
 
  I am not opposed to this patch in any way, however.
 
 Thanks.
 
 HPA, do you want to take this, or shall I send it on?
 
 Thanks,
 J
 

-- 
Ian Campbell

BOFH excuse #191:

Just type 'mv * /dev/null'.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv3 1/3] x86: use ELF format in compressed images.

2008-02-14 Thread Ian Campbell

On Thu, 2008-02-14 at 11:34 +, Mark McLoughlin wrote:
 On Wed, 2008-02-13 at 20:54 +, Ian Campbell wrote:
  This allows other boot loaders such as the Xen domain builder the
  opportunity to extract the ELF file.
 
   Right, Xen currently can't boot bzImage (it needs the ELF image) so you
 still can't use the same kernel image on Xen as bare-metal.

I have a xen domain builder patch as well. I was waiting for the Linux
side to gain some traction before putting it forward (I'd attach it now
but it's at home on a laptop which is sleeping).

  +Field name:compressed_payload_offset
  +Type:  read
  +Offset/size:   0x248/4
  +Protocol:  2.08+
  +
  +  If non-zero then this field contains the offset from the end of the
  +  real-mode code to the compressed payload. The compression format
  +  should be determined using the standard magic number, currently only
  +  gzip is used.
 
   Should probably mention that the payload format is expected to be ELF.

Agreed. Probably the same deal as the compression format, i.e. use the
magic number but only ELF is possible today (even less likely to change
than the compression format I guess...).

   How about this?
 
 +sed-offsets := -e 's/^00*/0/' \
 +-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/-D\2=0x\1 
 /p'
 +
 +$(obj)/header.o: AFLAGS_header.o += $(shell $(NM) $(obj)/compressed/vmlinux 
 | sed -n $(sed-offsets))
 +$(obj)/header.o: $(obj)/compressed/vmlinux FORCE

That's probably a neater way of doing it. Although the .../header.o:
AFLAGS_header.o is redundant, either 
header.o: AFLAGS += foo
or
AFLAGS_header.o += foo
with the second being preferred in Linux Makefiles I think.

I'll try and get an updated patch out before I head for my flight
tomorrow.

Ian.

-- 
Ian Campbell
Current Noise: Reverend Bizarre - The Festival

While money can't buy happiness, it certainly lets you choose your own
form of misery.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv3 1/3] x86: use ELF format in compressed images.

2008-02-14 Thread Ian Campbell

On Thu, 2008-02-14 at 17:01 +, Ian Campbell wrote:
 
   +Field name:compressed_payload_offset
   +Type:  read
   +Offset/size:   0x248/4
   +Protocol:  2.08+
   +
   +  If non-zero then this field contains the offset from the end of
 the
   +  real-mode code to the compressed payload. The compression
 format
   +  should be determined using the standard magic number, currently
 only
   +  gzip is used.
  
Should probably mention that the payload format is expected to
 be ELF.
 
 Agreed. Probably the same deal as the compression format, i.e. use the
 magic number but only ELF is possible today (even less likely to
 change than the compression format I guess...).

Updated with a note about ELF format payload.

I've also changed the fields to just payload_{offset,length} and
adjusted the description to allow for the possibility of non-compressed
ELF payloads. I don't have a use for it myself but I can see how it
might be useful (embedded systems?) so it seems reasonable not to rule
it out. ELF-in-gzip and plain ELF can both be identified by magic
numbers.

Ian.
--- 
From 544c003d4067d895556180fc11a951e211202d0d Mon Sep 17 00:00:00 2001
From: Ian Campbell [EMAIL PROTECTED]
Date: Thu, 14 Feb 2008 18:29:01 +
Subject: [PATCH] x86: use ELF format in compressed images.

This allows other boot loaders such as the Xen domain builder the
opportunity to extract the ELF file.

Signed-off-by: Ian Campbell [EMAIL PROTECTED]
Cc: Thomas Gleixner [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED]
Cc: H. Peter Anvin [EMAIL PROTECTED]
Cc: Jeremy Fitzhardinge [EMAIL PROTECTED]
Cc: virtualization@lists.linux-foundation.org
---
 Documentation/i386/boot.txt   |   20 +
 arch/x86/boot/Makefile|   14 +
 arch/x86/boot/compressed/Makefile |2 +-
 arch/x86/boot/compressed/misc.c   |   56 +
 arch/x86/boot/header.S|4 ++
 5 files changed, 95 insertions(+), 1 deletions(-)

diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index fc49b79..f2e54e5 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -170,6 +170,8 @@ Offset  Proto   NameMeaning
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
 023C/4 2.07+   hardware_subarch Hardware subarchitecture
 0240/8 2.07+   hardware_subarch_data Subarchitecture-specific data
+0248/4 2.08+   payload_offset  Offset of kernel payload
+024C/4 2.08+   payload_length  Length of kernel payload
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -512,6 +514,24 @@ Protocol:  2.07+
 
   A pointer to data that is specific to hardware subarch
 
+Field name:payload_offset
+Type:  read
+Offset/size:   0x248/4
+Protocol:  2.08+
+
+  If non-zero then this field contains the offset from the end of the
+  real-mode code to the payload.
+
+  The payload may be compressed. The format of both the compressed and
+  uncompressed data should be determined using the standard magic
+  numbers. Currently only gzip compressed ELF is used.
+  
+Field name:payload_length
+Type:  read
+Offset/size:   0x24c/4
+Protocol:  2.08+
+
+  The length of the payload.
 
  THE KERNEL COMMAND LINE
 
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index f88458e..9695aff 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -94,6 +94,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
+sed-offsets := -e 's/^00*/0/' \
+-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 
0x\1/p'
+
+quiet_cmd_offsets = OFFSETS $@
+  cmd_offsets = $(NM) $ | sed -n $(sed-offsets)  $@
+
+$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE
+   $(call if_changed,offsets)
+
+targets += offsets.h
+
+AFLAGS_header.o += -I$(obj)
+$(obj)/header.o: $(obj)/offsets.h
+
 LDFLAGS_setup.elf  := -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index d2b9f3b..92fdd35 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds 
$(obj)/head_$(BITS).o $(obj)/misc.o $
$(call if_changed,ld)
@:
 
-OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S
+OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8182e32..69aec2f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -15,6 +15,10 @@
  * we just keep it from happening
  */
 #undef CONFIG_PARAVIRT
+#ifdef CONFIG_X86_32
+#define _ASM_DESC_H_ 1
+#endif
+
 #ifdef CONFIG_X86_64
 #define _LINUX_STRING_H_ 1
 #define

[PATCHv3 1/3] x86: use ELF format in compressed images.

2008-02-13 Thread Ian Campbell
This allows other boot loaders such as the Xen domain builder the
opportunity to extract the ELF file.

Signed-off-by: Ian Campbell [EMAIL PROTECTED]
Cc: Thomas Gleixner [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED]
Cc: H. Peter Anvin [EMAIL PROTECTED]
Cc: Jeremy Fitzhardinge [EMAIL PROTECTED]
Cc: virtualization@lists.linux-foundation.org
---
 Documentation/i386/boot.txt   |   18 
 arch/x86/boot/Makefile|   14 +
 arch/x86/boot/compressed/Makefile |2 +-
 arch/x86/boot/compressed/misc.c   |   56 +
 arch/x86/boot/header.S|6 
 5 files changed, 95 insertions(+), 1 deletions(-)

diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index fc49b79..b5f5ba1 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -170,6 +170,8 @@ Offset  Proto   NameMeaning
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
 023C/4 2.07+   hardware_subarch Hardware subarchitecture
 0240/8 2.07+   hardware_subarch_data Subarchitecture-specific data
+0248/4 2.08+   compressed_payload_offset
+024C/4 2.08+   compressed_payload_length
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -512,6 +514,22 @@ Protocol:  2.07+
 
   A pointer to data that is specific to hardware subarch
 
+Field name:compressed_payload_offset
+Type:  read
+Offset/size:   0x248/4
+Protocol:  2.08+
+
+  If non-zero then this field contains the offset from the end of the
+  real-mode code to the compressed payload. The compression format
+  should be determined using the standard magic number, currently only
+  gzip is used.
+  
+Field name:compressed_payload_length
+Type:  read
+Offset/size:   0x24c/4
+Protocol:  2.08+
+
+  The length of the compressed payload.
 
  THE KERNEL COMMAND LINE
 
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index f88458e..9695aff 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -94,6 +94,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
+sed-offsets := -e 's/^00*/0/' \
+-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 
0x\1/p'
+
+quiet_cmd_offsets = OFFSETS $@
+  cmd_offsets = $(NM) $ | sed -n $(sed-offsets)  $@
+
+$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE
+   $(call if_changed,offsets)
+
+targets += offsets.h
+
+AFLAGS_header.o += -I$(obj)
+$(obj)/header.o: $(obj)/offsets.h
+
 LDFLAGS_setup.elf  := -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index d2b9f3b..92fdd35 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds 
$(obj)/head_$(BITS).o $(obj)/misc.o $
$(call if_changed,ld)
@:
 
-OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S
+OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8182e32..69aec2f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -15,6 +15,10 @@
  * we just keep it from happening
  */
 #undef CONFIG_PARAVIRT
+#ifdef CONFIG_X86_32
+#define _ASM_DESC_H_ 1
+#endif
+
 #ifdef CONFIG_X86_64
 #define _LINUX_STRING_H_ 1
 #define __LINUX_BITMAP_H 1
@@ -22,6 +26,7 @@
 
 #include linux/linkage.h
 #include linux/screen_info.h
+#include linux/elf.h
 #include asm/io.h
 #include asm/page.h
 #include asm/boot.h
@@ -365,6 +370,56 @@ static void error(char *x)
asm(hlt);
 }
 
+static void parse_elf(void *output)
+{
+#ifdef CONFIG_X86_64
+   Elf64_Ehdr ehdr;
+   Elf64_Phdr *phdrs, *phdr;
+#else
+   Elf32_Ehdr ehdr;
+   Elf32_Phdr *phdrs, *phdr;
+#endif
+   void *dest;
+   int i;
+
+   memcpy(ehdr, output, sizeof(ehdr));
+   if(ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
+  ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
+  ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
+  ehdr.e_ident[EI_MAG3] != ELFMAG3)
+   {
+   error(Kernel is not a valid ELF file);
+   return;
+   }
+
+   putstr(Parsing ELF... );
+
+   phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum);
+   if (!phdrs)
+   error(Failed to allocate space for phdrs);
+
+   memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+
+   for (i=0; iehdr.e_phnum; i++) {
+   phdr = phdrs[i];
+
+   switch (phdr-p_type) {
+   case PT_LOAD:
+#ifdef CONFIG_RELOCATABLE
+   dest = output;
+   dest += (phdr-p_paddr - LOAD_PHYSICAL_ADDR);
+#else
+   dest = (void

[PATCHv2 1/3] x86: use ELF format in compressed images.

2008-02-06 Thread Ian Campbell
This allows other boot loaders such as the Xen domain builder the
opportunity to extract the ELF file.

Signed-off-by: Ian Campbell [EMAIL PROTECTED]
Cc: Thomas Gleixner [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED]
Cc: H. Peter Anvin [EMAIL PROTECTED]
Cc: Jeremy Fitzhardinge [EMAIL PROTECTED]
Cc: virtualization@lists.linux-foundation.org
---
 Documentation/i386/boot.txt   |   18 
 arch/x86/boot/Makefile|   14 +
 arch/x86/boot/compressed/Makefile |2 +-
 arch/x86/boot/compressed/misc.c   |   56 +
 arch/x86/boot/header.S|6 
 5 files changed, 95 insertions(+), 1 deletions(-)

diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index fc49b79..b5f5ba1 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -170,6 +170,8 @@ Offset  Proto   NameMeaning
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
 023C/4 2.07+   hardware_subarch Hardware subarchitecture
 0240/8 2.07+   hardware_subarch_data Subarchitecture-specific data
+0248/4 2.08+   compressed_payload_offset
+024C/4 2.08+   compressed_payload_length
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -512,6 +514,22 @@ Protocol:  2.07+
 
   A pointer to data that is specific to hardware subarch
 
+Field name:compressed_payload_offset
+Type:  read
+Offset/size:   0x248/4
+Protocol:  2.08+
+
+  If non-zero then this field contains the offset from the end of the
+  real-mode code to the compressed payload. The compression format
+  should be determined using the standard magic number, currently only
+  gzip is used.
+  
+Field name:compressed_payload_length
+Type:  read
+Offset/size:   0x24c/4
+Protocol:  2.08+
+
+  The length of the compressed payload.
 
  THE KERNEL COMMAND LINE
 
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index f88458e..9695aff 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -94,6 +94,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
+sed-offsets := -e 's/^00*/0/' \
+-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 
0x\1/p'
+
+quiet_cmd_offsets = OFFSETS $@
+  cmd_offsets = $(NM) $ | sed -n $(sed-offsets)  $@
+
+$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE
+   $(call if_changed,offsets)
+
+targets += offsets.h
+
+AFLAGS_header.o += -I$(obj)
+$(obj)/header.o: $(obj)/offsets.h
+
 LDFLAGS_setup.elf  := -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index d2b9f3b..92fdd35 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds 
$(obj)/head_$(BITS).o $(obj)/misc.o $
$(call if_changed,ld)
@:
 
-OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S
+OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8182e32..69aec2f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -15,6 +15,10 @@
  * we just keep it from happening
  */
 #undef CONFIG_PARAVIRT
+#ifdef CONFIG_X86_32
+#define _ASM_DESC_H_ 1
+#endif
+
 #ifdef CONFIG_X86_64
 #define _LINUX_STRING_H_ 1
 #define __LINUX_BITMAP_H 1
@@ -22,6 +26,7 @@
 
 #include linux/linkage.h
 #include linux/screen_info.h
+#include linux/elf.h
 #include asm/io.h
 #include asm/page.h
 #include asm/boot.h
@@ -365,6 +370,56 @@ static void error(char *x)
asm(hlt);
 }
 
+static void parse_elf(void *output)
+{
+#ifdef CONFIG_X86_64
+   Elf64_Ehdr ehdr;
+   Elf64_Phdr *phdrs, *phdr;
+#else
+   Elf32_Ehdr ehdr;
+   Elf32_Phdr *phdrs, *phdr;
+#endif
+   void *dest;
+   int i;
+
+   memcpy(ehdr, output, sizeof(ehdr));
+   if(ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
+  ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
+  ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
+  ehdr.e_ident[EI_MAG3] != ELFMAG3)
+   {
+   error(Kernel is not a valid ELF file);
+   return;
+   }
+
+   putstr(Parsing ELF... );
+
+   phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum);
+   if (!phdrs)
+   error(Failed to allocate space for phdrs);
+
+   memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+
+   for (i=0; iehdr.e_phnum; i++) {
+   phdr = phdrs[i];
+
+   switch (phdr-p_type) {
+   case PT_LOAD:
+#ifdef CONFIG_RELOCATABLE
+   dest = output;
+   dest += (phdr-p_paddr - LOAD_PHYSICAL_ADDR);
+#else
+   dest = (void

[PATCH] x86: use ELF format in compressed images.

2008-01-31 Thread Ian Campbell
This allows other boot loaders such as the Xen domain builder the
opportunity to extract the ELF file.

Signed-off-by: Ian Campbell [EMAIL PROTECTED]
Cc: Thomas Gleixner [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED]
Cc: H. Peter Anvin [EMAIL PROTECTED]
Cc: Jeremy Fitzhardinge [EMAIL PROTECTED]
Cc: virtualization@lists.linux-foundation.org
---
 Documentation/i386/boot.txt   |   18 +
 arch/x86/boot/Makefile|   14 ++
 arch/x86/boot/compressed/Makefile |2 +-
 arch/x86/boot/compressed/misc.c   |   49 +
 arch/x86/boot/header.S|6 
 5 files changed, 88 insertions(+), 1 deletions(-)

diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index fc49b79..b5f5ba1 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -170,6 +170,8 @@ Offset  Proto   NameMeaning
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
 023C/4 2.07+   hardware_subarch Hardware subarchitecture
 0240/8 2.07+   hardware_subarch_data Subarchitecture-specific data
+0248/4 2.08+   compressed_payload_offset
+024C/4 2.08+   compressed_payload_length
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -512,6 +514,22 @@ Protocol:  2.07+
 
   A pointer to data that is specific to hardware subarch
 
+Field name:compressed_payload_offset
+Type:  read
+Offset/size:   0x248/4
+Protocol:  2.08+
+
+  If non-zero then this field contains the offset from the end of the
+  real-mode code to the compressed payload. The compression format
+  should be determined using the standard magic number, currently only
+  gzip is used.
+  
+Field name:compressed_payload_length
+Type:  read
+Offset/size:   0x24c/4
+Protocol:  2.08+
+
+  The length of the compressed payload.
 
  THE KERNEL COMMAND LINE
 
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 254a583..0c629dc 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,6 +86,20 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
+sed-offsets := -e 's/^00*/0/' \
+-e 's/^\([0-9a-fA-F]*\) . \(input_data\|input_data_end\)$$/\#define \2 
0x\1/p'
+
+quiet_cmd_offsets = OFFSETS $@
+  cmd_offsets = $(NM) $ | sed -n $(sed-offsets)  $@
+
+$(obj)/offsets.h: $(obj)/compressed/vmlinux FORCE
+   $(call if_changed,offsets)
+
+targets += offsets.h
+
+AFLAGS_header.o += -I$(obj)
+$(obj)/header.o: $(obj)/offsets.h
+
 LDFLAGS_setup.elf  := -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index d2b9f3b..92fdd35 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -22,7 +22,7 @@ $(obj)/vmlinux: $(src)/vmlinux_$(BITS).lds 
$(obj)/head_$(BITS).o $(obj)/misc.o $
$(call if_changed,ld)
@:
 
-OBJCOPYFLAGS_vmlinux.bin := -O binary -R .note -R .comment -S
+OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8182e32..8a5daf5 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -15,6 +15,10 @@
  * we just keep it from happening
  */
 #undef CONFIG_PARAVIRT
+#ifdef CONFIG_X86_32
+#define _ASM_DESC_H_ 1
+#endif
+
 #ifdef CONFIG_X86_64
 #define _LINUX_STRING_H_ 1
 #define __LINUX_BITMAP_H 1
@@ -22,6 +26,7 @@
 
 #include linux/linkage.h
 #include linux/screen_info.h
+#include linux/elf.h
 #include asm/io.h
 #include asm/page.h
 #include asm/boot.h
@@ -365,6 +370,49 @@ static void error(char *x)
asm(hlt);
 }
 
+static void parse_elf(void *output)
+{
+#ifdef CONFIG_X86_64
+   Elf64_Ehdr ehdr;
+   Elf64_Phdr *phdrs, *phdr;
+#else
+   Elf32_Ehdr ehdr;
+   Elf32_Phdr *phdrs, *phdr;
+#endif
+   int i;
+
+   memcpy(ehdr, output, sizeof(ehdr));
+   if(ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
+  ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
+  ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
+  ehdr.e_ident[EI_MAG3] != ELFMAG3)
+   {
+   error(Kernel is not a valid ELF file);
+   return;
+   }
+
+   putstr(Parsing ELF... );
+
+   phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum);
+   if (!phdrs)
+   error(Failed to allocate space for phdrs);
+
+   memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+
+   for (i=0; iehdr.e_phnum; i++) {
+   phdr = phdrs[i];
+
+   switch (phdr-p_type) {
+   case PT_LOAD:
+   memcpy((void*)phdr-p_paddr,
+  output + phdr-p_offset,
+  phdr-p_filesz);
+   break

[PATCH] Implement getgeo for Xen virtual block device.

2007-12-16 Thread Ian Campbell
Hi Jeremy,

The below implements the getgeo hook for Xen block devices. Extracted
from the xen-unstable tree where it has been used for ages.

It is useful to have because it allows things like grub2 (used by the
Debian installer images) to work in a guest domain without having to
sprinkle Xen specific hacks around the place.

Signed-off-by: Ian Campbell [EMAIL PROTECTED]

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2bdebcb..b0a2e69 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@
 
 #include linux/interrupt.h
 #include linux/blkdev.h
+#include linux/hdreg.h
 #include linux/module.h
 
 #include xen/xenbus.h
@@ -135,6 +136,22 @@ static void blkif_restart_queue_callback(void *arg)
schedule_work(info-work);
 }
 
+int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
+{
+   /* We don't have real geometry info, but let's at least return
+  values consistent with the size of the device */
+   sector_t nsect = get_capacity(bd-bd_disk);
+   sector_t cylinders = nsect;
+
+   hg-heads = 0xff;
+   hg-sectors = 0x3f;
+   sector_div(cylinders, hg-heads * hg-sectors);
+   hg-cylinders = cylinders;
+   if ((sector_t)(hg-cylinders + 1) * hg-heads * hg-sectors  nsect)
+   hg-cylinders = 0x;
+   return 0;
+}
+
 /*
  * blkif_queue_request
  *
@@ -939,6 +956,7 @@ static struct block_device_operations xlvbd_block_fops =
.owner = THIS_MODULE,
.open = blkif_open,
.release = blkif_release,
+   .getgeo = blkif_getgeo,
 };
 



-- 
Ian Campbell

'Martyrdom' is the only way a person can become famous without ability.
-- George Bernard Shaw


signature.asc
Description: This is a digitally signed message part
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization