Re: [PATCH] maintainers: drop Chris Wright from pvops

2017-10-26 Thread Chris Wright
(resend w/out html damage that triggers lkml reject)

On Thu, Oct 26, 2017 at 3:17 PM, Rusty Russell  wrote:
> Chris CC'd: He wasn't that hard to find.
>
> (linkedin says he's CTO of RedHat now.  I feel like an underachiever!)
>
> Cheers,
> Rusty.
>
> Juergen Gross  writes:
>
>> Mails to chr...@sous-sol.org are not deliverable since several months.
>> Drop him as PARAVIRT_OPS maintainer.
>>
>> Signed-off-by: Juergen Gross 

Acked-by: Chris Wright 

;)

thanks,
-chris

>> ---
>>  MAINTAINERS | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index d85c08956875..af0cb69f6a3e 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -10179,7 +10179,6 @@ F:Documentation/parport*.txt
>>
>>  PARAVIRT_OPS INTERFACE
>>  M:   Juergen Gross 
>> -M:   Chris Wright 
>>  M:   Alok Kataria 
>>  M:   Rusty Russell 
>>  L:   virtualization@lists.linux-foundation.org
>> --
>> 2.12.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] maintainers: drop Chris Wright from pvops

2017-10-26 Thread Chris Wright
Acked-by: Chris Wright 

;)


thanks,
-chris

On Oct 26, 2017 7:41 PM, "Rusty Russell"  wrote:

> Chris CC'd: He wasn't that hard to find.
>
> (linkedin says he's CTO of RedHat now.  I feel like an underachiever!)
>
> Cheers,
> Rusty.
>
> Juergen Gross  writes:
>
> > Mails to chr...@sous-sol.org are not deliverable since several months.
> > Drop him as PARAVIRT_OPS maintainer.
> >
> > Signed-off-by: Juergen Gross 
> > ---
> >  MAINTAINERS | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index d85c08956875..af0cb69f6a3e 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -10179,7 +10179,6 @@ F:Documentation/parport*.txt
> >
> >  PARAVIRT_OPS INTERFACE
> >  M:   Juergen Gross 
> > -M:   Chris Wright 
> >  M:   Alok Kataria 
> >  M:   Rusty Russell 
> >  L:   virtualization@lists.linux-foundation.org
> > --
> > 2.12.3
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

List Adminstration Help

2016-04-15 Thread Chris Wright
Hi all,

This list is run as a moderated list to minimize spam (gets a decent
amount despite filtering).  I've tended to this for years, and I'd
like to add some folks from community to help so that posts are
quickly approved.  Please let me know if you'd like to help.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Virtualization DevRoom at FOSDEM 2013

2012-11-16 Thread Chris Wright
Following on the heels of a successful KVM Forum and oVirt Workshop,
FOSDEM will be hosting a Virtualization DevRoom in February.  If you've
been to FOSDEM before, you know this is about developers and code, not
products.

Presentation proposals are due by December 16th 2012.

The full details are here:

 http://osvc.v2.cs.unibo.it/index.php/Main_Page

With the relevant topics being:

"Topics covered will include, but not limited to:
 - machine virtualization (e.g. KVM, Xen, VirtualBox,...)
 - network virtualization (e.g. openvstack, vale, vde, Open vSwitch,...)
 - process level virtualization, flexible kernels (e.g. rump anykernel, 
view-os, ...)
 - virt management (e.g. ganeti, libvirt, ovirt, XCP, ...)"

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: xen PV on HVM and initial domain merge in linux-next

2010-10-20 Thread Chris Wright
* Dan Magenheimer (dan.magenhei...@oracle.com) wrote:
> > Not following the Xen develpment at all, I would like to have a
> > positive reply from the listed Xen contacts, please,
> 
> I am not officially listed as a maintainer for Xen, but fwiw:
> 
> Acked-by: Dan Magenheimer 
> 
> And, Stephen, I think Chris Wright and virtualizat...@lists.osdl.org
> are stale entries in the MAINTAINERS file for Xen development,
> so you are unlikely to receive replies from him/them.
> (Chris, virtualizat...@lists.osdl.org ... please feel free
> to correct me if I am wrong.)

Yeah, I'm not really doing Xen pv stuff these days.
The virtualization list itself is always open, and it's useful for
things that cross over (e.g. virtio, pv clock, pv spinlocks).

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


KVM Forum 2010: videos online [was Re: KVM Forum 2010: presentations online]

2010-10-19 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
> We were also able to video the speakers, and will send a note when the
> videos are available.
> (and thanks again to Andrew Cathrow for making this happen)

I don't think a note went out yet.  The videos are available as well.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


KVM Forum 2010: presentations online

2010-08-17 Thread Chris Wright
KVM Forum 2010 was quite a success, many thanks to all who participated!

For those who couldn't attend, the presentations are available online now:
(thanks to Andrew Cathrow for pushing them all up)

http://www.linux-kvm.org/page/KVM_Forum_2010#Presentations

We were also able to video the speakers, and will send a note when the
videos are available.
(and thanks again to Andrew Cathrow for making this happen)

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: RFC: Network Plugin Architecture (NPA) for vmxnet3

2010-05-04 Thread Chris Wright
* Pankaj Thakkar (pthak...@vmware.com) wrote:
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let 
> us
> know your comments and queries.

How does the throughput, latency, and host CPU utilization for normal
data path compare with say NetQueue?

And does this obsolete your UPT implementation?

> Network Plugin Architecture
> ---
> 
> VMware has been working on various device passthrough technologies for the 
> past
> few years. Passthrough technology is interesting as it can result in better
> performance/cpu utilization for certain demanding applications. In our vSphere
> product we support direct assignment of PCI devices like networking adapters 
> to
> a guest virtual machine. This allows the guest to drive the device using the
> device drivers installed inside the guest. This is similar to the way KVM
> allows for passthrough of PCI devices to the guests. The hypervisor is 
> bypassed
> for all I/O and control operations and hence it can not provide any value add
> features such as live migration, suspend/resume, etc.
> 
> 
> Network Plugin Architecture (NPA) is an approach which VMware has developed in
> joint partnership with Intel which allows us to retain the best of passthrough
> technology and virtualization. NPA allows for passthrough of the fast data
> (I/O) path and lets the hypervisor deal with the slow control path using
> traditional emulation/paravirtualization techniques. Through this splitting of
> data and control path the hypervisor can still provide the above mentioned
> value add features and exploit the performance benefits of passthrough.

How many cards actually support this NPA interface?  What does it look
like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
one).

> NPA requires SR-IOV hardware which allows for sharing of one single NIC 
> adapter
> by multiple guests. SR-IOV hardware has many logically separate functions
> called virtual functions (VF) which can be independently assigned to the guest
> OS. They also have one or more physical functions (PF) (managed by a PF 
> driver)
> which are used by the hypervisor to control certain aspects of the VFs and the
> rest of the hardware.

How do you handle hardware which has a more symmetric view of the
SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
specification)?  Or hardware which has multiple functions per physical
port (multiqueue, hw filtering, embedded switch, etc.)?

> NPA splits the guest driver into two components called
> the Shell and the Plugin. The shell is responsible for interacting with the
> guest networking stack and funneling the control operations to the hypervisor.
> The plugin is responsible for driving the data path of the virtual function
> exposed to the guest and is specific to the NIC hardware. NPA also requires an
> embedded switch in the NIC to allow for switching traffic among the virtual
> functions. The PF is also used as an uplink to provide connectivity to other
> VMs which are in emulation mode. The figure below shows the major components 
> in
> a block diagram.
> 
> +--+
> | Guest VM |
> |  |
> |  ++  |
> |  | vmxnet3 driver |  |
> |  | Shell  |  |
> |  | ++ |  |
> |  | |   Plugin   | |  |
> +--+-++-+--+
> |   .
>+-+  .
>| vmxnet3 |  .
>|___+-+  .
>  |  .
>  |  .
> ++
> ||
> |   virtual switch   |
> ++
>   | .   \
>   | .\
>+=+  . \
>| PF control  |  .  \
>| |  .   \
>|  L2 driver  |  .\
>+-+  . \
>   | .  \
>   | .   \
> ++ ++
> | PF   VF1 VF2 ...   VFn | ||
> || |  regular   |
> |   SR-IOV NIC   | |nic |
> |+--+| |   ++
> ||   embedded   || +---+
> 

[PATCH] vhost-net: defer f->private_data until setup succeeds

2009-12-22 Thread Chris Wright
Trivial change, just for readability.  The filp is not installed on
failure, so the current code is not incorrect (also vhost_dev_init
currently has no failure case).  This just treats setting f->private_data
as something with global scope (sure, true only after fd_install).

Signed-off-by: Chris Wright 
---
 drivers/vhost/net.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 22d5fef..0697ab2 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -326,7 +326,6 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
int r;
if (!n)
return -ENOMEM;
-   f->private_data = n;
n->vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick;
n->vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick;
r = vhost_dev_init(&n->dev, n->vqs, VHOST_NET_VQ_MAX);
@@ -338,6 +337,9 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN);
n->tx_poll_state = VHOST_NET_POLL_DISABLED;
+
+   f->private_data = n;
+
return 0;
 }
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] vhost-net: comment use of invalid fd when setting vhost backend

2009-12-22 Thread Chris Wright
This looks like an error case, but it's just a special case to shutdown
the backend.  Clarify with a comment.

Signed-off-by: Chris Wright 
---
 drivers/vhost/net.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 22d5fef..cc92086 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -465,6 +465,7 @@ static struct socket *get_tun_socket(int fd)
 static struct socket *get_socket(int fd)
 {
struct socket *sock;
+   /* special case to disable backend */
if (fd == -1)
return NULL;
sock = get_raw_socket(fd);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: SCSI driver for VMware's virtual HBA - V6.

2009-10-13 Thread Chris Wright
* Alok Kataria (akata...@vmware.com) wrote:
> SCSI driver for VMware's virtual HBA.
> 
> From: Alok N Kataria 
> 
> This is a driver for VMware's paravirtualized SCSI device,
> which should improve disk performance for guests running
> under control of VMware hypervisors that support such devices.
> 
> Signed-off-by: Alok N Kataria 

Looks in good shape to me.

Reviewed-by: Chris Wright 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3

2009-09-29 Thread Chris Wright
* Bhavesh Davda (bhav...@vmware.com) wrote:
> > > Thanks a bunch for your really thorough review! I'll answer some of
> > your questions here. Shreyas can respond to your comments about some of
> > the coding style/comments/etc. in a separate mail.
> > 
> > The style is less important at this stage, but certainly eases review
> > to make it more consistent w/ Linux code.  The StudlyCaps, extra macros
> > (screaming caps) and inconistent space/tabs are visual distractions,
> > that's all.
> 
> Agreed, but we'll definitely address all the style issues in our subsequent 
> patch posts. Actually Shreyas showed me his raw patch and it had tabs and not 
> spaces, so we're trying to figure out if either Outlook (corporate blessed) 
> or our Exchange server is converting those tabs to spaces or something.

Ah, that's always fun.  You can check by mailing to yourself and looking
at the outcome.

> > > We do have an internal prototype of a Linux vmxnet3 driver with 4 Tx
> > queues and 4 Rx queues, using 9 MSI-X vectors, but it needs some work
> > before calling it production ready.
> > 
> > I'd expect once you switch to alloc_etherdev_mq(), make napi work per
> > rx queue, and fix MSI-X allocation (all needed for 4/4), you should
> > have enough to support the max of 16/8 (IOW, 4/4 still sounds like an
> > aritificial limitation).
> 
> Absolutely: 4/4 was simply a prototype to see if it helps with performance 
> any with certain benchmarks. So far it looks like there's a small performance 
> gain with microbenchmarks like netperf, but we're hoping having multiple 
> queues with multiple vectors might have some promise with macro benchmarks 
> like SPECjbb. If it pans out, we'll most likely make it a module_param with 
> some reasonable defaults, possibly just 1/1 by default.

Most physical devices that do MSI-X will do queue per cpu (or some
grouping if large number of cpus compared to queues).  Probably
reasonable default here too.

> > > > How about GRO conversion?
> > >
> > > Looks attractive, and we'll work on that in a subsequent patch.
> > Again, when we first wrote the driver, the NETIF_F_GRO stuff didn't
> > exist in Linux.
> > 
> > OK, shouldn't be too much work.
> > 
> > Another thing I forgot to mention is that net_device now has
> > net_device_stats in it.  So you shouldn't need net_device_stats in
> > vmxnet3_adapter.
> 
> Cool. Will do.
> 
> > > > > +#define UPT1_MAX_TX_QUEUES  64
> > > > > +#define UPT1_MAX_RX_QUEUES  64
> > > >
> > > > This is different than the 16/8 described above (and seemingly all
> > moot
> > > > since it becomes a single queue device).
> > >
> > > Nice catch! Those are not even used and are from the earliest days of
> > our driver development. We'll nuke those.
> > 
> > Could you describe the UPT layer a bit?  There were a number of
> > constants that didn't appear to be used.
> 
> UPT stands for Uniform Pass Thru, a spec/framework VMware developed with its 
> IHV partners to implement the fast path (Tx/Rx) features of vmxnet3 in 
> silicon. Some of these #defines that appear not to be used are based on this 
> initial spec that VMware shared with its IHV partners.
> 
> We divided the emulated vmxnet3 PCIe device's registers into two sets on two 
> separate BARs: BAR 0 for the UPT registers we asked IHV partners to implement 
> that we emulate in our hypervisor if no physical device compliant with the 
> UPT spec is available to pass thru from a virtual machine, and BAR 1 for 
> registers we always emulate for slow path/control operations like setting the 
> MAC address, or activating/quiescing/resetting the device, etc.

Interesting.  Sounds like part of NetQueue and also something that virtio
has looked into to support, e.g. VMDq.  Do you have a more complete
spec?

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3

2009-09-29 Thread Chris Wright
* Bhavesh Davda (bhav...@vmware.com) wrote:
> Hi Chris,
> 
> Thanks a bunch for your really thorough review! I'll answer some of your 
> questions here. Shreyas can respond to your comments about some of the coding 
> style/comments/etc. in a separate mail.

The style is less important at this stage, but certainly eases review
to make it more consistent w/ Linux code.  The StudlyCaps, extra macros
(screaming caps) and inconistent space/tabs are visual distractions,
that's all.

> > > INTx, MSI, MSI-X (25 vectors) interrupts
> > > 16 Rx queues, 8 Tx queues
> > 
> > Driver doesn't appear to actually support more than a single MSI-X
> > interrupt.
> > What is your plan for doing real multiqueue?
> 
> When we first wrote the driver a couple of years ago, Linux lacked proper 
> multiqueue support, hence we chose to use only a single queue though the 
> emulated device does support 16 Rx and 8 Tx queues, and 25 MSI-X vectors: 16 
> for Rx, 8 for Tx and 1 for other asynchronous event notifications, by design. 
> Actually a driver can repurpose any of the 25 vectors for any notifications; 
> just explaining the rationale for desiging the device with 25 MSI-X vectors.

I see, thanks.

> We do have an internal prototype of a Linux vmxnet3 driver with 4 Tx queues 
> and 4 Rx queues, using 9 MSI-X vectors, but it needs some work before calling 
> it production ready.

I'd expect once you switch to alloc_etherdev_mq(), make napi work per
rx queue, and fix MSI-X allocation (all needed for 4/4), you should
have enough to support the max of 16/8 (IOW, 4/4 still sounds like an
aritificial limitation).

> > How about GRO conversion?
> 
> Looks attractive, and we'll work on that in a subsequent patch. Again, when 
> we first wrote the driver, the NETIF_F_GRO stuff didn't exist in Linux.

OK, shouldn't be too much work.

Another thing I forgot to mention is that net_device now has
net_device_stats in it.  So you shouldn't need net_device_stats in
vmxnet3_adapter.

> > Also, heavy use of BUG_ON() (counted 51 of them), are you sure that
> > none
> > of them can be triggered by guest or remote (esp. the ones that happen
> > in interrupt context)?  Some initial thoughts below.
> 
> We'll definitely audit all the BUG_ONs again to make sure they can't be 
> exploited.
> 
> > > --- /dev/null
> > > +++ b/drivers/net/vmxnet3/upt1_defs.h
> > > +#define UPT1_MAX_TX_QUEUES  64
> > > +#define UPT1_MAX_RX_QUEUES  64
> > 
> > This is different than the 16/8 described above (and seemingly all moot
> > since it becomes a single queue device).
> 
> Nice catch! Those are not even used and are from the earliest days of our 
> driver development. We'll nuke those.

Could you describe the UPT layer a bit?  There were a number of
constants that didn't appear to be used.

> > > +/* interrupt moderation level */
> > > +#define UPT1_IML_NONE 0 /* no interrupt moderation */
> > > +#define UPT1_IML_HIGHEST  7 /* least intr generated */
> > > +#define UPT1_IML_ADAPTIVE 8 /* adpative intr moderation */
> > 
> > enum?  also only appears to support adaptive mode?
> 
> Yes, the Linux driver currently only asks for adaptive mode, but the device 
> supports 8 interrupt moderation levels.
> 
> > > --- /dev/null
> > > +++ b/drivers/net/vmxnet3/vmxnet3_defs.h
> > > +struct Vmxnet3_MiscConf {
> > > +   struct Vmxnet3_DriverInfo driverInfo;
> > > +   uint64_t uptFeatures;
> > > +   uint64_t ddPA; /* driver data PA */
> > > +   uint64_t queueDescPA;  /* queue descriptor table
> > PA */
> > > +   uint32_t ddLen;/* driver data len */
> > > +   uint32_t queueDescLen; /* queue desc. table len
> > in bytes */
> > > +   uint32_t mtu;
> > > +   uint16_t maxNumRxSG;
> > > +   uint8_t  numTxQueues;
> > > +   uint8_t  numRxQueues;
> > > +   uint32_t reserved[4];
> > > +};
> > 
> > should this be packed (or others that are shared w/ device)?  i assume
> > you've already done 32 vs 64 here
> > 
> 
> No need for packing since the fields are naturally 64-bit aligned. True for 
> all structures shared between the driver and device.

I had quickly looked and thought I saw a hole that would lead to
inconsistent layout for 32-bit vs 64-bit.  I figured I'd be wrong
there ;-)

> > > +#define VMXNET3_MAX_TX_QUEUES  8
> > > +#define VMXNET3_MAX_RX_QUEUES  16
> > 
> > different to UPT, I must've missed some layering here
> 
> These are the authoritiative #defines. Ignore the UPT ones.
> 
> > > --- /dev/null
> > > +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> > > +   VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_IMR + intr_idx *
> > 8, 0);
> > 
> > writel(0, adapter->hw_addr0 + VMXNET3_REG_IMR + intr_idx * 8)
> > seems just as clear to me.
> 
> Fair enough. We were just trying to clearly show which register accesses go 
> to BAR 0 versus BAR 1.
> 
> > only ever num_intrs=1, so there's some p

Re: Paravirtualization on VMware's Platform [VMI].

2009-09-29 Thread Chris Wright
* Alok Kataria (akata...@vmware.com) wrote:
> Mark VMI for removal in feature-removal-schedule.txt.
> 
> From: Alok N Kataria 
> 
> Add text in feature-removal.txt and also modify Kconfig to disable
> vmi by default.
> 
> Signed-off-by: Alok N Kataria 

Acked-by: Chris Wright 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-29 Thread Chris Wright
* Alok Kataria (akata...@vmware.com) wrote:
> 
> On Mon, 2009-09-28 at 19:25 -0700, H. Peter Anvin wrote:
> > On 09/28/2009 05:45 PM, Alok Kataria wrote:
> > > + bool "VMI Guest support [will be deprecated soon]"
> > > + default n
> > 
> > This is incorrect use of the word "deprecated"... it's *already*
> > deprecated (a word which pretty much means the opposite of "recommended".)
> > 
> > As far as "default n" is concerned... this is usually not necessary; "n"
> > is the default unless anything else is specified.
> 
> How about this ?  Thanks.

Looks good to me (missing Signed-off-by).  I think it's also useful
to generate some runtime noise saying it's a deprecated option.

Even something as simple as:

-   pv_info.name = "vmi"
+   pv_info.name = "vmi [deprecated]";

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3

2009-09-29 Thread Chris Wright
* Shreyas Bhatewara (sbhatew...@vmware.com) wrote:
> Some of the features of vmxnet3 are :
> PCIe 2.0 compliant PCI device: Vendor ID 0x15ad, Device ID 0x07b0
> INTx, MSI, MSI-X (25 vectors) interrupts
> 16 Rx queues, 8 Tx queues

Driver doesn't appear to actually support more than a single MSI-X interrupt.
What is your plan for doing real multiqueue?

> Offloads: TCP/UDP checksum, TSO over IPv4/IPv6,
> 802.1q VLAN tag insertion, filtering, stripping
> Multicast filtering, Jumbo Frames

How about GRO conversion?

> Wake-on-LAN, PCI Power Management D0-D3 states
> PXE-ROM for boot support
> 

Whole thing appears to be space indented, and is fairly noisy w/ printk.
Also, heavy use of BUG_ON() (counted 51 of them), are you sure that none
of them can be triggered by guest or remote (esp. the ones that happen
in interrupt context)?  Some initial thoughts below.


> diff --git a/drivers/net/vmxnet3/upt1_defs.h b/drivers/net/vmxnet3/upt1_defs.h
> new file mode 100644
> index 000..b50f91b
> --- /dev/null
> +++ b/drivers/net/vmxnet3/upt1_defs.h
> @@ -0,0 +1,104 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Maintained by: Shreyas Bhatewara 
> + *
> + */
> +
> +/* upt1_defs.h
> + *
> + *  Definitions for Uniform Pass Through.
> + */

Most of the source files have this format (some include -- after file
name).  Could just keep it all w/in the same comment block.  Since you
went to the trouble of saying what the file does, something a tad more
descriptive would be welcome.

> +
> +#ifndef _UPT1_DEFS_H
> +#define _UPT1_DEFS_H
> +
> +#define UPT1_MAX_TX_QUEUES  64
> +#define UPT1_MAX_RX_QUEUES  64

This is different than the 16/8 described above (and seemingly all moot
since it becomes a single queue device).

> +
> +/* interrupt moderation level */
> +#define UPT1_IML_NONE 0 /* no interrupt moderation */
> +#define UPT1_IML_HIGHEST  7 /* least intr generated */
> +#define UPT1_IML_ADAPTIVE 8 /* adpative intr moderation */

enum?  also only appears to support adaptive mode?

> +/* values for UPT1_RSSConf.hashFunc */
> +enum {
> +   UPT1_RSS_HASH_TYPE_NONE  = 0x0,
> +   UPT1_RSS_HASH_TYPE_IPV4  = 0x01,
> +   UPT1_RSS_HASH_TYPE_TCP_IPV4  = 0x02,
> +   UPT1_RSS_HASH_TYPE_IPV6  = 0x04,
> +   UPT1_RSS_HASH_TYPE_TCP_IPV6  = 0x08,
> +};
> +
> +enum {
> +   UPT1_RSS_HASH_FUNC_NONE  = 0x0,
> +   UPT1_RSS_HASH_FUNC_TOEPLITZ  = 0x01,
> +};
> +
> +#define UPT1_RSS_MAX_KEY_SIZE40
> +#define UPT1_RSS_MAX_IND_TABLE_SIZE  128
> +
> +struct UPT1_RSSConf {
> +   uint16_t   hashType;
> +   uint16_t   hashFunc;
> +   uint16_t   hashKeySize;
> +   uint16_t   indTableSize;
> +   uint8_thashKey[UPT1_RSS_MAX_KEY_SIZE];
> +   uint8_tindTable[UPT1_RSS_MAX_IND_TABLE_SIZE];
> +};
> +
> +/* features */
> +enum {
> +   UPT1_F_RXCSUM  = 0x0001,   /* rx csum verification */
> +   UPT1_F_RSS = 0x0002,
> +   UPT1_F_RXVLAN  = 0x0004,   /* VLAN tag stripping */
> +   UPT1_F_LRO = 0x0008,
> +};
> +#endif
> diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h 
> b/drivers/net/vmxnet3/vmxnet3_defs.h
> new file mode 100644
> index 000..a33a90b
> --- /dev/null
> +++ b/drivers/net/vmxnet3/vmxnet3_defs.h
> @@ -0,0 +1,534 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details

Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-21 Thread Chris Wright
* Stephen Hemminger (shemmin...@vyatta.com) wrote:
> On Mon, 21 Sep 2009 16:37:22 +0930
> Rusty Russell  wrote:
> 
> > > > Actually this framework can apply to traditional network adapters which 
> > > > have
> > > > just one tx/rx queue pair. And applications using the same user/kernel 
> > > > interface
> > > > can utilize this framework to send/receive network traffic directly 
> > > > thru a tx/rx
> > > > queue pair in a network adapter.
> > > > 
> 
> More importantly, when virtualizations is used with multi-queue NIC's the 
> virtio-net
> NIC is a single CPU bottleneck. The virtio-net NIC should preserve the 
> parallelism (lock
> free) using multiple receive/transmit queues. The number of queues should 
> equal the
> number of CPUs.

Yup, multiqueue virtio is on todo list ;-)

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-17 Thread Chris Wright
* Jeremy Fitzhardinge (jer...@goop.org) wrote:
> On 09/17/09 17:34, Chris Wright wrote:
> >> One of the options that I am contemplating is to drop the code from the
> >> tip tree in this release cycle, and given that this should be a low risk
> >> change we can remove it from Linus's tree later in the merge cycle.
> >>
> >> Let me know your views on this or if you think we should do this some
> >> other way.
> >> 
> > Typically we give time measured in multiple release cycles
> > before deprecating a feature.  This means placing an entry in
> > Documentation/feature-removal-schedule.txt, and potentially
> > adding some noise to warn users they are using a deprecated
> > feature.
> 
> That's true if the feature has some functional effect on users.  But at
> first sight, VMI is really just an optimisation, and a non-VMI-equipped
> kernel would be completely functionally equivalent, right?

True.  I'm all for removing code that's got no planned maintenance and
no place to run ;-)

> On the other hand, there could well be a performance regression which
> could affect users.  However they're taking the explicit step of
> withdrawing support for VMI, so I guess they can just take that in their
> stride.

Yeah.  Different than normal deprecation since it's atop VMware's HV
which is all in their domain.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-17 Thread Chris Wright
* Alok Kataria (akata...@vmware.com) wrote:
> We ran a few experiments to compare performance of VMware's
> paravirtualization technique (VMI) and hardware MMU technologies (HWMMU)
> on VMware's hypervisor.
> 
> To give some background, VMI is VMware's paravirtualization
> specification which tries to optimize CPU and MMU operations of the
> guest operating system. For more information take a look at this
> http://www.vmware.com/interfaces/paravirtualization.html
> 
> In most of the benchmarks, EPT/NPT (hwmmu) technologies are at par or
> provide better performance compared to VMI.
> The experiments included comparing performance across various micro and
> real world like benchmarks.
> 
> Host configuration used for testing.
> * Dell PowerEdge 2970
> * 2 x quad-core AMD Opteron 2384 2.7GHz (Shanghai C2), RVI capable.
> * 8 GB (4 x 2GB) memory, NUMA enabled
> * 2 x 300GB RAID 0 storage
> * 2 x embedded 1Gb NICs (Braodcom NetXtreme II BCM5708 1000Base-T)
> * Running developement build of ESX.
> 
> The guest VM was a SLES 10 SP2 based VM for both the VMI and non-VMI
> case. kernel version: 2.6.16.60-0.37_f594963d-vmipae.
> 
> Below is a short summary of performance results between HWMMU and VMI.
> These results are averaged over 9 runs. The memory was sized at 512MB
> per VCPU in all experiments.
> For the ratio results comparing hwmmu technologies to vmi, higher than 1
> means hwmmu is better than vmi.
> 
> compile workloads - 4-way : 1.02, i.e. about 2% better.
> compile workloads - 8-way : 1.14, i,e. 14% better.
> oracle swingbench - 4-way (small pages) : 1.34, i.e. 34% better.
> oracle swingbench - 4-way (large pages) : 1.03, i.e. 3% better.
> specjbb (large pages)   : 0.99, i.e. 1% degradation.

Not entirely surprising.  Curious if you ran specjbb w/ small pages too?

> Please note that specjbb is the worst case benchmark for hwmmu, due to
> the higher TLB miss latency, so it's a good result that the worst case
> benchmark has a degradation of only 1%.
> 
> VMware expects that these hardware virtualization features will be
> ubiquitous by 2011.
> 
> Apart from the performance benefit, VMI was important for Linux on
> VMware's platform, from timekeeping point of view, but with the tickless
> kernels and TSC improvements that were done for the mainline tree, we
> think VMI has outlived those requirements too.
> 
> In light of these results and availability of such hardware, we have
> decided to stop supporting VMI in our future products.
> 
> Given this new development, I wanted to discuss how should we go about
> retiring the VMI code from mainline Linux, i.e. the vmi_32.c and
> vmiclock_32.c bits.
> 
> One of the options that I am contemplating is to drop the code from the
> tip tree in this release cycle, and given that this should be a low risk
> change we can remove it from Linus's tree later in the merge cycle.
> 
> Let me know your views on this or if you think we should do this some
> other way.

Typically we give time measured in multiple release cycles
before deprecating a feature.  This means placing an entry in
Documentation/feature-removal-schedule.txt, and potentially
adding some noise to warn users they are using a deprecated
feature.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: remove driver_data direct access of struct device from more drivers

2009-05-04 Thread Chris Wright
* Greg Kroah-Hartman (gre...@suse.de) wrote:
> From: Greg Kroah-Hartman 
> 
> In the near future, the driver core is going to not allow direct access
> to the driver_data pointer in struct device.  Instead, the functions
> dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
> have been around since the beginning, so are backwards compatible with
> all older kernel versions.
> 
> Cc: xen-de...@lists.xensource.com
> Cc: virtualizat...@lists.osdl.org
> Cc: Chris Wright 
> Cc: Jeremy Fitzhardinge 
> Signed-off-by: Greg Kroah-Hartman 

Acked-by: Chris Wright 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen block: remove driver_data direct access of struct device

2009-04-30 Thread Chris Wright
* Greg KH (g...@kroah.com) wrote:
> On Thu, Apr 30, 2009 at 03:35:35PM -0700, Chris Wright wrote:
> > * Greg Kroah-Hartman (gre...@suse.de) wrote:
> > > In the near future, the driver core is going to not allow direct access
> > > to the driver_data pointer in struct device.  Instead, the functions
> > > dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
> > > have been around since the beginning, so are backwards compatible with
> > > all older kernel versions.
> > > 
> > > Cc: xen-de...@lists.xensource.com
> > > Cc: virtualizat...@lists.osdl.org
> > > Cc: Chris Wright 
> > > Cc: Jeremy Fitzhardinge 
> > > Signed-off-by: Greg Kroah-Hartman 
> > 
> > Acked-by: Chris Wright 
> 
> Thanks, will add it.  Any objections for this to go through my
> driver-core tree to Linus for 2.6.31?

None from me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen block: remove driver_data direct access of struct device

2009-04-30 Thread Chris Wright
* Greg Kroah-Hartman (gre...@suse.de) wrote:
> In the near future, the driver core is going to not allow direct access
> to the driver_data pointer in struct device.  Instead, the functions
> dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
> have been around since the beginning, so are backwards compatible with
> all older kernel versions.
> 
> Cc: xen-de...@lists.xensource.com
> Cc: virtualizat...@lists.osdl.org
> Cc: Chris Wright 
> Cc: Jeremy Fitzhardinge 
> Signed-off-by: Greg Kroah-Hartman 

Acked-by: Chris Wright 

after...

> - dev->dev.driver_data = NULL;
> + dev_det_drvdata(&dev->dev, NULL);

...fixing your fingers/script so it compiles ;-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-03 Thread Chris Wright
* Arkadiusz Miskiewicz (a.miskiew...@gmail.com) wrote:
> On Friday 03 of April 2009, Chris Wright wrote:
> > * Arkadiusz Miskiewicz (a.miskiew...@gmail.com) wrote:
> > > What about 9ea09af3bd3090e8349ca2899ca2011bd94cda85 ?
> > >
> > > stop_machine: introduce stop_machine_create/destroy.
> >
> > That is later fixed in a0e280e0f33f6c859a235fb69a875ed8f3420388.
> >
> > Can you please verify if 2.6.29 works for you? 
> 
> I think that the guilty part is 
> CONFIG_CC_STACKPROTECTOR_ALL=y
> CONFIG_CC_STACKPROTECTOR=y

Indeed, I think you're right.  In fact...this should fix it:

From:  Joseph Cihula 

The __restore_processor_state() fn restores %gs on resume from S3.  As
such, it cannot be protected by the stack-protector guard since %gs will
not be correct on function entry.

There are only a few other fns in this file and it should not negatively
impact kernel security that they will also have the stack-protector
guard removed (and so it's not worth moving them to another file).

Without this change, S3 resume on a kernel built with
CONFIG_CC_STACKPROTECTOR_ALL=y will fail.

Signed-off-by:  Joseph Cihula 

--- ../linux.trees.git/arch/x86/power/Makefile  2009-03-29 
12:12:13.0 -0700
+++ arch/x86/power/Makefile 2009-03-30 12:21:19.0 -0700
@@ -1,2 +1,7 @@
+# __restore_processor_state() restores %gs after S3 resume and so 
should not
+# itself be stack-protected
+nostackp := $(call cc-option, -fno-stack-protector)
+CFLAGS_cpu_$(BITS).o   := $(nostackp)
+
obj-$(CONFIG_PM_SLEEP)  += cpu_$(BITS).o
obj-$(CONFIG_HIBERNATION)   += hibernate_$(BITS).o hibernate_asm_$(BITS).o



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-03 Thread Chris Wright
* Ingo Molnar (mi...@elte.hu) wrote:
> 
> * Chris Wright  wrote:
> 
> > * Arkadiusz Miskiewicz (a.miskiew...@gmail.com) wrote:
> > > On Friday 03 of April 2009, Chris Wright wrote:
> > > > * Arkadiusz Miskiewicz (a.miskiew...@gmail.com) wrote:
> > > > > What about 9ea09af3bd3090e8349ca2899ca2011bd94cda85 ?
> > > > >
> > > > > stop_machine: introduce stop_machine_create/destroy.
> > > >
> > > > That is later fixed in a0e280e0f33f6c859a235fb69a875ed8f3420388.
> > > >
> > > > Can you please verify if 2.6.29 works for you? 
> > > 
> > > I think that the guilty part is 
> > > CONFIG_CC_STACKPROTECTOR_ALL=y
> > > CONFIG_CC_STACKPROTECTOR=y
> > 
> > Indeed, I think you're right.  In fact...this should fix it:
> 
> Note that i had to do a manual merge of the patch (it had 3 separate 
> patch corruptions) - the non-damaged version i applied is the one 
> below.

Indeed, I just discovered that ;-)

And, for the record...yes, that fixes it on my laptop.

Tested-by: Chris Wright 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-02 Thread Chris Wright
* Arkadiusz Miskiewicz (a.miskiew...@gmail.com) wrote:
> What about 9ea09af3bd3090e8349ca2899ca2011bd94cda85 ?
> 
> stop_machine: introduce stop_machine_create/destroy.

That is later fixed in a0e280e0f33f6c859a235fb69a875ed8f3420388.

Can you please verify if 2.6.29 works for you?  Your bisects are all
going way back to 2.6.29-rc1.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-01 Thread Chris Wright
* Rafael J. Wysocki (r...@sisk.pl) wrote:
> Sorry for the misunderstanding, I thought the breakage might be introduced
> between 15f7176eb1cccec0a332541285ee752b935c1c85 and
> 0a0c5168df270a50e3518e4f12bddb31f8f5f38f, so I thought it would be a good
> idea to verify if 0a0c5168df270a50e3518e4f12bddb31f8f5f38f fails too.

Ah, sure.  It fails too (both test_suspend=mem and regular suspend/resume).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-01 Thread Chris Wright
* Rafael J. Wysocki (r...@sisk.pl) wrote:
> On Wednesday 01 April 2009, Chris Wright wrote:
> > * Rafael J. Wysocki (r...@sisk.pl) wrote:
> > > This may be caused by the recent PM changes.  Can you please test if 
> > > commit
> > > 8efb8c76fcdccf5050c0ea059dac392789baaff2 is fine?
> > 
> > I just tested on my t400, it's not[1].  See same symptoms as Arkadiusz.
> > Seems as if it responds to initial apci event, I see some disk activity,
> > then nothing (as if it tried to wakeup, then went back to sleep).
> > 
> > [1] 15f7176eb1cccec0a332541285ee752b935c1c85 bad
> > fae3e7fba4c664b3a15f2cf15ac439e8d754afc2 good(ish) (display stays blank)
> 
> Hmm, can you also test commit 0a0c5168df270a50e3518e4f12bddb31f8f5f38f, 
> please?

Well, that's already included in 15f7176eb1cccec0a332541285ee752b935c1c85
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-01 Thread Chris Wright
* Rafael J. Wysocki (r...@sisk.pl) wrote:
> This may be caused by the recent PM changes.  Can you please test if commit
> 8efb8c76fcdccf5050c0ea059dac392789baaff2 is fine?

I just tested on my t400, it's not[1].  See same symptoms as Arkadiusz.
Seems as if it responds to initial apci event, I see some disk activity,
then nothing (as if it tried to wakeup, then went back to sleep).

[1] 15f7176eb1cccec0a332541285ee752b935c1c85 bad
fae3e7fba4c664b3a15f2cf15ac439e8d754afc2 good(ish) (display stays blank)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: 2.6.29 git, resume from ram broken on thinkpad

2009-04-01 Thread Chris Wright
* Arkadiusz Miskiewicz (a.miskiew...@gmail.com) wrote:
> and this as bad commit:
> 
> 7f7ace0cda64c99599c23785f8979a072e118058 is first bad commit

Does it make any difference if you roll fwd a couple commits to:

802bf931f2688ad125b73db597ce63cc842fb27a

That fixes a possible problem with the cpumask change and interrupt
migration.

And to be clear, one commit earlier works?

fae3e7fba4c664b3a15f2cf15ac439e8d754afc2

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 42/45] lguest: wire up pte_update/pte_update_defer

2009-03-31 Thread Chris Wright
-stable review patch.  If anyone has any objections, please let us know.
-

From: Rusty Russell 

upstream commit: b7ff99ea53cd16de8f6166c0e98f19a7c6ca67ee

Impact: intermittent guest segv/crash fix

I've been seeing random guest bad address crashes and segmentation faults:
bisect led to 4f98a2fee8 (vmscan: split LRU lists into anon & file sets),
but that's a red herring.

It turns out that lguest never hooked up the pte_update/pte_update_defer
calls, so our ptes were not always in sync.  After the vmscan commit, the
bug became reproducible; now a fsck in a 64MB guest causes reproducible
pagetable corruption.

Signed-off-by: Rusty Russell 
Cc: jer...@xensource.com
Cc: virtualizat...@lists.osdl.org
Cc: sta...@kernel.org
Signed-off-by: Chris Wright 
---
 arch/x86/lguest/boot.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -485,11 +485,17 @@ static void lguest_write_cr4(unsigned lo
  * into a process' address space.  We set the entry then tell the Host the
  * toplevel and address this corresponds to.  The Guest uses one pagetable per
  * process, so we need to tell the Host which one we're changing (mm->pgd). */
+static void lguest_pte_update(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep)
+{
+   lazy_hcall(LHCALL_SET_PTE, __pa(mm->pgd), addr, ptep->pte_low);
+}
+
 static void lguest_set_pte_at(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep, pte_t pteval)
 {
*ptep = pteval;
-   lazy_hcall(LHCALL_SET_PTE, __pa(mm->pgd), addr, pteval.pte_low);
+   lguest_pte_update(mm, addr, ptep);
 }
 
 /* The Guest calls this to set a top-level entry.  Again, we set the entry then
@@ -1034,6 +1040,8 @@ __init void lguest_init(void)
pv_mmu_ops.read_cr3 = lguest_read_cr3;
pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu;
pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mode;
+   pv_mmu_ops.pte_update = lguest_pte_update;
+   pv_mmu_ops.pte_update_defer = lguest_pte_update;
 
 #ifdef CONFIG_X86_LOCAL_APIC
/* apic read/write intercepts */

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virtio: block: set max_segment_size and max_sectors to infinite.

2008-12-03 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
> Setting max_segment_size allows more than 64k per sg element, unless
> the host specified a limit.  Setting max_sectors indicates that our
> max_hw_segments is the only limit.

We had been using a simple hardocded constant to increase max sectors to
improve throughput.  This (along with 2/2) are tested and showing nice numbers.

Acked-by: Chris Wright <[EMAIL PROTECTED]>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core

2008-11-26 Thread Chris Wright
* Greg KH ([EMAIL PROTECTED]) wrote:
> > +static int
> > +igb_virtual(struct pci_dev *pdev, int nr_virtfn)
> > +{
> > +   unsigned char my_mac_addr[6] = {0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0xFF};
> > +   struct net_device *netdev = pci_get_drvdata(pdev);
> > +   struct igb_adapter *adapter = netdev_priv(netdev);
> > +   int i;
> > +
> > +   if (nr_virtfn > 7)
> > +   return -EINVAL;
> 
> Why the check for 7?  Is that the max virtual functions for this card?
> Shouldn't that be a define somewhere so it's easier to fix in future
> versions of this hardware?  :)

IIRC it's 8 for the card, 1 reserved for PF.  I think both notions
should be captured w/ commented constants.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virtio: block: set max_segment_size and max_sectors to infinite.

2008-11-26 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
> + /* No real sector limit. */
> + blk_queue_max_sectors(vblk->disk->queue, -1U);
> +

Is that actually legitimate?  I think it'd still work out, but seems
odd, e.g. all the spots that do:

q->max_hw_sectors << 9

will just toss the upper bits...

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Chris Wright
* Greg KH ([EMAIL PROTECTED]) wrote:
> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> > On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> > > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > > I have not modified any existing drivers, but instead I threw together
> > > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > > was loaded.
> > > > 
> > > > It appears from my perusal thus far that drivers using these new
> > > > SR-IOV patches will require modification; i.e. the driver associated
> > > > with the Physical Function (PF) will be required to make the
> > > > pci_iov_register() call along with the requisite notify() function.
> > > > Essentially this suggests to me a model for the PF driver to perform
> > > > any "global actions" or setup on behalf of VFs before enabling them
> > > > after which VF drivers could be associated.
> > > 
> > > Where would the VF drivers have to be associated?  On the "pci_dev"
> > > level or on a higher one?
> > > 
> > > Will all drivers that want to bind to a "VF" device need to be
> > > rewritten?
> > 
> > The current model being implemented by my colleagues has separate
> > drivers for the PF (aka native) and VF devices.  I don't personally
> > believe this is the correct path, but I'm reserving judgement until I
> > see some code.
> 
> Hm, I would like to see that code before we can properly evaluate this
> interface.  Especially as they are all tightly tied together.
> 
> > I don't think we really know what the One True Usage model is for VF
> > devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
> > some ideas.  I bet there's other people who have other ideas too.
> 
> I'd love to hear those ideas.

First there's the question of how to represent the VF on the host.
Ideally (IMO) this would show up as a normal interface so that normal tools
can configure the interface.  This is not exactly how the first round of
patches were designed.

Second there's the question of reserving the BDF on the host such that
we don't have two drivers (one in the host and one in a guest) trying to
drive the same device (an issue that shows up for device assignment as
well as VF assignment).

Third there's the question of whether the VF can be used in the host at
all.

Fourth there's the question of whether the VF and PF drivers are the
same or separate.

The typical usecase is assigning the VF to the guest directly, so
there's only enough functionality in the host side to allocate a VF,
configure it, and assign it (and propagate AER).  This is with separate
PF and VF driver.

As Anthony mentioned, we are interested in allowing the host to use the
VF.  This could be useful for containers as well as dedicating a VF (a
set of device resources) to a guest w/out passing it through.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_net: Improve the recv buffer allocation scheme

2008-10-09 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
> On Thursday 09 October 2008 06:34:59 Mark McLoughlin wrote:
> > From: Herbert Xu <[EMAIL PROTECTED]>
> >
> > If segmentation offload is enabled by the host, we currently allocate
> > maximum sized packet buffers and pass them to the host. This uses up
> > 20 ring entries, allowing us to supply only 20 packet buffers to the
> > host with a 256 entry ring. This is a huge overhead when receiving
> > small packets, and is most keenly felt when receiving MTU sized
> > packets from off-host.
> 
> There are three approaches we should investigate before adding YA feature.  
> Obviously, we can simply increase the number of ring entries.

Tried that, it didn't help much.  I don't have my numbers handy, but
levelled off at about 512 and was a modest boost.  It's still wasteful
to preallocate like that on the off-chance it's a large packet.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.

2008-10-01 Thread Chris Wright
* Anthony Liguori ([EMAIL PROTECTED]) wrote:
> And arguably, storing TSC frequency in CPUID is a terrible interface  
> because the TSC frequency can change any time a guest is entered.  It  

True for older hardware, newer hardware should fix this.  I guess the
point is, the are numbers that are easy to measure incorrectly in guest.
Doesn't justify the whole thing..
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.

2008-10-01 Thread Chris Wright
* Anthony Liguori ([EMAIL PROTECTED]) wrote:
> We've already gone down the road of trying to make standard paravirtual  
> interfaces (via virtio).  No one was sufficiently interested in  
> collaborating.  I don't see why other paravirtualizations are going to  
> be much different.

The point is to be able to support those interfaces.  Presently a Linux guest
will test and find out which HV it's running on, and adapt.  Another
guest will fail to enlighten itself, and perf will suffer...yadda, yadda.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.

2008-10-01 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> "What hypervisor is this?" isn't a very interesting question; if you're  
> even asking it then it suggests that something has gone wrong.

It's essentially already happening.  Everyone wants to be a better
hyperv than hyperv ;-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/10] add missing parameter for lookup_address

2008-01-18 Thread Chris Wright
* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
> lookup_address() receives two parameters, but efi_64.c call
> is passing only one. It's actually preventing the tree from compiling
> 
> Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>

Good catch, I know I don't test with CONFIG_EFI=y
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation

2008-01-03 Thread Chris Wright
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
> On Thu, 3 Jan 2008, Chris Wright wrote:
> > Yes, paravirt ops have a well-specified calling convention (register
> > based).  There was a cleanup that Andi did that caused the problem
> > because it removed all the "fastcall" annotations since -mregparm=3
> > is now always on for i386.  Since MCOUNT disables REGPARM the calling
> > convention changes (caller pushes to stack, callee expects register)
> > chaos ensues.  I sent a patch to fix that quite some months back, but
> > it went stale and I neglected to update it.  Would you like me to dig
> > it up refresh and resend?
> 
> Chris, thanks for the refresher.
> 
> I'm going to see if we can remove the REGPARM hack and change the way
> mcount does its calls. Maybe this will fix things for us.

I don't recall why mcount disables regparm, but I think you're on the
right path to remove that dependency.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation

2008-01-03 Thread Chris Wright
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
> Hmm, I know paravirt-ops had an issue with mcount in the RT tree. I can't
> remember the exact issues, but it did have something to do with the way
> parameters were passed in.
> 
> Chris, do you remember what the issues were?

Yes, paravirt ops have a well-specified calling convention (register
based).  There was a cleanup that Andi did that caused the problem
because it removed all the "fastcall" annotations since -mregparm=3
is now always on for i386.  Since MCOUNT disables REGPARM the calling
convention changes (caller pushes to stack, callee expects register)
chaos ensues.  I sent a patch to fix that quite some months back, but
it went stale and I neglected to update it.  Would you like me to dig
it up refresh and resend?

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] Add I/O hypercalls for i386 paravirt

2007-08-22 Thread Chris Wright
* James Courtier-Dutton ([EMAIL PROTECTED]) wrote:
> Ok, so I need to get a new CPU like the Intel Core Duo that has VT
> features? I have an old Pentium 4 at the moment, without any VT features.

Depends on your goals.  You can certainly give a paravirt Xen guest[1]
physical hardware without any VT extentions.  But that guest will be
able to DMA anywhere in memory without VT-d, so if it's an untrusted
guest you'd be taking a huge risk.

thanks,
-chris

[1] Note: this is with the xenbits.xensource.com kernel, not with a
kernel you'll get from kernel.org ATM.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] Add I/O hypercalls for i386 paravirt

2007-08-22 Thread Chris Wright
* James Courtier-Dutton ([EMAIL PROTECTED]) wrote:
> If one could directly expose a device to the guest, this feature could
> be extremely useful for me.
> Is it possible? How would it manage to handle the DMA bus mastering?

Yes it's possible (Xen supports pci pass through).  Without an IOMMU
(like Intel VT-d or AMD IOMMU) it's not DMA safe.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/25][V3] irq_flags / halt routines

2007-08-15 Thread Chris Wright
* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
> Only caveat, is that it has to be done before smp gets in the game, and 
> with interrupts disabled. (which makes the function in vsmp.c not eligible).
> 
> My current option is to force VSMP to use PARAVIRT, as said before, and 
> then fill paravirt_arch_setup, which is currently unused, with code to 
> replace the needed paravirt_ops.fn.
> 
> I don't know if there is any method to dynamically determine (at this 
> point) that we are in a vsmp arch, and if there are not, it will have to 
> get ifdefs anyway. But at least, they are far more local.

between __cacheline_aligned_in_smp and other compile time bits based on
VSMP specific INTERNODE_CACHE, etc. I think compile time the way to go.

> I am okay with both, but after all the explanation, I don't think that 
> adding a new pvops is a bad idea. It would make things less cumbersome 
> in this case. Also, hacks like this save_fl may require changes to the 
> hypervisor, right? I don't even know where such hypervisor is, and how 
> easy it is to replace it (it may be deeply hidden in firmware)

No hypervisor change needed.  Just the pv backend needs to return 0 or
X86_EFLAGS_IF for save_flags (and similar translation on restore_flags).
Xen uses a simple shared memory flag and does something which you could
roughly translate into this:

xen_save_flags()
if (xen_vcpu_interrupts_enabled)
return X86_EFLAGS_IF;
else
return 0;

This doesn't require any hypervisor changes.  Similarly, VSMP could do
something along the lines of:

vsmp_save_flags()
flags = native_save_flags();
if (flags & X86_EFLAGS_IF) || (flags & X86_EFLAGS_AC)
return X86_EFLAGS_IF;
else
return 0;

> A question raises here: Would vsmp turn paravirt_enabled to 1 ?

Probably not.  It's mostly native and I'm not sure it would want the
bits disabled from if (paravirt_enabled()) tests.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/25][V3] irq_flags / halt routines

2007-08-15 Thread Chris Wright
* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
> As alternatives what we have now, we can either keep the paravirt_ops as 
> it is now for the native case, just hooking the vsmp functions in place 
> of the normal one, (there are just three ops anyway), refill the 
> paravirt_ops entirely in somewhere like vsmp.c, or similar (or maybe 
> even assigning paravirt_ops.fn = vsmp_fn on the fly, but early enough).

This is the best (just override pvops.fn for the few needed for VSMP).
The irq_disabled_flags() is the only problem.  For i386 we dropped it
(disabled_flags) as a pvop and forced the backend to provide a flags
(via save_flags) that conforms to IF only.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] [545/2many] MAINTAINERS - XEN HYPERVISOR INTERFACE

2007-08-13 Thread Chris Wright
* Randy Dunlap ([EMAIL PROTECTED]) wrote:
> On Mon, 13 Aug 2007 11:55:36 -0700 Chris Wright wrote:
> > * [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
> > > +F:   arch/i386/xen/
> > > +F:   drivers/*/xen-*front.c
> > > +F:   drivers/xen/
> > > +F:   include/asm-i386/xen/
> > > +F:   include/xen/
> > 
> > I think this data will easily become stale.  What is the point again?
> 
> Agreed.  But not everyone wants to or should have to use git,
> so what are the alternatives?

Between git (or gitweb), existing MAINTAINERS and a bit of common
sense (or extra sleuthing), I never perceived a significant problem.
Alternative could be to place info directly in source files.  If not
all of MAINTAINERS info, it could be a tag to reference the relevant
MAINTAINERS entry.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] [545/2many] MAINTAINERS - XEN HYPERVISOR INTERFACE

2007-08-13 Thread Chris Wright
* [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
> Add file pattern to MAINTAINER entry
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e4e1cc3..8395aba 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5153,6 +5153,11 @@ M: [EMAIL PROTECTED]
>  L:   [EMAIL PROTECTED]
>  L:   [EMAIL PROTECTED]
>  S:   Supported
> +F:   arch/i386/xen/
> +F:   drivers/*/xen-*front.c
> +F:   drivers/xen/
> +F:   include/asm-i386/xen/
> +F:   include/xen/

I think this data will easily become stale.  What is the point again?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] [365/2many] MAINTAINERS - PARAVIRT_OPS INTERFACE

2007-08-13 Thread Chris Wright
* [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
> Add file pattern to MAINTAINER entry
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2166416..44768ce 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3519,6 +3519,9 @@ M:  [EMAIL PROTECTED]
>  L:   [EMAIL PROTECTED]
>  L:   [EMAIL PROTECTED]
>  S:   Supported
> +F:   arch/i386/kernel/paravirt.c
> +F:   include/asm-i386/paravirt.h
> +F:   include/asm-um/paravirt.h

Not asm-um.  And it's much more spread out than that, touching many
non-paravirt specific files (as in include/asm-i386/ and search for
CONFIG_PARAVIRT).  I'm failing to see the value of this churn.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] VMI: remove CONFIG_DEBUG_PAGE_TYPE and associated bitrotted code

2007-07-06 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Zachary Amsden wrote:
> >I'd rather keep it, even with bitrot - it was non-trivial to get 
> >correct, and found many surprises in the code; most notably, it can 
> >detect
> >
> >1) PTE writes to pages not declared as page tables
> >2) Failure to allocate or de-allocate page tables using the 
> >paravirt-ops API
> >3) PTE writes using the wrong level operations
> >
> >These are most useful properties; in fact, I would like to extend the 
> >code for 64-bit paravirt-ops and 4-level paging, so rather not kill it 
> >until then.
> >
> >I never merged the whole bit upstream because it added a field to 
> >struct page. 
> 
> Hm, is that a big problem?  It would be OK for a debug config option, 
> wouldn't it?  Also, it doesn't seem particularly vmi-specific.  Could it 
> be made part of the pvops infrastructure?

I'm pretty sure lguest64 hit some of the problems Zach is trying to
catch, so should generalize well-enough.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] VMI: remove CONFIG_DEBUG_PAGE_TYPE and associated bitrotted code

2007-07-06 Thread Chris Wright
* Stefan Richter ([EMAIL PROTECTED]) wrote:
> > -#ifdef CONFIG_DEBUG_PAGE_TYPE
> > +#if 0 /* debug page type */

> This misnamed CONFIG_DEBUG_PAGE_TYPE (it's not a Kconfig variable) has
> about 120 lines debug code dangling on it.  So, replacing it by #if 0
> will hopefully motivate a kind janitor to send a removal patch for that
> debug code eventually.  I don't do so just now because that code went in
> between 2.6.20 and 2.6.21-rc1, i.e. not so long ago.

This is Zach's code, his final call.  I know it was pretty useful early
on, and used to be an actual Kconfig option for VMI.  However, it's
completely disconnected; the setup call to vmi_apply_boot_page_allocations
isn't merged and the page->type field isn't either (no surprise on that),
and some of the VMI_PAGE_ constants have changed names.  Clearly, it is
ripe for bitrot (already has !CONFIG_NEED_MULTIPLE_NODES dependency,
dunno if VMI has the same limitation).  It definitely should not have
a misleading Kconfig name.  I'd nuke it all rather than #if 0.

thanks,
-chris
--

Subject: [PATCH] VMI: remove CONFIG_DEBUG_PAGE_TYPE and associated bitrotted 
code

From: Chris Wright <[EMAIL PROTECTED]>

Remove the poorly named (non Kconfig) CONFIG_DEBUG_PAGE_TYPE compile
time conditional as well as the associated bitrotted debug code.

Cc: "Robert P. J. Day" <[EMAIL PROTECTED]>
Cc: Stefan Richter <[EMAIL PROTECTED]>
Cc: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

diff --git a/arch/i386/kernel/vmi.c b/arch/i386/kernel/vmi.c
index c12720d..45c55e4 100644
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -235,109 +235,6 @@ static void vmi_nop(void)
 {
 }
 
-#ifdef CONFIG_DEBUG_PAGE_TYPE
-
-#ifdef CONFIG_X86_PAE
-#define MAX_BOOT_PTS (2048+4+1)
-#else
-#define MAX_BOOT_PTS (1024+1)
-#endif
-
-/*
- * During boot, mem_map is not yet available in paging_init, so stash
- * all the boot page allocations here.
- */
-static struct {
-   u32 pfn;
-   int type;
-} boot_page_allocations[MAX_BOOT_PTS];
-static int num_boot_page_allocations;
-static int boot_allocations_applied;
-
-void vmi_apply_boot_page_allocations(void)
-{
-   int i;
-   BUG_ON(!mem_map);
-   for (i = 0; i < num_boot_page_allocations; i++) {
-   struct page *page = pfn_to_page(boot_page_allocations[i].pfn);
-   page->type = boot_page_allocations[i].type;
-   page->type = boot_page_allocations[i].type &
-   ~(VMI_PAGE_ZEROED | VMI_PAGE_CLONE);
-   }
-   boot_allocations_applied = 1;
-}
-
-static void record_page_type(u32 pfn, int type)
-{
-   BUG_ON(num_boot_page_allocations >= MAX_BOOT_PTS);
-   boot_page_allocations[num_boot_page_allocations].pfn = pfn;
-   boot_page_allocations[num_boot_page_allocations].type = type;
-   num_boot_page_allocations++;
-}
-
-static void check_zeroed_page(u32 pfn, int type, struct page *page)
-{
-   u32 *ptr;
-   int i;
-   int limit = PAGE_SIZE / sizeof(int);
-
-   if (page_address(page))
-   ptr = (u32 *)page_address(page);
-   else
-   ptr = (u32 *)__va(pfn << PAGE_SHIFT);
-   /*
-* When cloning the root in non-PAE mode, only the userspace
-* pdes need to be zeroed.
-*/
-   if (type & VMI_PAGE_CLONE)
-   limit = USER_PTRS_PER_PGD;
-   for (i = 0; i < limit; i++)
-   BUG_ON(ptr[i]);
-}
-
-/*
- * We stash the page type into struct page so we can verify the page
- * types are used properly.
- */
-static void vmi_set_page_type(u32 pfn, int type)
-{
-   /* PAE can have multiple roots per page - don't track */
-   if (PTRS_PER_PMD > 1 && (type & VMI_PAGE_PDP))
-   return;
-
-   if (boot_allocations_applied) {
-   struct page *page = pfn_to_page(pfn);
-   if (type != VMI_PAGE_NORMAL)
-   BUG_ON(page->type);
-   else
-   BUG_ON(page->type == VMI_PAGE_NORMAL);
-   page->type = type & ~(VMI_PAGE_ZEROED | VMI_PAGE_CLONE);
-   if (type & VMI_PAGE_ZEROED)
-   check_zeroed_page(pfn, type, page);
-   } else {
-   record_page_type(pfn, type);
-   }
-}
-
-static void vmi_check_page_type(u32 pfn, int type)
-{
-   /* PAE can have multiple roots per page - skip checks */
-   if (PTRS_PER_PMD > 1 && (type & VMI_PAGE_PDP))
-   return;
-
-   type &= ~(VMI_PAGE_ZEROED | VMI_PAGE_CLONE);
-   if (boot_allocations_applied) {
-   struct page *page = pfn_to_page(pfn);
-   BUG_ON((page->type ^ type) & VMI_PAGE_PAE);
-   BUG_ON(type == VMI_PAGE_NORMAL && page->type);
-   BUG_ON((type & page-

Re: changing definition of paravirt_ops.iret

2007-05-21 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> I'm implementing a more efficient version of the Xen iret paravirt_op,
> so that it can use the real iret instruction where possible.  I really
> need to get access to per-cpu variables, so I can set the event mask
> state in the vcpu_info structure, but unfortunately at the point where
> INTERRUPT_RETURN is used in entry.S, the usermode %fs has already been
> restored.
> 
> How would you feel if we changed paravirt_ops.iret to make it also
> responsible for restoring %fs? 

This is definitely ad-hoc semantic change, but I don't see a beter
way to do it (other than have iret be restore_regs_and_iret, which
isn't really an improvement).

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Yep, this patch gets rid of my spinning thread.  I can't find this patch
> or any discussion on marc.info; is there a better netdev list archive?

See the "linkwatch bustage in git-net" thread on netdev

http://thread.gmane.org/gmane.linux.network/61800/focus=61812
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

2007-05-02 Thread Chris Wright
* Herbert Xu ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> > ===
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -1213,10 +1213,10 @@ static int netif_poll(struct net_device 
> 
> Any reason why xen-netfront isn't just in a single patch? It makes
> it a bit hard to review having it scattered around like this.

It simply maps directly to the patch queue.  We do go back and fold
things in and that should probably be done again, I agree.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[ANNOUNCE] paravirt_ops i686 Fedora rawhide kernel packages

2007-05-02 Thread Chris Wright
If you are interested in trying the new paravirt_ops kernels, I've
created some packages and uploaded them to:

  http://et.redhat.com/~chrisw/paravirt_ops/

There you'll find a yum repo (see yum.README for details) and an
INSTALL file for getting started.  With this package you can use a
single kernel[1] for both native (w/out a hypervisor, or under a full
virt hypervisor), domU paravirt under Xen, and a VMI guest.  There is no
support for dom0 kernels ATM.  Running the kernel native is as simple as
installing and rebooting.  However, it requires some manual intervention
to get a domU started (see INSTALL for details).

thanks,
-chris

[1] It's not quite a single kernel.  It's two separate binaries built from
the identical vmlinux.  This is due to the Xen domain builder's inability
to boot a bzImage format kernel binary.  Fixing this is on the todo list.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-25 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Eric W. Biederman wrote:
> > Then why you had to allocate enough pages to cause a failure has me stumped.
> > Perhaps there is some other bug?
> 
> Perhaps, but nothing comes to mind. I'll see what happens when I boot
> this kernel on real hardware (rather than kvm).

I was using real hardware with your .config when I reproduced it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: paravirt repo rebased to 2.6.21-rc6-mm1

2007-04-12 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Seems to work OK for native and Xen.  I had to play a bit with the
> paravirt-sched-clock patch to deal with the VMI changes.  Zach, can you
> check that it still works?

Here's what's working for me on UP.  The boot cpu on UP is never getting
the GDT loaded.  Also __KERNEL_PERCPU is a bad selector on UP (ifdef
was Jeremy's idea, I was just hacking entry.S to get things to boot ;-).

diff -r cde7063e34bd arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Wed Apr 11 18:07:02 2007 -0700
+++ b/arch/i386/kernel/cpu/common.c Thu Apr 12 01:36:00 2007 -0700
@@ -644,6 +644,18 @@ struct pt_regs * __devinit idle_regs(str
return regs;
 }
 
+/* Current gdt points %fs at the "master" per-cpu area: after this,
+ * it's on the real one. */
+void switch_to_new_gdt(void)
+{
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(&gdt_descr);
+   asm("mov %0, %%fs" : : "r" (__KERNEL_PERCPU) : "memory");
+}
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
@@ -674,6 +688,7 @@ void __cpuinit cpu_init(void)
}
 
load_idt(&idt_descr);
+   switch_to_new_gdt();
 
/*
 * Set up and load the per-CPU TSS and LDT
diff -r cde7063e34bd arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cWed Apr 11 18:07:02 2007 -0700
+++ b/arch/i386/kernel/smpboot.cThu Apr 12 01:35:05 2007 -0700
@@ -1176,18 +1176,6 @@ void __init native_smp_prepare_cpus(unsi
smp_boot_cpus(max_cpus);
 }
 
-/* Current gdt points %fs at the "master" per-cpu area: after this,
- * it's on the real one. */
-static inline void switch_to_new_gdt(void)
-{
-   struct Xgt_desc_struct gdt_descr;
-
-   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
-   gdt_descr.size = GDT_SIZE - 1;
-   load_gdt(&gdt_descr);
-   asm("mov %0, %%fs" : : "r" (__KERNEL_PERCPU) : "memory");
-}
-
 void __init native_smp_prepare_boot_cpu(void)
 {
unsigned int cpu = smp_processor_id();
diff -r cde7063e34bd include/asm-i386/processor.h
--- a/include/asm-i386/processor.h  Wed Apr 11 18:07:02 2007 -0700
+++ b/include/asm-i386/processor.h  Thu Apr 12 01:37:44 2007 -0700
@@ -777,6 +777,7 @@ extern int sysenter_setup(void);
 extern int sysenter_setup(void);
 
 extern void cpu_set_gdt(int);
+extern void switch_to_new_gdt(void);
 extern void cpu_init(void);
 
 #endif /* __ASM_I386_PROCESSOR_H */
diff -r cde7063e34bd include/asm-i386/segment.h
--- a/include/asm-i386/segment.hWed Apr 11 18:07:02 2007 -0700
+++ b/include/asm-i386/segment.hThu Apr 12 01:45:24 2007 -0700
@@ -75,7 +75,11 @@
 #define __ESPFIX_SS (GDT_ENTRY_ESPFIX_SS * 8)
 
 #define GDT_ENTRY_PERCPU   (GDT_ENTRY_KERNEL_BASE + 15)
+#ifdef CONFIG_SMP
 #define __KERNEL_PERCPU (GDT_ENTRY_PERCPU * 8)
+#else
+#define __KERNEL_PERCPU 0
+#endif
 
 #define GDT_ENTRY_DOUBLEFAULT_TSS  31
 
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: paravirt repo rebased to 2.6.21-rc6-mm1

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge wrote:
> >Seems to work OK for native and Xen.  I had to play a bit with the
> >paravirt-sched-clock patch to deal with the VMI changes.  Zach, can you
> >check that it still works?
> 
> I'm on it.

Not sure about cycles_2_ns...

arch/i386/kernel/built-in.o: In function `activate_vmi':
/home/chrisw/hg/xen/linux-2.6-pv/arch/i386/kernel/vmi.c:894: undefined 
reference to `vmi_sched_clock'

diff -r a5e50a2e914a arch/i386/kernel/vmiclock.c
--- a/arch/i386/kernel/vmiclock.c   Tue Apr 10 16:20:13 2007 -0700
+++ b/arch/i386/kernel/vmiclock.c   Tue Apr 10 17:07:47 2007 -0700
@@ -64,10 +64,10 @@ int vmi_set_wallclock(unsigned long now)
return 0;
 }
 
-/* paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles */
-unsigned long long vmi_get_sched_cycles(void)
-{
-   return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE);
+unsigned long long vmi_sched_clock(void)
+{
+   cycle_t cycles = vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE);
+   return cycles_2_ns(cycles);
 }
 
 /* paravirt_ops.get_cpu_khz = vmi_cpu_khz */
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: paravirt repo rebased to 2.6.21-rc6-mm1

2007-04-10 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Seems to work OK for native and Xen.  I had to play a bit with the
> paravirt-sched-clock patch to deal with the VMI changes.  Zach, can you
> check that it still works?

Cool, thanks for the rebase.  Here's some small fixes.

Minor issue with CONFIG_STACK_UNWIND in -mm vs. pda-to-percpu:

diff -r a5e50a2e914a include/asm-i386/unwind.h
--- a/include/asm-i386/unwind.h Tue Apr 10 16:20:13 2007 -0700
+++ b/include/asm-i386/unwind.h Tue Apr 10 16:34:58 2007 -0700
@@ -71,7 +71,7 @@ static inline void arch_unw_init_blocked
info->regs.xss = __KERNEL_DS;
info->regs.xds = __USER_DS;
info->regs.xes = __USER_DS;
-   info->regs.xfs = __KERNEL_PDA;
+   info->regs.xfs = __KERNEL_PERCPU;
 }
 
 extern asmlinkage int arch_unwind_init_running(struct unwind_frame_info *,

Another issue when !SMP from xen-smp.patch.  leave_mm was SMP only before,
it basically still is except for one call from xen_exit_mmap().  This is
the simplest sol'n.


diff -r a5e50a2e914a include/asm-i386/mmu_context.h
--- a/include/asm-i386/mmu_context.hTue Apr 10 16:20:13 2007 -0700
+++ b/include/asm-i386/mmu_context.hTue Apr 10 16:45:58 2007 -0700
@@ -41,9 +41,11 @@ static inline void enter_lazy_tlb(struct
  */
 static inline void leave_mm (unsigned long cpu)
 {
+#ifdef CONFIG_SMP
if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK)
BUG();
cpu_clear(cpu, per_cpu(cpu_tlbstate, cpu).active_mm->cpu_vm_mask);
+#endif
load_cr3(swapper_pg_dir);
 }
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge wrote:
> >Why not submit a patch to do what you need here?  (The Geode comment is
> >a bit worrying though.)
> 
> Why should VMI add workaround into PIT code?

I'm not sure it's a workaround, seems more like a subtle diff (perhaps
it's just an oversight).  I need to rectify it anyway when merging in
the x86_64 version.  It's the way the x86_64 code is working already.
Shutdown for most clockevents tells device to stop ticking.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> Yes, but unfortunately that is a nop:
> 
> /*
>  * Avoid unnecessary state transitions, as it confuses
>  * Geode / Cyrix based boxen.
>  */
> case CLOCK_EVT_MODE_SHUTDOWN:
> if (evt->mode == CLOCK_EVT_MODE_UNUSED)
> break;
> case CLOCK_EVT_MODE_UNUSED:
> if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
> break;

This one should be fallthrough case during exchange (mode == PERIODIC)

> case CLOCK_EVT_MODE_ONESHOT:
> /* One shot setup */
> outb_p(0x38, PIT_MODE);
> 
> So switching from PIT to VMI does not disable PIT timer interrupts.  
> Thus I have to keep this part of the patch.

Oh, I was looking at this (x86_64 work I have here):

case CLOCK_EVT_MODE_SHUTDOWN:
case CLOCK_EVT_MODE_UNUSED:
outb_p(0x30, PIT_MODE);
outb_p(0, PIT_CH0); /* LSB */
outb_p(0, PIT_CH0); /* MSB */
break;

That's mode 0, not mode 5, but I think the end result is the same.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> >>+void __init vmi_time_init(void)
> >>+{
> >>+   /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
> >>+   outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */
> >
> >That shouldn't be necessary using clockevents.
> 
> Actually, I'm not so sure.  If clockevents simply masks the PIT when 
> disabling it, we still have overhead of keeping the latch in sync, which 
> requires a timer at the PIT frequency.  I can instrument to see how 
> exactly the PIT gets disabled.

It should switch from pit to vmi-timer, and the switch should do the state
transistions on pit to go to unused mode.

> >>+   vmi_time_init_clockevent();
> >>+   setup_irq(0, &vmi_clock_action);
> >>+}
> >>+
> >>+#ifdef CONFIG_X86_LOCAL_APIC
> >>+void __devinit vmi_time_bsp_init(void)
> >>+{
> >>+   /*
> >>+* On APIC systems, we want local timers to fire on each cpu.  We do
> >>+* this by programming LVTT to deliver timer events to the IRQ 
> >>handler
> >>+* for IRQ-0, since we can't re-use the APIC local timer handler
> >>+* without interfering with that code.
> >>+*/
> >>+   clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
> >
> >Why do you do this suspend...
> 
> We need to cancel all pending PIT timer events and restart then local 
> timer, which requires atomically taking over IRQ-0.  We use the IDT gate 
> for IRQ-0 because it is already an exclusive interrupt, but we can't 
> re-use the LVTT IDT gate for local timer since that requires a custom 
> custom SMP interrupt in entry.S.  So we must be absolutely sure when we 
> get an interrupt on IRQ-0 that it came from the VMI local (rather than 
> PIT) delivery path.

OK, this is why it seems odd.  Clockevents should put pit timer into
unused state.

> >>+   local_irq_disable();
> >>+#ifdef CONFIG_X86_SMP
> >>+   /*
> >>+* XXX handle_percpu_irq only defined for SMP; we need to switch over
> >>+* to using it, since this is a local interrupt, which each CPU must
> >>+* handle individually without locking out or dropping simultaneous
> >>+* local timers on other CPUs.  We also don't want to trigger the
> >>+* quirk workaround code for interrupts which gets invoked from
> >>+* handle_percpu_irq via eoi, so we use our own IRQ chip.
> >>+*/
> >>+   set_irq_chip_and_handler_name(0, &vmi_chip, handle_percpu_irq, 
> >>"lvtt");
> >>+#else
> >>+   set_irq_chip_and_handler_name(0, &vmi_chip, handle_edge_irq, "lvtt");
> >>+#endif
> >>+   vmi_wiring = VMI_ALARM_WIRED_LVTT;
> >>+   apic_write(APIC_LVTT, vmi_get_timer_vector());
> >>
> >
> >isn't this just your ->startup?
> 
> Which structure has a ->startup function we can use?  Sorry if this 
> seems ignorant, I'm not quite sure what you mean.

The irq_chip.  IOW, it looks like a liberal sprinkling of LVTT vector
initialization.

> >>+   local_irq_enable();
> >>+   clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
> >...and resume?  Instead of letting clockevents core handle all of that,
> >and just registering right here?
> 
> It wasn't clear that clockevents would issue a resume notify for us; if 
> so we could handle this setup in the callback, but it has to be done on 
> the correct CPU.  I can try it and see if that works.

I would've expected to simply register the clockevents device right here,
and that should do the proper state transitions on the old device, as well
as the new device.  Why do you need resume notify?

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-09 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> diff -r c02ab981c99c arch/i386/kernel/vmiclock.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/arch/i386/kernel/vmiclock.c Mon Apr 09 15:47:17 2007 -0700
> @@ -0,0 +1,318 @@
> +/*
> + * VMI paravirtual timer support routines.
> + *
> + * Copyright (C) 2007, VMware, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "io_ports.h"
> +
> +#define VMI_ONESHOT  (VMI_ALARM_IS_ONESHOT  | VMI_CYCLES_REAL | 
> vmi_get_alarm_wiring())
> +#define VMI_PERIODIC (VMI_ALARM_IS_PERIODIC | VMI_CYCLES_REAL | 
> vmi_get_alarm_wiring())
> +
> +static DEFINE_PER_CPU(struct clock_event_device, local_events);
> +
> +static inline u32 vmi_counter(u32 flags)
> +{
> + /* Given VMI_ONESHOT or VMI_PERIODIC, return the corresponding
> +  * cycle counter. */
> + return flags & VMI_ALARM_COUNTER_MASK;
> +}
> +
> +/* paravirt_ops.get_wallclock = vmi_get_wallclock */

Style nit, these pv_ops.foo = vmi_foo style comments aren't really useful.

> +unsigned long vmi_get_wallclock(void)
> +{
> + unsigned long long wallclock;
> + wallclock = vmi_timer_ops.get_wallclock(); // nsec
> + (void)do_div(wallclock, 10);   // sec
> +
> + return wallclock;
> +}
> +
> +/* paravirt_ops.set_wallclock = vmi_set_wallclock */
> +int vmi_set_wallclock(unsigned long now)
> +{
> + return 0;
> +}
> +
> +/* paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles */
> +unsigned long long vmi_get_sched_cycles(void)
> +{
> + return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE);
> +}
> +
> +/* paravirt_ops.get_cpu_khz = vmi_cpu_khz */
> +unsigned long vmi_cpu_khz(void)
> +{
> + unsigned long long khz;
> + khz = vmi_timer_ops.get_cycle_frequency();
> + (void)do_div(khz, 1000);
> + return khz;
> +}
> +
> +static inline unsigned int vmi_get_timer_vector(void)
> +{
> +#ifdef CONFIG_X86_IO_APIC
> + return FIRST_DEVICE_VECTOR;
> +#else
> + return FIRST_EXTERNAL_VECTOR;
> +#endif
> +}
> +
> +/** vmi clockchip */
> +#ifdef CONFIG_X86_LOCAL_APIC
> +static unsigned int startup_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, vmi_get_timer_vector());
> +
> + return (val & APIC_SEND_PENDING);
> +}
> +
> +static void mask_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, val | APIC_LVT_MASKED);
> +}
> +
> +static void unmask_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, val & ~APIC_LVT_MASKED);
> +}
> +
> +static void ack_timer_irq(unsigned int irq)
> +{
> + ack_APIC_irq();
> +}
> +
> +static struct irq_chip vmi_chip __read_mostly = {
> + .name   = "VMI-LOCAL",
> + .startup= startup_timer_irq,
> + .mask   = mask_timer_irq,
> + .unmask = unmask_timer_irq,
> + .ack= ack_timer_irq
> +};
> +#endif
> +
> +/** vmi clockevent */
> +#define VMI_ALARM_WIRED_IRQ00x
> +#define VMI_ALARM_WIRED_LVTT0x0001
> +static int vmi_wiring = VMI_ALARM_WIRED_IRQ0;
> +
> +static inline int vmi_get_alarm_wiring(void)
> +{
> + return vmi_wiring;  
> +}
> +
> +static void vmi_timer_set_mode(enum clock_event_mode mode,
> +struct clock_event_device *evt)
> +{
> + cycle_t now, cycles_per_hz;
> + BUG_ON(!irqs_disabled());
> +
> + switch (mode) {
> + case CLOCK_EVT_MODE_ONESHOT:
> + break;
> + case CLOCK_EVT_MODE_PERIODIC:
> + cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
> + (void)do_div(cycles_per_hz, HZ);
> + now = 
> vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
> + vmi_timer_ops.set_alarm(VMI_PERIODIC, now, cycles_per_hz);
> + break;
> + case CLOCK_EVT_MODE_UNUSED:
> + case CLOCK_EVT_MODE_SHUTDOWN:
> + switch (evt->mode) {
> + case CLOCK_EVT_MODE_ONESHOT:
> + vmi_timer_ops.cancel_alarm

Re: New CPUID/MSR driver; virtualization hooks

2007-04-05 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> H. Peter Anvin wrote:
> >
> >I actually expect rdmsr/wrmsr to have a higher likelihood of weird 
> >things.  I don't know how many hypervisors can trap and handle those.
> 
> Or implement anything useful at all ... perhaps it might be useful to 
> simply wrap them with an -ENOSYS in some cases.
> 
> Rusty, Jeremy. Chris - any feedback on MSR support?

Drop 'em as far as I'm concerned.  T&E works fine, need to double check
lguest.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-04 Thread Chris Wright
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> Acked-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> for all thats worth since I am not a i386 specialist.
> 
> How much of the issues with page struct sharing between slab and arch code 
> does this address?

I think the answer is 'none yet.'  It uses page sized slab and still
needs pgd_list, for example.  But the mm_list chaining should work too,
so it shouldn't make things any worse.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: New CPUID/MSR driver; virtualization hooks

2007-04-04 Thread Chris Wright
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
> I have finally gotten off the pot and finished writing up my new 
> CPUID/MSR driver, which contains support for registers that need 
> arbitrary GPRs touched.  For i386 vs x86-64 compatibility, both use an 
> x86-64 register image (16 64-bit register fields); this allows 32-bit 
> userspace to access the full 64-bit image if the kernel is 64 bits.
> 
> Anyway, this presumably requires new paravirtualization hooks.  The 
> patch is at:
> 
> http://www.kernel.org/pub/linux/kernel/people/hpa/new-cpuid-msr.patch

Not mirrored out yet
> 
> ... and a git tree is at ...
> 
> http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-cpuidmsr.git;a=summary

Bleah, and gitweb is unhappy ATM too.

> I'm posting this here to give the paravirt maintainers an opportunity to 
> comment.  Presumably the functions that need to be paravirtualized are 
> the ones represented by the functions do_cpuid(), do_rdmsr() and 
> do_wrmsr(): they take a cpu number, an input register image, and an 
> output register image, and return either 0 or -EIO (in case of a trap.)

Yes, so currently cpuid, for example, is like this:

do_cpuid
  cpuid
   __cpuid

Where __cpuid is
native_cpuid() on !CONFIG_PARAVIRT (include/asm-i386/processor.h)
(and this is real asm("cpuid"))
and
paravirt_ops.cpuid() on CONFIG_PARAVIRT (

Without having seen the patch yet, you'll need to make sure
that the final point which is issuing asm("cpuid") is wrapped
and split to CONFIG_PARAVIRT and non CONFIG_PARAVIRT modes.

Similar for rdmsr:

do_rdmsr
  rdmsr_eio
rdmsr_safe

Where rdmsr is paravirtualized
rdmsr is asm("rdmsr") on !CONFIG_PARAVIRT (include/asm-i386/msr.h)
and
paravirt_ops.read_msr() on CONFIG_PARAVIRT (include/asm-i386/paravirt.h)

Similar for do_wrmsr.

Does that answer your question?

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Chris Wright
* Eric W. Biederman ([EMAIL PROTECTED]) wrote:
> Is it truly critical to inline any of these instructions?

I don't have any current measurements.  But we'd been aiming
at getting irq_{en,dis}able to a simple memory write to pda.
But simplicity, maintenance, etc. win over trimming a couple
cycles, so still worth real look.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 20/26] Xen-paravirt_ops: Core Xen implementation

2007-03-19 Thread Chris Wright
* Eric W. Biederman ([EMAIL PROTECTED]) wrote:
> Chris Wright <[EMAIL PROTECTED]> writes:
> 
> > * Ingo Molnar ([EMAIL PROTECTED]) wrote:
> >> >  ENTRY(swapper_pg_dir)
> >> > +.align PAGE_SIZE_asm
> >> >  .fill 1024,4,0
> >> 
> >> does the native kernel lose memory here?
> >
> > Not in my builds.
> 
> Shouldn't the align be before the label.  Otherwise padding
> would be inserted between the label and the data.

Yes, I think you're right.  Thanks
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: todo

2007-03-16 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Chris Wright wrote:
> > Consistently wrap paravirt ops callsites
> > "ugh" - mingo
> 
> Had a thought.  What if we do a kind of reverse/two-way module linkage? 
> Somehow compile each pv-op implementation like a module, and then link
> the appropriate one in at boot time.

This is very similar to something Steve was chatting with me about
this morning.  The idea he was tossing around was something a bit like
an initrd that a load_module analog could link up.  In a sense, it's
similar to the VMI ROM, with the exceptions that the ABI is just created
by the compiler from a normal mutable kernel API and it's linkage with
symbols available on both sides.

> Tricky parts: it would need two-way unresolved references between kernel
> and module, and it would need to be able to run very early in the
> kernel's life.

This is the tricky part, and where Steve and I left off.

> It would also limit us to plain old calls rather than
> any inlining (though that could be done separately).
> 
> On the upside, it removes pv_ops, and it might simplify the question of
> how normal module exports work, since by that time they would just be
> normal kernel functions.  All the calls would be normal direct calls
> rather than indirect.  And it would allow us to free the memory for the
> unused pv-ops backends.

I suspect we could free the unused backends already.  It also has one
negative side-effect, which is promoting external module code that links
with the kernel.  IOW, there's much less incentive to get code merged
if it's just a matter of linking.

thanks,
-chris
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations

2007-03-16 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Chris Wright wrote:
> > I mean like this (bunch of work, for a type check that we're really ignoring
> > anwyay, but this is the idea...)
> 
> Oh, I see.  I think this is the best argument yet for the current
> arrangement...

Heh, like I said it's a bunch of work for literally nothing ;-)
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations

2007-03-16 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Chris Wright wrote:
> > how about __paravirt_nop_start < func < __paravirt_nop_end  and preserve
> > the types?
> >   
> 
> Er?  The reason for the (void *) cast is to stop gcc complaining about
> mismatched pointer types.

I mean like this (bunch of work, for a type check that we're really ignoring
anwyay, but this is the idea...)

diff -r 930fff55070e arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Fri Mar 16 11:09:10 2007 -0700
+++ b/arch/i386/kernel/paravirt.c   Fri Mar 16 14:48:04 2007 -0700
@@ -35,10 +35,60 @@
 #include 
 #include 
 
-/* nop stub */
-void _paravirt_nop(void)
-{
-}
+/* nop stubs */
+void __paravirt_nop pv_nop_arch_setup(void)
+{
+}
+void __paravirt_nop pv_nop_set_lazy_mode(int mode)
+{
+}
+void __paravirt_nop pv_nop_alloc_pt(struct mm_struct *mm, u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_alloc_pd(u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_alloc_pd_clone(u32 pfn, u32 clonepfn, u32 start, 
u32 count)
+{
+}
+void __paravirt_nop pv_nop_release_pt(u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_release_pd(u32 pfn)
+{
+}
+void __paravirt_nop pv_nop_pte_update(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+}
+void __paravirt_nop pv_nop_pte_update_defer(struct mm_struct *mm, unsigned 
long addr, pte_t *ptep)
+{
+}
+void * __paravirt_nop pv_nop_kmap_atomic_pte(struct page *page, enum km_type 
type)
+{
+}
+void __paravirt_nop pv_nop_load_tr_desc(void)
+{
+}
+void __paravirt_nop pv_nop_activate_mm(struct mm_struct *prev, struct 
mm_struct *next)
+{
+}
+void __paravirt_nop pv_nop_dup_mmap(struct mm_struct *oldmm, struct mm_struct 
*mm)
+{
+}
+void __paravirt_nop pv_nop_exit_mmap(struct mm_struct *mm)
+{
+}
+void __paravirt_nop pv_nop_startup_ipi_hook(int phys_apicid, unsigned long 
start_eip, unsigned long start_esp)
+{
+}
+#ifdef CONFIG_X86_LOCAL_APIC
+void __paravirt_nop pv_nop_setup_boot_clock(void)
+{
+}
+void __paravirt_nop pv_nop_setup_secondary_clock(void)
+{
+}
+#endif
 
 static void __init default_banner(void)
 {
@@ -166,7 +216,7 @@ unsigned paravirt_patch_default(u8 type,
if (opfunc == NULL)
/* If there's no function, patch it with a ud2a (BUG) */
ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a);
-   else if (opfunc == paravirt_nop)
+   else if (opfunc >= __paravirt_nop_start || opfunc < __paravirt_nop_end)
/* If the operation is a nop, then nop the callsite */
ret = paravirt_patch_nop();
else if (type == PARAVIRT_PATCH(iret) ||
@@ -521,7 +571,7 @@ struct paravirt_ops paravirt_ops = {
 
.patch = native_patch,
.banner = default_banner,
-   .arch_setup = paravirt_nop,
+   .arch_setup = pv_nop_arch_setup,
.memory_setup = machine_specific_memory_setup,
.get_wallclock = native_get_wallclock,
.set_wallclock = native_set_wallclock,
@@ -577,7 +627,7 @@ struct paravirt_ops paravirt_ops = {
.setup_boot_clock = setup_boot_APIC_clock,
.setup_secondary_clock = setup_secondary_APIC_clock,
 #endif
-   .set_lazy_mode = paravirt_nop,
+   .set_lazy_mode = pv_nop_set_lazy_mode,
 
.pagetable_setup_start = native_pagetable_setup_start,
.pagetable_setup_done = native_pagetable_setup_done,
@@ -587,24 +637,24 @@ struct paravirt_ops paravirt_ops = {
.flush_tlb_single = native_flush_tlb_single,
.flush_tlb_others = native_flush_tlb_others,
 
-   .alloc_pt = paravirt_nop,
-   .alloc_pd = paravirt_nop,
-   .alloc_pd_clone = paravirt_nop,
-   .release_pt = paravirt_nop,
-   .release_pd = paravirt_nop,
+   .alloc_pt = pv_nop_alloc_pt,
+   .alloc_pd = pv_nop_alloc_pd,
+   .alloc_pd_clone = pv_nop_alloc_pd_clone,
+   .release_pt = pv_nop_release_pt,
+   .release_pd = pv_nop_release_pd,
 
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
.set_pmd = native_set_pmd,
-   .pte_update = paravirt_nop,
-   .pte_update_defer = paravirt_nop,
+   .pte_update = pv_nop_pte_update,
+   .pte_update_defer = pv_nop_pte_update_defer,
 
.ptep_get_and_clear = native_ptep_get_and_clear,
 
 #ifdef CONFIG_HIGHPTE
.kmap_atomic_pte = native_kmap_atomic_pte,
 #else
-   .kmap_atomic_pte = paravirt_nop,
+   .kmap_atomic_pte = pv_nop_kmap_atomic_pte,
 #endif
 
 #ifdef CONFIG_X86_PAE
@@ -627,11 +677,11 @@ struct paravirt_ops paravirt_ops = {
.irq_enable_sysexit = native_irq_enable_sysexit,
.iret = native_iret,
 
-   .dup_mmap = paravirt_nop,
-   .exit_mmap = paravirt_nop,
-   .activate_mm = paravirt_nop,
-
-   .startup_ipi_hook = paravirt_nop,
+   .dup_mmap = pv_nop_dup_mmap,
+   .exit_mmap = pv_nop_exit_mmap,
+   .activate_mm = pv_nop_activate_mm,
+
+   .startup_ipi_hook = pv_nop_startup_ipi_hook,
 };
 
 /*
diff -r 930fff55070e arch/i386/kernel/vmlinux.l

[ADMINISTRIVIA] lists header change

2007-03-16 Thread Chris Wright
The Linux Foundation is hosting this list now.  Mailing to
virtualization@lists.osdl.org will continue to work,
but you may need to update your procmail, etc. filters,
I know I did.

thanks,
-chris
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations

2007-03-16 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Ingo Molnar wrote:
> > but only as a cleanup of the current open-coded (void *) casts. My 
> > problem with this is that it loses the types. Not that there is much to 
> > check for, but still, this adds some assumptions about how function 
> > calls look like.
> 
> I agree.  I don't generally like this kind of hack, but having a single
> test for "func == paravirt_nop" to look for nop pv_ops in the patcher is
> what tipped the balance.

how about __paravirt_nop_start < func < __paravirt_nop_end  and preserve
the types?

thanks,
-chris
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization