Re: [PATCH] KVM: ARM: enable Cortex A7 hosts

2013-09-24 Thread Simon Horman
On Thu, Sep 19, 2013 at 02:01:48PM +0200, Ulrich Hecht wrote:
> KVM runs fine on Cortex A7 cores, so they should be enabled. Tested on an
> APE6EVM board (r8a73a4 SoC).
> 
> Signed-off-by: Ulrich Hecht 

Hi Ulrich,

I'm not entirely sure but it seems to me that you should expand the
CC list of this patch someohow as it doesn't seem to have got any
attention in the week since you sent it.


# ./scripts/get_maintainer.pl -f arch/arm/kvm/guest.c
Christoffer Dall  (supporter:KERNEL VIRTUAL MA...)
Gleb Natapov  (supporter:KERNEL VIRTUAL MA...)
Paolo Bonzini  (supporter:KERNEL VIRTUAL MA...)
Russell King  (maintainer:ARM PORT)
kvm...@lists.cs.columbia.edu (open list:KERNEL VIRTUAL MA...)
kvm@vger.kernel.org (open list:KERNEL VIRTUAL MA...)
linux-arm-ker...@lists.infradead.org (moderated list:ARM PORT)
linux-ker...@vger.kernel.org (open list)



> ---
>  arch/arm/kvm/guest.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> index 152d036..05c62d5 100644
> --- a/arch/arm/kvm/guest.c
> +++ b/arch/arm/kvm/guest.c
> @@ -192,6 +192,8 @@ int __attribute_const__ kvm_target_cpu(void)
>   switch (part_number) {
>   case ARM_CPU_PART_CORTEX_A15:
>   return KVM_ARM_TARGET_CORTEX_A15;
> + case ARM_CPU_PART_CORTEX_A7:
> + return KVM_ARM_TARGET_CORTEX_A15;
>   default:
>   return -EINVAL;
>   }
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] & 2.6.39

2011-06-09 Thread Simon Horman
On Thu, Jun 09, 2011 at 01:02:13AM +0800, Brad Campbell wrote:
> On 08/06/11 11:59, Eric Dumazet wrote:
> 
> >Well, a bisection definitely should help, but needs a lot of time in
> >your case.
> 
> Yes. compile, test, crash, walk out to the other building to press
> reset, lather, rinse, repeat.
> 
> I need a reset button on the end of a 50M wire, or a hardware watchdog!

Not strictly on-topic, but in situations where I have machines
that either don't have lights-out facilities or have broken ones
I find that network controlled power switches to be very useful.

At one point I would have need an 8000km long wire to the reset switch :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net

2011-02-23 Thread Simon Horman
On Wed, Feb 23, 2011 at 10:52:09AM +0530, Krishna Kumar2 wrote:
> Simon Horman  wrote on 02/22/2011 01:17:09 PM:
> 
> Hi Simon,
> 
> 
> > I have a few questions about the results below:
> >
> > 1. Are the (%) comparisons between non-mq and mq virtio?
> 
> Yes - mainline kernel with transmit-only MQ patch.
> 
> > 2. Was UDP or TCP used?
> 
> TCP. I had done some initial testing on UDP, but don't have
> the results now as it is really old. But I will be running
> it again.
> 
> > 3. What was the transmit size (-m option to netperf)?
> 
> I didn't use the -m option, so it defaults to 16K. The
> script does:
> 
> netperf -t TCP_STREAM -c -C -l 60 -H $SERVER
> 
> > Also, I'm interested to know what the status of these patches is.
> > Are you planing a fresh series?
> 
> Yes. Michael Tsirkin had wanted to see how the MQ RX patch
> would look like, so I was in the process of getting the two
> working together. The patch is ready and is being tested.
> Should I send a RFC patch at this time?
> 
> The TX-only patch helped the guest TX path but didn't help
> host->guest much (as tested using TCP_MAERTS from the guest).
> But with the TX+RX patch, both directions are getting
> improvements. Remote testing is still to be done.

Hi Krishna,

thanks for clarifying the test results.
I'm looking forward to the forthcoming RFC patches.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net

2011-02-22 Thread Simon Horman
On Wed, Oct 20, 2010 at 02:24:52PM +0530, Krishna Kumar wrote:
> Following set of patches implement transmit MQ in virtio-net.  Also
> included is the user qemu changes.  MQ is disabled by default unless
> qemu specifies it.

Hi Krishna,

I have a few questions about the results below:

1. Are the (%) comparisons between non-mq and mq virtio?
2. Was UDP or TCP used?
3. What was the transmit size (-m option to netperf)?

Also, I'm interested to know what the status of these patches is.
Are you planing a fresh series?

> 
>   Changes from rev2:
>   --
> 1. Define (in virtio_net.h) the maximum send txqs; and use in
>virtio-net and vhost-net.
> 2. vi->sq[i] is allocated individually, resulting in cache line
>aligned sq[0] to sq[n].  Another option was to define
>'send_queue' as:
>struct send_queue {
>struct virtqueue *svq;
>struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
>} cacheline_aligned_in_smp;
>and to statically allocate 'VIRTIO_MAX_SQ' of those.  I hope
>the submitted method is preferable.
> 3. Changed vhost model such that vhost[0] handles RX and vhost[1-MAX]
>handles TX[0-n].
> 4. Further change TX handling such that vhost[0] handles both RX/TX
>for single stream case.
> 
>   Enabling MQ on virtio:
>   ---
> When following options are passed to qemu:
> - smp > 1
> - vhost=on
> - mq=on (new option, default:off)
> then #txqueues = #cpus.  The #txqueues can be changed by using an
> optional 'numtxqs' option.  e.g. for a smp=4 guest:
> vhost=on   ->   #txqueues = 1
> vhost=on,mq=on ->   #txqueues = 4
> vhost=on,mq=on,numtxqs=2   ->   #txqueues = 2
> vhost=on,mq=on,numtxqs=8   ->   #txqueues = 8
> 
> 
>Performance (guest -> local host):
>---
> System configuration:
> Host:  8 Intel Xeon, 8 GB memory
> Guest: 4 cpus, 2 GB memory
> Test: Each test case runs for 60 secs, sum over three runs (except
> when number of netperf sessions is 1, which has 10 runs of 12 secs
> each).  No tuning (default netperf) other than taskset vhost's to
> cpus 0-3.  numtxqs=32 gave the best results though the guest had
> only 4 vcpus (I haven't tried beyond that).
> 
> __ numtxqs=2, vhosts=3  
> #sessions  BW%  CPU%RCPU%SD%  RSD%
> 
> 1  4.46-1.96 .19 -12.50   -6.06
> 2  4.93-1.162.10  0   -2.38
> 4  46.1764.77   33.72 19.51   -2.48
> 8  47.8970.00   36.23 41.4613.35
> 16 48.9780.44   40.67 21.11   -5.46
> 24 49.0378.78   41.22 20.51   -4.78
> 32 51.1177.15   42.42 15.81   -6.87
> 40 51.6071.65   42.43 9.75-8.94
> 48 50.1069.55   42.85 11.80   -5.81
> 64 46.2468.42   42.67 14.18   -3.28
> 80 46.3763.13   41.62 7.43-6.73
> 96 46.4063.31   42.20 9.36-4.78
> 12850.4362.79   42.16 13.11   -1.23
> 
> BW: 37.2%,  CPU/RCPU: 66.3%,41.6%,  SD/RSD: 11.5%,-3.7%
> 
> __ numtxqs=8, vhosts=5  
> #sessions   BW%  CPU% RCPU% SD%  RSD%
> 
> 1   -.76-1.56 2.33  03.03
> 2   17.4111.1111.41 0   -4.76
> 4   42.1255.1130.20 19.51.62
> 8   54.6980.0039.22 24.39-3.88
> 16  54.7781.6240.89 20.34-6.58
> 24  54.6679.6841.57 15.49-8.99
> 32  54.9276.8241.79 17.59-5.70
> 40  51.7968.5640.53 15.31-3.87
> 48  51.7266.4040.84 9.72 -7.13
> 64  51.1163.9441.10 5.93 -8.82
> 80  46.5159.5039.80 9.33 -4.18
> 96  47.7257.7539.84 4.20 -7.62
> 128 54.3558.9540.66 3.24 -8.63
> 
> BW: 38.9%,  CPU/RCPU: 63.0%,40.1%,  SD/RSD: 6.0%,-7.4%
> 
> __ numtxqs=16, vhosts=5  ___
> #sessions   BW%  CPU% RCPU% SD%  RSD%
> 
> 1   -1.43-3.521.55  0  3.03
> 2   33.09 21.63   20.12-10.00 -9.52
> 4   67.17 94.60   44.28 19.51 -11.80
> 8   75.72 108.14  49.15 25.00 -10.71
> 16  80.34 101.77  52.94 25.93 -4.49
> 24  70.84 93.12   43.62 27.63 -5.03
> 32  69.01   

Re: Flow Control and Port Mirroring Revisited

2011-01-23 Thread Simon Horman
On Sun, Jan 23, 2011 at 12:39:02PM +0200, Michael S. Tsirkin wrote:
> On Sun, Jan 23, 2011 at 05:38:49PM +1100, Simon Horman wrote:
> > On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
> > > On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
> > > > On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:

[snip]

> > > > > Hmm, what is this supposed to measure?  Basically each time you run an
> > > > > un-paced UDP_STREAM you get some random load on the network.
> > > > > You can't tell what it was exactly, only that it was between
> > > > > the send and receive throughput.
> > > > 
> > > > Rick mentioned in another email that I messed up my test parameters a 
> > > > bit,
> > > > so I will re-run the tests, incorporating his suggestions.
> > > > 
> > > > What I was attempting to measure was the effect of an unpaced UDP_STREAM
> > > > on the latency of more moderated traffic. Because I am interested in
> > > > what effect an abusive guest has on other guests and how that my be
> > > > mitigated.
> > > > 
> > > > Could you suggest some tests that you feel are more appropriate?
> > > 
> > > Yes. To refraze my concern in these terms, besides the malicious guest
> > > you have another software in host (netperf) that interferes with
> > > the traffic, and it cooperates with the malicious guest.
> > > Right?
> > 
> > Yes, that is the scenario in this test.
> 
> Yes but I think that you want to put some controlled load on host.
> Let's assume that we impove the speed somehow and now you can push more
> bytes per second without loss.  Result might be a regression in your
> test because you let the guest push "as much as it can" and suddenly it
> can push more data through.  OTOH with packet loss the load on host is
> anywhere in between send and receive throughput: there's no easy way to
> measure it from netperf: the earlier some buffers overrun, the earlier
> the packets get dropped and the less the load on host.
> 
> This is why I say that to get a specific
> load on host you want to limit the sender
> to a specific BW and then either
> - make sure packet loss % is close to 0.
> - make sure packet loss % is close to 100%.

Thanks, and sorry for being a bit slow.  I now see what you have
been getting at with regards to limiting the tests.
I will see about getting some numbers based on your suggestions.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-22 Thread Simon Horman
On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
> On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
> > On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote:
> > > > [ Trimmed Eric from CC list as vger was complaining that it is too long 
> > > > ]
> > > > 
> > > > On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > > > > >So it won't be all that simple to implement well, and before we try,
> > > > > >I'd like to know whether there are applications that are helped
> > > > > >by it. For example, we could try to measure latency at various
> > > > > >pps and see whether the backpressure helps. netperf has -b, -w
> > > > > >flags which might help these measurements.
> > > > > 
> > > > > Those options are enabled when one adds --enable-burst to the
> > > > > pre-compilation ./configure  of netperf (one doesn't have to
> > > > > recompile netserver).  However, if one is also looking at latency
> > > > > statistics via the -j option in the top-of-trunk, or simply at the
> > > > > histogram with --enable-histogram on the ./configure and a verbosity
> > > > > level of 2 (global -v 2) then one wants the very top of trunk
> > > > > netperf from:
> > > > 
> > > > Hi,
> > > > 
> > > > I have constructed a test where I run an un-paced  UDP_STREAM test in
> > > > one guest and a paced omni rr test in another guest at the same time.
> > > 
> > > Hmm, what is this supposed to measure?  Basically each time you run an
> > > un-paced UDP_STREAM you get some random load on the network.
> > > You can't tell what it was exactly, only that it was between
> > > the send and receive throughput.
> > 
> > Rick mentioned in another email that I messed up my test parameters a bit,
> > so I will re-run the tests, incorporating his suggestions.
> > 
> > What I was attempting to measure was the effect of an unpaced UDP_STREAM
> > on the latency of more moderated traffic. Because I am interested in
> > what effect an abusive guest has on other guests and how that my be
> > mitigated.
> > 
> > Could you suggest some tests that you feel are more appropriate?
> 
> Yes. To refraze my concern in these terms, besides the malicious guest
> you have another software in host (netperf) that interferes with
> the traffic, and it cooperates with the malicious guest.
> Right?

Yes, that is the scenario in this test.

> IMO for a malicious guest you would send
> UDP packets that then get dropped by the host.
> 
> For example block netperf in host so that
> it does not consume packets from the socket.

I'm more interested in rate-limiting netperf than blocking it.
But in any case, do you mean use iptables or tc based on
classification made by net_cls?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-21 Thread Simon Horman
On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote:
> > [ Trimmed Eric from CC list as vger was complaining that it is too long ]
> > 
> > On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > > >So it won't be all that simple to implement well, and before we try,
> > > >I'd like to know whether there are applications that are helped
> > > >by it. For example, we could try to measure latency at various
> > > >pps and see whether the backpressure helps. netperf has -b, -w
> > > >flags which might help these measurements.
> > > 
> > > Those options are enabled when one adds --enable-burst to the
> > > pre-compilation ./configure  of netperf (one doesn't have to
> > > recompile netserver).  However, if one is also looking at latency
> > > statistics via the -j option in the top-of-trunk, or simply at the
> > > histogram with --enable-histogram on the ./configure and a verbosity
> > > level of 2 (global -v 2) then one wants the very top of trunk
> > > netperf from:
> > 
> > Hi,
> > 
> > I have constructed a test where I run an un-paced  UDP_STREAM test in
> > one guest and a paced omni rr test in another guest at the same time.
> 
> Hmm, what is this supposed to measure?  Basically each time you run an
> un-paced UDP_STREAM you get some random load on the network.
> You can't tell what it was exactly, only that it was between
> the send and receive throughput.

Rick mentioned in another email that I messed up my test parameters a bit,
so I will re-run the tests, incorporating his suggestions.

What I was attempting to measure was the effect of an unpaced UDP_STREAM
on the latency of more moderated traffic. Because I am interested in
what effect an abusive guest has on other guests and how that my be
mitigated.

Could you suggest some tests that you feel are more appropriate?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-20 Thread Simon Horman
[ Trimmed Eric from CC list as vger was complaining that it is too long ]

On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> >So it won't be all that simple to implement well, and before we try,
> >I'd like to know whether there are applications that are helped
> >by it. For example, we could try to measure latency at various
> >pps and see whether the backpressure helps. netperf has -b, -w
> >flags which might help these measurements.
> 
> Those options are enabled when one adds --enable-burst to the
> pre-compilation ./configure  of netperf (one doesn't have to
> recompile netserver).  However, if one is also looking at latency
> statistics via the -j option in the top-of-trunk, or simply at the
> histogram with --enable-histogram on the ./configure and a verbosity
> level of 2 (global -v 2) then one wants the very top of trunk
> netperf from:

Hi,

I have constructed a test where I run an un-paced  UDP_STREAM test in
one guest and a paced omni rr test in another guest at the same time.
Breifly I get the following results from the omni test..

1. Omni test only:  MEAN_LATENCY=272.00
2. Omni and stream test:MEAN_LATENCY=3423.00
3. cpu and net_cls group:   MEAN_LATENCY=493.00
   As per 2 plus cgoups are created for each guest
   and guest tasks added to the groups
4. 100Mbit/s class: MEAN_LATENCY=273.00
   As per 3 plus the net_cls groups each have a 100MBit/s HTB class
5. cpu.shares=128:  MEAN_LATENCY=652.00
   As per 4 plus the cpu groups have cpu.shares set to 128
6. Busy CPUS:   MEAN_LATENCY=15126.00
   As per 5 but the CPUs are made busy using a simple shell while loop

There is a bit of noise in the results as the two netperf invocations
aren't started at exactly the same moment

For reference, my netperf invocations are:
netperf -c -C -t UDP_STREAM -H 172.17.60.216 -l 12
netperf.omni -p 12866 -D -c -C -H 172.17.60.216 -t omni -j -v 2 -- -r 1 -d rr 
-k foo -b 1 -w 200 -m 200

foo contains
PROTOCOL
THROUGHPUT,THROUGHPUT_UNITS
LOCAL_SEND_THROUGHPUT
LOCAL_RECV_THROUGHPUT
REMOTE_SEND_THROUGHPUT
REMOTE_RECV_THROUGHPUT
RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY
P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY
LOCAL_CPU_UTIL,REMOTE_CPU_UTIL

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-19 Thread Simon Horman
On Tue, Jan 18, 2011 at 10:13:33PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > >So it won't be all that simple to implement well, and before we try,
> > >I'd like to know whether there are applications that are helped
> > >by it. For example, we could try to measure latency at various
> > >pps and see whether the backpressure helps. netperf has -b, -w
> > >flags which might help these measurements.
> > 
> > Those options are enabled when one adds --enable-burst to the
> > pre-compilation ./configure  of netperf (one doesn't have to
> > recompile netserver).  However, if one is also looking at latency
> > statistics via the -j option in the top-of-trunk, or simply at the
> > histogram with --enable-histogram on the ./configure and a verbosity
> > level of 2 (global -v 2) then one wants the very top of trunk
> > netperf from:
> > 
> > http://www.netperf.org/svn/netperf2/trunk
> > 
> > to get the recently added support for accurate (netperf level) RTT
> > measuremnts on burst-mode request/response tests.
> > 
> > happy benchmarking,
> > 
> > rick jones

Thanks Rick, that is really helpful.

> > PS - the enhanced latency statistics from -j are only available in
> > the "omni" version of the TCP_RR test.  To get that add a
> > --enable-omni to the ./configure - and in this case both netperf and
> > netserver have to be recompiled.
> 
> 
> Is this TCP only? I would love to get latency data from UDP as well.

At a glance, -- -T UDP is what you are after.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-16 Thread Simon Horman
On Fri, Jan 14, 2011 at 08:54:15AM +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 14, 2011 at 03:35:28PM +0900, Simon Horman wrote:
> > On Fri, Jan 14, 2011 at 06:58:18AM +0200, Michael S. Tsirkin wrote:
> > > On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
> > > > On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> > > > > On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman  
> > > > > wrote:
> > > > > > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> > > > > >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > > > > >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > > > > >> >
> > > > > >> > [ snip ]
> > > > > >> > >
> > > > > >> > > I know that everyone likes a nice netperf result but I agree 
> > > > > >> > > with
> > > > > >> > > Michael that this probably isn't the right question to be 
> > > > > >> > > asking.  I
> > > > > >> > > don't think that socket buffers are a real solution to the flow
> > > > > >> > > control problem: they happen to provide that functionality but 
> > > > > >> > > it's
> > > > > >> > > more of a side effect than anything.  It's just that the 
> > > > > >> > > amount of
> > > > > >> > > memory consumed by packets in the queue(s) doesn't really have 
> > > > > >> > > any
> > > > > >> > > implicit meaning for flow control (think multiple physical 
> > > > > >> > > adapters,
> > > > > >> > > all with the same speed instead of a virtual device and a 
> > > > > >> > > physical
> > > > > >> > > device with wildly different speeds).  The analog in the 
> > > > > >> > > physical
> > > > > >> > > world that you're looking for would be Ethernet flow control.
> > > > > >> > > Obviously, if the question is limiting CPU or memory 
> > > > > >> > > consumption then
> > > > > >> > > that's a different story.
> > > > > >> >
> > > > > >> > Point taken. I will see if I can control CPU (and thus memory) 
> > > > > >> > consumption
> > > > > >> > using cgroups and/or tc.
> > > > > >>
> > > > > >> I have found that I can successfully control the throughput using
> > > > > >> the following techniques
> > > > > >>
> > > > > >> 1) Place a tc egress filter on dummy0
> > > > > >>
> > > > > >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then 
> > > > > >> eth1,
> > > > > >>    this is effectively the same as one of my hacks to the datapath
> > > > > >>    that I mentioned in an earlier mail. The result is that eth1
> > > > > >>    "paces" the connection.
> > > 
> > > This is actually a bug. This means that one slow connection will affect
> > > fast ones. I intend to change the default for qemu to sndbuf=0 : this
> > > will fix it but break your "pacing". So pls do not count on this
> > > behaviour.
> > 
> > Do you have a patch I could test?
> 
> You can (and users already can) just run qemu with sndbuf=0. But if you
> like, below.

Thanks

> > > > > > Further to this, I wonder if there is any interest in providing
> > > > > > a method to switch the action order - using ovs-ofctl is a hack 
> > > > > > imho -
> > > > > > and/or switching the default action order for mirroring.
> > > > > 
> > > > > I'm not sure that there is a way to do this that is correct in the
> > > > > generic case.  It's possible that the destination could be a VM while
> > > > > packets are being mirrored to a physical device or we could be
> > > > > multicasting or some other arbitrarily complex scenario.  Just think
> > > > > of what a physical switch would do if it has ports with two different
> > > > > speeds.
> > > > 
> > > > Yes, I have con

Re: Flow Control and Port Mirroring Revisited

2011-01-13 Thread Simon Horman
On Fri, Jan 14, 2011 at 06:58:18AM +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
> > On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> > > On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman  wrote:
> > > > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> > > >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > > >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > > >> >
> > > >> > [ snip ]
> > > >> > >
> > > >> > > I know that everyone likes a nice netperf result but I agree with
> > > >> > > Michael that this probably isn't the right question to be asking.  
> > > >> > > I
> > > >> > > don't think that socket buffers are a real solution to the flow
> > > >> > > control problem: they happen to provide that functionality but it's
> > > >> > > more of a side effect than anything.  It's just that the amount of
> > > >> > > memory consumed by packets in the queue(s) doesn't really have any
> > > >> > > implicit meaning for flow control (think multiple physical 
> > > >> > > adapters,
> > > >> > > all with the same speed instead of a virtual device and a physical
> > > >> > > device with wildly different speeds).  The analog in the physical
> > > >> > > world that you're looking for would be Ethernet flow control.
> > > >> > > Obviously, if the question is limiting CPU or memory consumption 
> > > >> > > then
> > > >> > > that's a different story.
> > > >> >
> > > >> > Point taken. I will see if I can control CPU (and thus memory) 
> > > >> > consumption
> > > >> > using cgroups and/or tc.
> > > >>
> > > >> I have found that I can successfully control the throughput using
> > > >> the following techniques
> > > >>
> > > >> 1) Place a tc egress filter on dummy0
> > > >>
> > > >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
> > > >>    this is effectively the same as one of my hacks to the datapath
> > > >>    that I mentioned in an earlier mail. The result is that eth1
> > > >>    "paces" the connection.
> 
> This is actually a bug. This means that one slow connection will affect
> fast ones. I intend to change the default for qemu to sndbuf=0 : this
> will fix it but break your "pacing". So pls do not count on this
> behaviour.

Do you have a patch I could test?

> > > > Further to this, I wonder if there is any interest in providing
> > > > a method to switch the action order - using ovs-ofctl is a hack imho -
> > > > and/or switching the default action order for mirroring.
> > > 
> > > I'm not sure that there is a way to do this that is correct in the
> > > generic case.  It's possible that the destination could be a VM while
> > > packets are being mirrored to a physical device or we could be
> > > multicasting or some other arbitrarily complex scenario.  Just think
> > > of what a physical switch would do if it has ports with two different
> > > speeds.
> > 
> > Yes, I have considered that case. And I agree that perhaps there
> > is no sensible default. But perhaps we could make it configurable somehow?
> 
> The fix is at the application level. Run netperf with -b and -w flags to
> limit the speed to a sensible value.

Perhaps I should have stated my goals more clearly.
I'm interested in situations where I don't control the application.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-13 Thread Simon Horman
On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman  wrote:
> > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> >> >
> >> > [ snip ]
> >> > >
> >> > > I know that everyone likes a nice netperf result but I agree with
> >> > > Michael that this probably isn't the right question to be asking.  I
> >> > > don't think that socket buffers are a real solution to the flow
> >> > > control problem: they happen to provide that functionality but it's
> >> > > more of a side effect than anything.  It's just that the amount of
> >> > > memory consumed by packets in the queue(s) doesn't really have any
> >> > > implicit meaning for flow control (think multiple physical adapters,
> >> > > all with the same speed instead of a virtual device and a physical
> >> > > device with wildly different speeds).  The analog in the physical
> >> > > world that you're looking for would be Ethernet flow control.
> >> > > Obviously, if the question is limiting CPU or memory consumption then
> >> > > that's a different story.
> >> >
> >> > Point taken. I will see if I can control CPU (and thus memory) 
> >> > consumption
> >> > using cgroups and/or tc.
> >>
> >> I have found that I can successfully control the throughput using
> >> the following techniques
> >>
> >> 1) Place a tc egress filter on dummy0
> >>
> >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
> >>    this is effectively the same as one of my hacks to the datapath
> >>    that I mentioned in an earlier mail. The result is that eth1
> >>    "paces" the connection.
> >
> > Further to this, I wonder if there is any interest in providing
> > a method to switch the action order - using ovs-ofctl is a hack imho -
> > and/or switching the default action order for mirroring.
> 
> I'm not sure that there is a way to do this that is correct in the
> generic case.  It's possible that the destination could be a VM while
> packets are being mirrored to a physical device or we could be
> multicasting or some other arbitrarily complex scenario.  Just think
> of what a physical switch would do if it has ports with two different
> speeds.

Yes, I have considered that case. And I agree that perhaps there
is no sensible default. But perhaps we could make it configurable somehow?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-12 Thread Simon Horman
On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > 
> > [ snip ]
> > > 
> > > I know that everyone likes a nice netperf result but I agree with
> > > Michael that this probably isn't the right question to be asking.  I
> > > don't think that socket buffers are a real solution to the flow
> > > control problem: they happen to provide that functionality but it's
> > > more of a side effect than anything.  It's just that the amount of
> > > memory consumed by packets in the queue(s) doesn't really have any
> > > implicit meaning for flow control (think multiple physical adapters,
> > > all with the same speed instead of a virtual device and a physical
> > > device with wildly different speeds).  The analog in the physical
> > > world that you're looking for would be Ethernet flow control.
> > > Obviously, if the question is limiting CPU or memory consumption then
> > > that's a different story.
> > 
> > Point taken. I will see if I can control CPU (and thus memory) consumption
> > using cgroups and/or tc.
> 
> I have found that I can successfully control the throughput using
> the following techniques
> 
> 1) Place a tc egress filter on dummy0
> 
> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
>this is effectively the same as one of my hacks to the datapath
>that I mentioned in an earlier mail. The result is that eth1
>"paces" the connection.

Further to this, I wonder if there is any interest in providing
a method to switch the action order - using ovs-ofctl is a hack imho -
and/or switching the default action order for mirroring.

> 3) 2) + place a tc egress filter on eth1
> 
> Which mostly makes sense to me although I am a little confused about
> why 1) needs a filter on dummy0 (a filter on eth1 has no effect)
> but 3) needs a filter on eth1 (a filter on dummy0 has no effect,
> even if the skb is sent to dummy0 last.
> 
> I also had some limited success using CPU cgroups, though obviously
> that targets CPU usage and thus the effect on throughput is fairly course.
> In short, its a useful technique but not one that bares further
> discussion here.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-10 Thread Simon Horman
On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> 
> [ snip ]
> > 
> > I know that everyone likes a nice netperf result but I agree with
> > Michael that this probably isn't the right question to be asking.  I
> > don't think that socket buffers are a real solution to the flow
> > control problem: they happen to provide that functionality but it's
> > more of a side effect than anything.  It's just that the amount of
> > memory consumed by packets in the queue(s) doesn't really have any
> > implicit meaning for flow control (think multiple physical adapters,
> > all with the same speed instead of a virtual device and a physical
> > device with wildly different speeds).  The analog in the physical
> > world that you're looking for would be Ethernet flow control.
> > Obviously, if the question is limiting CPU or memory consumption then
> > that's a different story.
> 
> Point taken. I will see if I can control CPU (and thus memory) consumption
> using cgroups and/or tc.

I have found that I can successfully control the throughput using
the following techniques

1) Place a tc egress filter on dummy0

2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
   this is effectively the same as one of my hacks to the datapath
   that I mentioned in an earlier mail. The result is that eth1
   "paces" the connection.

3) 2) + place a tc egress filter on eth1

Which mostly makes sense to me although I am a little confused about
why 1) needs a filter on dummy0 (a filter on eth1 has no effect)
but 3) needs a filter on eth1 (a filter on dummy0 has no effect,
even if the skb is sent to dummy0 last.

I also had some limited success using CPU cgroups, though obviously
that targets CPU usage and thus the effect on throughput is fairly course.
In short, its a useful technique but not one that bares further
discussion here.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:

[ snip ]
> 
> I know that everyone likes a nice netperf result but I agree with
> Michael that this probably isn't the right question to be asking.  I
> don't think that socket buffers are a real solution to the flow
> control problem: they happen to provide that functionality but it's
> more of a side effect than anything.  It's just that the amount of
> memory consumed by packets in the queue(s) doesn't really have any
> implicit meaning for flow control (think multiple physical adapters,
> all with the same speed instead of a virtual device and a physical
> device with wildly different speeds).  The analog in the physical
> world that you're looking for would be Ethernet flow control.
> Obviously, if the question is limiting CPU or memory consumption then
> that's a different story.

Point taken. I will see if I can control CPU (and thus memory) consumption
using cgroups and/or tc.

> This patch also double counts memory, since the full size of the
> packet will be accounted for by each clone, even though they share the
> actual packet data.  Probably not too significant here but it might be
> when flooding/mirroring to many interfaces.  This is at least fixable
> (the Xen-style accounting through page tracking deals with it, though
> it has its own problems).

Agreed on all counts.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 02:28:18PM +0100, Eric Dumazet wrote:
> Le jeudi 06 janvier 2011 à 21:44 +0900, Simon Horman a écrit :
> 
> > Hi Eric !
> > 
> > Thanks for the advice. I had thought about the socket buffer but at some
> > point it slipped my mind.
> > 
> > In any case the following patch seems to implement the change that I had in
> > mind. However my discussions Michael Tsirkin elsewhere in this thread are
> > beginning to make me think that think that perhaps this change isn't the
> > best solution.
> > 
> > diff --git a/datapath/actions.c b/datapath/actions.c
> > index 5e16143..505f13f 100644
> > --- a/datapath/actions.c
> > +++ b/datapath/actions.c
> > @@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, 
> > struct sk_buff *skb,
> >  
> > for (a = actions, rem = actions_len; rem > 0; a = nla_next(a, &rem)) {
> > if (prev_port != -1) {
> > -   do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
> > +   struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
> > +   if (nskb) {
> > +   if (skb->sk)
> > +   skb_set_owner_w(nskb, skb->sk);
> > +   do_output(dp, nskb, prev_port);
> > +   }
> > prev_port = -1;
> > }
> > 
> > I got a rather nasty panic without the if (skb->sk),
> > I guess some skbs don't have a socket.
> 
> Indeed, some packets are not linked to a socket.
> 
> (ARP packets for example)
> 
> Sorry, I should have mentioned it :)

Not at all, the occasional panic during hacking is good for the soul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 11:22:42AM +0100, Eric Dumazet wrote:
> Le jeudi 06 janvier 2011 à 18:33 +0900, Simon Horman a écrit :
> > Hi,
> > 
> > Back in October I reported that I noticed a problem whereby flow control
> > breaks down when openvswitch is configured to mirror a port[1].
> > 
> > I have (finally) looked into this further and the problem appears to relate
> > to cloning of skbs, as Jesse Gross originally suspected.
> > 
> > More specifically, in do_execute_actions[2] the first n-1 times that an skb
> > needs to be transmitted it is cloned first and the final time the original
> > skb is used.
> > 
> > In the case that there is only one action, which is the normal case, then
> > the original skb will be used. But in the case of mirroring the cloning
> > comes into effect. And in my case the cloned skb seems to go to the (slow)
> > eth1 interface while the original skb goes to the (fast) dummy0 interface
> > that I set up to be a mirror. The result is that dummy0 "paces" the flow,
> > and its a cracking pace at that.
> > 
> > As an experiment I hacked do_execute_actions() to use the original skb
> > for the first action instead of the last one.  In my case the result was
> > that eth1 "paces" the flow, and things work reasonably nicely.
> > 
> > Well, sort of. Things work well for non-GSO skbs but extremely poorly for
> > GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
> > netserv. I'm unsure why, but I digress.
> > 
> > It seems to me that my hack illustrates the point that the flow ends up
> > being "paced" by one interface. However I think that what would be
> > desirable is that the flow is "paced" by the slowest link. Unfortunately
> > I'm unsure how to achieve that.
> > 
> 
> Hi Simon !
> 
> "pacing" is done because skb is attached to a socket, and a socket has a
> limited (but configurable) sndbuf. sk->sk_wmem_alloc is the current sum
> of all truesize skbs in flight.
> 
> When you enter something that :
> 
> 1) Get a clone of the skb, queue the clone to device X
> 2) queue the original skb to device Y
> 
> Then :Socket sndbuf is not affected at all by device X queue.
>   This is speed on device Y that matters.
> 
> You want to get servo control on both X and Y
> 
> You could try to
> 
> 1) Get a clone of skb
>Attach it to socket too (so that socket get a feedback of final
> orphaning for the clone) with skb_set_owner_w()
>queue the clone to device X
> 
> Unfortunatly, stacked skb->destructor() makes this possible only for
> known destructor (aka sock_wfree())

Hi Eric !

Thanks for the advice. I had thought about the socket buffer but at some
point it slipped my mind.

In any case the following patch seems to implement the change that I had in
mind. However my discussions Michael Tsirkin elsewhere in this thread are
beginning to make me think that think that perhaps this change isn't the
best solution.

diff --git a/datapath/actions.c b/datapath/actions.c
index 5e16143..505f13f 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, struct 
sk_buff *skb,
 
for (a = actions, rem = actions_len; rem > 0; a = nla_next(a, &rem)) {
if (prev_port != -1) {
-   do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
+   struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+   if (nskb) {
+   if (skb->sk)
+   skb_set_owner_w(nskb, skb->sk);
+   do_output(dp, nskb, prev_port);
+   }
prev_port = -1;
}

I got a rather nasty panic without the if (skb->sk),
I guess some skbs don't have a socket.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 02:07:22PM +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 06, 2011 at 08:30:52PM +0900, Simon Horman wrote:
> > On Thu, Jan 06, 2011 at 12:27:55PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
> > > > Hi,
> > > > 
> > > > Back in October I reported that I noticed a problem whereby flow control
> > > > breaks down when openvswitch is configured to mirror a port[1].
> > > 
> > > Apropos the UDP flow control.  See this
> > > http://www.spinics.net/lists/netdev/msg150806.html
> > > for some problems it introduces.
> > > Unfortunately UDP does not have built-in flow control.
> > > At some level it's just conceptually broken:
> > > it's not present in physical networks so why should
> > > we try and emulate it in a virtual network?
> > > 
> > > 
> > > Specifically, when you do:
> > > # netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
> > > You are asking: what happens if I push data faster than it can be 
> > > received?
> > > But why is this an interesting question?
> > > Ask 'what is the maximum rate at which I can send data with %X packet
> > > loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
> > > -b and -w flags for this. It needs to be configured
> > > with --enable-intervals=yes for them to work.
> > > 
> > > If you pose the questions this way the problem of pacing
> > > the execution just goes away.
> > 
> > I am aware that UDP inherently lacks flow control.
> 
> Everyone's is aware of that, but this is always followed by a 'however'
> :).
> 
> > The aspect of flow control that I am interested in is situations where the
> > guest can create large amounts of work for the host. However, it seems that
> > in the case of virtio with vhostnet that the CPU utilisation seems to be
> > almost entirely attributable to the vhost and qemu-system processes.  And
> > in the case of virtio without vhost net the CPU is used by the qemu-system
> > process. In both case I assume that I could use a cgroup or something
> > similar to limit the guests.
> 
> cgroups, yes. the vhost process inherits the cgroups
> from the qemu process so you can limit them all.
> 
> If you are after limiting the max troughput of the guest
> you can do this with cgroups as well.

Do you mean a CPU cgroup or something else?

> > Assuming all of that is true then from a resource control problem point of
> > view, which is mostly what I am concerned about, the problem goes away.
> > However, I still think that it would be nice to resolve the situation I
> > described.
> 
> We need to articulate what's wrong here, otherwise we won't
> be able to resolve the situation. We are sending UDP packets
> as fast as we can and some receivers can't cope. Is this the problem?
> We have made attempts to add a pseudo flow control in the past
> in an attempt to make UDP on the same host work better.
> Maybe they help some but they also sure introduce problems.

In the case where port mirroring is not active, which is the
usual case, to some extent there is flow control in place due to
(as Eric Dumazet pointed out) the socket buffer.

When port mirroring is activated the flow control operates based
only on one port - which can't be controlled by the administrator
in an obvious way.

I think that it would be more intuitive if flow control was
based on sending a packet to all ports rather than just one.

Though now I think about it some more, perhaps this isn't the best either.
For instance the case where data was being sent to dummy0 and suddenly
adding a mirror on eth1 slowed everything down.

So perhaps there needs to be another knob to tune when setting
up port-mirroring. Or perhaps the current situation isn't so bad.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 12:27:55PM +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
> > Hi,
> > 
> > Back in October I reported that I noticed a problem whereby flow control
> > breaks down when openvswitch is configured to mirror a port[1].
> 
> Apropos the UDP flow control.  See this
> http://www.spinics.net/lists/netdev/msg150806.html
> for some problems it introduces.
> Unfortunately UDP does not have built-in flow control.
> At some level it's just conceptually broken:
> it's not present in physical networks so why should
> we try and emulate it in a virtual network?
> 
> 
> Specifically, when you do:
> # netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
> You are asking: what happens if I push data faster than it can be received?
> But why is this an interesting question?
> Ask 'what is the maximum rate at which I can send data with %X packet
> loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
> -b and -w flags for this. It needs to be configured
> with --enable-intervals=yes for them to work.
> 
> If you pose the questions this way the problem of pacing
> the execution just goes away.

I am aware that UDP inherently lacks flow control.

The aspect of flow control that I am interested in is situations where the
guest can create large amounts of work for the host. However, it seems that
in the case of virtio with vhostnet that the CPU utilisation seems to be
almost entirely attributable to the vhost and qemu-system processes.  And
in the case of virtio without vhost net the CPU is used by the qemu-system
process. In both case I assume that I could use a cgroup or something
similar to limit the guests.

Assuming all of that is true then from a resource control problem point of
view, which is mostly what I am concerned about, the problem goes away.
However, I still think that it would be nice to resolve the situation I
described.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
Hi,

Back in October I reported that I noticed a problem whereby flow control
breaks down when openvswitch is configured to mirror a port[1].

I have (finally) looked into this further and the problem appears to relate
to cloning of skbs, as Jesse Gross originally suspected.

More specifically, in do_execute_actions[2] the first n-1 times that an skb
needs to be transmitted it is cloned first and the final time the original
skb is used.

In the case that there is only one action, which is the normal case, then
the original skb will be used. But in the case of mirroring the cloning
comes into effect. And in my case the cloned skb seems to go to the (slow)
eth1 interface while the original skb goes to the (fast) dummy0 interface
that I set up to be a mirror. The result is that dummy0 "paces" the flow,
and its a cracking pace at that.

As an experiment I hacked do_execute_actions() to use the original skb
for the first action instead of the last one.  In my case the result was
that eth1 "paces" the flow, and things work reasonably nicely.

Well, sort of. Things work well for non-GSO skbs but extremely poorly for
GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
netserv. I'm unsure why, but I digress.

It seems to me that my hack illustrates the point that the flow ends up
being "paced" by one interface. However I think that what would be
desirable is that the flow is "paced" by the slowest link. Unfortunately
I'm unsure how to achieve that.

One idea that I had was to skb_get() the original skb each time it is
cloned - that is easy enough. But unfortunately it seems to me that
approach would require some sort of callback mechanism in kfree_skb() so
that the cloned skbs can kfree_skb() the original skb.

Ideas would be greatly appreciated.

[1] 
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html
[2] 
http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/actions.c;h=5e16143ca402f7da0ee8fc18ee5eb16c3b7598e6;hb=HEAD
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] Flow Control and Port Mirroring

2010-11-07 Thread Simon Horman
On Mon, Nov 08, 2010 at 01:41:23PM +1030, Rusty Russell wrote:
> On Sat, 30 Oct 2010 01:29:33 pm Simon Horman wrote:
> > [ CCed VHOST contacts ]
> > 
> > On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
> > > On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman  wrote:
> > > > My reasoning is that in the non-mirroring case the guest is
> > > > limited by the external interface through wich the packets
> > > > eventually flow - that is 1Gbit/s. But in the mirrored either
> > > > there is no flow control or the flow control is acting on the
> > > > rate of dummy0, which is essentailly infinate.
> > > >
> > > > Before investigating this any further I wanted to ask if
> > > > this behaviour is intentional.
> > > 
> > > It's not intentional but I can take a guess at what is happening.
> > > 
> > > When we send the packet to a mirror, the skb is cloned but only the
> > > original skb is charged to the sender.  If the original packet is
> > > delivered to localhost then it will be freed quickly and no longer
> > > accounted for, despite the fact that the "real" packet is still
> > > sitting in the transmit queue on the NIC.  The UDP stack will then
> > > send the next packet, limited only by the speed of the CPU.
> > 
> > That would explain what I have observed.
> 
> I can't find the thread (what is ovs-dev?),

Sorry, yes its on ovs-dev.
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html

> but I think the tap device
> has this fundamental feature: you can blast as many packets as you want
> through it.
> 
> If that's a bad thing, we have to look harder...

There does seem to be flow control in the non-mirrored case.
So I suspect its occurring at the skb level but that breaks down when
a clone occurs. It would seem that fragment level flow control would
help this problem (which is basically what Xen's netback/netfront has),
but by this point I am speculating wildly.  I'll try and find out exactly
where the problem is occurring in order for us to have a more informed
discussion.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] Flow Control and Port Mirroring

2010-10-29 Thread Simon Horman
[ CCed VHOST contacts ]

On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
> On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman  wrote:
> > My reasoning is that in the non-mirroring case the guest is
> > limited by the external interface through wich the packets
> > eventually flow - that is 1Gbit/s. But in the mirrored either
> > there is no flow control or the flow control is acting on the
> > rate of dummy0, which is essentailly infinate.
> >
> > Before investigating this any further I wanted to ask if
> > this behaviour is intentional.
> 
> It's not intentional but I can take a guess at what is happening.
> 
> When we send the packet to a mirror, the skb is cloned but only the
> original skb is charged to the sender.  If the original packet is
> delivered to localhost then it will be freed quickly and no longer
> accounted for, despite the fact that the "real" packet is still
> sitting in the transmit queue on the NIC.  The UDP stack will then
> send the next packet, limited only by the speed of the CPU.

That would explain what I have observed.

> Normally, this would be tracked by accounting for the memory charged
> to the socket.  However, I know that Xen tracks whether the actual
> pages of memory have been freed, which should avoid this problem since
> the memory won't be released util the last packet has been sent.  I
> don't know what KVM virtio does but I'm guessing that it similar to
> the former, since this problem is occurring.

I am also familiar of how Xen tracks pages but less sure of the
virtio side of things.

> While it would be easy to charge the socket for all clones, I also
> want to be careful about over accounting of the same data, leading to
> a very small effective socket buffer.

Agreed, we don't want to see over-charging.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] device assignment: default requires IOMMU

2009-12-23 Thread Simon Horman
On Thu, Dec 24, 2009 at 02:56:00PM +0800, Sheng Yang wrote:
> On Thursday 24 December 2009 14:51:23 Simon Horman wrote:
> > On Thu, Dec 24, 2009 at 01:45:34AM +0100, Alexander Graf wrote:
> > > Am 23.12.2009 um 23:40 schrieb Chris Wright :
> > > >[ resend, fixing email header, sorry for duplicate ]
> > > >
> > > >The default mode for device assignment is to rely on an IOMMU for
> > > >proper translations and a functioning device in the guest.  The
> > > >current
> > > >logic makes this requirement advisory, and simply disables the request
> > > >for IOMMU if one is not found on the host.  This makes for a confused
> > > >user when the device assignment appears to work, but the device in the
> > > >guest is not functioning  (I've seen about a half-dozen reports with
> > > >this failure mode).
> > > >
> > > >Change the logic such that the default requires the IOMMU.  Period.
> > > >If the host does not have an IOMMU, device assignment will fail.
> > > >
> > > >This is a user visible change, however I think the current
> > > >situation is
> > > >simply broken.
> > > >
> > > >And, of course, disabling the IOMMU requirement using the old:
> > > >
> > > >  -pcidevice host=[addr],dma=none
> > > >
> > > >or the newer:
> > > >
> > > >  -device pci-assign,host=[addr],iommu=0
> > > >
> > > >will do what it always did (not require an IOMMU, and fail to work
> > > >properly).
> > >
> > > Yay!
> > 
> > Sounds good to me. Though I am curious to know the reasoning
> > behind the current logic.
> > 
> Sounds pretty good. :)
> 
> I think maybe it due to we are interested in implementing PV DMA?

Ok, that would explain it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] device assignment: default requires IOMMU

2009-12-23 Thread Simon Horman
On Thu, Dec 24, 2009 at 01:45:34AM +0100, Alexander Graf wrote:
> 
> Am 23.12.2009 um 23:40 schrieb Chris Wright :
> 
> >[ resend, fixing email header, sorry for duplicate ]
> >
> >The default mode for device assignment is to rely on an IOMMU for
> >proper translations and a functioning device in the guest.  The
> >current
> >logic makes this requirement advisory, and simply disables the request
> >for IOMMU if one is not found on the host.  This makes for a confused
> >user when the device assignment appears to work, but the device in the
> >guest is not functioning  (I've seen about a half-dozen reports with
> >this failure mode).
> >
> >Change the logic such that the default requires the IOMMU.  Period.
> >If the host does not have an IOMMU, device assignment will fail.
> >
> >This is a user visible change, however I think the current
> >situation is
> >simply broken.
> >
> >And, of course, disabling the IOMMU requirement using the old:
> >
> >  -pcidevice host=[addr],dma=none
> >
> >or the newer:
> >
> >  -device pci-assign,host=[addr],iommu=0
> >
> >will do what it always did (not require an IOMMU, and fail to work
> >properly).
> 
> Yay!

Sounds good to me. Though I am curious to know the reasoning
behind the current logic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Simon Horman
On Thu, Feb 19, 2009 at 10:06:17PM +1030, Rusty Russell wrote:
> On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
> > On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > > 
> > > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > > normal NICs.  So far I have some very sketched-out patches: for the
> > > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > > which returned it to the guest.  This bypasses all firewalling in the
> > > host though; we're basically having the guest process drive the NIC
> > > directly.
> > 
> > Hi Rusty,
> > 
> > Can I clarify that the idea with utilising SR-IOV would be to assign
> > virtual functions to guests? That is, something conceptually similar to
> > PCI pass-through in Xen (although I'm not sure that anyone has virtual
> > function pass-through working yet).
> 
> Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it
> makes migrate complicated (if not impossible), and requires emulation or
> the same NIC on the destination host.
> 
> This would be the *host* seeing the virtual functions as multiple NICs,
> then the ability to attach a given NIC directly to a process.
> 
> This isn't guest-visible: the kvm process is configured to connect
> directly to a NIC, rather than (say) bridging through the host.

Hi Rusty, Hi Chris,

Thanks for the clarification.

I think that the approach that Xen recommends for migration is to
use a bonding device that accesses the pass-through device if present
and a virtual nic.

The idea that you outline above does sound somewhat cleaner :-)

> > If so, wouldn't this also be useful on machines that have multiple
> > NICs?
> 
> Yes, but mainly as a benchmark hack AFAICT :)

Ok, I was under the impression that at least in the Xen world it
was something people actually used. But I could easily be mistaken.

> Hope that clarifies, Rusty.

On Thu, Feb 19, 2009 at 03:37:52AM -0800, Chris Wright wrote:
> * Simon Horman (ho...@verge.net.au) wrote:
> > On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > > normal NICs.  So far I have some very sketched-out patches: for the
> > > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > > which returned it to the guest.  This bypasses all firewalling in the
> > > host though; we're basically having the guest process drive the NIC
> > > directly.
> > 
> > Can I clarify that the idea with utilising SR-IOV would be to assign
> > virtual functions to guests? That is, something conceptually similar to
> > PCI pass-through in Xen (although I'm not sure that anyone has virtual
> > function pass-through working yet). If so, wouldn't this also be useful
> > on machines that have multiple NICs?
> 
> This would be the typical usecase for sr-iov.  But I think Rusty is
> referring to giving a nic "directly" to a guest but the guest is still
> seeing a virtio nic (not pass-through/device-assignment).  So there's
> no bridge, and zero copy so the dma buffers are supplied by guest,
> but host has the driver for the physical nic or the VF.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-18 Thread Simon Horman
On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> 
> 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> other multiqueue nics, but for boutique cases or benchmarks, could be for
> normal NICs.  So far I have some very sketched-out patches: for the
> attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> them via some kind of AIO interface), and a branch in netif_receive_skb()
> which returned it to the guest.  This bypasses all firewalling in the
> host though; we're basically having the guest process drive the NIC
> directly.

Hi Rusty,

Can I clarify that the idea with utilising SR-IOV would be to assign
virtual functions to guests? That is, something conceptually similar to
PCI pass-through in Xen (although I'm not sure that anyone has virtual
function pass-through working yet). If so, wouldn't this also be useful
on machines that have multiple NICs?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'

2008-11-13 Thread Simon Horman
ne PCI_DMA_NONE 3
>  
> -#define DEVICE_COUNT_RESOURCE12
> +/*
> + *  For PCI devices, the region numbers are assigned this way:
> + */
> +enum {
> + /* #0-5: standard PCI regions */
> + PCI_STD_RESOURCES,
> + PCI_STD_RESOURCES_END = 5,
> +
> + /* #6: expansion ROM */
> + PCI_ROM_RESOURCE,
> +
> + /* address space assigned to buses behind the bridge */
> +#ifndef PCI_BRIDGE_RES_NUM
> +#define PCI_BRIDGE_RES_NUM 4
> +#endif


  Is there any intention to ever set PCI_BRIDGE_RES_NUM to any
  value other than 4? I'm confused about why it is protected
  by #ifndef as I can't find it declared anywhere else.

> + PCI_BRIDGE_RESOURCES,
> + PCI_BRIDGE_RES_END = PCI_BRIDGE_RESOURCES + PCI_BRIDGE_RES_NUM - 1,
> +
> + /* total resources associated with a PCI device */
> + PCI_NUM_RESOURCES,
> +
> + /* preserve this for compatibility */
> + DEVICE_COUNT_RESOURCE
> +};
>  
>  typedef int __bitwise pci_power_t;
>  
> @@ -262,18 +285,6 @@ static inline void pci_add_saved_cap(struct pci_dev 
> *pci_dev,
>   hlist_add_head(&new_cap->next, &pci_dev->saved_cap_space);
>  }
>  
> -/*
> - *  For PCI devices, the region numbers are assigned this way:
> - *
> - *   0-5 standard PCI regions
> - *   6   expansion ROM
> - *   7-10bridges: address space assigned to buses behind the bridge
> - */
> -
> -#define PCI_ROM_RESOURCE 6
> -#define PCI_BRIDGE_RESOURCES 7
> -#define PCI_NUM_RESOURCES11
> -
>  #ifndef PCI_BUS_NUM_RESOURCES
>  #define PCI_BUS_NUM_RESOURCES16
>  #endif

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git repository for SR-IOV development?

2008-11-06 Thread Simon Horman
On Thu, Nov 06, 2008 at 11:58:25AM -0800, H L wrote:
> --- On Thu, 11/6/08, Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Nov 06, 2008 at 08:51:09AM -0800, H L wrote:
> > > 
> > > Has anyone initiated or given consideration to the
> > creation of a git
> > > repository (say, on kernel.org) for SR-IOV
> > development?
> > 
> > Why?  It's only a few patches, right?  Why would it
> > need a whole new git
> > tree?
> 
> 
> So as to minimize the time and effort patching a kernel, especially if
> the tree (and/or hash level) against which the patches were created fails
> to be specified on a mailing-list.  Plus, there appears to be questions
> raised on how, precisely, the implementation should ultimately be modeled
> and especially given that, who knows at this point what number of patches
> will ultimately be submitted?  I know I've built the "7-patch" one
> (painfully, by the way), and I'm aware there's another 15-patch set out
> there which I've not yet examined.

FWIW, the v6 patch series (this thread) applied to both 2.6.28-rc3
and the current Linus tree after a minor tweak to the first patch, as below.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

From: Yu Zhao <[EMAIL PROTECTED]>

[PATCH 1/16 v6] PCI: remove unnecessary arg of pci_update_resource()

This cleanup removes unnecessary argument 'struct resource *res' in
pci_update_resource(), so it takes same arguments as other companion
functions (pci_assign_resource(), etc.).

Cc: Alex Chiang <[EMAIL PROTECTED]>
Cc: Grant Grundler <[EMAIL PROTECTED]>
Cc: Greg KH <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Jesse Barnes <[EMAIL PROTECTED]>
Cc: Matthew Wilcox <[EMAIL PROTECTED]>
Cc: Randy Dunlap <[EMAIL PROTECTED]>
Cc: Roland Dreier <[EMAIL PROTECTED]>
Signed-off-by: Yu Zhao <[EMAIL PROTECTED]>
Upported-by: Simon Horman <[EMAIL PROTECTED]>

---
 drivers/pci/pci.c   |4 ++--
 drivers/pci/setup-res.c |7 ---
 include/linux/pci.h |2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

* Fri, 07 Nov 2008 09:05:18 +1100, Simon Horman
  - Minor rediff of include/linux/pci.h section to apply to 2.6.28-rc3

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4db261e..ae62f01 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -376,8 +376,8 @@ pci_restore_bars(struct pci_dev *dev)
return;
}
 
-   for (i = 0; i < numres; i ++)
-   pci_update_resource(dev, &dev->resource[i], i);
+   for (i = 0; i < numres; i++)
+   pci_update_resource(dev, i);
 }
 
 static struct pci_platform_pm_ops *pci_platform_pm;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 2dbd96c..b7ca679 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -26,11 +26,12 @@
 #include "pci.h"
 
 
-void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno)
+void pci_update_resource(struct pci_dev *dev, int resno)
 {
struct pci_bus_region region;
u32 new, check, mask;
int reg;
+   struct resource *res = dev->resource + resno;
 
/*
 * Ignore resources for unimplemented BARs and unused resource slots
@@ -162,7 +163,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
} else {
res->flags &= ~IORESOURCE_STARTALIGN;
if (resno < PCI_BRIDGE_RESOURCES)
-   pci_update_resource(dev, res, resno);
+   pci_update_resource(dev, resno);
}
 
return ret;
@@ -197,7 +198,7 @@ int pci_assign_resource_fixed(struct pci_dev *dev, int 
resno)
dev_err(&dev->dev, "BAR %d: can't allocate %s resource %pR\n",
resno, res->flags & IORESOURCE_IO ? "I/O" : "mem", res);
} else if (resno < PCI_BRIDGE_RESOURCES) {
-   pci_update_resource(dev, res, resno);
+   pci_update_resource(dev, resno);
}
 
return ret;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 085187b..43e1fc1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -626,7 +626,7 @@ int pcix_get_mmrbc(struct pci_dev *dev);
 int pcie_set_readrq(struct pci_dev *dev, int rq);
 int pci_reset_function(struct pci_dev *dev);
 int pci_execute_reset_function(struct pci_dev *dev);
-void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno);
+void pci_update_resource(struct pci_dev *dev, int resno);
 int __must_check pci_assign_resource(struct pci_dev *dev, int i);
 int pci_select_bars(struct pci_dev *dev, unsigned long flags);
 
-- 
1.5.6.4

--
To unsu

Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Simon Horman
On Thu, Nov 06, 2008 at 09:53:08AM -0800, Greg KH wrote:
> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> > On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> > > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > > I have not modified any existing drivers, but instead I threw together
> > > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > > was loaded.
> > > > 
> > > > It appears from my perusal thus far that drivers using these new
> > > > SR-IOV patches will require modification; i.e. the driver associated
> > > > with the Physical Function (PF) will be required to make the
> > > > pci_iov_register() call along with the requisite notify() function.
> > > > Essentially this suggests to me a model for the PF driver to perform
> > > > any "global actions" or setup on behalf of VFs before enabling them
> > > > after which VF drivers could be associated.
> > > 
> > > Where would the VF drivers have to be associated?  On the "pci_dev"
> > > level or on a higher one?
> > > 
> > > Will all drivers that want to bind to a "VF" device need to be
> > > rewritten?
> > 
> > The current model being implemented by my colleagues has separate
> > drivers for the PF (aka native) and VF devices.  I don't personally
> > believe this is the correct path, but I'm reserving judgement until I
> > see some code.
> 
> Hm, I would like to see that code before we can properly evaluate this
> interface.  Especially as they are all tightly tied together.
> 
> > I don't think we really know what the One True Usage model is for VF
> > devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
> > some ideas.  I bet there's other people who have other ideas too.
> 
> I'd love to hear those ideas.
> 
> Rumor has it, there is some Xen code floating around to support this
> already, is that true?

Xen patches were posted to xen-devel by Yu Zhao on the 29th of September [1].
Unfortunately the only responses that I can find are a) that the patches
were mangled and b) they seem to include changes (by others) that have
been merged into Linux. I have confirmed that both of these concerns
are valid.

I have not yet examined the difference, if any, in the approach taken by Yu
to SR-IOV in Linux and Xen. Unfortunately comparison is less than trivial
due to the gaping gap in kernel versions between Linux-Xen (2.6.18.8) and
Linux itself.

One approach that I was considering in order to familiarise myself with the
code was to backport the v6 Linux patches (this thread) to Linux-Xen. I made a
start on that, but again due to kernel version differences it is non-trivial.

[1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00923.html

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-23 Thread Simon Horman
[ Added Andrew Morton, Eric Biederman, Vivek Goyal and Haren Myneni to CC ]

On Thu, Oct 23, 2008 at 05:41:29PM -0200, Eduardo Habkost wrote:
> On Thu, Oct 23, 2008 at 10:28:24AM +1100, Simon Horman wrote:
> > On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote:
> > > The following two patches should make kdump work when the kvm-intel module
> > > is loaded. We need to disable vmx mode before booting the kdump kernel,
> > > so I've introduced a notifier interface where KVM can hook and disable
> > > virtualization on all CPUs just before they are halted.
> > > 
> > > It has the same purpose of the KVM reboot notifier that gets executed
> > > at kexec-time. But on the kdump case, things are not as simple because
> > > the kernel has just crashed.
> > > 
> > > The notifier interface being introduced is x86-specific. I don't know
> > > if an arch-independent interface would be more appropriate for this
> > > case.
> > > 
> > > It was tested only using kvm-intel. Testing on different machines
> > > is welcome.
> > 
> > These changes look fine to me from a kexec/kdump point of view.
> > 
> > Reviewed-by: Simon Horman <[EMAIL PROTECTED]>
> 
> Thanks.
> 
> Considering they touch both KVM and kexec, which tree would be best way
> to get them in?

As I understand it, there is no kexec tree as such, rather
patches either get picked up by an arch tree or Andrew Morton.
I am happy to create and maintain a kexec tree if there is a need.
But in this case it seems that using the KVM tree would be best.

> (Avi: the patches were sent only to kexec and kvm mailing lists,
> initially. If it's better to submit them to your address also so it gets
> on your queue, please let me know)

I won't speak for Avi, but usually its good to CC the maintainer.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-22 Thread Simon Horman
On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote:
> The following two patches should make kdump work when the kvm-intel module
> is loaded. We need to disable vmx mode before booting the kdump kernel,
> so I've introduced a notifier interface where KVM can hook and disable
> virtualization on all CPUs just before they are halted.
> 
> It has the same purpose of the KVM reboot notifier that gets executed
> at kexec-time. But on the kdump case, things are not as simple because
> the kernel has just crashed.
> 
> The notifier interface being introduced is x86-specific. I don't know
> if an arch-independent interface would be more appropriate for this
> case.
> 
> It was tested only using kvm-intel. Testing on different machines
> is welcome.

These changes look fine to me from a kexec/kdump point of view.

Reviewed-by: Simon Horman <[EMAIL PROTECTED]>

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html