Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Jason Wang



On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang  wrote:



On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin  
wrote:

On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:

 Hello:
  We used to orphan packets before transmission for virtio-net. 
This breaks

 socket accounting and can lead serveral functions won't work, e.g:
  - Byte Queue Limit depends on tx completion nofication to work.
 - Packet Generator depends on tx completion nofication for the last
   transmitted packet to complete.
 - TCP Small Queue depends on proper accounting of sk_wmem_alloc to 
work.
  This series tries to solve the issue by enabling tx interrupts. 
To minize

 the performance impacts of this, several optimizations were used:
  - In guest side, virtqueue_enable_cb_delayed() was used to delay 
the tx

   interrupt untile 3/4 pending packets were sent.
 - In host side, interrupt coalescing were used to reduce tx 
interrupts.

  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
  - For guest receiving. No obvious regression on throughput were
   noticed. More cpu utilization were noticed in few cases.
 - For guest transmission. Very huge improvement on througput for 
small
   packet transmission were noticed. This is expected since TSQ and 
other
   optimization for small packet transmission work after tx 
interrupt. But

   will use more cpu for large packets.
 - For TCP_RR, regression (10% on transaction rate and cpu 
utilization) were
   found. Tx interrupt won't help but cause overhead in this case. 
Using
   more aggressive coalescing parameters may help to reduce the 
regression.


OK, you do have posted coalescing patches - does it help any?


Helps a lot.

For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
For small packet TX, it increases 33% - 245% throughput. (reduce 
about 60% inters)
For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx 
intrs)




I'm not sure the regression is due to interrupts.
It would make sense for CPU but why would it
hurt transaction rate?


Anyway guest need to take some cycles to handle tx interrupts.
And transaction rate does increase if we coalesces more tx 
interurpts. 



It's possible that we are deferring kicks too much due to BQL.

As an experiment: do we get any of it back if we do
-if (kick || netif_xmit_stopped(txq))
-virtqueue_kick(sq->vq);
+virtqueue_kick(sq->vq);
?



I will try, but during TCP_RR, at most 1 packets were pending,
I suspect if BQL can help in this case.


Looks like this helps a lot in multiple sessions of TCP_RR.

How about move the BQL patch out of this series?

Let's first converge tx interrupt and then introduce it?
(e.g with kicking after queuing X bytes?)






If yes, we can just kick e.g. periodically, e.g. after queueing each
X bytes.


Okay, let me try to see if this help.



 Changes from RFC V3:
 - Don't free tx packets in ndo_start_xmit()
 - Add interrupt coalescing support for virtio-net
 Changes from RFC v2:
 - clean up code, address issues raised by Jason
 Changes from RFC v1:
 - address comments by Jason Wang, use delayed cb everywhere
 - rebased Jason's patch on top of mine and include it (with some 
tweaks)

  Please reivew. Comments were more than welcomed.
  [1] Performance Test result:
  Environment:
 - Two Intel(R) Xeon(R) CPU E5620 @ 2.40GHz machines connected back 
to back

   with 82599ES cards.
 - Both host and guest were net-next.git plus the patch
 - Coalescing parameters for the card:
   Adaptive RX: off  TX: off
   rx-usecs: 1
   rx-frames: 0
   tx-usecs: 0
   tx-frames: 0
 - Vhost_net was enabled and zerocopy was disabled
 - Tests was done by netperf-2.6
 - Guest has 2 vcpus with single queue virtio-net
  Results:
 - Numbers of square brackets are whose significance is grater than 
95%

  Guest RX:
  size/sessions/+throughput/+cpu/+per_cpu_throughput/
 64/1/+2.0326/[+6.2807%]/-3.9970%/
 64/2/-0.2104%/[+3.2012%]/[-3.3058%]/
 64/4/+1.5956%/+2.2451%/-0.6353%/
 64/8/+1.1732%/+3.5123%/-2.2598%/
 256/1/+3.7619%/[+5.8117%]/-1.9372%/
 256/2/-0.0661%/[+3.2511%]/-3.2127%/
 256/4/+1.1435%/[-8.1842%]/[+10.1591%]/
 256/8/[+2.2447%]/[+6.2044%]/[-3.7283%]/
 1024/1/+9.1479%/[+12.0997%]/[-2.6332%]/
 1024/2/[-17.3341%]/[+0.%]/[-17.3341%]/
 1024/4/[-0.6284%]/-1.0376%/+0.4135%/
 1024/8/+1.1444%/-1.6069%/+2.7961%/
 4096/1/+0.0401%/-0.5993%/+0.6433%/
 4096/2/[-0.5894%]/-2.2071%/+1.6542%/
 4096/4/[-0.5560%]/-1.4969%/+0.9553%/
 4096/8/-0.3362%/+2.7086%/-2.9645%/
 16384/1/-0.0285%/+0.7247%/-0.7478%/
 16384/2/-0.5286%/+0.3287%/-0.8545%/
 16384/4/-0.3297%/-2.0543%/+1.7608%/
 16384/8/+1.0932%/+4.0253%/-2.8187%/
 65535/1/+0.0003%/-0.1502%/+0.1508%/
 65535/2/[-0.6065%]/+0.2309%/-0.8355%/
 65535/4/[-0.6861%]/[+3.9451%]/[-4.4554%]/
 65535/8/+1.8359%/+3.1590%/-1.2825%/
  Guest RX:
 size/sessions/+throughput/+cpu/+per_cpu_throughput/
 64/1/[+65.0961%]/[-8.6807%]/[+80.7900%]/
 64/2/[+6.0288%]/[-2.2823%]/[+8.5052%]/
 64/4/[+5.9038%]/[-2.1834%]/[+8.2677%]/
 64/8/[+5.4154%]/[-2.1804%]/[+7

Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Michael S. Tsirkin
On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> 
> 
> On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang  wrote:
> >
> >
> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin  wrote:
> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> >>> Hello:
> >>>  We used to orphan packets before transmission for virtio-net. This
> >>>breaks
> >>> socket accounting and can lead serveral functions won't work, e.g:
> >>>  - Byte Queue Limit depends on tx completion nofication to work.
> >>> - Packet Generator depends on tx completion nofication for the last
> >>>   transmitted packet to complete.
> >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
> >>>work.
> >>>  This series tries to solve the issue by enabling tx interrupts. To
> >>>minize
> >>> the performance impacts of this, several optimizations were used:
> >>>  - In guest side, virtqueue_enable_cb_delayed() was used to delay the
> >>>tx
> >>>   interrupt untile 3/4 pending packets were sent.
> >>> - In host side, interrupt coalescing were used to reduce tx
> >>>interrupts.
> >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
> >>>  - For guest receiving. No obvious regression on throughput were
> >>>   noticed. More cpu utilization were noticed in few cases.
> >>> - For guest transmission. Very huge improvement on througput for
> >>>small
> >>>   packet transmission were noticed. This is expected since TSQ and
> >>>other
> >>>   optimization for small packet transmission work after tx interrupt.
> >>>But
> >>>   will use more cpu for large packets.
> >>> - For TCP_RR, regression (10% on transaction rate and cpu
> >>>utilization) were
> >>>   found. Tx interrupt won't help but cause overhead in this case.
> >>>Using
> >>>   more aggressive coalescing parameters may help to reduce the
> >>>regression.
> >>
> >>OK, you do have posted coalescing patches - does it help any?
> >
> >Helps a lot.
> >
> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> >For small packet TX, it increases 33% - 245% throughput. (reduce about 60%
> >inters)
> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx intrs)
> >
> >>
> >>I'm not sure the regression is due to interrupts.
> >>It would make sense for CPU but why would it
> >>hurt transaction rate?
> >
> >Anyway guest need to take some cycles to handle tx interrupts.
> >And transaction rate does increase if we coalesces more tx interurpts.
> >>
> >>
> >>It's possible that we are deferring kicks too much due to BQL.
> >>
> >>As an experiment: do we get any of it back if we do
> >>-if (kick || netif_xmit_stopped(txq))
> >>-virtqueue_kick(sq->vq);
> >>+virtqueue_kick(sq->vq);
> >>?
> >
> >
> >I will try, but during TCP_RR, at most 1 packets were pending,
> >I suspect if BQL can help in this case.
> 
> Looks like this helps a lot in multiple sessions of TCP_RR.

so what's faster
BQL + kick each packet
no BQL
?

> How about move the BQL patch out of this series?
> 
> Let's first converge tx interrupt and then introduce it?
> (e.g with kicking after queuing X bytes?)

Sounds good.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Jason Wang



On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin  
wrote:

On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
 
 
 On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang  
wrote:

 >
 >
 >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin 
 wrote:

 >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
 >>> Hello:
 >>>  We used to orphan packets before transmission for virtio-net. 
This

 >>>breaks
 >>> socket accounting and can lead serveral functions won't work, 
e.g:

 >>>  - Byte Queue Limit depends on tx completion nofication to work.
 >>> - Packet Generator depends on tx completion nofication for the 
last

 >>>   transmitted packet to complete.
 >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc 
to

 >>>work.
 >>>  This series tries to solve the issue by enabling tx 
interrupts. To

 >>>minize
 >>> the performance impacts of this, several optimizations were 
used:
 >>>  - In guest side, virtqueue_enable_cb_delayed() was used to 
delay the

 >>>tx
 >>>   interrupt untile 3/4 pending packets were sent.
 >>> - In host side, interrupt coalescing were used to reduce tx
 >>>interrupts.
 >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
 >>>  - For guest receiving. No obvious regression on throughput were
 >>>   noticed. More cpu utilization were noticed in few cases.
 >>> - For guest transmission. Very huge improvement on througput for
 >>>small
 >>>   packet transmission were noticed. This is expected since TSQ 
and

 >>>other
 >>>   optimization for small packet transmission work after tx 
interrupt.

 >>>But
 >>>   will use more cpu for large packets.
 >>> - For TCP_RR, regression (10% on transaction rate and cpu
 >>>utilization) were
 >>>   found. Tx interrupt won't help but cause overhead in this 
case.

 >>>Using
 >>>   more aggressive coalescing parameters may help to reduce the
 >>>regression.
 >>
 >>OK, you do have posted coalescing patches - does it help any?
 >
 >Helps a lot.
 >
 >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
 >For small packet TX, it increases 33% - 245% throughput. (reduce 
about 60%

 >inters)
 >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx 
intrs)

 >
 >>
 >>I'm not sure the regression is due to interrupts.
 >>It would make sense for CPU but why would it
 >>hurt transaction rate?
 >
 >Anyway guest need to take some cycles to handle tx interrupts.
 >And transaction rate does increase if we coalesces more tx 
interurpts.

 >>
 >>
 >>It's possible that we are deferring kicks too much due to BQL.
 >>
 >>As an experiment: do we get any of it back if we do
 >>-if (kick || netif_xmit_stopped(txq))
 >>-virtqueue_kick(sq->vq);
 >>+virtqueue_kick(sq->vq);
 >>?
 >
 >
 >I will try, but during TCP_RR, at most 1 packets were pending,
 >I suspect if BQL can help in this case.
 
 Looks like this helps a lot in multiple sessions of TCP_RR.


so what's faster
BQL + kick each packet
no BQL
?


Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not 
show obvious differences.


May need a complete benchmark to see.




 How about move the BQL patch out of this series?
 
 Let's first converge tx interrupt and then introduce it?

 (e.g with kicking after queuing X bytes?)


Sounds good.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Michael S. Tsirkin
On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang wrote:
> 
> 
> On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin  wrote:
> >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> >> On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang 
> >>wrote:
> >> >
> >> >
> >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin 
> >>wrote:
> >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> >> >>> Hello:
> >> >>>  We used to orphan packets before transmission for virtio-net. This
> >> >>>breaks
> >> >>> socket accounting and can lead serveral functions won't work, e.g:
> >> >>>  - Byte Queue Limit depends on tx completion nofication to work.
> >> >>> - Packet Generator depends on tx completion nofication for the last
> >> >>>   transmitted packet to complete.
> >> >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
> >> >>>work.
> >> >>>  This series tries to solve the issue by enabling tx interrupts. To
> >> >>>minize
> >> >>> the performance impacts of this, several optimizations were used:
> >> >>>  - In guest side, virtqueue_enable_cb_delayed() was used to delay
> >>the
> >> >>>tx
> >> >>>   interrupt untile 3/4 pending packets were sent.
> >> >>> - In host side, interrupt coalescing were used to reduce tx
> >> >>>interrupts.
> >> >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
> >> >>>  - For guest receiving. No obvious regression on throughput were
> >> >>>   noticed. More cpu utilization were noticed in few cases.
> >> >>> - For guest transmission. Very huge improvement on througput for
> >> >>>small
> >> >>>   packet transmission were noticed. This is expected since TSQ and
> >> >>>other
> >> >>>   optimization for small packet transmission work after tx
> >>interrupt.
> >> >>>But
> >> >>>   will use more cpu for large packets.
> >> >>> - For TCP_RR, regression (10% on transaction rate and cpu
> >> >>>utilization) were
> >> >>>   found. Tx interrupt won't help but cause overhead in this case.
> >> >>>Using
> >> >>>   more aggressive coalescing parameters may help to reduce the
> >> >>>regression.
> >> >>
> >> >>OK, you do have posted coalescing patches - does it help any?
> >> >
> >> >Helps a lot.
> >> >
> >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> >> >For small packet TX, it increases 33% - 245% throughput. (reduce about
> >>60%
> >> >inters)
> >> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx
> >>intrs)
> >> >
> >> >>
> >> >>I'm not sure the regression is due to interrupts.
> >> >>It would make sense for CPU but why would it
> >> >>hurt transaction rate?
> >> >
> >> >Anyway guest need to take some cycles to handle tx interrupts.
> >> >And transaction rate does increase if we coalesces more tx interurpts.
> >> >>
> >> >>
> >> >>It's possible that we are deferring kicks too much due to BQL.
> >> >>
> >> >>As an experiment: do we get any of it back if we do
> >> >>-if (kick || netif_xmit_stopped(txq))
> >> >>-virtqueue_kick(sq->vq);
> >> >>+virtqueue_kick(sq->vq);
> >> >>?
> >> >
> >> >
> >> >I will try, but during TCP_RR, at most 1 packets were pending,
> >> >I suspect if BQL can help in this case.
> >> Looks like this helps a lot in multiple sessions of TCP_RR.
> >
> >so what's faster
> > BQL + kick each packet
> > no BQL
> >?
> 
> Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show obvious
> differences.
> 
> May need a complete benchmark to see.

Okay so going forward something like BQL + kick each packet
might be a good solution.
The advantage of BQL is that it works without GSO.
For example, now that we don't do UFO, you might
see significant gains with UDP.


> >
> >
> >> How about move the BQL patch out of this series?
> >> Let's first converge tx interrupt and then introduce it?
> >> (e.g with kicking after queuing X bytes?)
> >
> >Sounds good.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread David Laight
From: Jason Wang
> > On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> >>  Hello:
> >>
> >>  We used to orphan packets before transmission for virtio-net. This
> >> breaks
> >>  socket accounting and can lead serveral functions won't work, e.g:
> >>
> >>  - Byte Queue Limit depends on tx completion nofication to work.
> >>  - Packet Generator depends on tx completion nofication for the last
> >>transmitted packet to complete.
> >>  - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
> >> work.
> >>
> >>  This series tries to solve the issue by enabling tx interrupts. To
> >> minize
> >>  the performance impacts of this, several optimizations were used:
> >>
> >>  - In guest side, virtqueue_enable_cb_delayed() was used to delay the tx
> >>interrupt untile 3/4 pending packets were sent.

Doesn't that give problems for intermittent transmits?

...

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Michael S. Tsirkin
On Tue, Dec 02, 2014 at 10:00:06AM +, David Laight wrote:
> From: Jason Wang
> > > On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> > >>  Hello:
> > >>
> > >>  We used to orphan packets before transmission for virtio-net. This
> > >> breaks
> > >>  socket accounting and can lead serveral functions won't work, e.g:
> > >>
> > >>  - Byte Queue Limit depends on tx completion nofication to work.
> > >>  - Packet Generator depends on tx completion nofication for the last
> > >>transmitted packet to complete.
> > >>  - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
> > >> work.
> > >>
> > >>  This series tries to solve the issue by enabling tx interrupts. To
> > >> minize
> > >>  the performance impacts of this, several optimizations were used:
> > >>
> > >>  - In guest side, virtqueue_enable_cb_delayed() was used to delay the tx
> > >>interrupt untile 3/4 pending packets were sent.
> 
> Doesn't that give problems for intermittent transmits?
> 
> ...
> 
>   David
> 

No because it has not effect in that case.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


3rd World Conference on Information Systems and Technologies (WorldCIST'15), Deadline: December 7

2014-12-02 Thread M. Lemos
Deadline: December 7, 2014

--
WorldCIST'15 - 3rd World Conference on Information Systems and Technologies 
Ponta Delgada, Azores *, Portugal
1 - 3 April 2015
http://www.aisti.eu/worldcist15/
--
* Azores is ranked as the second most beautiful archipelago in the world by 
National Geographic.



SCOPE

The WorldCIST'15 - 3rd World Conference on Information Systems and 
Technologies, to be held at Ponta Delgada, São Miguel, Azores, Portugal, 1 - 3 
April 2015, is a global forum for researchers and practitioners to present and 
discuss the most recent innovations, trends, results, experiences and concerns 
in the several perspectives of Information Systems and Technologies.

Azores is ranked as the second most beautiful archipelago in the world by 
National Geographic. Consisting of nine distinct islands, each of them special, 
is in fact a place in the world to be visited.

We are pleased to invite you to submit your papers to WorldCISTI'15. All 
submissions will be reviewed on the basis of relevance, originality, importance 
and clarity.


THEMES

Submitted papers should be related with one or more of the main themes proposed 
for the Conference:

A) Information and Knowledge Management (IKM);
B) Organizational Models and Information Systems (OMIS);
C) Intelligent and Decision Support Systems (IDSS);
D) Big Data Analytics and Applications (BDAA);
E) Software Systems, Architectures, Applications and Tools (SSAAT);
F) Multimedia Systems and Applications (MSA);
G) Computer Networks, Mobility and Pervasive Systems (CNMPS);
H) Human-Computer Interaction (HCI);
I) Health Informatics (HIS);
J) Information Technologies in Education (ITE).
K) Information Technologies in Radiocommunications (ITR)


TYPES OF SUBMISSIONS AND DECISIONS

Four types of papers can be submitted:

- Full paper: Finished or consolidated R&D works, to be included in one of the 
Conference themes. These papers are assigned a 10-page limit.

- Short paper: Ongoing works with relevant preliminary results, open to 
discussion. These papers are assigned a 7-page limit.

- Poster paper: Initial work with relevant ideas, open to discussion. These 
papers are assigned to a 4-page limit.

- Company paper: Companies' papers that show practical experience, R & D, 
tools, etc., focused on some topics of the conference. These papers are 
assigned to a 4-page limit.

Submitted papers must comply with the format of Advances in Intelligent Systems 
and Computing Series (see Instructions for Authors at Springer Website or 
download a DOC example) be written in English, must not have been published 
before, not be under review for any other conference or publication and not 
include any information leading to the authors’ identification. Therefore, the 
authors’ names, affiliations and bibliographic references should not be 
included in the version for evaluation by the Program Committee. This 
information should only be included in the camera-ready version, saved in Word 
or Latex format and also in PDF format. These files must be accompanied by the 
Consent to Publication form filled out, in a ZIP file, and uploaded at the 
conference management system.

All papers will be subjected to a “double-blind review” by at least two members 
of the Program Committee.

Based on Program Committee evaluation, a paper can be rejected or accepted by 
the Conference Chairs. In the later case, it can be accepted as the type 
originally submitted or as another type. Thus, full papers can be accepted as 
short papers or poster papers only. Similarly, short papers can be accepted as 
poster papers only. In these cases, the authors will be allowed to maintain the 
original number of pages in the camera-ready version.

The authors of accepted poster papers must also build and print a poster to be 
exhibited during the Conference. This poster must follow an A1 or A2 vertical 
format. The Conference can includes Work Sessions where these posters are 
presented and orally discussed, with a 5 minute limit per poster.

The authors of accepted full papers will have 15 minutes to present their work 
in a Conference Work Session; approximately 5 minutes of discussion will follow 
each presentation. The authors of accepted short papers and company papers will 
have 11 minutes to present their work in a Conference Work Session; 
approximately 4 minutes of discussion will follow each presentation.


PUBLICATION & INDEXING

To ensure that a full paper, short paper, poster paper or company paper is 
published in the Proceedings, at least one of the authors must be fully 
registered by the 6th of January 2015, and the paper must comply with the 
suggested layout and page-limit. Additionally, all recommended changes must be 
addressed by the authors before they submit the camera-ready version.

No more than one paper per registration will be published in the Conference 
Proceedings. An extra fee must be paid for publication of additional papers, 
with a maximum of one addit

Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Michael S. Tsirkin
On Tue, Dec 02, 2014 at 05:08:35AM -0500, Pankaj Gupta wrote:
> 
> > 
> > On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang wrote:
> > > 
> > > 
> > > On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin  
> > > wrote:
> > > >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> > > >> On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang 
> > > >>wrote:
> > > >> >
> > > >> >
> > > >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin 
> > > >>wrote:
> > > >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> > > >> >>> Hello:
> > > >> >>>  We used to orphan packets before transmission for virtio-net. This
> > > >> >>>breaks
> > > >> >>> socket accounting and can lead serveral functions won't work, e.g:
> > > >> >>>  - Byte Queue Limit depends on tx completion nofication to work.
> > > >> >>> - Packet Generator depends on tx completion nofication for the last
> > > >> >>>   transmitted packet to complete.
> > > >> >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
> > > >> >>>work.
> > > >> >>>  This series tries to solve the issue by enabling tx interrupts. To
> > > >> >>>minize
> > > >> >>> the performance impacts of this, several optimizations were used:
> > > >> >>>  - In guest side, virtqueue_enable_cb_delayed() was used to delay
> > > >>the
> > > >> >>>tx
> > > >> >>>   interrupt untile 3/4 pending packets were sent.
> > > >> >>> - In host side, interrupt coalescing were used to reduce tx
> > > >> >>>interrupts.
> > > >> >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
> > > >> >>>  - For guest receiving. No obvious regression on throughput were
> > > >> >>>   noticed. More cpu utilization were noticed in few cases.
> > > >> >>> - For guest transmission. Very huge improvement on througput for
> > > >> >>>small
> > > >> >>>   packet transmission were noticed. This is expected since TSQ and
> > > >> >>>other
> > > >> >>>   optimization for small packet transmission work after tx
> > > >>interrupt.
> > > >> >>>But
> > > >> >>>   will use more cpu for large packets.
> > > >> >>> - For TCP_RR, regression (10% on transaction rate and cpu
> > > >> >>>utilization) were
> > > >> >>>   found. Tx interrupt won't help but cause overhead in this case.
> > > >> >>>Using
> > > >> >>>   more aggressive coalescing parameters may help to reduce the
> > > >> >>>regression.
> > > >> >>
> > > >> >>OK, you do have posted coalescing patches - does it help any?
> > > >> >
> > > >> >Helps a lot.
> > > >> >
> > > >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> > > >> >For small packet TX, it increases 33% - 245% throughput. (reduce about
> > > >>60%
> > > >> >inters)
> > > >> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx
> > > >>intrs)
> > > >> >
> > > >> >>
> > > >> >>I'm not sure the regression is due to interrupts.
> > > >> >>It would make sense for CPU but why would it
> > > >> >>hurt transaction rate?
> > > >> >
> > > >> >Anyway guest need to take some cycles to handle tx interrupts.
> > > >> >And transaction rate does increase if we coalesces more tx interurpts.
> > > >> >>
> > > >> >>
> > > >> >>It's possible that we are deferring kicks too much due to BQL.
> > > >> >>
> > > >> >>As an experiment: do we get any of it back if we do
> > > >> >>-if (kick || netif_xmit_stopped(txq))
> > > >> >>-virtqueue_kick(sq->vq);
> > > >> >>+virtqueue_kick(sq->vq);
> > > >> >>?
> > > >> >
> > > >> >
> > > >> >I will try, but during TCP_RR, at most 1 packets were pending,
> > > >> >I suspect if BQL can help in this case.
> > > >> Looks like this helps a lot in multiple sessions of TCP_RR.
> > > >
> > > >so what's faster
> > > > BQL + kick each packet
> > > > no BQL
> > > >?
> > > 
> > > Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show obvious
> > > differences.
> > > 
> > > May need a complete benchmark to see.
> > 
> > Okay so going forward something like BQL + kick each packet
> > might be a good solution.
> > The advantage of BQL is that it works without GSO.
> > For example, now that we don't do UFO, you might
> > see significant gains with UDP.
> 
> If I understand correctly, it can also help for small packet
> regr. in multiqueue scenario?

Well BQL generally should only be active for 1:1 mappings.

> Would be nice to see the perf. numbers
> with multi-queue for small packets streams.
> > 
> > 
> > > >
> > > >
> > > >> How about move the BQL patch out of this series?
> > > >> Let's first converge tx interrupt and then introduce it?
> > > >> (e.g with kicking after queuing X bytes?)
> > > >
> > > >Sounds good.
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtu

[PATCH RFC 1/2] virtio_balloon: convert to virtio 1.0 endian-ness

2014-12-02 Thread Michael S. Tsirkin
balloon device is not part of virtio 1.0 spec.  Still, it's easy enough
to make it handle endian-ness exactly as other virtio 1.0 devices: what
we gain from this, is that there's no need to special-case it in virtio
core.

Signed-off-by: Michael S. Tsirkin 
---
 include/uapi/linux/virtio_balloon.h | 5 +++--
 drivers/virtio/virtio_balloon.c | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 5e26f61..5bee71c 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -27,6 +27,7 @@
  * SUCH DAMAGE. */
 #include 
 #include 
+#include 
 
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST0 /* Tell before reclaiming 
pages */
@@ -52,8 +53,8 @@ struct virtio_balloon_config
 #define VIRTIO_BALLOON_S_NR   6
 
 struct virtio_balloon_stat {
-   __u16 tag;
-   __u64 val;
+   __virtio16 tag;
+   __virtio64 val;
 } __attribute__((packed));
 
 #endif /* _LINUX_VIRTIO_BALLOON_H */
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 4497def..721e32f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -201,8 +201,8 @@ static inline void update_stat(struct virtio_balloon *vb, 
int idx,
   u16 tag, u64 val)
 {
BUG_ON(idx >= VIRTIO_BALLOON_S_NR);
-   vb->stats[idx].tag = tag;
-   vb->stats[idx].val = val;
+   vb->stats[idx].tag = cpu_to_virtio16(vb->vdev, tag);
+   vb->stats[idx].val = cpu_to_virtio64(vb->vdev, val);
 }
 
 #define pages_to_bytes(x) ((u64)(x) << PAGE_SHIFT)
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/2] virtio_balloon: drop legacy_only

2014-12-02 Thread Michael S. Tsirkin
balloon is the only driver using legacy_only ATM.
It turns out, it's easier to just make balloon support virtio 1.0
endian-ness and drop the flag from core.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/virtio.h  | 2 --
 drivers/virtio/virtio.c | 4 
 drivers/virtio/virtio_balloon.c | 1 -
 3 files changed, 7 deletions(-)

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2bbf626..f70411e 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -132,7 +132,6 @@ int virtio_device_restore(struct virtio_device *dev);
  * @feature_table_size: number of entries in the feature table array.
  * @feature_table_legacy: same as feature_table but when working in legacy 
mode.
  * @feature_table_size_legacy: number of entries in feature table legacy array.
- * @legacy_only: driver does not support virtio 1.0.
  * @probe: the function to call when a device is found.  Returns 0 or -errno.
  * @remove: the function to call when a device is removed.
  * @config_changed: optional function to call when the device configuration
@@ -145,7 +144,6 @@ struct virtio_driver {
unsigned int feature_table_size;
const unsigned int *feature_table_legacy;
unsigned int feature_table_size_legacy;
-   bool legacy_only;
int (*probe)(struct virtio_device *dev);
void (*scan)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index fa6b75d..2e836d8 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -197,10 +197,6 @@ static int virtio_dev_probe(struct device *_d)
driver_features_legacy = driver_features;
}
 
-   /* Detect legacy-only drivers and disable VIRTIO_F_VERSION_1. */
-   if (drv->legacy_only)
-   device_features &= ~(1ULL << VIRTIO_F_VERSION_1);
-
if (device_features & (1ULL << VIRTIO_F_VERSION_1))
dev->features = driver_features & device_features;
else
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 721e32f..fed3709 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -518,7 +518,6 @@ static unsigned int features[] = {
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-   .legacy_only = true,
.feature_table = features,
.feature_table_size = ARRAY_SIZE(features),
.driver.name =  KBUILD_MODNAME,
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 01/19] linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1

2014-12-02 Thread Cornelia Huck
From: Thomas Huth 

Add the new VIRTIO_F_VERSION_1 definition to the virtio_config.h
linux header.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 linux-headers/linux/virtio_config.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/linux/virtio_config.h 
b/linux-headers/linux/virtio_config.h
index 75dc20b..16aa289 100644
--- a/linux-headers/linux/virtio_config.h
+++ b/linux-headers/linux/virtio_config.h
@@ -54,4 +54,7 @@
 /* Can the device handle any descriptor layout? */
 #define VIRTIO_F_ANY_LAYOUT27
 
+/* v1.0 compliant. */
+#define VIRTIO_F_VERSION_1 32
+
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 02/19] virtio: cull virtio_bus_set_vdev_features

2014-12-02 Thread Cornelia Huck
The only user of this function was virtio-ccw, and it should use
virtio_set_features() like everybody else: We need to make sure
that bad features are masked out properly, which this function did
not do.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |3 +--
 hw/virtio/virtio-bus.c |   14 --
 include/hw/virtio/virtio-bus.h |3 ---
 3 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..84f17bc 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -400,8 +400,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(&address_space_memory, ccw.cda);
 if (features.index < ARRAY_SIZE(dev->host_features)) {
-virtio_bus_set_vdev_features(&dev->bus, features.features);
-vdev->guest_features = features.features;
+virtio_set_features(vdev, features.features);
 } else {
 /*
  * If the guest supports more feature bits, assert that it
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index eb77019..a8ffa07 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -109,20 +109,6 @@ uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 return k->get_features(vdev, requested_features);
 }
 
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features)
-{
-VirtIODevice *vdev = virtio_bus_get_device(bus);
-VirtioDeviceClass *k;
-
-assert(vdev != NULL);
-k = VIRTIO_DEVICE_GET_CLASS(vdev);
-if (k->set_features != NULL) {
-k->set_features(vdev, requested_features);
-}
-}
-
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus)
 {
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 0756545..0d2e7b4 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -84,9 +84,6 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus);
 /* Get the features of the plugged device. */
 uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 uint32_t requested_features);
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features);
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus);
 /* Get config of the plugged device. */
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 00/19] qemu: towards virtio-1 host support

2014-12-02 Thread Cornelia Huck
Another iteration of virtio-1 patches for qemu, as always available on
git://github.com/cohuck/qemu virtio-1

This one seems to work together with the current vhost-next patches
(well, I can ping :)

Changes from v4:
- add helpers for feature bit manipulation and checking
- use 64 bit feature bits instead of 32 bit arrays
- infrastructure to allow devices to offer different sets of feature
  bits for legacy and standard devices
- several fixes (mainly regarding, you guessed it, feature bits)

Cornelia Huck (16):
  virtio: cull virtio_bus_set_vdev_features
  virtio: feature bit manipulation helpers
  virtio: add feature checking helpers
  virtio: support more feature bits
  virtio: endianness checks for virtio 1.0 devices
  virtio: allow virtio-1 queue layout
  dataplane: allow virtio-1 devices
  s390x/virtio-ccw: support virtio-1 set_vq format
  virtio: disallow late feature changes for virtio-1
  virtio: allow to fail setting status
  s390x/virtio-ccw: enable virtio 1.0
  virtio-net: no writeable mac for virtio-1
  virtio-net: support longer header
  virtio-net: enable virtio 1.0
  virtio: support revision-specific features
  virtio-blk: revision specific feature bits

Thomas Huth (3):
  linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1
  s390x/css: Add a callback for when subchannel gets disabled
  s390x/virtio-ccw: add virtio set-revision call

 hw/9pfs/virtio-9p-device.c|4 +-
 hw/block/dataplane/virtio-blk.c   |4 +-
 hw/block/virtio-blk.c |   44 +++--
 hw/char/virtio-serial-bus.c   |6 +-
 hw/net/virtio-net.c   |  100 ++-
 hw/s390x/css.c|   12 ++
 hw/s390x/css.h|1 +
 hw/s390x/s390-virtio-bus.c|3 +-
 hw/s390x/s390-virtio-bus.h|2 +-
 hw/s390x/virtio-ccw.c |  235 ++---
 hw/s390x/virtio-ccw.h |8 +-
 hw/scsi/vhost-scsi.c  |3 +-
 hw/scsi/virtio-scsi-dataplane.c   |2 +-
 hw/scsi/virtio-scsi.c |   12 +-
 hw/virtio/Makefile.objs   |2 +-
 hw/virtio/dataplane/Makefile.objs |2 +-
 hw/virtio/dataplane/vring.c   |   96 +-
 hw/virtio/virtio-balloon.c|4 +-
 hw/virtio/virtio-bus.c|   24 ++-
 hw/virtio/virtio-mmio.c   |6 +-
 hw/virtio/virtio-pci.c|7 +-
 hw/virtio/virtio-pci.h|2 +-
 hw/virtio/virtio-rng.c|2 +-
 hw/virtio/virtio.c|   83 +++--
 include/hw/qdev-properties.h  |   11 ++
 include/hw/virtio/dataplane/vring-accessors.h |   75 
 include/hw/virtio/dataplane/vring.h   |   14 +-
 include/hw/virtio/virtio-access.h |4 +
 include/hw/virtio/virtio-bus.h|   14 +-
 include/hw/virtio/virtio-net.h|   46 ++---
 include/hw/virtio/virtio-scsi.h   |6 +-
 include/hw/virtio/virtio.h|   61 +--
 linux-headers/linux/virtio_config.h   |3 +
 33 files changed, 625 insertions(+), 273 deletions(-)
 create mode 100644 include/hw/virtio/dataplane/vring-accessors.h

-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 03/19] virtio: feature bit manipulation helpers

2014-12-02 Thread Cornelia Huck
Add virtio_{add,clear}_feature helper functions for manipulating a
feature bits variable. This has some benefits over open coding:
- add check that the bit is in a sane range
- make it obvious at a glance what is going on
- have a central point to change when we want to extend feature bits

Convert existing code manipulating features to use the new helpers.

Signed-off-by: Cornelia Huck 
---
 hw/9pfs/virtio-9p-device.c  |2 +-
 hw/block/virtio-blk.c   |   16 
 hw/char/virtio-serial-bus.c |2 +-
 hw/net/virtio-net.c |   34 +-
 hw/s390x/virtio-ccw.c   |4 ++--
 hw/virtio/virtio-mmio.c |2 +-
 hw/virtio/virtio-pci.c  |4 ++--
 include/hw/virtio/virtio.h  |   12 
 8 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 2572747..30492ec 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -23,7 +23,7 @@
 
 static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
 {
-features |= 1 << VIRTIO_9P_MOUNT_TAG;
+virtio_add_feature(&features, VIRTIO_9P_MOUNT_TAG);
 return features;
 }
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..3f76e2a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -568,20 +568,20 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, uint32_t features)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-features |= (1 << VIRTIO_BLK_F_SEG_MAX);
-features |= (1 << VIRTIO_BLK_F_GEOMETRY);
-features |= (1 << VIRTIO_BLK_F_TOPOLOGY);
-features |= (1 << VIRTIO_BLK_F_BLK_SIZE);
-features |= (1 << VIRTIO_BLK_F_SCSI);
+virtio_add_feature(&features, VIRTIO_BLK_F_SEG_MAX);
+virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY);
+virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
+virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
+virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
 
 if (s->conf.config_wce) {
-features |= (1 << VIRTIO_BLK_F_CONFIG_WCE);
+virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
 }
 if (blk_enable_write_cache(s->blk)) {
-features |= (1 << VIRTIO_BLK_F_WCE);
+virtio_add_feature(&features, VIRTIO_BLK_F_WCE);
 }
 if (blk_is_read_only(s->blk)) {
-features |= 1 << VIRTIO_BLK_F_RO;
+virtio_add_feature(&features, VIRTIO_BLK_F_RO);
 }
 
 return features;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index a7b1b68..0f637db 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -474,7 +474,7 @@ static uint32_t get_features(VirtIODevice *vdev, uint32_t 
features)
 vser = VIRTIO_SERIAL(vdev);
 
 if (vser->bus.max_nr_ports > 1) {
-features |= (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+virtio_add_feature(&features, VIRTIO_CONSOLE_F_MULTIPORT);
 }
 return features;
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e574bd4..f1aa100 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -446,23 +446,23 @@ static uint32_t virtio_net_get_features(VirtIODevice 
*vdev, uint32_t features)
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
 
-features |= (1 << VIRTIO_NET_F_MAC);
+virtio_add_feature(&features, VIRTIO_NET_F_MAC);
 
 if (!peer_has_vnet_hdr(n)) {
-features &= ~(0x1 << VIRTIO_NET_F_CSUM);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_TSO4);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_TSO6);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_ECN);
+virtio_clear_feature(&features, VIRTIO_NET_F_CSUM);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO4);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO6);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_ECN);
 
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_CSUM);
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_TSO4);
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_TSO6);
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_ECN);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
 }
 
 if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_UFO);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_UFO);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_UFO);
 }
 
 if (!get_vhost_net(nc->peer)) {
@@ -477,11 +477,11 @@ static uint32_t virtio_net_bad_features(VirtIODevice 
*vdev)
 
 /* Linux kernel 2.6.25.  It understood MAC (as everyone must),
  * but also these: */
-features |= (1 << VIRTIO_NET_F_MAC);
-feat

[PATCH RFC v5 06/19] virtio: endianness checks for virtio 1.0 devices

2014-12-02 Thread Cornelia Huck
Add code that checks for the VERSION_1 feature bit in order to make
decisions about the device's endianness. This allows us to support
transitional devices.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c|6 +-
 include/hw/virtio/virtio-access.h |4 
 include/hw/virtio/virtio.h|8 ++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7f74ae5..8f69ffa 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -881,7 +881,11 @@ static bool virtio_device_endian_needed(void *opaque)
 VirtIODevice *vdev = opaque;
 
 assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev->device_endian != virtio_default_endian();
+if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+return vdev->device_endian != virtio_default_endian();
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
 }
 
 static const VMStateDescription vmstate_virtio_device_endian = {
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index 46456fd..ee28c21 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -19,6 +19,10 @@
 
 static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
 {
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
+}
 #if defined(TARGET_IS_BIENDIAN)
 return virtio_is_big_endian(vdev);
 #elif defined(TARGET_WORDS_BIGENDIAN)
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 08141c7..68c40db 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -297,7 +297,11 @@ static inline bool virtio_has_feature(VirtIODevice *vdev, 
unsigned int fbit)
 
 static inline bool virtio_is_big_endian(VirtIODevice *vdev)
 {
-assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
+return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
 }
 #endif
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Cornelia Huck
For virtio-1 devices, we allow a more complex queue layout that doesn't
require descriptor table and rings on a physically-contigous memory area:
add virtio_queue_set_rings() to allow transports to set this up.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c |   16 
 include/hw/virtio/virtio.h |2 ++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8f69ffa..508dccf 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
 {
 hwaddr pa = vq->pa;
 
+if (pa == -1ULL) {
+/*
+ * This is a virtio-1 style vq that has already been setup
+ * in virtio_queue_set.
+ */
+return;
+}
 vq->vring.desc = pa;
 vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
 vq->vring.used = vring_align(vq->vring.avail +
@@ -717,6 +724,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
 return vdev->vq[n].pa;
 }
 
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used)
+{
+vdev->vq[n].pa = -1ULL;
+vdev->vq[n].vring.desc = desc;
+vdev->vq[n].vring.avail = avail;
+vdev->vq[n].vring.used = used;
+}
+
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
 {
 /* Don't allow guest to flip queue between existent and
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 68c40db..80ee313 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -224,6 +224,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, 
hwaddr addr);
 hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n);
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num);
 int virtio_queue_get_num(VirtIODevice *vdev, int n);
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used);
 void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 05/19] virtio: support more feature bits

2014-12-02 Thread Cornelia Huck
With virtio-1, we support more than 32 feature bits. Let's extend both
host and guest features to 64, which should suffice for a while.

vhost and migration have been ignored for now.

Signed-off-by: Cornelia Huck 
---
 hw/9pfs/virtio-9p-device.c  |2 +-
 hw/block/virtio-blk.c   |2 +-
 hw/char/virtio-serial-bus.c |2 +-
 hw/net/virtio-net.c |   22 +--
 hw/s390x/s390-virtio-bus.c  |3 ++-
 hw/s390x/s390-virtio-bus.h  |2 +-
 hw/s390x/virtio-ccw.c   |   40 --
 hw/s390x/virtio-ccw.h   |5 +
 hw/scsi/vhost-scsi.c|3 +--
 hw/scsi/virtio-scsi.c   |4 ++--
 hw/virtio/virtio-balloon.c  |2 +-
 hw/virtio/virtio-bus.c  |6 ++---
 hw/virtio/virtio-mmio.c |4 ++--
 hw/virtio/virtio-pci.c  |3 ++-
 hw/virtio/virtio-pci.h  |2 +-
 hw/virtio/virtio-rng.c  |2 +-
 hw/virtio/virtio.c  |   13 ++-
 include/hw/qdev-properties.h|   11 ++
 include/hw/virtio/virtio-bus.h  |8 +++
 include/hw/virtio/virtio-net.h  |   46 +++
 include/hw/virtio/virtio-scsi.h |6 ++---
 include/hw/virtio/virtio.h  |   38 ++--
 22 files changed, 126 insertions(+), 100 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 30492ec..60f9ff9 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -21,7 +21,7 @@
 #include "virtio-9p-coth.h"
 #include "hw/virtio/virtio-access.h"
 
-static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t virtio_9p_get_features(VirtIODevice *vdev, uint64_t features)
 {
 virtio_add_feature(&features, VIRTIO_9P_MOUNT_TAG);
 return features;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 27f263a..9cfae66 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -564,7 +564,7 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 aio_context_release(blk_get_aio_context(s->blk));
 }
 
-static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index d49883f..2d2ed9c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -467,7 +467,7 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 {
 }
 
-static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t get_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIOSerial *vser;
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9f3c58a..d6d1b98 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -38,16 +38,16 @@
 (offsetof(container, field) + sizeof(((container *)0)->field))
 
 typedef struct VirtIOFeature {
-uint32_t flags;
+uint64_t flags;
 size_t end;
 } VirtIOFeature;
 
 static VirtIOFeature feature_sizes[] = {
-{.flags = 1 << VIRTIO_NET_F_MAC,
+{.flags = 1ULL << VIRTIO_NET_F_MAC,
  .end = endof(struct virtio_net_config, mac)},
-{.flags = 1 << VIRTIO_NET_F_STATUS,
+{.flags = 1ULL << VIRTIO_NET_F_STATUS,
  .end = endof(struct virtio_net_config, status)},
-{.flags = 1 << VIRTIO_NET_F_MQ,
+{.flags = 1ULL << VIRTIO_NET_F_MQ,
  .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
 {}
 };
@@ -441,7 +441,7 @@ static void virtio_net_set_queues(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
-static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
@@ -471,9 +471,9 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, 
uint32_t features)
 return vhost_net_get_features(get_vhost_net(nc->peer), features);
 }
 
-static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
+static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
 {
-uint32_t features = 0;
+uint64_t features = 0;
 
 /* Linux kernel 2.6.25.  It understood MAC (as everyone must),
  * but also these: */
@@ -496,7 +496,7 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
 !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)));
 }
 
-static uint64_t virtio_net_guest_offloads_by_features(uint32_t features)
+static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
 {
 static const uint64_t guest_offloads_mask =
 (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
@@ -514,7 +514,7 @@ static inline uint64_t 
virtio_net_supported_guest_offloads(VirtIONet *n)
 return virtio_net_guest_offloads_by_feat

[PATCH RFC v5 04/19] virtio: add feature checking helpers

2014-12-02 Thread Cornelia Huck
Add a helper function for checking whether a bit is set in the guest
features for a vdev as well as one that works on a feature bit set.

Convert code that open-coded this: It cleans up the code and makes it
easier to extend the guest feature bits.

Signed-off-by: Cornelia Huck 
---
 hw/block/virtio-blk.c   |7 ++-
 hw/char/virtio-serial-bus.c |2 +-
 hw/net/virtio-net.c |   23 +--
 hw/scsi/virtio-scsi.c   |8 
 hw/virtio/dataplane/vring.c |   10 +-
 hw/virtio/virtio-balloon.c  |2 +-
 hw/virtio/virtio.c  |   10 +-
 include/hw/virtio/virtio.h  |   11 +++
 8 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 3f76e2a..27f263a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -590,7 +590,6 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, 
uint32_t features)
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
-uint32_t features;
 
 if (s->dataplane && !(status & (VIRTIO_CONFIG_S_DRIVER |
 VIRTIO_CONFIG_S_DRIVER_OK))) {
@@ -601,8 +600,6 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 return;
 }
 
-features = vdev->guest_features;
-
 /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send
  * cache flushes.  Thus, the "auto writethrough" behavior is never
  * necessary for guests that support the VIRTIO_BLK_F_CONFIG_WCE feature.
@@ -618,10 +615,10 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
  *
  * s->blk would erroneously be placed in writethrough mode.
  */
-if (!(features & (1 << VIRTIO_BLK_F_CONFIG_WCE))) {
+if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) {
 aio_context_acquire(blk_get_aio_context(s->blk));
 blk_set_enable_write_cache(s->blk,
-   !!(features & (1 << VIRTIO_BLK_F_WCE)));
+   virtio_has_feature(vdev, VIRTIO_BLK_F_WCE));
 aio_context_release(blk_get_aio_context(s->blk));
 }
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 0f637db..d49883f 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -75,7 +75,7 @@ static VirtIOSerialPort *find_port_by_name(char *name)
 static bool use_multiport(VirtIOSerial *vser)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(vser);
-return vdev->guest_features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+return virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static size_t write_to_port(VirtIOSerialPort *port,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f1aa100..9f3c58a 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -86,7 +86,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 
 memcpy(&netcfg, config, n->config_size);
 
-if (!(vdev->guest_features >> VIRTIO_NET_F_CTRL_MAC_ADDR & 1) &&
+if (!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR) &&
 memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
 memcpy(n->mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
@@ -305,7 +305,7 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 info->multicast_table = str_list;
 info->vlan_table = get_vlan_table(n);
 
-if (!((1 << VIRTIO_NET_F_CTRL_VLAN) & vdev->guest_features)) {
+if (!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VLAN)) {
 info->vlan = RX_STATE_ALL;
 } else if (!info->vlan_table) {
 info->vlan = RX_STATE_NONE;
@@ -519,9 +519,12 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint32_t features)
 VirtIONet *n = VIRTIO_NET(vdev);
 int i;
 
-virtio_net_set_multiqueue(n, !!(features & (1 << VIRTIO_NET_F_MQ)));
+virtio_net_set_multiqueue(n,
+  __virtio_has_feature(features, VIRTIO_NET_F_MQ));
 
-virtio_net_set_mrg_rx_bufs(n, !!(features & (1 << 
VIRTIO_NET_F_MRG_RXBUF)));
+virtio_net_set_mrg_rx_bufs(n,
+   __virtio_has_feature(features,
+VIRTIO_NET_F_MRG_RXBUF));
 
 if (n->has_vnet_hdr) {
 n->curr_guest_offloads =
@@ -538,7 +541,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint32_t features)
 vhost_net_ack_features(get_vhost_net(nc->peer), features);
 }
 
-if ((1 << VIRTIO_NET_F_CTRL_VLAN) & features) {
+if (__virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) {
 memset(n->vlans, 0, MAX_VLAN >> 3);
 } else {
 memset(n->vlans, 0xff, MAX_VLAN >> 3);
@@ -585,7 +588,7 @@ static int virtio_net_handle_offloads(VirtIONet *n, uint8_t 
cmd,
 uint64_t offloads;
 size_t s;
 
-if (!((1 << VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &

[PATCH RFC v5 10/19] s390x/virtio-ccw: add virtio set-revision call

2014-12-02 Thread Cornelia Huck
From: Thomas Huth 

Handle the virtio-ccw revision according to what the guest sets.
When revision 1 is selected, we have a virtio-1 standard device
with byteswapping for the virtio rings.

When a channel gets disabled, we have to revert to the legacy behavior
in case the next user of the device does not negotiate the revision 1
anymore (e.g. the boot firmware uses revision 1, but the operating
system only uses the legacy mode).

Note that revisions > 0 are still disabled.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |   52 +
 hw/s390x/virtio-ccw.h |5 +
 2 files changed, 57 insertions(+)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index e434718..5311d9f 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -20,9 +20,11 @@
 #include "hw/virtio/virtio-net.h"
 #include "hw/sysbus.h"
 #include "qemu/bitops.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/s390x/adapter.h"
 #include "hw/s390x/s390_flic.h"
+#include "linux/virtio_config.h"
 
 #include "ioinst.h"
 #include "css.h"
@@ -260,6 +262,12 @@ typedef struct VirtioThinintInfo {
 uint8_t isc;
 } QEMU_PACKED VirtioThinintInfo;
 
+typedef struct VirtioRevInfo {
+uint16_t revision;
+uint16_t length;
+uint8_t data[0];
+} QEMU_PACKED VirtioRevInfo;
+
 /* Specify where the virtqueues for the subchannel are in guest memory. */
 static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
   uint16_t index, uint16_t num)
@@ -299,6 +307,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 {
 int ret;
 VqInfoBlock info;
+VirtioRevInfo revinfo;
 uint8_t status;
 VirtioFeatDesc features;
 void *config;
@@ -375,6 +384,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 features.features = (uint32_t)dev->host_features;
 } else if (features.index == 1) {
 features.features = (uint32_t)(dev->host_features >> 32);
+/*
+ * Don't offer version 1 to the guest if it did not
+ * negotiate at least revision 1.
+ */
+if (dev->revision <= 0) {
+features.features &= ~(1 << (VIRTIO_F_VERSION_1 - 32));
+}
 } else {
 /* Return zeroes if the guest supports more feature bits. */
 features.features = 0;
@@ -406,6 +422,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 (vdev->guest_features & 
0x) |
 features.features);
 } else if (features.index == 1) {
+/*
+ * The guest should not set version 1 if it didn't
+ * negotiate a revision >= 1.
+ */
+if (dev->revision <= 0) {
+features.features &= ~(1 << (VIRTIO_F_VERSION_1 - 32));
+}
 virtio_set_features(vdev,
 (vdev->guest_features & 
0x) |
 ((uint64_t)features.features << 32));
@@ -608,6 +631,25 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 }
 break;
+case CCW_CMD_SET_VIRTIO_REV:
+len = sizeof(revinfo);
+if (ccw.count < len || (check_len && ccw.count > len)) {
+ret = -EINVAL;
+break;
+}
+if (!ccw.cda) {
+ret = -EFAULT;
+break;
+}
+cpu_physical_memory_read(ccw.cda, &revinfo, len);
+if (dev->revision >= 0 ||
+revinfo.revision > VIRTIO_CCW_REV_MAX) {
+ret = -ENOSYS;
+break;
+}
+ret = 0;
+dev->revision = revinfo.revision;
+break;
 default:
 ret = -ENOSYS;
 break;
@@ -615,6 +657,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 return ret;
 }
 
+static void virtio_sch_disable_cb(SubchDev *sch)
+{
+VirtioCcwDevice *dev = sch->driver_data;
+
+dev->revision = -1;
+}
+
 static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev)
 {
 unsigned int cssid = 0;
@@ -740,6 +789,7 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 css_sch_build_virtual_schib(sch, 0, VIRTIO_CCW_CHPID_TYPE);
 
 sch->ccw_cb = virtio_ccw_cb;
+sch->disable_cb = virtio_sch_disable_cb;
 
 /* Build senseid data. */
 memset(&sch->id, 0, sizeof(SenseId));
@@ -747,6 +797,8 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 sch->id.cu_type = VIRTIO_CCW_CU_TYPE;
 sch->id.cu_model = vdev->device_id;
 
+dev->revision = -1;
+
 /* Set default feature bits that are offered by the host. */
 dev->host_features = 0;
 virtio_add_feature(&dev->host_featur

[PATCH RFC v5 08/19] dataplane: allow virtio-1 devices

2014-12-02 Thread Cornelia Huck
Handle endianness conversion for virtio-1 virtqueues correctly.

Note that dataplane now needs to be built per-target.

Signed-off-by: Cornelia Huck 
---
 hw/block/dataplane/virtio-blk.c   |4 +-
 hw/scsi/virtio-scsi-dataplane.c   |2 +-
 hw/virtio/Makefile.objs   |2 +-
 hw/virtio/dataplane/Makefile.objs |2 +-
 hw/virtio/dataplane/vring.c   |   86 ++---
 include/hw/virtio/dataplane/vring-accessors.h |   75 +
 include/hw/virtio/dataplane/vring.h   |   14 +---
 7 files changed, 131 insertions(+), 54 deletions(-)
 create mode 100644 include/hw/virtio/dataplane/vring-accessors.h

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 1222a37..2d8cc15 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -16,7 +16,9 @@
 #include "qemu/iov.h"
 #include "qemu/thread.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/dataplane/vring.h"
+#include "hw/virtio/dataplane/vring-accessors.h"
 #include "sysemu/block-backend.h"
 #include "hw/virtio/virtio-blk.h"
 #include "virtio-blk.h"
@@ -75,7 +77,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlockDataPlane *s = req->dev->dataplane;
 stb_p(&req->in->status, status);
 
-vring_push(&req->dev->dataplane->vring, &req->elem,
+vring_push(s->vdev, &req->dev->dataplane->vring, &req->elem,
req->qiov.size + sizeof(*req->in));
 
 /* Suppress notification to guest by BH and its scheduled
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 03a1e8c..418d73b 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,7 +94,7 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(req->vring->parent);
 
-vring_push(&req->vring->vring, &req->elem,
+vring_push(vdev, &req->vring->vring, &req->elem,
req->qsgl.size + req->resp_iov.size);
 
 if (vring_should_notify(vdev, &req->vring->vring)) {
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index d21c397..19b224a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o
 common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 common-obj-y += virtio-bus.o
 common-obj-y += virtio-mmio.o
-common-obj-$(CONFIG_VIRTIO) += dataplane/
+obj-$(CONFIG_VIRTIO) += dataplane/
 
 obj-y += virtio.o virtio-balloon.o 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
diff --git a/hw/virtio/dataplane/Makefile.objs 
b/hw/virtio/dataplane/Makefile.objs
index 9a8cfc0..753a9ca 100644
--- a/hw/virtio/dataplane/Makefile.objs
+++ b/hw/virtio/dataplane/Makefile.objs
@@ -1 +1 @@
-common-obj-y += vring.o
+obj-y += vring.o
diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 6e283fc..a44c8c8 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -18,7 +18,9 @@
 #include "hw/hw.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/dataplane/vring.h"
+#include "hw/virtio/dataplane/vring-accessors.h"
 #include "qemu/error-report.h"
 
 /* vring_map can be coupled with vring_unmap or (if you still have the
@@ -83,7 +85,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
 
 vring->last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n);
-vring->last_used_idx = vring->vr.used->idx;
+vring->last_used_idx = vring_get_used_idx(vdev, vring);
 vring->signalled_used = 0;
 vring->signalled_used_valid = false;
 
@@ -104,7 +106,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n)
 void vring_disable_notification(VirtIODevice *vdev, Vring *vring)
 {
 if (!virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
+vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 }
 
@@ -117,10 +119,10 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring 
*vring)
 if (virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
 vring_avail_event(&vring->vr) = vring->vr.avail->idx;
 } else {
-vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 smp_mb(); /* ensure update is seen before reading avail_idx */
-return !vring_more_avail(vring);
+return !vring_more_avail(vdev, vring);
 }
 
 /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
@@ -134,12 +136,13 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 smp_mb();
 
 if (virtio_has_feature(vdev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
-unlikely(vring->vr.avail->idx == vring->last_av

[PATCH RFC v5 09/19] s390x/css: Add a callback for when subchannel gets disabled

2014-12-02 Thread Cornelia Huck
From: Thomas Huth 

We need a possibility to run code when a subchannel gets disabled.
This patch adds the necessary infrastructure.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c |   12 
 hw/s390x/css.h |1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index b67c039..735ec55 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -588,6 +588,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 {
 SCSW *s = &sch->curr_status.scsw;
 PMCW *p = &sch->curr_status.pmcw;
+uint16_t oldflags;
 int ret;
 SCHIB schib;
 
@@ -610,6 +611,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 copy_schib_from_guest(&schib, orig_schib);
 /* Only update the program-modifiable fields. */
 p->intparm = schib.pmcw.intparm;
+oldflags = p->flags;
 p->flags &= ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
   PMCW_FLAGS_MASK_MP);
@@ -625,6 +627,12 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 (PMCW_CHARS_MASK_MBFC | PMCW_CHARS_MASK_CSENSE);
 sch->curr_status.mba = schib.mba;
 
+/* Has the channel been disabled? */
+if (sch->disable_cb && (oldflags & PMCW_FLAGS_MASK_ENA) != 0
+&& (p->flags & PMCW_FLAGS_MASK_ENA) == 0) {
+sch->disable_cb(sch);
+}
+
 ret = 0;
 
 out:
@@ -1443,6 +1451,10 @@ void css_reset_sch(SubchDev *sch)
 {
 PMCW *p = &sch->curr_status.pmcw;
 
+if ((p->flags & PMCW_FLAGS_MASK_ENA) != 0 && sch->disable_cb) {
+sch->disable_cb(sch);
+}
+
 p->intparm = 0;
 p->flags &= ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 33104ac..7fa807b 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -81,6 +81,7 @@ struct SubchDev {
 uint8_t ccw_no_data_cnt;
 /* transport-provided data: */
 int (*ccw_cb) (SubchDev *, CCW1);
+void (*disable_cb)(SubchDev *);
 SenseId id;
 void *driver_data;
 };
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 13/19] virtio: allow to fail setting status

2014-12-02 Thread Cornelia Huck
virtio-1 allow setting of the FEATURES_OK status bit to fail if
the negotiated feature bits are inconsistent: let's fail
virtio_set_status() in that case and update virtio-ccw to post an
error to the guest.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |   20 
 hw/virtio/virtio.c |   24 +++-
 include/hw/virtio/virtio.h |3 ++-
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 75c9ff9..ec492b8 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -555,15 +555,19 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
 virtio_ccw_stop_ioeventfd(dev);
 }
-virtio_set_status(vdev, status);
-if (vdev->status == 0) {
-virtio_reset(vdev);
-}
-if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
-virtio_ccw_start_ioeventfd(dev);
+if (virtio_set_status(vdev, status) == 0) {
+if (vdev->status == 0) {
+virtio_reset(vdev);
+}
+if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+virtio_ccw_start_ioeventfd(dev);
+}
+sch->curr_status.scsw.count = ccw.count - sizeof(status);
+ret = 0;
+} else {
+/* Trigger a command reject. */
+ret = -ENOSYS;
 }
-sch->curr_status.scsw.count = ccw.count - sizeof(status);
-ret = 0;
 }
 break;
 case CCW_CMD_SET_IND:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4f2dc48..be128f7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -548,15 +548,37 @@ void virtio_update_irq(VirtIODevice *vdev)
 virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
 }
 
-void virtio_set_status(VirtIODevice *vdev, uint8_t val)
+static int virtio_validate_features(VirtIODevice *vdev)
+{
+VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+if (k->validate_features) {
+return k->validate_features(vdev);
+} else {
+return 0;
+}
+}
+
+int virtio_set_status(VirtIODevice *vdev, uint8_t val)
 {
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 trace_virtio_set_status(vdev, val);
 
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
+val & VIRTIO_CONFIG_S_FEATURES_OK) {
+int ret = virtio_validate_features(vdev);
+
+if (ret) {
+return ret;
+}
+}
+}
 if (k->set_status) {
 k->set_status(vdev, val);
 }
 vdev->status = val;
+return 0;
 }
 
 bool target_words_bigendian(void);
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9a984c2..e7bedd1 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -149,6 +149,7 @@ typedef struct VirtioDeviceClass {
 uint64_t (*get_features)(VirtIODevice *vdev, uint64_t requested_features);
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
+int (*validate_features)(VirtIODevice *vdev);
 void (*get_config)(VirtIODevice *vdev, uint8_t *config);
 void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
 void (*reset)(VirtIODevice *vdev);
@@ -232,7 +233,7 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
-void virtio_set_status(VirtIODevice *vdev, uint8_t val);
+int virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
 int virtio_set_features(VirtIODevice *vdev, uint64_t val);
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 14/19] s390x/virtio-ccw: enable virtio 1.0

2014-12-02 Thread Cornelia Huck
virtio-ccw should now have everything in place to operate virtio 1.0
devices, so let's enable revision 1.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index fe5c782..d40e3be 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -70,7 +70,7 @@ typedef struct VirtIOCCWDeviceClass {
 } VirtIOCCWDeviceClass;
 
 /* The maximum virtio revision we support. */
-#define VIRTIO_CCW_REV_MAX 0
+#define VIRTIO_CCW_REV_MAX 1
 
 /* Performance improves when virtqueue kick processing is decoupled from the
  * vcpu thread using ioeventfd for some devices. */
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 17/19] virtio-net: enable virtio 1.0

2014-12-02 Thread Cornelia Huck
virtio-net (non-vhost) now should have everything in place to support
virtio 1.0: let's enable the feature bit for it.

Note that VIRTIO_F_VERSION_1 is technically a transport feature; once
every device is ready for virtio 1.0, we can move setting this
feature bit out of the individual devices.

Signed-off-by: Cornelia Huck 
---
 hw/net/virtio-net.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ee2bd6..b5dd356 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -473,6 +473,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features)
 }
 
 if (!get_vhost_net(nc->peer)) {
+virtio_add_feature(&features, VIRTIO_F_VERSION_1);
 return features;
 }
 return vhost_net_get_features(get_vhost_net(nc->peer), features);
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 11/19] s390x/virtio-ccw: support virtio-1 set_vq format

2014-12-02 Thread Cornelia Huck
Support the new CCW_CMD_SET_VQ format for virtio-1 devices.

While we're at it, refactor the code a bit and enforce big endian
fields (which had always been required, even for legacy).

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |  114 ++---
 1 file changed, 80 insertions(+), 34 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 5311d9f..75c9ff9 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -238,11 +238,20 @@ VirtualCssBus *virtual_css_bus_init(void)
 }
 
 /* Communication blocks used by several channel commands. */
-typedef struct VqInfoBlock {
+typedef struct VqInfoBlockLegacy {
 uint64_t queue;
 uint32_t align;
 uint16_t index;
 uint16_t num;
+} QEMU_PACKED VqInfoBlockLegacy;
+
+typedef struct VqInfoBlock {
+uint64_t desc;
+uint32_t res0;
+uint16_t index;
+uint16_t num;
+uint64_t avail;
+uint64_t used;
 } QEMU_PACKED VqInfoBlock;
 
 typedef struct VqConfigBlock {
@@ -269,17 +278,20 @@ typedef struct VirtioRevInfo {
 } QEMU_PACKED VirtioRevInfo;
 
 /* Specify where the virtqueues for the subchannel are in guest memory. */
-static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
-  uint16_t index, uint16_t num)
+static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock *info,
+  VqInfoBlockLegacy *linfo)
 {
 VirtIODevice *vdev = virtio_ccw_get_vdev(sch);
+uint16_t index = info ? info->index : linfo->index;
+uint16_t num = info ? info->num : linfo->num;
+uint64_t desc = info ? info->desc : linfo->queue;
 
 if (index > VIRTIO_PCI_QUEUE_MAX) {
 return -EINVAL;
 }
 
 /* Current code in virtio.c relies on 4K alignment. */
-if (addr && (align != 4096)) {
+if (linfo && desc && (linfo->align != 4096)) {
 return -EINVAL;
 }
 
@@ -287,8 +299,12 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return -EINVAL;
 }
 
-virtio_queue_set_addr(vdev, index, addr);
-if (!addr) {
+if (info) {
+virtio_queue_set_rings(vdev, index, desc, info->avail, info->used);
+} else {
+virtio_queue_set_addr(vdev, index, desc);
+}
+if (!desc) {
 virtio_queue_set_vector(vdev, index, 0);
 } else {
 /* Fail if we don't have a big enough queue. */
@@ -303,10 +319,66 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return 0;
 }
 
-static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
+static int virtio_ccw_handle_set_vq(SubchDev *sch, CCW1 ccw, bool check_len,
+bool is_legacy)
 {
 int ret;
 VqInfoBlock info;
+VqInfoBlockLegacy linfo;
+size_t info_len = is_legacy ? sizeof(linfo) : sizeof(info);
+
+if (check_len) {
+if (ccw.count != info_len) {
+return -EINVAL;
+}
+} else if (ccw.count < info_len) {
+/* Can't execute command. */
+return -EINVAL;
+}
+if (!ccw.cda) {
+return -EFAULT;
+}
+if (is_legacy) {
+linfo.queue = ldq_be_phys(&address_space_memory, ccw.cda);
+linfo.align = ldl_be_phys(&address_space_memory,
+  ccw.cda + sizeof(linfo.queue));
+linfo.index = lduw_be_phys(&address_space_memory,
+   ccw.cda + sizeof(linfo.queue)
+   + sizeof(linfo.align));
+linfo.num = lduw_be_phys(&address_space_memory,
+ ccw.cda + sizeof(linfo.queue)
+ + sizeof(linfo.align)
+ + sizeof(linfo.index));
+ret = virtio_ccw_set_vqs(sch, NULL, &linfo);
+} else {
+info.desc = ldq_be_phys(&address_space_memory, ccw.cda);
+info.index = lduw_be_phys(&address_space_memory,
+  ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0));
+info.num = lduw_be_phys(&address_space_memory,
+ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0)
+  + sizeof(info.index));
+info.avail = ldq_be_phys(&address_space_memory,
+ ccw.cda + sizeof(info.desc)
+ + sizeof(info.res0)
+ + sizeof(info.index)
+ + sizeof(info.num));
+info.used = ldq_be_phys(&address_space_memory,
+ccw.cda + sizeof(info.desc)
++ sizeof(info.res0)
++ sizeof(info.index)
++ sizeof(info.num)
++ sizeof(info.avail));
+ret = virtio_ccw_set_vqs(sch, &info, NULL);
+}
+  

[PATCH RFC v5 12/19] virtio: disallow late feature changes for virtio-1

2014-12-02 Thread Cornelia Huck
For virtio-1 devices, the driver must not attempt to set feature bits
after it set FEATURES_OK in the device status. Simply reject it in
that case.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c |   16 ++--
 include/hw/virtio/virtio.h |2 ++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 508dccf..4f2dc48 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -980,7 +980,7 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 vmstate_save_state(f, &vmstate_virtio, vdev);
 }
 
-int virtio_set_features(VirtIODevice *vdev, uint64_t val)
+static int __virtio_set_features(VirtIODevice *vdev, uint64_t val)
 {
 BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
 VirtioBusClass *vbusk = VIRTIO_BUS_GET_CLASS(qbus);
@@ -996,6 +996,18 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 return bad ? -1 : 0;
 }
 
+int virtio_set_features(VirtIODevice *vdev, uint64_t val)
+{
+   /*
+ * The driver must not attempt to set features after feature negotiation
+ * has finished.
+ */
+if (vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) {
+return -EINVAL;
+}
+return __virtio_set_features(vdev, val);
+}
+
 int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
 {
 int i, ret;
@@ -1028,7 +1040,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 qemu_get_be32s(f, &features);
 
 /* XXX features >= 32 */
-if (virtio_set_features(vdev, features) < 0) {
+if (__virtio_set_features(vdev, features) < 0) {
 supported_features = k->get_features(qbus->parent);
 error_report("Features 0x%x unsupported. Allowed features: 0x%lx",
  features, supported_features);
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 80ee313..9a984c2 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -32,6 +32,8 @@
 #define VIRTIO_CONFIG_S_DRIVER  2
 /* Driver has used its parts of the config, and is happy */
 #define VIRTIO_CONFIG_S_DRIVER_OK   4
+/* Driver has finished configuring features */
+#define VIRTIO_CONFIG_S_FEATURES_OK 8
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED  0x80
 
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 19/19] virtio-blk: revision specific feature bits

2014-12-02 Thread Cornelia Huck
Wire up virtio-blk to provide different feature bit sets depending
on whether legacy or v1.0 has been requested.

Note that VERSION_1 is still disabled due to missing ANY_LAYOUT support.

Signed-off-by: Cornelia Huck 
---
 hw/block/virtio-blk.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9cfae66..fdc236a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -587,6 +587,24 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
*vdev, uint64_t features)
 return features;
 }
 
+static uint64_t virtio_blk_get_features_rev(VirtIODevice *vdev,
+uint64_t features,
+unsigned int revision)
+{
+if (revision == 0) {
+/* legacy */
+virtio_clear_feature(&features, VIRTIO_F_VERSION_1);
+return virtio_blk_get_features(vdev, features);
+}
+/* virtio 1.0 or later */
+virtio_clear_feature(&features, VIRTIO_BLK_F_SCSI);
+virtio_clear_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
+virtio_clear_feature(&features, VIRTIO_BLK_F_WCE);
+/* we're still missing ANY_LAYOUT */
+/* virtio_add_feature(&features, VIRTIO_F_VERSION_1); */
+return features;
+}
+
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
@@ -821,6 +839,7 @@ static void virtio_blk_class_init(ObjectClass *klass, void 
*data)
 vdc->get_config = virtio_blk_update_config;
 vdc->set_config = virtio_blk_set_config;
 vdc->get_features = virtio_blk_get_features;
+vdc->get_features_rev = virtio_blk_get_features_rev;
 vdc->set_status = virtio_blk_set_status;
 vdc->reset = virtio_blk_reset;
 vdc->save = virtio_blk_save_device;
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 15/19] virtio-net: no writeable mac for virtio-1

2014-12-02 Thread Cornelia Huck
Devices operating as virtio 1.0 may not allow writes to the mac
address in config space.

Signed-off-by: Cornelia Huck 
---
 hw/net/virtio-net.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index d6d1b98..ebbea60 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -87,6 +87,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 memcpy(&netcfg, config, n->config_size);
 
 if (!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR) &&
+!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
 memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
 memcpy(n->mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 16/19] virtio-net: support longer header

2014-12-02 Thread Cornelia Huck
virtio-1 devices always use num_buffers in the header, even if
mergeable rx buffers have not been negotiated.

Signed-off-by: Cornelia Huck 
---
 hw/net/virtio-net.c |   21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ebbea60..7ee2bd6 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -373,15 +373,21 @@ static int peer_has_ufo(VirtIONet *n)
 return n->has_ufo;
 }
 
-static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs)
+static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
+   int version_1)
 {
 int i;
 NetClientState *nc;
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
-n->guest_hdr_len = n->mergeable_rx_bufs ?
-sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct 
virtio_net_hdr);
+if (version_1) {
+n->guest_hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+} else {
+n->guest_hdr_len = n->mergeable_rx_bufs ?
+sizeof(struct virtio_net_hdr_mrg_rxbuf) :
+sizeof(struct virtio_net_hdr);
+}
 
 for (i = 0; i < n->max_queues; i++) {
 nc = qemu_get_subqueue(n->nic, i);
@@ -525,7 +531,9 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint64_t features)
 
 virtio_net_set_mrg_rx_bufs(n,
__virtio_has_feature(features,
-VIRTIO_NET_F_MRG_RXBUF));
+VIRTIO_NET_F_MRG_RXBUF),
+   __virtio_has_feature(features,
+VIRTIO_F_VERSION_1));
 
 if (n->has_vnet_hdr) {
 n->curr_guest_offloads =
@@ -1407,7 +1415,8 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 qemu_get_buffer(f, n->mac, ETH_ALEN);
 n->vqs[0].tx_waiting = qemu_get_be32(f);
 
-virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
+virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f),
+   virtio_has_feature(vdev, VIRTIO_F_VERSION_1));
 
 if (version_id >= 3)
 n->status = qemu_get_be16(f);
@@ -1653,7 +1662,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 n->vqs[0].tx_waiting = 0;
 n->tx_burst = n->net_conf.txburst;
-virtio_net_set_mrg_rx_bufs(n, 0);
+virtio_net_set_mrg_rx_bufs(n, 0, 0);
 n->promisc = 1; /* for compatibility */
 
 n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 18/19] virtio: support revision-specific features

2014-12-02 Thread Cornelia Huck
Devices may support different sets of feature bits depending on which
revision they're operating at. Let's give the transport a way to
re-query the device about its features when the revision has been
changed.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |   12 ++--
 hw/virtio/virtio-bus.c |   14 --
 include/hw/virtio/virtio-bus.h |3 +++
 include/hw/virtio/virtio.h |3 +++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ec492b8..3826074 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -699,6 +699,10 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 ret = 0;
 dev->revision = revinfo.revision;
+/* Re-evaluate which features the device wants to offer. */
+dev->host_features =
+virtio_bus_get_vdev_features_rev(&dev->bus, dev->host_features,
+ dev->revision >= 1 ? 1 : 0);
 break;
 default:
 ret = -ENOSYS;
@@ -712,6 +716,9 @@ static void virtio_sch_disable_cb(SubchDev *sch)
 VirtioCcwDevice *dev = sch->driver_data;
 
 dev->revision = -1;
+/* Reset the device's features to legacy. */
+dev->host_features =
+virtio_bus_get_vdev_features_rev(&dev->bus, dev->host_features, 0);
 }
 
 static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev)
@@ -854,8 +861,9 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 virtio_add_feature(&dev->host_features, VIRTIO_F_NOTIFY_ON_EMPTY);
 virtio_add_feature(&dev->host_features, VIRTIO_F_BAD_FEATURE);
 
-dev->host_features = virtio_bus_get_vdev_features(&dev->bus,
-  dev->host_features);
+/* All devices start in legacy mode. */
+dev->host_features =
+virtio_bus_get_vdev_features_rev(&dev->bus, dev->host_features, 0);
 
 css_generate_sch_crws(sch->cssid, sch->ssid, sch->schid,
   parent->hotplugged, 1);
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 32e3fab..a30826c 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -97,18 +97,28 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus)
 }
 
 /* Get the features of the plugged device. */
-uint64_t virtio_bus_get_vdev_features(VirtioBusState *bus,
-  uint64_t requested_features)
+uint64_t virtio_bus_get_vdev_features_rev(VirtioBusState *bus,
+  uint64_t requested_features,
+  unsigned int revision)
 {
 VirtIODevice *vdev = virtio_bus_get_device(bus);
 VirtioDeviceClass *k;
 
 assert(vdev != NULL);
 k = VIRTIO_DEVICE_GET_CLASS(vdev);
+if (revision > 0 && k->get_features_rev) {
+return k->get_features_rev(vdev, requested_features, revision);
+}
 assert(k->get_features != NULL);
 return k->get_features(vdev, requested_features);
 }
 
+uint64_t virtio_bus_get_vdev_features(VirtioBusState *bus,
+  uint64_t requested_features)
+{
+return virtio_bus_get_vdev_features_rev(bus, requested_features, 0);
+}
+
 /* Get bad features of the plugged device. */
 uint64_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus)
 {
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 0a4dde1..f0916ef 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -84,6 +84,9 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus);
 /* Get the features of the plugged device. */
 uint64_t virtio_bus_get_vdev_features(VirtioBusState *bus,
   uint64_t requested_features);
+uint64_t virtio_bus_get_vdev_features_rev(VirtioBusState *bus,
+  uint64_t requested_features,
+  unsigned int revision);
 /* Get bad features of the plugged device. */
 uint64_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus);
 /* Get config of the plugged device. */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index e7bedd1..f31e3df 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -147,6 +147,9 @@ typedef struct VirtioDeviceClass {
 DeviceRealize realize;
 DeviceUnrealize unrealize;
 uint64_t (*get_features)(VirtIODevice *vdev, uint64_t requested_features);
+uint64_t (*get_features_rev)(VirtIODevice *vdev,
+ uint64_t requested_features,
+ unsigned int revision);
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
 int (*validate_features)(VirtIODevice *vdev);
-- 
1.7.9.5

___
Virtualization mailing list
Virtu

Re: [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Michael S. Tsirkin
On Tue, Dec 02, 2014 at 02:00:15PM +0100, Cornelia Huck wrote:
> For virtio-1 devices, we allow a more complex queue layout that doesn't
> require descriptor table and rings on a physically-contigous memory area:
> add virtio_queue_set_rings() to allow transports to set this up.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  hw/virtio/virtio.c |   16 
>  include/hw/virtio/virtio.h |2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 8f69ffa..508dccf 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
>  {
>  hwaddr pa = vq->pa;
>  
> +if (pa == -1ULL) {
> +/*
> + * This is a virtio-1 style vq that has already been setup
> + * in virtio_queue_set.
> + */
> +return;
> +}
>  vq->vring.desc = pa;
>  vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
>  vq->vring.used = vring_align(vq->vring.avail +
> @@ -717,6 +724,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
>  return vdev->vq[n].pa;
>  }
>  
> +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> +hwaddr avail, hwaddr used)
> +{
> +vdev->vq[n].pa = -1ULL;
> +vdev->vq[n].vring.desc = desc;
> +vdev->vq[n].vring.avail = avail;
> +vdev->vq[n].vring.used = used;
> +}
> +
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
>  {
>  /* Don't allow guest to flip queue between existent and

pa == -1ULL tricks look quite ugly.
Can't we set desc/avail/used unconditionally, and drop
the pa value?

> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 68c40db..80ee313 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -224,6 +224,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, 
> hwaddr addr);
>  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n);
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num);
>  int virtio_queue_get_num(VirtIODevice *vdev, int n);
> +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> +hwaddr avail, hwaddr used);
>  void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
>  void virtio_queue_notify(VirtIODevice *vdev, int n);
>  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
> -- 
> 1.7.9.5
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Cornelia Huck
On Tue, 2 Dec 2014 16:46:28 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Dec 02, 2014 at 02:00:15PM +0100, Cornelia Huck wrote:
> > For virtio-1 devices, we allow a more complex queue layout that doesn't
> > require descriptor table and rings on a physically-contigous memory area:
> > add virtio_queue_set_rings() to allow transports to set this up.
> > 
> > Signed-off-by: Cornelia Huck 
> > ---
> >  hw/virtio/virtio.c |   16 
> >  include/hw/virtio/virtio.h |2 ++
> >  2 files changed, 18 insertions(+)
> > 
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 8f69ffa..508dccf 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
> >  {
> >  hwaddr pa = vq->pa;
> >  
> > +if (pa == -1ULL) {
> > +/*
> > + * This is a virtio-1 style vq that has already been setup
> > + * in virtio_queue_set.
> > + */
> > +return;
> > +}
> >  vq->vring.desc = pa;
> >  vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
> >  vq->vring.used = vring_align(vq->vring.avail +
> > @@ -717,6 +724,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> >  return vdev->vq[n].pa;
> >  }
> >  
> > +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> > +hwaddr avail, hwaddr used)
> > +{
> > +vdev->vq[n].pa = -1ULL;
> > +vdev->vq[n].vring.desc = desc;
> > +vdev->vq[n].vring.avail = avail;
> > +vdev->vq[n].vring.used = used;
> > +}
> > +
> >  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> >  {
> >  /* Don't allow guest to flip queue between existent and
> 
> pa == -1ULL tricks look quite ugly.
> Can't we set desc/avail/used unconditionally, and drop
> the pa value?

And have virtio_queue_get_addr() return desc? Let me see if I can come
up with a patch.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Cornelia Huck
On Tue, 2 Dec 2014 15:54:44 +0100
Cornelia Huck  wrote:

> On Tue, 2 Dec 2014 16:46:28 +0200
> "Michael S. Tsirkin"  wrote:

> > pa == -1ULL tricks look quite ugly.
> > Can't we set desc/avail/used unconditionally, and drop
> > the pa value?
> 
> And have virtio_queue_get_addr() return desc? Let me see if I can come
> up with a patch.

I came up with the following (untested) patch, which should hopefully
suit mmio as well. I haven't cared about migration yet.

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8f69ffa..ac3c615 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -69,7 +69,6 @@ typedef struct VRing
 struct VirtQueue
 {
 VRing vring;
-hwaddr pa;
 uint16_t last_avail_idx;
 /* Last used index value we have signalled on */
 uint16_t signalled_used;
@@ -92,12 +91,13 @@ struct VirtQueue
 };
 
 /* virt queue functions */
-static void virtqueue_init(VirtQueue *vq)
+static void virtqueue_update_rings(VirtQueue *vq)
 {
-hwaddr pa = vq->pa;
-
-vq->vring.desc = pa;
-vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
+if (!vq->vring.desc) {
+/* not yet setup -> nothing to do */
+return;
+}
+vq->vring.avail = vq->vring.desc + vq->vring.num * sizeof(VRingDesc);
 vq->vring.used = vring_align(vq->vring.avail +
  offsetof(VRingAvail, ring[vq->vring.num]),
  vq->vring.align);
@@ -605,7 +605,6 @@ void virtio_reset(void *opaque)
 vdev->vq[i].vring.avail = 0;
 vdev->vq[i].vring.used = 0;
 vdev->vq[i].last_avail_idx = 0;
-vdev->vq[i].pa = 0;
 vdev->vq[i].vector = VIRTIO_NO_VECTOR;
 vdev->vq[i].signalled_used = 0;
 vdev->vq[i].signalled_used_valid = false;
@@ -708,17 +707,34 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t 
addr, uint32_t data)
 
 void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
 {
-vdev->vq[n].pa = addr;
-virtqueue_init(&vdev->vq[n]);
+vdev->vq[n].vring.desc = addr;
+virtqueue_update_rings(&vdev->vq[n]);
 }
 
 hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
 {
-return vdev->vq[n].pa;
+return vdev->vq[n].vring.desc;
+}
+
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used)
+{
+vdev->vq[n].vring.desc = desc;
+vdev->vq[n].vring.avail = avail;
+vdev->vq[n].vring.used = used;
 }
 
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
 {
+/*
+ * For virtio-1 devices, the number of buffers may only be
+ * updated if the ring addresses have not yet been set up.
+ */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
+vdev->vq[n].vring.desc) {
+error_report("tried to modify buffer num for virtio-1 device");
+return;
+}
 /* Don't allow guest to flip queue between existent and
  * nonexistent states, or to set it to an invalid size.
  */
@@ -728,7 +744,7 @@ void virtio_queue_set_num(VirtIODevice *vdev, int n, int 
num)
 return;
 }
 vdev->vq[n].vring.num = num;
-virtqueue_init(&vdev->vq[n]);
+virtqueue_update_rings(&vdev->vq[n]);
 }
 
 int virtio_queue_get_num(VirtIODevice *vdev, int n)
@@ -748,6 +764,11 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align)
 BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 
+/* virtio-1 compliant devices cannot change the aligment */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+error_report("tried to modify queue alignment for virtio-1 device");
+return;
+}
 /* Check that the transport told us it was going to do this
  * (so a buggy transport will immediately assert rather than
  * silently failing to migrate this state)
@@ -755,7 +776,7 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align)
 assert(k->has_variable_vring_alignment);
 
 vdev->vq[n].vring.align = align;
-virtqueue_init(&vdev->vq[n]);
+virtqueue_update_rings(&vdev->vq[n]);
 }
 
 void virtio_queue_notify_vq(VirtQueue *vq)
@@ -949,7 +970,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 if (k->has_variable_vring_alignment) {
 qemu_put_be32(f, vdev->vq[i].vring.align);
 }
-qemu_put_be64(f, vdev->vq[i].pa);
+/* XXX virtio-1 devices */
+qemu_put_be64(f, vdev->vq[i].vring.desc);
 qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
 if (k->save_queue) {
 k->save_queue(qbus->parent, i, f);
@@ -1044,13 +1066,14 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 if (k->has_variable_vring_alignment) {
 vdev->vq[i].vring.align = qemu_get_be32(f);
 }
-vdev->vq[i].pa = qemu_get_be64(f);
+vdev->vq[i].vring.desc = qemu_get_be64(f);
 qemu_get_be16s(f, &vdev->vq[i].last_avail_idx);
   

Re: [PATCH net] Revert "drivers/net: Disable UFO through virtio" in macvtap and tun

2014-12-02 Thread Vlad Yasevich
On 11/11/2014 12:12 PM, Ben Hutchings wrote:
> This reverts commit 88e0e0e5aa722b193c8758c8b45d041de5316924 for
> the tap drivers, but leaves UFO disabled in virtio_net.
> 
> libvirt at least assumes that tap features will never be dropped
> in new kernel versions, and doing so prevents migration of VMs to
> the never kernel version while they are running with virtio net
> devices.
> 
> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
> Signed-off-by: Ben Hutchings 
> ---
> Compile-tested only.

I ran some migrations tests of different guests between the hosts
with 3.17 and a newly patched kernel and they all worked for me.

Tested-by: Vladislav Yasevich 

-vlad

> 
> Ben.
> 
>  drivers/net/macvtap.c | 13 -
>  drivers/net/tun.c | 19 ---
>  2 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 6f226de..aeaeb6d 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
>  static const struct proto_ops macvtap_socket_ops;
>  
>  #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
> -   NETIF_F_TSO6)
> +   NETIF_F_TSO6 | NETIF_F_UFO)
>  #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
>  #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
>  
> @@ -570,8 +570,6 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
>   gso_type = SKB_GSO_TCPV6;
>   break;
>   case VIRTIO_NET_HDR_GSO_UDP:
> - pr_warn_once("macvtap: %s: using disabled UFO feature; 
> please fix this program\n",
> -  current->comm);
>   gso_type = SKB_GSO_UDP;
>   if (skb->protocol == htons(ETH_P_IPV6))
>   ipv6_proxy_select_ident(skb);
> @@ -619,6 +617,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff 
> *skb,
>   vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>   else if (sinfo->gso_type & SKB_GSO_TCPV6)
>   vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> + else if (sinfo->gso_type & SKB_GSO_UDP)
> + vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
>   else
>   BUG();
>   if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> @@ -953,6 +953,9 @@ static int set_offload(struct macvtap_queue *q, unsigned 
> long arg)
>   if (arg & TUN_F_TSO6)
>   feature_mask |= NETIF_F_TSO6;
>   }
> +
> + if (arg & TUN_F_UFO)
> + feature_mask |= NETIF_F_UFO;
>   }
>  
>   /* tun/tap driver inverts the usage for TSO offloads, where
> @@ -963,7 +966,7 @@ static int set_offload(struct macvtap_queue *q, unsigned 
> long arg)
>* When user space turns off TSO, we turn off GSO/LRO so that
>* user-space will not receive TSO frames.
>*/
> - if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
> + if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
>   features |= RX_OFFLOADS;
>   else
>   features &= ~RX_OFFLOADS;
> @@ -1064,7 +1067,7 @@ static long macvtap_ioctl(struct file *file, unsigned 
> int cmd,
>   case TUNSETOFFLOAD:
>   /* let the user check for future flags */
>   if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> - TUN_F_TSO_ECN))
> + TUN_F_TSO_ECN | TUN_F_UFO))
>   return -EINVAL;
>  
>   rtnl_lock();
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 7302398..a0987d1 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -175,7 +175,7 @@ struct tun_struct {
>   struct net_device   *dev;
>   netdev_features_t   set_features;
>  #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> -   NETIF_F_TSO6)
> +   NETIF_F_TSO6|NETIF_F_UFO)
>  
>   int vnet_hdr_sz;
>   int sndbuf;
> @@ -1152,20 +1152,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
>   skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>   break;
>   case VIRTIO_NET_HDR_GSO_UDP:
> - {
> - static bool warned;
> -
> - if (!warned) {
> - warned = true;
> - netdev_warn(tun->dev,
> - "%s: using disabled UFO feature; 
> please fix this program\n",
> - current->comm);
> - }
>   skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>   if (skb->protocol == htons(ETH_P_IPV

Re: [PATCH RFC 1/2] virtio_balloon: convert to virtio 1.0 endian-ness

2014-12-02 Thread Cornelia Huck
On Tue, 2 Dec 2014 13:44:06 +0200
"Michael S. Tsirkin"  wrote:

> balloon device is not part of virtio 1.0 spec.  Still, it's easy enough
> to make it handle endian-ness exactly as other virtio 1.0 devices: what
> we gain from this, is that there's no need to special-case it in virtio
> core.

Well, the balloon is weird in a number of ways, including its always
little-endian config space.

But I'm not quite sure the spec covers this?

> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/uapi/linux/virtio_balloon.h | 5 +++--
>  drivers/virtio/virtio_balloon.c | 4 ++--
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 

>  struct virtio_balloon_stat {
> - __u16 tag;
> - __u64 val;
> + __virtio16 tag;
> + __virtio64 val;
>  } __attribute__((packed));

Would the respective fields in the spec need updating? While it is
actually talking about legacy requirements, the fields are not
specified as __virtio{16,64}.

Also, is changing the stat fields enough? I've not looked into balloon
operation, but does the payload need some endianess conversion as well?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Michael S. Tsirkin
On Tue, Dec 02, 2014 at 04:41:36PM +0100, Cornelia Huck wrote:
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
>  {
> +/*
> + * For virtio-1 devices, the number of buffers may only be
> + * updated if the ring addresses have not yet been set up.

Where does it say that?

> + */
> +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> +vdev->vq[n].vring.desc) {
> +error_report("tried to modify buffer num for virtio-1 device");
> +return;
> +}
>  /* Don't allow guest to flip queue between existent and
>   * nonexistent states, or to set it to an invalid size.
>   */
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM

2014-12-02 Thread Konrad Rzeszutek Wilk
On Wed, Oct 29, 2014 at 04:19:10PM -0400, Waiman Long wrote:
> This patch adds the necessary KVM specific code to allow KVM to
> support the CPU halting and kicking operations needed by the queue
> spinlock PV code.
> 
> Two KVM guests of 20 CPU cores (2 nodes) were created for performance
> testing in one of the following three configurations:
>  1) Only 1 VM is active
>  2) Both VMs are active and they share the same 20 physical CPUs
> (200% overcommit)
> 
> The tests run included the disk workload of the AIM7 benchmark on
> both ext4 and xfs RAM disks at 3000 users on a 3.17 based kernel. The
> "ebizzy -m" test and futextest was was also run and its performance
> data were recorded.  With two VMs running, the "idle=poll" kernel
> option was added to simulate a busy guest. If PV qspinlock is not
> enabled, unfairlock will be used automically in a guest.

What is the unfairlock? Isn't it just using a bytelock at this point?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread Pankaj Gupta

> 
> On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang wrote:
> > 
> > 
> > On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin  wrote:
> > >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> > >> On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang 
> > >>wrote:
> > >> >
> > >> >
> > >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin 
> > >>wrote:
> > >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> > >> >>> Hello:
> > >> >>>  We used to orphan packets before transmission for virtio-net. This
> > >> >>>breaks
> > >> >>> socket accounting and can lead serveral functions won't work, e.g:
> > >> >>>  - Byte Queue Limit depends on tx completion nofication to work.
> > >> >>> - Packet Generator depends on tx completion nofication for the last
> > >> >>>   transmitted packet to complete.
> > >> >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
> > >> >>>work.
> > >> >>>  This series tries to solve the issue by enabling tx interrupts. To
> > >> >>>minize
> > >> >>> the performance impacts of this, several optimizations were used:
> > >> >>>  - In guest side, virtqueue_enable_cb_delayed() was used to delay
> > >>the
> > >> >>>tx
> > >> >>>   interrupt untile 3/4 pending packets were sent.
> > >> >>> - In host side, interrupt coalescing were used to reduce tx
> > >> >>>interrupts.
> > >> >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
> > >> >>>  - For guest receiving. No obvious regression on throughput were
> > >> >>>   noticed. More cpu utilization were noticed in few cases.
> > >> >>> - For guest transmission. Very huge improvement on througput for
> > >> >>>small
> > >> >>>   packet transmission were noticed. This is expected since TSQ and
> > >> >>>other
> > >> >>>   optimization for small packet transmission work after tx
> > >>interrupt.
> > >> >>>But
> > >> >>>   will use more cpu for large packets.
> > >> >>> - For TCP_RR, regression (10% on transaction rate and cpu
> > >> >>>utilization) were
> > >> >>>   found. Tx interrupt won't help but cause overhead in this case.
> > >> >>>Using
> > >> >>>   more aggressive coalescing parameters may help to reduce the
> > >> >>>regression.
> > >> >>
> > >> >>OK, you do have posted coalescing patches - does it help any?
> > >> >
> > >> >Helps a lot.
> > >> >
> > >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> > >> >For small packet TX, it increases 33% - 245% throughput. (reduce about
> > >>60%
> > >> >inters)
> > >> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx
> > >>intrs)
> > >> >
> > >> >>
> > >> >>I'm not sure the regression is due to interrupts.
> > >> >>It would make sense for CPU but why would it
> > >> >>hurt transaction rate?
> > >> >
> > >> >Anyway guest need to take some cycles to handle tx interrupts.
> > >> >And transaction rate does increase if we coalesces more tx interurpts.
> > >> >>
> > >> >>
> > >> >>It's possible that we are deferring kicks too much due to BQL.
> > >> >>
> > >> >>As an experiment: do we get any of it back if we do
> > >> >>-if (kick || netif_xmit_stopped(txq))
> > >> >>-virtqueue_kick(sq->vq);
> > >> >>+virtqueue_kick(sq->vq);
> > >> >>?
> > >> >
> > >> >
> > >> >I will try, but during TCP_RR, at most 1 packets were pending,
> > >> >I suspect if BQL can help in this case.
> > >> Looks like this helps a lot in multiple sessions of TCP_RR.
> > >
> > >so what's faster
> > >   BQL + kick each packet
> > >   no BQL
> > >?
> > 
> > Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show obvious
> > differences.
> > 
> > May need a complete benchmark to see.
> 
> Okay so going forward something like BQL + kick each packet
> might be a good solution.
> The advantage of BQL is that it works without GSO.
> For example, now that we don't do UFO, you might
> see significant gains with UDP.

If I understand correctly, it can also help for small packet
regr. in multiqueue scenario? Would be nice to see the perf. numbers
with multi-queue for small packets streams.
> 
> 
> > >
> > >
> > >> How about move the BQL patch out of this series?
> > >> Let's first converge tx interrupt and then introduce it?
> > >> (e.g with kicking after queuing X bytes?)
> > >
> > >Sounds good.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM

2014-12-02 Thread Thomas Gleixner
On Wed, 29 Oct 2014, Waiman Long wrote:
> AIM7 XFS Disk Test (no overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 25423737.08   98.95   5.44
>   PV qspinlock  25495757.06   98.63   5.40
>   unfairlock  26162796.91   97.05   5.42
> 
> AIM7 XFS Disk Test (200% overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 64446827.93  415.22   6.33
>   PV qspinlock  64562427.88  419.84   0.39

That number is made up by what? 

>   unfairlock  69551825.88  377.40   4.09
> 
> AIM7 EXT4 Disk Test (no overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 19955659.02  103.67   5.76
>   PV qspinlock  20111738.95  102.15   5.40
>   unfairlock  20665908.71   98.13   5.46
> 
> AIM7 EXT4 Disk Test (200% overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 47834137.63  495.81  30.78
>   PV qspinlock  47405837.97  475.74  30.95
>   unfairlock  56022432.13  398.43  26.27
> 
> For the AIM7 disk workload, both PV ticketlock and qspinlock have
> about the same performance. The unfairlock performs slightly better
> than the PV lock.

Slightly?

Taking the PV locks, which are basically the same for the existing
ticket locks and your new fangled qlocks as a reference then the so
called 'unfair locks' which are just the native locks w/o the PV
nonsense are fundamentally better up to a whopping 18% in the
ext4/200% overcommit case. See below.
 
> EBIZZY-m Test (no overcommit)
>   kernelRec/s   Real Time   Sys TimeUsr Time
>   - -   -   
>   PV ticketlock 3255  10.00   60.65   3.62
>   PV qspinlock  3318  10.00   54.27   3.60
>   unfairlock  2833  10.00   26.66   3.09
> 
> EBIZZY-m Test (200% overcommit)
>   kernelRec/s   Real Time   Sys TimeUsr Time
>   - -   -   
>   PV ticketlock  841  10.00   71.03   2.37
>   PV qspinlock   834  10.00   68.27   2.39
>   unfairlock   865  10.00   27.08   1.51
> 
>   futextest (no overcommit)
>   kernel   kops/s
>   ---
>   PV ticketlock11523
>   PV qspinlock 12328
>   unfairlock  9478
> 
>   futextest (200% overcommit)
>   kernel   kops/s
>   ---
>   PV ticketlock 7276
>   PV qspinlock  7095
>   unfairlock  5614
> 
> The ebizzy and futextest have much higher spinlock contention than
> the AIM7 disk workload. In this case, the unfairlock performs worse
> than both the PV ticketlock and qspinlock. The performance of the 2
> PV locks are comparable.

While I can see that the PV lock stuff performs 13% better for the
ebizzy no overcommit case, what about the very interresting numbers
for the same test with 200% overcommit?

The regular lock has a slightly better performance, but significantly
less sys/usr time. How do you explain that?

'Lies, damned lies and statistics' comes to my mind.

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 08/50] virtio: memory access APIs

2014-12-02 Thread Prabhakar Lad
Hi Michael,

Thanks for the patch.

On Mon, Dec 1, 2014 at 4:03 PM, Michael S. Tsirkin  wrote:
> virtio 1.0 makes all memory structures LE, so
[Snip]
> +/*
> + * Low-level memory accessors for handling virtio in modern little endian 
> and in
> + * compatibility native endian format.
> + */
> +
> +static inline u16 __virtio16_to_cpu(bool little_endian, __virtio16 val)
> +{
> +   if (little_endian)
> +   return le16_to_cpu((__force __le16)val);
> +   else
> +   return (__force u16)val;
> +}
> +
> +static inline __virtio16 __cpu_to_virtio16(bool little_endian, u16 val)
> +{
> +   if (little_endian)
> +   return (__force __virtio16)cpu_to_le16(val);
> +   else
> +   return (__force __virtio16)val;
> +}
> +
> +static inline u32 __virtio32_to_cpu(bool little_endian, __virtio32 val)
> +{
> +   if (little_endian)
> +   return le32_to_cpu((__force __le32)val);
> +   else
> +   return (__force u32)val;
> +}
> +
> +static inline __virtio32 __cpu_to_virtio32(bool little_endian, u32 val)
> +{
> +   if (little_endian)
> +   return (__force __virtio32)cpu_to_le32(val);
> +   else
> +   return (__force __virtio32)val;
> +}
> +
> +static inline u64 __virtio64_to_cpu(bool little_endian, __virtio64 val)
> +{
> +   if (little_endian)
> +   return le64_to_cpu((__force __le64)val);
> +   else
> +   return (__force u64)val;
> +}
> +
> +static inline __virtio64 __cpu_to_virtio64(bool little_endian, u64 val)
> +{
> +   if (little_endian)
> +   return (__force __virtio64)cpu_to_le64(val);
> +   else
> +   return (__force __virtio64)val;
> +}
> +

Nitpicking, could remove the else for the all above functions and
align the return appropriately ?

Apart from that patch looks good.

Acked-by: Lad, Prabhakar 

Thanks,
--Prabhakar Lad
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next] virtio-net: don't do header check for dodgy gso packets

2014-12-02 Thread Jason Wang
There's no need to do header check for virito-net since:

- Host set dodgy for all gso packets from guest and check the header.
- Host should prepare for all kinds of evil packets from guest, since
  malicious guest can send any kinds of packet.

So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.

Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b0bc8ea..4cd242b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1760,6 +1760,8 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
dev->hw_features |= NETIF_F_TSO_ECN;
 
+   dev->features |= NETIF_F_GSO_ROBUST;
+
if (gso)
dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
/* (!csum && gso) case will be fixed by register_netdev() */
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization