Re: [PATCH 0/2] virtio_balloon: do not change memory amount visible via /proc/meminfo

2015-08-31 Thread Denis V. Lunev

On 08/20/2015 12:49 AM, Denis V. Lunev wrote:

Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.

Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.

Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.

Please note that neither VMWare ballon nor HyperV balloon do not care
about proper handling of adjust_managed_page_count at all.

Signed-off-by: Denis V. Lunev 
CC: Michael S. Tsirkin 

ping

Michael, the issue is important for us. Can you pls look?

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


rfc: vhost user enhancements for vm2vm communication

2015-08-31 Thread Michael S. Tsirkin
Hello!
During the KVM forum, we discussed supporting virtio on top
of ivshmem. I have considered it, and came up with an alternative
that has several advantages over that - please see below.
Comments welcome.

-

Existing solutions to userspace switching between VMs on the
same host are vhost-user and ivshmem.

vhost-user works by mapping memory of all VMs being bridged into the
switch memory space.

By comparison, ivshmem works by exposing a shared region of memory to all VMs.
VMs are required to use this region to store packets. The switch only
needs access to this region.

Another difference between vhost-user and ivshmem surfaces when polling
is used. With vhost-user, the switch is required to handle
data movement between VMs, if using polling, this means that 1 host CPU
needs to be sacrificed for this task.

This is easiest to understand when one of the VMs is
used with VF pass-through. This can be schematically shown below:

+-- VM1 --++---VM2---+
| virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- NIC
+-++-+


With ivshmem in theory communication can happen directly, with two VMs
polling the shared memory region.


I won't spend time listing advantages of vhost-user over ivshmem.
Instead, having identified two advantages of ivshmem over vhost-user,
below is a proposal to extend vhost-user to gain the advantages
of ivshmem.


1: virtio in guest can be extended to allow support
for IOMMUs. This provides guest with full flexibility
about memory which is readable or write able by each device.
By setting up a virtio device for each other VM we need to
communicate to, guest gets full control of its security, from
mapping all memory (like with current vhost-user) to only
mapping buffers used for networking (like ivshmem) to
transient mappings for the duration of data transfer only.
This also allows use of VFIO within guests, for improved
security.

vhost user would need to be extended to send the
mappings programmed by guest IOMMU.

2. qemu can be extended to serve as a vhost-user client:
remote VM mappings over the vhost-user protocol, and
map them into another VM's memory.
This mapping can take, for example, the form of
a BAR of a pci device, which I'll call here vhost-pci - 
with bus address allowed
by VM1's IOMMU mappings being translated into
offsets within this BAR within VM2's physical
memory space.

Since the translation can be a simple one, VM2
can perform it within its vhost-pci device driver.

While this setup would be the most useful with polling,
VM1's ioeventfd can also be mapped to
another VM2's irqfd, and vice versa, such that VMs
can trigger interrupts to each other without need
for a helper thread on the host.


The resulting channel might look something like the following:

+-- VM1 --+  +---VM2---+
| virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
+-+  +-+

comparing the two diagrams, a vhost-user thread on the host is
no longer required, reducing the host CPU utilization when
polling is active.  At the same time, VM2 can not access all of VM1's
memory - it is limited by the iommu configuration setup by VM1.


Advantages over ivshmem:

- more flexibility, endpoint VMs do not have to place data at any
  specific locations to use the device, in practice this likely
  means less data copies.
- better standardization/code reuse
  virtio changes within guests would be fairly easy to implement
  and would also benefit other backends, besides vhost-user
  standard hotplug interfaces can be used to add and remove these
  channels as VMs are added or removed.
- migration support
  It's easy to implement since ownership of memory is well defined.
  For example, during migration VM2 can notify hypervisor of VM1
  by updating dirty bitmap each time is writes into VM1 memory.

Thanks,

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


CFP: 8th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2015 -- co-located with IEEE/ACM Supercomputing/SC 2015

2015-08-31 Thread Ioan Raicu

Call for Papers

---
The 8th Workshop on Many-Task Computing on Clouds, Grids, and 
Supercomputers (MTAGS) 2015

http://datasys.cs.iit.edu/events/MTAGS15/
---
November 15th, 2015
Austin, Texas, USA

Co-located with with IEEE/ACM International Conference for
High Performance Computing, Networking, Storage and Analysis (SC15)

===
The 8th workshop on Many-Task Computing on Clouds, Grids, and 
Supercomputers (MTAGS) will provide the scientific community a dedicated 
forum for presenting new research, development, and deployment efforts 
of large-scale many-task computing (MTC) applications on large scale 
clusters, Grids, Supercomputers, and Cloud Computing infrastructure. 
MTC, the theme of the workshop encompasses loosely coupled applications, 
which are generally composed of many tasks (both independent and 
dependent tasks) to achieve some larger application goal.  This workshop 
will cover challenges that can hamper efficiency and utilization in 
running applications on large-scale systems, such as local resource 
manager scalability and granularity, efficient utilization of raw 
hardware, parallel file system contention and scalability, data 
management, I/O management, reliability at scale, and application 
scalability. We welcome paper submissions on all theoretical, 
simulations, and systems topics related to MTC, but we give special 
consideration to papers addressing petascale to exascale challenges. 
Papers will be peer-reviewed for novelty, scientific merit, and scope 
for the workshop. The workshop will be co-located with the IEEE/ACM 
Supercomputing 2015 Conference in Austin Texas on November 15th, 2015. 
For more information, please see http://datasys.cs.iit.edu/events/MTAGS15/.


For more information on past workshops, please see MTAGS14, MTAGS13, 
MTAGS12, MTAGS11, MTAGS10, MTAGS09, and MTAGS08. We also ran a Special 
Issue on Many-Task Computing in the IEEE Transactions on Parallel and 
Distributed Systems (TPDS) which appeared in June 2011; the proceedings 
can be found online at 
http://www.computer.org/portal/web/csdl/abs/trans/td/2011/06/ttd201106toc.htm. 
We are also currently assembling a new special issue in the IEEE 
Transaction on Cloud Computing on Many-Task Computing in the Cloud, see 
 http://datasys.cs.iit.edu/events/TCC-MTC15/index.html for more 
information. We, the workshop organizers, also published a highly 
relevant paper that defines Many-Task Computing which was published in 
MTAGS08,titled “Many-Task Computing for Grids and Supercomputers”; we 
encourage potential authors to read this paper, and to clearly 
articulate in your paper submissions how your papers are related to 
Many-Task Computing.



Topics
---
We invite the submission of original work that is related to the topics 
below. The papers should be 6 pages, including all figures and 
references.We aim to cover topics related to Many-Task Computing on each 
of the three major distributed systems paradigms, Cloud Computing, Grid 
Computing and Supercomputing. Topics of interest include:

* Compute Resource Management
  * Scheduling
  * Job execution frameworks
  * Local resource manager extensions
  * Performance evaluation of resource managers in use on large scale 
systems

  * Dynamic resource provisioning
  * Techniques to manage many-core resources and/or GPUs
  * Challenges and opportunities in running many-task workloads on HPC 
systems
  * Challenges and opportunities in running many-task workloads on 
Cloud Computing infrastructure

* Storage architectures and implementations
  * Distributed file systems
  * Parallel file systems
  * Distributed meta-data management
  * Content distribution systems for large data
  * Data caching frameworks and techniques
  * Data management within and across data centers
  * Data-aware scheduling
  * Data-intensive computing applications
  * Eventual-consistency storage usage and management
* Programming models and tools
  * Map-reduce and its generalizations
  * Many-task computing middleware and applications
  * Parallel programming frameworks
  * Ensemble MPI techniques and frameworks
  * Service-oriented science applications
* Large-Scale Workflow Systems
  * Workflow system performance and scalability analysis
  * Scalability of workflow systems
  * Workflow infrastructure and e-Science middleware
  * Programming Paradigms and Models
* Large-Scale Many-Task Applications
  * High-throughput computing (HTC) applications
  * Data-intensive applications
  * Quasi-supercomputing applications, deployments, and experiences
  * Performance Evaluation
* Performance evaluation
  * Real systems
  * Simulations
  * Reliability of large systems

Re: rfc: vhost user enhancements for vm2vm communication

2015-08-31 Thread Nakajima, Jun
On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin  wrote:
> Hello!
> During the KVM forum, we discussed supporting virtio on top
> of ivshmem. I have considered it, and came up with an alternative
> that has several advantages over that - please see below.
> Comments welcome.

Hi Michael,

I like this, and it should be able to achieve what I presented at KVM
Forum (vhost-user-shmem).
Comments below.

>
> -
>
> Existing solutions to userspace switching between VMs on the
> same host are vhost-user and ivshmem.
>
> vhost-user works by mapping memory of all VMs being bridged into the
> switch memory space.
>
> By comparison, ivshmem works by exposing a shared region of memory to all VMs.
> VMs are required to use this region to store packets. The switch only
> needs access to this region.
>
> Another difference between vhost-user and ivshmem surfaces when polling
> is used. With vhost-user, the switch is required to handle
> data movement between VMs, if using polling, this means that 1 host CPU
> needs to be sacrificed for this task.
>
> This is easiest to understand when one of the VMs is
> used with VF pass-through. This can be schematically shown below:
>
> +-- VM1 --++---VM2---+
> | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- 
> NIC
> +-++-+
>
>
> With ivshmem in theory communication can happen directly, with two VMs
> polling the shared memory region.
>
>
> I won't spend time listing advantages of vhost-user over ivshmem.
> Instead, having identified two advantages of ivshmem over vhost-user,
> below is a proposal to extend vhost-user to gain the advantages
> of ivshmem.
>
>
> 1: virtio in guest can be extended to allow support
> for IOMMUs. This provides guest with full flexibility
> about memory which is readable or write able by each device.

I assume that you meant VFIO only for virtio by "use of VFIO".  To get
VFIO working for general direct-I/O (including VFs) in guests, as you
know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt
remapping table on x86 (i.e. nested VT-d).

> By setting up a virtio device for each other VM we need to
> communicate to, guest gets full control of its security, from
> mapping all memory (like with current vhost-user) to only
> mapping buffers used for networking (like ivshmem) to
> transient mappings for the duration of data transfer only.

And I think that we can use VMFUNC to have such transient mappings.

> This also allows use of VFIO within guests, for improved
> security.
>
> vhost user would need to be extended to send the
> mappings programmed by guest IOMMU.

Right. We need to think about cases where other VMs (VM3, etc.) join
the group or some existing VM leaves.
PCI hot-plug should work there (as you point out at "Advantages over
ivshmem" below).

>
> 2. qemu can be extended to serve as a vhost-user client:
> remote VM mappings over the vhost-user protocol, and
> map them into another VM's memory.
> This mapping can take, for example, the form of
> a BAR of a pci device, which I'll call here vhost-pci -
> with bus address allowed
> by VM1's IOMMU mappings being translated into
> offsets within this BAR within VM2's physical
> memory space.

I think it's sensible.

>
> Since the translation can be a simple one, VM2
> can perform it within its vhost-pci device driver.
>
> While this setup would be the most useful with polling,
> VM1's ioeventfd can also be mapped to
> another VM2's irqfd, and vice versa, such that VMs
> can trigger interrupts to each other without need
> for a helper thread on the host.
>
>
> The resulting channel might look something like the following:
>
> +-- VM1 --+  +---VM2---+
> | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> +-+  +-+
>
> comparing the two diagrams, a vhost-user thread on the host is
> no longer required, reducing the host CPU utilization when
> polling is active.  At the same time, VM2 can not access all of VM1's
> memory - it is limited by the iommu configuration setup by VM1.
>
>
> Advantages over ivshmem:
>
> - more flexibility, endpoint VMs do not have to place data at any
>   specific locations to use the device, in practice this likely
>   means less data copies.
> - better standardization/code reuse
>   virtio changes within guests would be fairly easy to implement
>   and would also benefit other backends, besides vhost-user
>   standard hotplug interfaces can be used to add and remove these
>   channels as VMs are added or removed.
> - migration support
>   It's easy to implement since ownership of memory is well defined.
>   For example, during migration VM2 can notify hypervisor of VM1
>   by updating dirty bitmap each time is writes into VM1 memory.

Also, the ivshmem functionality could be implemented by this proposal:
- vswitch (or some VM) allocates memory regions in its address space, and
- it sets up

RE: [Qemu-devel] rfc: vhost user enhancements for vm2vm communication

2015-08-31 Thread Varun Sethi
Hi Michael,
When you talk about VFIO in guest, is it with a purely emulated IOMMU in Qemu?
Also, I am not clear on the following points:
1. How transient memory would be mapped using BAR in the backend VM
2. How would the backend VM update the dirty page bitmap for the frontend VM

Regards
Varun

> -Original Message-
> From: qemu-devel-bounces+varun.sethi=freescale@nongnu.org
> [mailto:qemu-devel-bounces+varun.sethi=freescale@nongnu.org] On
> Behalf Of Nakajima, Jun
> Sent: Monday, August 31, 2015 1:36 PM
> To: Michael S. Tsirkin
> Cc: virtio-...@lists.oasis-open.org; Jan Kiszka;
> claudio.font...@huawei.com; qemu-de...@nongnu.org; Linux
> Virtualization; opnfv-tech-disc...@lists.opnfv.org
> Subject: Re: [Qemu-devel] rfc: vhost user enhancements for vm2vm
> communication
> 
> On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin 
> wrote:
> > Hello!
> > During the KVM forum, we discussed supporting virtio on top of
> > ivshmem. I have considered it, and came up with an alternative that
> > has several advantages over that - please see below.
> > Comments welcome.
> 
> Hi Michael,
> 
> I like this, and it should be able to achieve what I presented at KVM Forum
> (vhost-user-shmem).
> Comments below.
> 
> >
> > -
> >
> > Existing solutions to userspace switching between VMs on the same host
> > are vhost-user and ivshmem.
> >
> > vhost-user works by mapping memory of all VMs being bridged into the
> > switch memory space.
> >
> > By comparison, ivshmem works by exposing a shared region of memory to
> all VMs.
> > VMs are required to use this region to store packets. The switch only
> > needs access to this region.
> >
> > Another difference between vhost-user and ivshmem surfaces when
> > polling is used. With vhost-user, the switch is required to handle
> > data movement between VMs, if using polling, this means that 1 host
> > CPU needs to be sacrificed for this task.
> >
> > This is easiest to understand when one of the VMs is used with VF
> > pass-through. This can be schematically shown below:
> >
> > +-- VM1 --++---VM2---+
> > | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- 
> > NIC
> > +-++-+
> >
> >
> > With ivshmem in theory communication can happen directly, with two VMs
> > polling the shared memory region.
> >
> >
> > I won't spend time listing advantages of vhost-user over ivshmem.
> > Instead, having identified two advantages of ivshmem over vhost-user,
> > below is a proposal to extend vhost-user to gain the advantages of
> > ivshmem.
> >
> >
> > 1: virtio in guest can be extended to allow support for IOMMUs. This
> > provides guest with full flexibility about memory which is readable or
> > write able by each device.
> 
> I assume that you meant VFIO only for virtio by "use of VFIO".  To get VFIO
> working for general direct-I/O (including VFs) in guests, as you know, we
> need to virtualize IOMMU (e.g. VT-d) and the interrupt remapping table on
> x86 (i.e. nested VT-d).
> 
> > By setting up a virtio device for each other VM we need to communicate
> > to, guest gets full control of its security, from mapping all memory
> > (like with current vhost-user) to only mapping buffers used for
> > networking (like ivshmem) to transient mappings for the duration of
> > data transfer only.
> 
> And I think that we can use VMFUNC to have such transient mappings.
> 
> > This also allows use of VFIO within guests, for improved security.
> >
> > vhost user would need to be extended to send the mappings programmed
> > by guest IOMMU.
> 
> Right. We need to think about cases where other VMs (VM3, etc.) join the
> group or some existing VM leaves.
> PCI hot-plug should work there (as you point out at "Advantages over
> ivshmem" below).
> 
> >
> > 2. qemu can be extended to serve as a vhost-user client:
> > remote VM mappings over the vhost-user protocol, and map them into
> > another VM's memory.
> > This mapping can take, for example, the form of a BAR of a pci device,
> > which I'll call here vhost-pci - with bus address allowed by VM1's
> > IOMMU mappings being translated into offsets within this BAR within
> > VM2's physical memory space.
> 
> I think it's sensible.
> 
> >
> > Since the translation can be a simple one, VM2 can perform it within
> > its vhost-pci device driver.
> >
> > While this setup would be the most useful with polling, VM1's
> > ioeventfd can also be mapped to another VM2's irqfd, and vice versa,
> > such that VMs can trigger interrupts to each other without need for a
> > helper thread on the host.
> >
> >
> > The resulting channel might look something like the following:
> >
> > +-- VM1 --+  +---VM2---+
> > | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> > +-+  +-+
> >
> > comparing the two diagrams, a vhost-user thread on the host is no
> > longer required, reducing the host CPU u