On 05/11/2010 09:53 AM, Avi Kivity wrote:
On 05/11/2010 05:17 PM, Cam Macdonell wrote:
The master is the shared memory area. It's a completely separate
entity
that is represented by the backing file (or shared memory server
handing out
the fd to mmap). It can exists independently of any guest.
I think the master/peer idea would be necessary if we were sharing
guest memory (sharing guest A's memory with guest B). Then if the
master (guest A) dies, perhaps something needs to happen to preserve
the memory contents.
Definitely. But we aren't...
Then transparent live migration is impossible. IMHO, that's a
fundamental mistake that we will regret down the road.
But since we're sharing host memory, the
applications in the guests can race to determine the master by
grabbing a lock at offset 0 or by using lowest VM ID.
Looking at it another way, it is the applications using shared memory
that may or may not need a master, the Qemu processes don't need the
concept of a master since the memory belongs to the host.
Exactly. Furthermore, even in a master/slave relationship, there will
be different masters for different sub-areas, it would be a pity to
expose all this in the hardware abstraction. This way we have an
external device, and PCI HBAs which connect to it - just like a
multi-tailed SCSI disk.
To support transparent live migration, it's necessary to do two things:
1) Preserve the memory contents of the PCI BAR after disconnected from a
shared memory segment
2) Synchronize any changes made to the PCI BAR with the shared memory
segment upon reconnect/initial connection.
N.B. savevm/loadvm both constitute disconnect and reconnect events
respectively.
Supporting (1) is easy since we just need to memcpy() the contents of
the shared memory segment to a temporary RAM area upon disconnect.
Supporting (2) is easy when the shared memory segment is viewed as owned
by the guest since it has the definitive copy of the data. IMHO, this
is what role=master means. However, if we want to support a model where
the guest does not have a definitive copy of the data, upon reconnect,
we need to throw away the guest's changes and make the shared memory
segment appear to simultaneously update to the guest. This is what
role=peer means.
For role=peer, it's necessary to signal to the guest when it's not
connected. This means prior to savevm it's necessary to indicate to the
guest that it's been disconnected.
I think it's important that we build this mechanism in from the start
because as I've stated in the past, I don't think role=peer is going to
be the dominant use-case. I actually don't think that shared memory
between guests is all that interesting compared to shared memory to an
external process on the host.
Regards,
Anthony Liguori