Hello Avi,
About this VMCSINFO patch, we really need this functionality in our development.
And YOSHIDA Masanori([email protected]), the developer from
Hitachi,
has said they need this too. So could you please tell us why the patch is
unacceptable?
You dislike the whole export-VMCSINFO-thing in all, or you just dislike the way
we implement the path? Finally do you have any suggestion about all this?
Below is why we need this patch and how we will use this patch in our
development.
We once came to an abnormal situation: a host scheduler bug caused guest
machine's
vcpu stopped for a long time and then led to heartbeat stop (host is still
running).
We want to have an efficient way to make the bug analysis when we come to the
similar
situations where guest machine doesn't work well due to something of host
machine's.
Actually, these situations have happened many times, in particular, under
development.
So here comes the requirement:
If we want to find the root cause, we should debug both host machine's and guest
machine's sides. But first we should get both host machine's crash dump and
guest
machine's crash dump and they must be dumped at the same time when the abnormal
situation remains. So the only way to do this is to panic the host with the
abnormal
guest running on it and then the guest's image is contained in host's crash
dump.
Logically, retrieving guest's crash dump from the host's crash dump is the very
important step to accomplish our goal. Unfortunately, in kvm implementation,
some
registers' values of the guest are hidden in vmcs, and vmcs internal is hidden
by
Intel. If we could not retrieve these registers from the vmcs, the guest crash
dump
we make is incomplete, and some key information is lost when we analyse the
guest
crash dump.
So we make this patch to export the vmcs internal. With the patch applied, we
could write registers' values stored in vmcs into guest's crash dump. And that's
what we want.
If a bug was found on customer's environment, we have two ways to avoid
affecting other guest machines running on the same host. First, we could do bug
analysis on another environment to reproduce the buggy situation; Second, we
could migrate other guest machines to other hosts.
After the abnormal situation is reproduced, we panic the host *manually*.
Then we could use userland tools to get guest machine's crash dump from host
machine's
with the feature provided by this patch. Finally we could analyse them
separately
to find which side causes the problem.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html