On Wed, May 8, 2013 at 8:47 PM, Konrad Rzeszutek Wilk <konrad.w...@oracle.com> wrote: > On Wed, May 08, 2013 at 11:07:33AM -0700, H. Peter Anvin wrote: >> On 05/08/2013 10:20 AM, Konrad Rzeszutek Wilk wrote: >> > >> > If I am reading you right the #1 issue is that you don't know whether >> > a certain paravirt instruction has any side-effects and as such you don't >> > feel that you can treat it like an equivalent instruction that is defined >> > in the Intel SDM? >> > >> > And that means that any development work you have in the pipeline is >> > affected because you don't have the documentation on hand and are unsure >> > whether you will break something? >> > >> >> That is, indeed, the #1 issue (and you and I have discussed it at >> length, obviously.) >
First of thank you for enumerating the issues and clarifying them. My understanding was (until this email conversation) was that the lack of the hypercall docs was hindering you. However you are not interested in that - you are after the documentation for the paravirt interface. Specifically how it differs from what baremetal would do. That changes my priorities! In that case lguest provides an excellent source of documentation on how the paravirt interfaces should work. It is almost like reading a book. Perhaps moving them to paravirt.h will help. There are of course some differences between the platforms that use it - but that would help identify them. >> There are a few other issues: >> >> 2. some of the paravirt_ops are plain wrong. Most of the really big >> problems are in the MMU-related ones, but as an easily-explained example >> that I ran into the other day: >> >> read_cr4_safe() assumes that there is no useful distinction between "cr4 >> is zero" and "cr4 doesn't exist". Unfortunately, this is an invalid >> assumption. It would be a five-minute fix in the normal case, but since >> it is paravirtualized, fixing it involves grokking the semantics of each >> PV layer, including any of the hypercalls that may be involved. >> >> In this particular example I think the answer is actually reasonable >> simple, because I don't think any of the hypervisors support running on >> pre-cr4 hardware (basically 486 at this point.) > >From my understanding the paravirt interface was a shim for three platforms (VMware, Xen and lguest). And the API was choosen to be broad to encompass everything. That was then and nowadays Xen and lguest are the top users. And there are things that don't make sense anymore (like the example above). Your specific example is interesting as a quick 10 second lookup confirmed that the implementation for different paravirt platforms all point to native_read_cr4(). There are no PV hypercalls involved. Axe! We had a public discussion on this and I think determined that the paravit_ops can benefit from a diet. Certainly the [read|write]_msr_safe that was introduced by Borislav Petkov per your suggestion was a good candidate. So was the store_gdt. I think the read_tsc and this one (read_cr4) can be slashed too. Not sure about the idt ones but I hadn't done a careful examination of that. I hope it is clear that there is no resistance from me in this diet-program. The issue I face to help you are the same you do: priority - those merge windows, darn bugs, regressions, etc. And until this conversation I was under the impression the documentation for hypercalls was #1 - but that is more of a #3 (#2 being the diet program, #1 being the paravirt documentation). >> >> 3. "Let's add another hook" becomes a far too easy solution to new problems. > I would hope that with your help in brainstorming these problems one can figure out that right way of doing it. Perhaps instead of posting the patches first, one can solicit you first for help to design it properly from the start? > >> >> 4. Performance and maintainability impact of having to support multiple >> code flows with different semantics. The semantics of the Xen MMU, in >> particular, is actually quite different from the x86 MMU. > You along with Thomas had designed a nice way of not only making this code nicely but also cleaning up the code. And I hope we all learned more about the Xen RO/pinning/pagetables protection mechanism. The semantics are understood now. In terms of development however, I recall that you as a excellent maintainer was able to delegate the work to an Oracle employee and everybody pitched in to help review the code and test. We did not test a forced branch which had some extra patches and that bit us all in the merge window. The after effect is quite nice code and design. > > >> >> 5. Performance and maintainability impact of a maze of twisty little >> functions, all different. For example, in the case of some of the MSR >> functions, we actually end up telling the compiler to combine and break >> up the two 32-bit halves into a 64-bit register multiple times, because >> the functions don't actually match up. I still don't understand why we >> don't just patch out the rdmsr/wrmsr instructions, using those registers. My understanding is that there are two variants of paravirt patching: inline patching (for a selective bunch of them), and patch in a call. The inline patching is the more efficient one and the functions that were choosen for this were the performance sensitive. I think at the time this was designed the rdmsr/wrmsr instructions were not terribly fast. I think that some of these paravirt interfaces could be optimized now. One idea that I had been toying around and asked the ksplice team to prototype was the usage of the asm goto mechanism. It only did it for the pte_modify call but I never got to run the performance numbers on all the different architectures. In summary I think (and please correct me if I mis-understood you): - The paravirt interfaces documentation should be #1 priority, - Fixes, diet and optimization for paravirt infrastructure is needed. And since we don't want to hinder your development and I would need to reschedule work-items I have to ask - what are the development work that is hindering this? So I have an idea of the timelines (if this needs to be an NDA discussion, pls email me privately your manager's email so I can start the discussion on that). Lastly, a bit seperate topic, but close, I thought that the paravirt interface was under x86 maintainers ownership, but the MAINTAINERS file says otherwise! Jeremy, Rusty, are you OK with this. Rusty, would you have the time to review the patches or would it be easier if I was the sole person ? >From f622669a3b6e05d2962e3886ef8b46add18b1e6c Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Date: Fri, 10 May 2013 10:41:22 -0400 Subject: [PATCH] MAINTAINERS: Become the maintainer of the paravirt interface. Somehow I thought the x86 team owned it. Time to step up and maintain this and put the paravirt interface on a diet! CC: Jeremy Fitzhardinge <jer...@goop.org> CC: Chris Wright <chr...@sous-sol.org> CC: Alok Kataria <akata...@vmware.com> CC: Rusty Russell <ru...@rustcorp.com.au> CC: virtualizat...@lists.linux-foundation.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> --- MAINTAINERS | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3d7782b..7e24105 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6044,10 +6044,8 @@ F: drivers/char/ppdev.c F: include/uapi/linux/ppdev.h PARAVIRT_OPS INTERFACE -M: Jeremy Fitzhardinge <jer...@goop.org> -M: Chris Wright <chr...@sous-sol.org> -M: Alok Kataria <akata...@vmware.com> M: Rusty Russell <ru...@rustcorp.com.au> +M: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> L: virtualizat...@lists.linux-foundation.org S: Supported F: Documentation/ia64/paravirt_ops.txt -- 1.8.1.4 _______________________________________________ Lguest mailing list Lguest@lists.ozlabs.org https://lists.ozlabs.org/listinfo/lguest