Alexander Graf wrote: > > On Feb 25, 2008, at 6:40 PM, Avi Kivity wrote: > >> Alexander Graf wrote: >>> >>> The ebx store was done because of PIC code, which does not allow ebx >>> to get clobbered. If we are not in PIC code, =r contains ebx as GPR >>> though, so the assumption that ebx needs to be restored was wrong >>> then. This new version only enables the store/restore code if i386 >>> and PIC code are used. There is no need to distinguish between >>> x86_64 and i386 for the other cases. >>> >>> So does this version work? >>> >> >> It probably will, but it seems fragile to depend on the details of >> PIC. I committed something more generic: >> >> #ifdef __x86_64__ >> asm volatile("cpuid" >> : "=a"(vec[0]), "=b"(vec[1]), >> "=c"(vec[2]), "=d"(vec[3]) >> : "0"(function) : "cc"); > > This code works fine for all targets, including i386. With PIC > enabled, gcc registers the ebx registers and complains about this, > thus errors out. This is the only special case I am aware of, so I > doubt we should treat any case different from the "normal" case but > the PIC one. > >> >> #else >> asm volatile("pusha \n\t" >> >> "cpuid \n\t" >> "mov %%eax, 0(%1) \n\t" >> "mov %%ebx, 4(%1) \n\t" >> "mov %%ecx, 8(%1) \n\t" >> "mov %%edx, 12(%1) \n\t" >> >> "popa" >> : "a"(function), "S"(vec) : "memory", "cc"); >> #endif > > Basically #ifdef __x86_64__ is even wrong, as the problem is not that > too many registers are being used, but that ebx is reserved and can't > be saved/restored automatically. > > Furthermore I believe that the less assembler is used, the better the > code looks. So for cases the snippet above is not required, why use > it? Overusing assembler is imho exactly the reason the previous code > broke. >
I agree with all of this, but I think this case is an exception. gcc doesn't behave well with many register constraints on i386 and the PIC case shows things are not straightforward. I want something I can forget about. > There's one more thing I'd like to add here. Gcc optimizes really well > when one lets it to. So for this exact case with -O2 used, there are > no memory accesses. The vector is simply stored in 4 registers and > thus no more movs are required. > Again I agree, but host_cpuid() is hardly an optimization target. you can add a usleep(10000) there with no noticable effect. btw the cpuid instruction execution time itself will likely overwhelm any instructions around it (since it is microcoded). -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel