Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Thu, 30 Aug 2012 19:49:23 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 31, 2012 at 01:09:56AM +0900, Takuya Yoshikawa wrote: On Thu, 30 Aug 2012 16:21:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: +static u32 apic_read_reg(int reg_off, void *bitmap) +{ + return *((u32 *)(bitmap + reg_off)); +} + Contrast with apic_set_reg which gets apic, add fact that all callers invoke REG_POS and you will see this is a bad API. I played with some APIs but in the end it's probably better to just open-code this. I don't mind open-coding this. As a bonus, open-coding will avoid the need for cast above, which is good: casts make code more fragile. But I still don't understand why we can eliminate casting: u32 reg_val; reg_val = *((u32 *)(bitmap + REG_POS(vec))); if (reg_val) return __fls(reg_val) + vec; (I'm not sure compilers are allowed to push out the value and do multiple references for this code as explained in https://lwn.net/Articles/508991/ So you *were* talking about concurrency? Yes and no, please see below. And you expect to solve it somehow without barriers explicit or implicit? What I want to make clear is that the value we pass to __fls() is not zero, not any more, to avoid undefined behaviour. So as you showed below, if the value passed to __fls() is exactly from the register, which we did non-zero check, that's fine. Barriers are not related here. But as can be seen in the last part of the article above, that's may theoretically not be guranteed? Anyway, I'm now thinking that we do not care about such things here, and can just follow your advice, yes? ) If you mean u32 *reg; reg = bitmap + REG_POS(vec); if (*reg) return __fls(*reg) + vec; yes I'm still not confident if this is a good style. I rarely see code doing if (*p) __fls(*p); This looks like explicite multiple references: I'm not saying this will actually be compiled to do multiple references. Thanks, Takuya It's just weird. Both versions are exactly equivalent in C. Adding a temporary changes *nothing* so the best readability wins. And IMHO, a version that does not cast wins hands down. I did a small test just to give you an example: Thank you for the example. What you showed is what I wanted to mean by I'm not saying this will actually be compiled to ... Thanks, Takuya [mst@robin ~]$ cat a.c int foo(void *bitmap) { unsigned *reg; reg = bitmap + 4; if (*reg) return *reg + 1; return -1; } [mst@robin ~]$ cat b.c int foo(void *bitmap) { unsigned reg; reg = *((unsigned *)(bitmap + 4)); if (reg) return reg + 1; return -1; } [mst@robin ~]$ gcc -O2 -c a.c [mst@robin ~]$ gcc -O2 -c b.c [mst@robin ~]$ objdump -ld a.o a.o: file format elf32-i386 Disassembly of section .text: foo: foo(): 0: 8b 44 24 04 mov0x4(%esp),%eax 4: 8b 50 04mov0x4(%eax),%edx 7: b8 ff ff ff ff mov$0x,%eax c: 8d 4a 01lea0x1(%edx),%ecx f: 85 d2 test %edx,%edx 11: 0f 45 c1cmovne %ecx,%eax 14: c3 ret [mst@robin ~]$ objdump -ld b.o b.o: file format elf32-i386 Disassembly of section .text: foo: foo(): 0: 8b 44 24 04 mov0x4(%esp),%eax 4: 8b 50 04mov0x4(%eax),%edx 7: b8 ff ff ff ff mov$0x,%eax c: 8d 4a 01lea0x1(%edx),%ecx f: 85 d2 test %edx,%edx 11: 0f 45 c1cmovne %ecx,%eax 14: c3 ret -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Wed, Sep 05, 2012 at 05:30:31PM +0900, Takuya Yoshikawa wrote: On Thu, 30 Aug 2012 19:49:23 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Aug 31, 2012 at 01:09:56AM +0900, Takuya Yoshikawa wrote: On Thu, 30 Aug 2012 16:21:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: +static u32 apic_read_reg(int reg_off, void *bitmap) +{ + return *((u32 *)(bitmap + reg_off)); +} + Contrast with apic_set_reg which gets apic, add fact that all callers invoke REG_POS and you will see this is a bad API. I played with some APIs but in the end it's probably better to just open-code this. I don't mind open-coding this. As a bonus, open-coding will avoid the need for cast above, which is good: casts make code more fragile. But I still don't understand why we can eliminate casting: u32 reg_val; reg_val = *((u32 *)(bitmap + REG_POS(vec))); if (reg_val) return __fls(reg_val) + vec; (I'm not sure compilers are allowed to push out the value and do multiple references for this code as explained in https://lwn.net/Articles/508991/ So you *were* talking about concurrency? Yes and no, please see below. And you expect to solve it somehow without barriers explicit or implicit? What I want to make clear is that the value we pass to __fls() is not zero, not any more, to avoid undefined behaviour. So as you showed below, if the value passed to __fls() is exactly from the register, which we did non-zero check, that's fine. Barriers are not related here. But as can be seen in the last part of the article above, that's may theoretically not be guranteed? It's not guaranteed if another thread can modify the bitmap. Is this the case here? If yes we need at least ACCESS_ONCE. Anyway, I'm now thinking that we do not care about such things here, and can just follow your advice, yes? Unless you see an issue with it ... ) If you mean u32 *reg; reg = bitmap + REG_POS(vec); if (*reg) return __fls(*reg) + vec; yes I'm still not confident if this is a good style. I rarely see code doing if (*p) __fls(*p); This looks like explicite multiple references: I'm not saying this will actually be compiled to do multiple references. Thanks, Takuya It's just weird. Both versions are exactly equivalent in C. Adding a temporary changes *nothing* so the best readability wins. And IMHO, a version that does not cast wins hands down. I did a small test just to give you an example: Thank you for the example. What you showed is what I wanted to mean by I'm not saying this will actually be compiled to ... Thanks, Takuya [mst@robin ~]$ cat a.c int foo(void *bitmap) { unsigned *reg; reg = bitmap + 4; if (*reg) return *reg + 1; return -1; } [mst@robin ~]$ cat b.c int foo(void *bitmap) { unsigned reg; reg = *((unsigned *)(bitmap + 4)); if (reg) return reg + 1; return -1; } [mst@robin ~]$ gcc -O2 -c a.c [mst@robin ~]$ gcc -O2 -c b.c [mst@robin ~]$ objdump -ld a.o a.o: file format elf32-i386 Disassembly of section .text: foo: foo(): 0: 8b 44 24 04 mov0x4(%esp),%eax 4: 8b 50 04mov0x4(%eax),%edx 7: b8 ff ff ff ff mov$0x,%eax c: 8d 4a 01lea0x1(%edx),%ecx f: 85 d2 test %edx,%edx 11: 0f 45 c1cmovne %ecx,%eax 14: c3 ret [mst@robin ~]$ objdump -ld b.o b.o: file format elf32-i386 Disassembly of section .text: foo: foo(): 0: 8b 44 24 04 mov0x4(%esp),%eax 4: 8b 50 04mov0x4(%eax),%edx 7: b8 ff ff ff ff mov$0x,%eax c: 8d 4a 01lea0x1(%edx),%ecx f: 85 d2 test %edx,%edx 11: 0f 45 c1cmovne %ecx,%eax 14: c3 ret -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Wed, 5 Sep 2012 12:26:49 +0300 Michael S. Tsirkin m...@redhat.com wrote: It's not guaranteed if another thread can modify the bitmap. Is this the case here? If yes we need at least ACCESS_ONCE. In this patch, using the wrapper function to read out a register value forces compilers not to do bad things. But I agree that it is not a good API. I would like to use fls() rather than ACCESS_ONCE if it's really needed. Anyway, I'm now thinking that we do not care about such things here, and can just follow your advice, yes? Unless you see an issue with it ... Although I read the code, I'm not sure. But this code is apparently not so critical for performance that we can simply use fls(). If I can remove likely() and use proper macros, that's enough for me. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Wed, Sep 05, 2012 at 06:40:26PM +0900, Takuya Yoshikawa wrote: On Wed, 5 Sep 2012 12:26:49 +0300 Michael S. Tsirkin m...@redhat.com wrote: It's not guaranteed if another thread can modify the bitmap. Is this the case here? If yes we need at least ACCESS_ONCE. In this patch, using the wrapper function to read out a register value forces compilers not to do bad things. It's not robust. Compiler is free to optimize it out, and it frequently does that. But I agree that it is not a good API. I would like to use fls() rather than ACCESS_ONCE if it's really needed. Sure, makes sense. Anyway, I'm now thinking that we do not care about such things here, and can just follow your advice, yes? Unless you see an issue with it ... Although I read the code, I'm not sure. But this code is apparently not so critical for performance that we can simply use fls(). Yes. I guess if it becomes critical we'll need to add a cache anyway. If I can remove likely() and use proper macros, that's enough for me. Me too. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Thu, Aug 30, 2012 at 09:30:19PM +0900, Takuya Yoshikawa wrote: find_highest_vector() and count_vectors(): - Instead of using magic values, define and use proper interfaces to access registers. find_highest_vector(): - Remove likely() which is there only for historical reasons and not doing correct branch predictions anymore. Using such heuristics to optimize this function is not worth it now. Let CPUs predict things instead. - Stop checking word[0] separately. This was only needed for doing likely() optimization. - Use __fls(), not fls(), since we have checked the value passed to it is not zero. - Use for loop, not while, to iterate over the register array to make the code clearer. Note that we actually confirmed that the likely() did wrong predictions by inserting debug code. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Michael S. Tsirkin m...@redhat.com --- arch/x86/kvm/lapic.c | 35 +++ 1 files changed, 23 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 18d149d..e44b8ab 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -66,6 +66,7 @@ #define APIC_DEST_NOSHORT0x0 #define APIC_DEST_MASK 0x800 #define MAX_APIC_VECTOR 256 +#define APIC_VECTORS_PER_REG 32 #define VEC_POS(v) ((v) (32 - 1)) #define REG_POS(v) (((v) 5) 4) @@ -78,6 +79,11 @@ static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val) *((u32 *) (apic-regs + reg_off)) = val; } +static u32 apic_read_reg(int reg_off, void *bitmap) +{ + return *((u32 *)(bitmap + reg_off)); +} + Contrast with apic_set_reg which gets apic, add fact that all callers invoke REG_POS and you will see this is a bad API. I played with some APIs but in the end it's probably better to just open-code this. As a bonus, open-coding will avoid the need for cast above, which is good: casts make code more fragile. static inline int apic_test_and_set_vector(int vec, void *bitmap) { return test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); @@ -208,25 +214,30 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = { static int find_highest_vector(void *bitmap) { - u32 *word = bitmap; - int word_offset = MAX_APIC_VECTOR 5; + int vec; + u32 reg; - while ((word_offset != 0) (word[(--word_offset) 2] == 0)) - continue; + for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG; + vec = 0; vec -= APIC_VECTORS_PER_REG) { + reg = apic_read_reg(REG_POS(vec), bitmap); + if (reg) + return __fls(reg) + vec; + } - if (likely(!word_offset !word[0])) - return -1; - else - return fls(word[word_offset 2]) - 1 + (word_offset 5); + return -1; } static u8 count_vectors(void *bitmap) { - u32 *word = bitmap; - int word_offset; + int vec; + u32 reg; u8 count = 0; - for (word_offset = 0; word_offset MAX_APIC_VECTOR 5; ++word_offset) - count += hweight32(word[word_offset 2]); + + for (vec = 0; vec MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) { + reg = apic_read_reg(REG_POS(vec), bitmap); + count += hweight32(reg); + } + return count; } -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Thu, 30 Aug 2012 16:21:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: +static u32 apic_read_reg(int reg_off, void *bitmap) +{ + return *((u32 *)(bitmap + reg_off)); +} + Contrast with apic_set_reg which gets apic, add fact that all callers invoke REG_POS and you will see this is a bad API. I played with some APIs but in the end it's probably better to just open-code this. I don't mind open-coding this. As a bonus, open-coding will avoid the need for cast above, which is good: casts make code more fragile. But I still don't understand why we can eliminate casting: u32 reg_val; reg_val = *((u32 *)(bitmap + REG_POS(vec))); if (reg_val) return __fls(reg_val) + vec; (I'm not sure compilers are allowed to push out the value and do multiple references for this code as explained in https://lwn.net/Articles/508991/ ) If you mean u32 *reg; reg = bitmap + REG_POS(vec); if (*reg) return __fls(*reg) + vec; I'm still not confident if this is a good style. I rarely see code doing if (*p) __fls(*p); This looks like explicite multiple references: I'm not saying this will actually be compiled to do multiple references. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
On Fri, Aug 31, 2012 at 01:09:56AM +0900, Takuya Yoshikawa wrote: On Thu, 30 Aug 2012 16:21:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: +static u32 apic_read_reg(int reg_off, void *bitmap) +{ + return *((u32 *)(bitmap + reg_off)); +} + Contrast with apic_set_reg which gets apic, add fact that all callers invoke REG_POS and you will see this is a bad API. I played with some APIs but in the end it's probably better to just open-code this. I don't mind open-coding this. As a bonus, open-coding will avoid the need for cast above, which is good: casts make code more fragile. But I still don't understand why we can eliminate casting: u32 reg_val; reg_val = *((u32 *)(bitmap + REG_POS(vec))); if (reg_val) return __fls(reg_val) + vec; (I'm not sure compilers are allowed to push out the value and do multiple references for this code as explained in https://lwn.net/Articles/508991/ So you *were* talking about concurrency? And you expect to solve it somehow without barriers explicit or implicit? ) If you mean u32 *reg; reg = bitmap + REG_POS(vec); if (*reg) return __fls(*reg) + vec; yes I'm still not confident if this is a good style. I rarely see code doing if (*p) __fls(*p); This looks like explicite multiple references: I'm not saying this will actually be compiled to do multiple references. Thanks, Takuya It's just weird. Both versions are exactly equivalent in C. Adding a temporary changes *nothing* so the best readability wins. And IMHO, a version that does not cast wins hands down. I did a small test just to give you an example: [mst@robin ~]$ cat a.c int foo(void *bitmap) { unsigned *reg; reg = bitmap + 4; if (*reg) return *reg + 1; return -1; } [mst@robin ~]$ cat b.c int foo(void *bitmap) { unsigned reg; reg = *((unsigned *)(bitmap + 4)); if (reg) return reg + 1; return -1; } [mst@robin ~]$ gcc -O2 -c a.c [mst@robin ~]$ gcc -O2 -c b.c [mst@robin ~]$ objdump -ld a.o a.o: file format elf32-i386 Disassembly of section .text: foo: foo(): 0: 8b 44 24 04 mov0x4(%esp),%eax 4: 8b 50 04mov0x4(%eax),%edx 7: b8 ff ff ff ff mov$0x,%eax c: 8d 4a 01lea0x1(%edx),%ecx f: 85 d2 test %edx,%edx 11: 0f 45 c1cmovne %ecx,%eax 14: c3 ret [mst@robin ~]$ objdump -ld b.o b.o: file format elf32-i386 Disassembly of section .text: foo: foo(): 0: 8b 44 24 04 mov0x4(%esp),%eax 4: 8b 50 04mov0x4(%eax),%edx 7: b8 ff ff ff ff mov$0x,%eax c: 8d 4a 01lea0x1(%edx),%ecx f: 85 d2 test %edx,%edx 11: 0f 45 c1cmovne %ecx,%eax 14: c3 ret -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html