Re: g++ 4.3, troubles with C++ indexing idioms
For performance small arrays should be the same as individual members (I can see the annoying fact that initialization is a headache - this has annoyed me as well). For larger arrays (4 members), aliasing will make a difference possibly, making the array variant slower. Any union variant is expected to be slower for aliasing reasons (we do not do field-sensitive aliasing for unions). I wonder if the fabled union as VIEW_CONVERT_EXPR patch would help... Richi, do you know? Paolo
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/25/07, Paolo Bonzini [EMAIL PROTECTED] wrote: For performance small arrays should be the same as individual members (I can see the annoying fact that initialization is a headache - this has annoyed me as well). For larger arrays (4 members), aliasing will make a difference possibly, making the array variant slower. Any union variant is expected to be slower for aliasing reasons (we do not do field-sensitive aliasing for unions). I wonder if the fabled union as VIEW_CONVERT_EXPR patch would help... Richi, do you know? I don't think so. As long as our memory optimizers are weak and only work well if they are faced with single SFT.xx VUSEs and VDEFs then using a union will be hit by create_variable_info_for not creating field variables for unions at all (and thus memory optimizers not be able to tell apart fields). Richard.
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/21/07, tbp [EMAIL PROTECTED] wrote: On 7/19/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course, if any then the array indexing variant is fixed. It would be nice to see a complete testcase with a pessimization, maybe you can file a bugreport about this? There's many issues for all alternatives and i'm not qualified to pinpoint them further. I've taken http://ompf.org/ray/sphereflake/ which is used as a benchmark already here http://www.suse.de/~gcctest/c++bench/raytracer/, because it's small, self contained and has such a basic 3 component class that's used all over. It doesn't use any kind of array access operator, but it's good enough to show the price one has to pay before even thinking of providing some. It has been adjusted to use floats and access members through accessors (to allow for a straighter comparison of all cases). variation 0 is the reference, a mere struct { float x,y,z; ...};, performs as good as the original, but wouldn't allow for any 'valid' indexing. variation 1 is struct { float f[3]; ... } variations 2,3,4,5 try to use some union # /usr/local/gcc-4.3-20070720/bin/g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr/local/gcc-4.3-20070720 --enable-languages=c,c++ --enable-threads=posix --disable-checking --disable-nls --disable-shared --disable-win32-registry --with-system-zlib --disable-multilib --verbose --with-gcc=gcc-4.2 --with-gnu-ld --with-gnu-as --enable-checking=none --disable-bootstrap Thread model: posix gcc version 4.3.0 20070720 (experimental) # make bench [snip] sf.v0 real0m3.963s user0m3.812s sys 0m0.152s sf.v1 real0m3.972s user0m3.864s sys 0m0.104s sf.v2 real0m10.384s user0m10.261s sys 0m0.120s sf.v3 real0m10.390s user0m10.289s sys 0m0.104s sf.v4 real0m10.388s user0m10.265s sys 0m0.124s sf.v5 real0m10.399s user0m10.281s sys 0m0.116s There's some inlining difference between union variations and the first two, but they clearly stand in their own league anyway. So we can only seriously consider the first two. Variation #0 would ask for invalid c++ (pointer arithmetic abuse, not an option anymore) or forbidding array access operator and going to set/get + memcpy, but pretty optimal. Variation #1 (straight array) is quite annoying in C++ (no initializer list, need to reformulate all access etc...) and already show some slight pessimization, but it's not easy to track. Apparently g++ got a bit better lately in this regard, or it's only blatant on larger data or more complex cases. I hope this shows how problematic it is for the end user. For performance small arrays should be the same as individual members (I can see the annoying fact that initialization is a headache - this has annoyed me as well). For larger arrays (4 members), aliasing will make a difference possibly, making the array variant slower. Any union variant is expected to be slower for aliasing reasons (we do not do field-sensitive aliasing for unions). In the end I would still recommend to go with array variants. Richard.
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/24/07, Richard Guenther [EMAIL PROTECTED] wrote: For performance small arrays should be the same as individual members (I can see the annoying fact that initialization is a headache - this has annoyed me as well). For larger arrays (4 members), aliasing will make a difference possibly, making the array variant slower. Any union variant is expected to be slower for aliasing reasons (we do not do field-sensitive aliasing for unions). Confirmed :) And thanks for the clue about the threshold. In the end I would still recommend to go with array variants. I guess wishful thinking, or heresy, got me asking for a sanctioned address-this-as-an-array idiom.; now i'll go with the flow and use those 2nd class citizens of C++ aka array, even if i'm a bit sceptical about the performance equivalence (granted it isn't as obvious as it used to be, i need to investigate some more). But for sure it can't be as terrible as unions...
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/19/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course, if any then the array indexing variant is fixed. It would be nice to see a complete testcase with a pessimization, maybe you can file a bugreport about this? There's many issues for all alternatives and i'm not qualified to pinpoint them further. I've taken http://ompf.org/ray/sphereflake/ which is used as a benchmark already here http://www.suse.de/~gcctest/c++bench/raytracer/, because it's small, self contained and has such a basic 3 component class that's used all over. It doesn't use any kind of array access operator, but it's good enough to show the price one has to pay before even thinking of providing some. It has been adjusted to use floats and access members through accessors (to allow for a straighter comparison of all cases). variation 0 is the reference, a mere struct { float x,y,z; ...};, performs as good as the original, but wouldn't allow for any 'valid' indexing. variation 1 is struct { float f[3]; ... } variations 2,3,4,5 try to use some union # /usr/local/gcc-4.3-20070720/bin/g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr/local/gcc-4.3-20070720 --enable-languages=c,c++ --enable-threads=posix --disable-checking --disable-nls --disable-shared --disable-win32-registry --with-system-zlib --disable-multilib --verbose --with-gcc=gcc-4.2 --with-gnu-ld --with-gnu-as --enable-checking=none --disable-bootstrap Thread model: posix gcc version 4.3.0 20070720 (experimental) # make bench [snip] sf.v0 real0m3.963s user0m3.812s sys 0m0.152s sf.v1 real0m3.972s user0m3.864s sys 0m0.104s sf.v2 real0m10.384s user0m10.261s sys 0m0.120s sf.v3 real0m10.390s user0m10.289s sys 0m0.104s sf.v4 real0m10.388s user0m10.265s sys 0m0.124s sf.v5 real0m10.399s user0m10.281s sys 0m0.116s There's some inlining difference between union variations and the first two, but they clearly stand in their own league anyway. So we can only seriously consider the first two. Variation #0 would ask for invalid c++ (pointer arithmetic abuse, not an option anymore) or forbidding array access operator and going to set/get + memcpy, but pretty optimal. Variation #1 (straight array) is quite annoying in C++ (no initializer list, need to reformulate all access etc...) and already show some slight pessimization, but it's not easy to track. Apparently g++ got a bit better lately in this regard, or it's only blatant on larger data or more complex cases. I hope this shows how problematic it is for the end user. // sphere flake bvh raytracer (c) 2005, thierry berger-perrin [EMAIL PROTECTED] // this code is released under the GNU Public License. // see http://ompf.org/ray/sphereflake/ // compile with ie g++ -O2 -ffast-math sphereflake.cc // usage: ./sphereflake [lvl=6] pix.ppm #include cmath #include iostream #include cstdlib #include limits #define GIMME_SHADOWS enum { childs = 9, ss= 2, ss_sqr = ss*ss }; /* not really tweakable anymore */ static const float infinity = std::numeric_limitsfloat::infinity(), epsilon = 1e-4f; #if VARIATION == 5 union v_t { // straight union; array left unharmed; just as horrible as the others. struct { float _x, _y, _z; }; float f[3]; v_t(const float a, const float b, const float c) : _x(a), _y(b), _z(c) {} float x() const { return _x; } float x() { return _x; } float y() const { return _y; } float y() { return _y; } float z() const { return _z; } float z() { return _z; } #else struct v_t { #endif #if VARIATION == 0 // best of the breed, but doesn't give way for an 'array access' operator. float _x, _y, _z; v_t(const float a, const float b, const float c) : _x(a), _y(b), _z(c) {} float x() const { return _x; } float x() { return _x; } float y() const { return _y; } float y() { return _y; } float z() const { return _z; } float z() { return _z; } #elif VARIATION == 1 // not as good, obvious 'array access' but forbids initializer lists float f[3]; v_t(const float a, const float b, const float c) { f[0] = a; f[1] = b; f[2] = c; } float x() const { return f[0]; } float x() { return f[0]; } float y() const { return f[1]; } float y() { return f[1]; } float z() const { return f[2]; } float z() { return f[2]; } #elif VARIATION == 2 // Richard Guenther's suggestion, worst of the worst. union { struct { float x, y, z; } a; float b[3]; } u; v_t(const float i, const float j, const float k) { u.a.x = i; u.a.y = j; u.a.z = k; } float x() const { return u.a.x; } float x() { return u.a.x; } float y() const { return u.a.y; } float y() { return u.a.y; } float z() const { return u.a.z; } float z() { return u.a.z; } #elif VARIATION == 3 // slightly better than variation #2, but still terrible. union { struct { float _x, _y, _z; }; float f[3]; }; v_t(const float a, const float b, const float c) : _x(a), _y(b), _z(c)
g++ 4.3, troubles with C++ indexing idioms
I have that usual heavy duty 3 fp components class that needs to be reasonably efficient and takes this form for g++ struct vec_t { float x,y,z; const float operator()(const uint_t i) const { return *(x + i); } float operator()(const uint_t i) { return *(x + i); } // -- guilty [snip ctors, operators related cruft] }; I use this notation because g++ does silly things with straight arrays (and C++ gets in the way), doesn't like union vec_t { struct { float x,y,z; }; float f[3]; const float operator()(const uint_t i) const { return m[i]; } float operator()(const uint_t i) { return m[i]; } }; much either, and seems to enjoy the first form (+ ctors with initializer lists) much. So far, so good. Alas, somewhere between gcc-4.3-20070608 (ok) and gcc-4.3-20070707 (not ok ever since), the non const indexing started to trigger bogus codegen with some skipped stores on x86-64, but of course only in convoluted situations. So, i can't produce a simple testcase. I can kludge around either by: . marking it __attribute__((noinline)) . turning it into a set operation doing a std::memcpy(x + i, f, sizeof(float)) . annoying the optimizer with the entertaining union vec_t { struct { float x,y,z; }; float f[3]; const float operator()(const uint_t i) const { return *(x + i); } float operator()(const uint_t i) { return *(x + i); } }; At this point i'd need some guidance from compiler developers because the compiler itself provides none (no warning whatsoever in any of those variations) and what i thought was acceptable apparently isn't anymore. What kind of idiom am i supposed to write such thing in to get back efficient and correct code?
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/19/07, tbp [EMAIL PROTECTED] wrote: I have that usual heavy duty 3 fp components class that needs to be reasonably efficient and takes this form for g++ struct vec_t { float x,y,z; const float operator()(const uint_t i) const { return *(x + i); } float operator()(const uint_t i) { return *(x + i); } // -- guilty [snip ctors, operators related cruft] }; I use this notation because g++ does silly things with straight arrays (and C++ gets in the way), doesn't like union vec_t { struct { float x,y,z; }; float f[3]; const float operator()(const uint_t i) const { return m[i]; } float operator()(const uint_t i) { return m[i]; } }; much either, and seems to enjoy the first form (+ ctors with initializer lists) much. So far, so good. Alas, somewhere between gcc-4.3-20070608 (ok) and gcc-4.3-20070707 (not ok ever since), the non const indexing started to trigger bogus codegen with some skipped stores on x86-64, but of course only in convoluted situations. So, i can't produce a simple testcase. I can kludge around either by: . marking it __attribute__((noinline)) . turning it into a set operation doing a std::memcpy(x + i, f, sizeof(float)) . annoying the optimizer with the entertaining union vec_t { struct { float x,y,z; }; float f[3]; const float operator()(const uint_t i) const { return *(x + i); } float operator()(const uint_t i) { return *(x + i); } }; At this point i'd need some guidance from compiler developers because the compiler itself provides none (no warning whatsoever in any of those variations) and what i thought was acceptable apparently isn't anymore. What kind of idiom am i supposed to write such thing in to get back efficient and correct code? Well, I always used the array variant, but you should be able to do struct vec { ... union { struct { float x, y, z; } a; float x[3] b; } u; } if you need to (why does the array form not work for you?) Richard.
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/19/07, Richard Guenther [EMAIL PROTECTED] wrote: Well, I always used the array variant, but you should be able to do [snip] if you need to (why does the array form not work for you?) Because if you bench in some non trivial program, on x86/x86-64 at least, those variations (struct { float x,y,z; }, struct { float f[3]; } and some additional union layer) the last 2 consistently come out as slower. In the array case addressing seems to be the main issue (redundant scaling etc...); for the union variant, it's less clear but it seems it prohibits some copy/return value optimizations. Plus gcc apparently likes (well, used to) very much the *(x + i) idiom; all in all i had something to work with. Now i'm seeing *some* stores indexed in this way vanish, array addressing is still as bad as it was, unions still get me some pessimization and using the memcpy idiom asks me to give up on the idea of an array acces operator altogether. So i'm asking, which is going to be fixed in the foreseeable future.
Re: g++ 4.3, troubles with C++ indexing idioms
On Thu, 19 Jul 2007, tbp wrote: On 7/19/07, Richard Guenther [EMAIL PROTECTED] wrote: Well, I always used the array variant, but you should be able to do [snip] if you need to (why does the array form not work for you?) Because if you bench in some non trivial program, on x86/x86-64 at least, those variations (struct { float x,y,z; }, struct { float f[3]; } and some additional union layer) the last 2 consistently come out as slower. In the array case addressing seems to be the main issue (redundant scaling etc...); for the union variant, it's less clear but it seems it prohibits some copy/return value optimizations. Plus gcc apparently likes (well, used to) very much the *(x + i) idiom; all in all i had something to work with. It's worth noting that C++ does not guarantee that (y == x + 1) in your example. The compiler is allowed to insert in arbitrary amount of padding between the fields of a structure. If it works, you got lucky. Now i'm seeing *some* stores indexed in this way vanish, array addressing is still as bad as it was, unions still get me some pessimization and using the memcpy idiom asks me to give up on the idea of an array acces operator altogether. So i'm asking, which is going to be fixed in the foreseeable future.
RE: g++ 4.3, troubles with C++ indexing idioms
On 19 July 2007 14:08, tbp wrote: I have that usual heavy duty 3 fp components class that needs to be reasonably efficient and takes this form for g++ struct vec_t { float x,y,z; const float operator()(const uint_t i) const { return *(x + i); } float operator()(const uint_t i) { return *(x + i); } // -- guilty [snip ctors, operators related cruft] }; I use this notation That's not notation. That's illegal invalid code that invokes undefined behaviour. struct vec_t is not a POD type (since it has ctors) and you don't actually have any right to assume that x, y and z are arranged in consecutive memory locations. Alas, somewhere between gcc-4.3-20070608 (ok) and gcc-4.3-20070707 (not ok ever since), the non const indexing started to trigger bogus codegen with Bogus codegen is the inevitable result of bogus code. Garbage in, garbage out. BTW, the const indexing is completely undefined too. At this point i'd need some guidance from compiler developers No, you need to read a book on how to program in C++. What kind of idiom am i supposed to write such thing in to get back efficient and correct code? This is a question for gcc-help, not for the main gcc list. Legal valid code would be a good start. cheers, DaveK -- Can't think of a witty .sigline today
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/19/07, tbp [EMAIL PROTECTED] wrote: On 7/19/07, Richard Guenther [EMAIL PROTECTED] wrote: Well, I always used the array variant, but you should be able to do [snip] if you need to (why does the array form not work for you?) Because if you bench in some non trivial program, on x86/x86-64 at least, those variations (struct { float x,y,z; }, struct { float f[3]; } and some additional union layer) the last 2 consistently come out as slower. In the array case addressing seems to be the main issue (redundant scaling etc...); for the union variant, it's less clear but it seems it prohibits some copy/return value optimizations. Plus gcc apparently likes (well, used to) very much the *(x + i) idiom; all in all i had something to work with. Now i'm seeing *some* stores indexed in this way vanish, array addressing is still as bad as it was, unions still get me some pessimization and using the memcpy idiom asks me to give up on the idea of an array acces operator altogether. So i'm asking, which is going to be fixed in the foreseeable future. Of course, if any then the array indexing variant is fixed. It would be nice to see a complete testcase with a pessimization, maybe you can file a bugreport about this? Thanks, Richard
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/19/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course, if any then the array indexing variant is fixed. It would be nice to see a complete testcase with a pessimization, maybe you can file a bugreport about this? By essence they're hard to trigger in small testcases (that's not where they matter anyway), and by my own previous experience large hairy bug reports get forgotten on the side of the road. But i'll see if can make up something convincing, provided i got the cause for the relative slowdown right.
Re: g++ 4.3, troubles with C++ indexing idioms
On 7/19/07, Dave Korn [EMAIL PROTECTED] wrote: Bogus codegen is the inevitable result of bogus code. Garbage in, garbage out. BTW, the const indexing is completely undefined too. That's the kind of answer i'd get from gcc-help and at that point i'd be none wiser because i already know that. I also know that up to gcc-4.3-20070608 it was provably giving correct results faster than any other variants. Being no language lawyer, that's the only metric i consider. It's no portability issue either because every compiler asks for a specific work around; which is quite sad considering how mundane that code is.
RE: g++ 4.3, troubles with C++ indexing idioms
On 19 July 2007 20:38, tbp wrote: On 7/19/07, Dave Korn wrote: Bogus codegen is the inevitable result of bogus code. Garbage in, garbage out. BTW, the const indexing is completely undefined too. That's the kind of answer i'd get from gcc-help and at that point i'd be none wiser because i already know that. I also know that up to gcc-4.3-20070608 it was provably giving correct results faster than any other variants. Being no language lawyer, that's the only metric i consider. It's no portability issue either because every compiler asks for a specific work around; which is quite sad considering how mundane that code is. The workaround is to just do it right in the first place: union the three single ints with the array of three. That way you won't mislead the aliasing and dataflow analysis. If you really analysed the generated code in great depth, you'd be able to classify all the differences between the codegen when you do it the right way and the codegen when you do it your original way into two categories: a) missed optimisations, and b) extra instructions that are actually needed for technical correctness in aliasing situations. Those in category a) you can file PRs and they'll be fixed if possible; those in category b) would actually be bad codegen if the compiler generated the same code for the array/union case as it did for your pointer-arithmetic-abuse case, although in some cases you might be able to use restricted pointers to get back to the kind of codegen you're looking for. cheers, DaveK -- Can't think of a witty .sigline today