On Wed, Feb 12, 2014 at 7:21 AM, Albert Cahalan <acaha...@gmail.com> wrote: > I had a problem that got solved in an ugly way. I think gcc ought > to provide a few ways to make a nicer solution. > > There was an array of structs roughly like so: > > struct{int w;float x;char y[4];short z[2];}foo[512][4]; > > The types within the struct are 4 bytes each; I don't actually > remember anything else and it doesn't matter except that they > are distinct. I think it was bitfields actually, neatly grouped > into groups of 32 bits. In other words, like 4 4-byte values > but with more-or-less incompatible types. > > Note that 4 of the structs neatly fill a 64-byte cache line. > An alignment attribute was used to ensure 64-byte alignment. > > The most common operation needed on this array is to compare > the first struct member of 4 of the structs against a given > value, looking to see if there is a match. SSE would be good. > This would then be followed by using the matching entry if > there is one, else picking one of the 4 to recycle and thus use. > > First bad solution: > > One could load up 4 SSE registers, shuffle things around... NO. > > Second bad solution: > > One could simply have 4 distinct arrays. This is bad because > there are different cache lines for w, x, y, and z. > > Third bad solution: > > The array can be viewed as "int foo[512][4][4]" instead, with > the struct forming the third array index. Note that the last two > array indexes are both 4, so you can kind of swap them around. > This groups 4 fields of each type together, allowing SSE. The > problem here is loss of type safety; one must use array indexes > instead of struct field names. Like so: foo[idx][WHERE_W_IS][i] > > Fourth bad solution: > > We lay things out as in the third solution, but we cast pointers > to effectively lay sparse structs over each other like shingles. > { > int w; > int pad_wx[3]; > float x; > int pad_xy[3]; > char y[4]; > int pad_yz[3]; > short z[2]; > } > Performance is hurt by the need for __may_alias__ and of course > the result is painful to look at. We went with this anyway, using > SSE intrinsics, and performance was great. Maintainability... not > so much. > > BTW, an array of 512 structs containing 4-entry arrays was not used > because we wanted to have a simple normal pointer to indicate the > item being operated on. We didn't want to need a pointer,index pair. > > Can something be done to help out here? The first thing that pops > into mind is the ability to tell gcc that the struct-to-struct > byte offset for array indexing is a user-specified value instead > of simply the struct size. > > It's possible we could have safely ignored the warning about aliasing. > I don't know. Perhaps that would give even better performance, but > the casting would still be very ugly. > > Solutions that that be defined away for non-gcc compilers are better.
Do the overlay but use an overlay of type char[<large enough>] and load from that. Should be more maintainable than using may_alias and also work with other compilers. Richard.