On Sat, 2007-10-27 at 12:19 +0100, Thiemo Seufer wrote: > J. Mayer wrote: > > The latest patches in clo makes gcc 3.4.6 fail to build the mips64 > > targets on my amd64 host (looks like an register allocation clash in the > > optimizer code). > > Your version is likely faster as well. > > > Furthermore, the clz micro-op for Mips seems very suspect to me, > > according to the changes made in the clo implementation. > > It is correct, the sign-extension are zero in that case.
OK, you know better than me... > > I did change the clz / clo implementation to use the same code as the > > one used for the PowerPC implementation. It seems to me that the result > > would be correct... And it compiles... > > > > Please take a look to the folowing patch: > > We have now clz/clo in several places, so I expanded your patch a > bit. For now it is only used for the mips target. Comments? I fully aggree with the idea of sharing this code, if it's OK according to all targets specifications. Please commit and I'll update PowerPC and Alpha target to use them. Oh, I did an optimisation for clz64 used on 32 bits host, avoiding use of 64 bits logical operations: static always_inline int clz64(uint64_t val) { int cnt = 0; #if HOST_LONG_BITS == 64 if (!(val & 0xFFFFFFFF00000000ULL)) { cnt += 32; val <<= 32; } if (!(val & 0xFFFF000000000000ULL)) { cnt += 16; val <<= 16; } if (!(val & 0xFF00000000000000ULL)) { cnt += 8; val <<= 8; } if (!(val & 0xF000000000000000ULL)) { cnt += 4; val <<= 4; } if (!(val & 0xC000000000000000ULL)) { cnt += 2; val <<= 2; } if (!(val & 0x8000000000000000ULL)) { cnt++; val <<= 1; } if (!(val & 0x8000000000000000ULL)) { cnt++; } #else /* Make it easier on 32 bits host machines */ if (!(val >> 32)) cnt = _do_cntlzw(val) + 32; else cnt = _do_cntlzw(val >> 32); #endif return cnt; } If gcc is really cleaver, this would not lead to a better code, but it seemed that the 32 bits implementation leaded to a more optimized code on 32 bits hosts. Maybe this implementation could also be used for 64 bits host, avoiding #ifdef. Count trailing zero is also implemented on Alpha, it may be a good idea to share the implementation, if needed: static always_inline void ctz32 (uint32_t val) { int cnt = 0; if (!(val & 0x0000FFFFUL)) { cnt += 16; op32 >>= 16; } if (!(val & 0x000000FFUL)) { cnt += 8; val >>= 8; } if (!(val & 0x0000000FUL)) { cnt += 4; val >>= 4; } if (!(val & 0x00000003UL)) { cnt += 2; val >>= 2; } if (!(val & 0x00000001UL)) { cnt++; val >>= 1; } if (!(val & 0x00000001UL)) { cnt++; } return cnt; } static always_inline void ctz64 (uint64_t val) { int cnt = 0; if (!(val & 0x00000000FFFFFFFFULL)) { cnt+= 32; val >>= 32; } /* Make it easier for 32 bits hosts */ cnt += ctz32(val); return cnt; } And of course cto32 and cto64 could also be added. I also got optimized versions of bit population count which could also be shared: static always_inline int ctpop32 (uint32_t val) { int i; for (i = 0; val != 0; i++) val = val ^ (val - 1); return i; } If you prefer, I can add those shared functions (ctz32, ctz64, cto32, cto64, ctpop32, ctpop64) later, as they do not seem as widely used as clxxx functions. -- J. Mayer <[EMAIL PROTECTED]> Never organized