Il 08/05/2013 17:44, Torbjorn Granlund ha scritto: > Paolo Bonzini <pbonz...@redhat.com> writes: > > I think that would be faster on 32-bit hosts, truncs are cheap. > > And slower perhaps on 64-bit hosts, at least for operations where > additional explicit trunctation will be needed (such as before > comparisions and after right shifts). > > > There could be a disadvantage of this compared to the old code, since > > this has a chained algebraic dependency, while the old code's many > > instructions might have been more independent. > > What about these alternatives: > > setcond LT, t0, arg0, arg1 > setcond EQ, t1, arg0, arg1 > trunc s0, t0 > trunc s1, t1 > shli s0, s0, 1 ; s0 = (arg0 < arg1) ? 2 : 0 > subi s1, s1, 2 ; s1 = (arg0 != arg1) ? -2 : -1 > sub s0, s0, s1 ; < 4 == 1 > 2 > shli s0, s0, 1 ; < 8 == 2 > 4 > > ======= > > setcond LT, t0, arg0, arg1 > setcond NE, t1, arg0, arg1 > trunc s0, t0 > trunc s1, t1 > add s0, s0, s1 ; < 2 == 0 > 1 > movi s1, 1 > add s0, s0, s1 ; < 3 == 1 > 2 > shl s1, s1, s0 ; < 8 == 2 > 4 > > Surely there are many alternative forms. > Is your aim to add micro-parallelism?
Yes, I think in this respect I think the first one is better. The second could be three instructions on machines that have a set-nth-bit instruction _and_ a zero register, but I'm not sure they exist... > (Your sequences look a bit curious. Did you use a super-optimiser?) No, but I am attracted to these curious sequences from my previous life working on compilers. :) I know your superoptimizer and, in fact, we both worked on some parts of GCC (optimization of conditional branches/stores), just 20 years apart. The second is actually not too curious after you look at it for a while, it is a variant of the usual (x > y) + (x >= y) trick used to generate a 0/1/2 result. The first I found by trial and error based on yours; it is basically (x < y) * 2 - (x == y) + 2, with some reordering to get parallelism and avoid the need for subfi-like instructions. Paolo