http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

           Summary: Could do more efficient unsigned-to-float to
                    conversions based on range information
           Product: gcc
           Version: 4.6.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: sgunder...@bigfoot.com


I have code that looks vaguely like this:

float func(unsigned x)
{
    return (x & 0xfffff) * 0.01f;
}

When I compile it, GCC gives a long and relatively slow sequence:

fugl:~> gcc-4.6 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc-4.6
COLLECT_LTO_WRAPPER=/usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.1-3'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-multiarch
--with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib/i386-linux-gnu
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib/i386-linux-gnu
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.6.1 (Debian 4.6.1-3) 

fugl:~> gcc-4.6 -O2 -march=pentium3 -msse2 -mfpmath=sse -c test.c
fugl:~> objdump --disassemble test.o                             

test.o:     file format elf32-i386


Disassembly of section .text:

00000000 <func>:
   0:    83 ec 04                 sub    $0x4,%esp
   3:    8b 54 24 08              mov    0x8(%esp),%edx
   7:    89 d0                    mov    %edx,%eax
   9:    81 e2 ff ff 00 00        and    $0xffff,%edx
   f:    25 ff ff 0f 00           and    $0xfffff,%eax
  14:    c1 e8 10                 shr    $0x10,%eax
  17:    f3 0f 2a c0              cvtsi2ss %eax,%xmm0
  1b:    f3 0f 2a ca              cvtsi2ss %edx,%xmm1
  1f:    f3 0f 59 05 00 00 00     mulss  0x0,%xmm0
  26:    00 
  27:    f3 0f 58 c1              addss  %xmm1,%xmm0
  2b:    f3 0f 59 05 04 00 00     mulss  0x4,%xmm0
  32:    00 
  33:    f3 0f 11 04 24           movss  %xmm0,(%esp)
  38:    d9 04 24                 flds   (%esp)
  3b:    58                       pop    %eax
  3c:    c3                       ret    
  3d:    8d 76 00                 lea    0x0(%esi),%esi

I assume this is because x is unsigned (I cannot easily change this, as I
depend on wraparound). However, if I insert a cast to int after the and
operation, I get the same results, and a much better sequence:

00000040 <func2>:
  40:    83 ec 04                 sub    $0x4,%esp
  43:    8b 44 24 08              mov    0x8(%esp),%eax
  47:    25 ff ff 0f 00           and    $0xfffff,%eax
  4c:    f3 0f 2a c0              cvtsi2ss %eax,%xmm0
  50:    f3 0f 59 05 04 00 00     mulss  0x4,%xmm0
  57:    00 
  58:    f3 0f 11 04 24           movss  %xmm0,(%esp)
  5d:    d9 04 24                 flds   (%esp)
  60:    5a                       pop    %edx
  61:    c3                       ret    

In other words, the modified code looks like this:

float func2(unsigned x)
{
    return (int)(x & 0xfffff) * 0.01f;
}

This should be possible for GCC to do when it has range information that says
the sign bit cannot be set.

Reply via email to