Re: [Factor-talk] asm intrinsics for speed

Slava Pestov Wed, 03 Sep 2008 05:23:22 -0700

Hi Phil,

I found the time to look at your code and thought about what's going on here.


Here are the main sources of overhead, as far as I can tell:

- The accumulating integer for the count cannot be proven to be
fixnum. This would be the case even if the loop count was known to be
a fixnum (which it isn't, in your case). Consider something like this:

{ fixnum } declare 0 swap [ 1+ ] times

Even though the counter increments in lock-step with the internal
counter of 'times', which is known to be bounded by a fixnum, the
optimizer doesn't notice this and produces the following code:

[
    0 swap 0 swap \ ( gensym ) [
        over over fixnum< [
            >r
            >r 1 +-integer-fixnum r>
            r>
            >r 1 fixnum+fast r>
            ( gensym )
        ] [ 2drop ] if
    ] label
]

1 +-integer-fixnum involves a subroutine call and two conditional
branches and is not as fast as 1 fixnum+fast which is essentially a
single instruction.

A similar situation occurs with a 'count' combinator. The counter is
always less than or equal to the loop induction variable, so if the
latter is a fixnum the counter is too. The optimizer needs to realize
this fact.

While the required induction variable analysis might not appear for a
while, the new code generator that's planned will make
+-integer-fixnum more efficient.

- The other problem is that on 32-bit platforms, there is no inline
intrinsic for alien-unsigned-4 and so a call into the VM is generated,
which is very expensive.

- Even if the code generator could produce inline assembly for
alien-unsigned-4, it wouldn't in your above code, because it cannot
assume that address>> outputs a simple-alien and length>> outputs a
fixnum.

So the bad news is that for this given problem, it won't get much
faster than it is now until I work on the compiler some more.

The good news is that thanks to the new optimizer's tuple unboxing,
ity is possible to implement an algorithm with almost identical
performance but without re-implementing parts of sequences or using
unsafe words:

TUPLE: int-array { alien read-only } { length read-only } ;

: <int-array> ( alien bytes -- int-array )
    "int" heap-size /i int-array boa ; inline

M: int-array length length>> ;

M: int-array nth-unsafe alien>> int-nth ;

INSTANCE: int-array immutable-sequence

: count-items ( alien bytes item -- n )
    [ <int-array> ] dip [ = ] curry count ;

HINTS: count-items simple-alien fixnum fixnum ;

: with-whole-mapped-file ( fname quot -- )
    [ dup file-info size>> ] dip with-mapped-file ; inline

: test-count-items ( fname val -- count )
    [
        [ [ address>> ] [ length>> ] bi ] dip count-items
    ] curry with-whole-mapped-file ;

The key here is that the <int-array> gets unboxed because its
allocation never escapes, and so we end up with essentially the same
code that you started with.

On 64-bit platforms, the alien-signed-4 call (in the int-nth inline
word) should be open-coded as an assembler intrinsic, because we
explicitly declared that the alien is a simple-alien and that the
length is a fixnum in hints, and this would improve performance.

Note that we didn't have to declare types on the tuple slots; they
propagate from the HINTS: through the constructor and out of the
accessors. The read-only declaration is essential though, this is
required in order for tuple slot types to propagate and for unboxing
to take place.

Finally, I added a new benchmark, extra/benchmark/dawes (I hope you
don't mind the name ;-), as a reminder to myself that I need to work
on this in the future.

Slava

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] asm intrinsics for speed

Reply via email to