Re: [Factor-talk] Naive loop optimization

John Benediktsson Fri, 06 Feb 2015 07:55:58 -0800

Some thoughts for you:

No, ``dup`` does not do anything but duplicate essentially a pointer to the
object.


Part of the reason it is slow is that you are operating on a kind of box by
keeping your { x y } pairs in arrays (and in some cases unboxing ``first2``
and re-boxing ``2array``).  Each of the "math" words (``v*n``, ``v+``,
``v/n``, etc.) also do the same.  They aren't in-place operations, so
always allocating memory.

In addition to ``time``, you can also ``profile`` your program:

    IN: scratchpad gc [ bench1 ] profile
    Running time: 2.656065607 seconds

    IN: scratchpad flat profile.
    depth   time ms  GC %  JIT %  FFI %   FT %
       0    2657.0   5.91   0.00  17.69   0.00 T{ thread f "Listener"
~curry~ ~quotation~ 39 ~box~ f t f H{ } f...
       0    2656.0   5.87   0.00  17.66   0.00   bench1
       0    2655.0   5.88   0.00  17.66   0.00   step
       0     433.0  18.01   0.00  18.24   0.00   *
       0     430.0  18.14   0.00  90.70   0.00   <array>
       0     363.0   0.00   0.00   0.00   0.00   M\ array nth-unsafe
       0     275.0   0.00   0.00   0.00   0.00   /
       0     232.0   0.00   0.00   0.00   0.00   <
       0     194.0   0.00   0.00   0.00   0.00   +
       0     178.0   0.00   0.00   0.00   0.00   M\ array length
       0     141.0   0.00   0.00   0.00   0.00   M\ array set-nth-unsafe
       0     140.0   0.00   0.00   0.00   0.00   M\ sequence nth
       0     113.0   0.00   0.00   0.00   0.00   M\ integer bounds-check?
       0     104.0   0.00   0.00   0.00   0.00   M\ fixnum integer>fixnum

Here's a couple ideas for speeding it up.

You can "inline" all the math, so that you operate on ``x`` and ``y``, not
``{ x y }``, avoiding all the array accesses and mallocs.

    : lin-osc2 ( x y -- x1 y1 )
        2dup                     ! x y x y
        swap                     ! x y y x
        -1.0 *                   ! x y y -x
        [ 0.01 * ] bi@           ! x y dx*dt dy*dt
        [ + ] bi-curry@ bi*      ! x1 y1
        2dup [ absq ] bi@ + sqrt ! x1 y1 norm
        [ / ] curry bi@ ; inline

    : bench2 ( -- x y )
        1.0 0.0 10,000,000 [ lin-osc2 ] times ;

Note: you have to return something in "bench" or too much gets "optimized".

    IN: scratchpad gc [ bench2 ] time
    Running time: 0.213926035 seconds

Sometimes it might be easier for you to see the flow of math with locals
(doesn't affect performance):

    :: lin-osc3 ( x y -- x' y' )
        0.01 :> dt
        y :> dx
        x neg :> dy

        x dx dt * + :> x1
        y dy dt * + :> y1

        x1 y1 [ absq ] bi@ + sqrt :> norm

        x1 norm /
        y1 norm / ; inline

    : bench3 ( -- {x,y} )
        1.0 0.0 10,000,000 [ lin-osc3 ] times 2array ;

You can look into using things like the "typed" vocabulary, although
because of the way we are inlining the inputs above, it should already
"know" that it is operating on floats.

The "typed" vocabulary checks inputs against known types.  If you don't
want to slow down to do that, you can just declare your types (its unsafe
and in a private vocabulary because if you declare the wrong type you can
hard crash and we don't it to be mis-used):

    { float float } declare

Anyway, hope that helps, and sorry for the spam on the paste site,
reCAPTCHA isn't what it used to be for keeping away the bad robots.

Best,
John.









On Fri, Feb 6, 2015 at 4:07 AM, Marmaduke Woodman <mmwood...@gmail.com>
wrote:

> Hi,
>
> I've attempted to write a perhaps naive ODE integration loop in Factor,
>
>   http://paste.factorcode.org/paste?id=3428
>
> but it seems quite slow: the `bench1` word reports running time of ~3 s,
> which is an order of magnitude off equivalent OCaml & Haskell, so I imagine
> due to my lack of Factor experience there's boxing, unboxing and mixing of
> types leading to poor performance.
>
> Are there some general principles for writing performant numerical code in
> Factor? Do the generic sequence words get optimized or explicit use of
> unsafe words are required?
>
> A specific question about `dup` on container types: are the underlying
> data duplicated?
>
> If I've missed any potential reading matériels on this, refs would be much
> appreciated.
>
> Cheers,
> Marmaduke
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] Naive loop optimization

Reply via email to