Thanks!

In terms of absolute performance, I think it's just that I'm on a
slower/older machine with a Core Duo @1.8Ghz ca. 2006. Trying it again
on a Xeon server gives 11.8ms and a friend's workstation gives 8.4ms;
much closer to the speed you're getting.

Matthew Flatt <mfl...@cs.utah.edu> writes:
> These changes look good, and I'll push them.
>
> I also tried a larger revision to lift the mode tests out of the loop;
> it's only worth about 10%, but it might set up further improvements.
>
> When I run your example, though, I get much better absolute performance
> than you're reporting: 34.3 ± 0.2 msec in version 5.3.1, and 8.45 ± 0.1
> msec after the changes. That's running `racket' from a command line and
> on 64-bit Mac OS X, but I get similar results from other machines (and
> other OSes under VirtualBox on Mac OS X). Any idea why your numbers are
> so different?
>
> At Sun, 16 Dec 2012 11:29:58 -0700, Michael Wilber wrote:
>> TL;DR: About ~2.8x speedup from using local variables and unsafe
>> functions. Copying each bitmap row could bring speedup to ~20x, but it
>> doesn't quite work and I need your help. Pull request at
>> https://github.com/plt/racket/pull/199
>>
>> Hey there!
>>
>> I'm writing some FFmpeg bindings for Racket. It's fast enough to decode
>> video in real time, but on my machine, set-argb-pixels takes 189.35±1.3
>> msec to run for a 500x500 image, which means I'm limited to displaying
>> frames at ~5fps.
>>
>> Here's a toy benchmark to test set-argb-pixels:
>> https://gist.github.com/4a5661dfad984cfdab19
>>
>> There are some very simple bottlenecks that I've started to address:
>>
>> 1. It turns out that the references to b&w? and alpha-channel-local? for
>>    each pixel are slow slow slow. Making them local variables drops the
>>    time down to 124.8±1.0msec. This three-line change gives a speedup
>>    factor of about ~1.5
>>
>> 2. Using unsafe functions everywhere (unsafe-bytes-ref and friends,
>>    unsafe-fx+ and friends) drops it further to 67.05±0.6msec, which is a
>>    speedup factor of ~2.82 over the original on my machine
>>
>> A pull request for the above is at
>> https://github.com/plt/racket/pull/199
>>
>> Now, if we can assume that the input bytes already contain pre-clipped,
>> premultiplied data, we don't really have to loop through each pixel. If
>> we copy each row using copy-bytes!, that drops the function to 9.55±6.1
>> msec (!) which is a speedup factor of ~20x over the original.
>>
>> The problem with that is on my little-endian machine, Cairo expects the
>> input data in BGRA format, not RGBA, so the colors look wrong. Alas,
>> this is why Racket's doing all the byte swizzling manually.
>>
>> Is there a fast native way of switching the endianness of a byte vector
>> assumed to contain 32-bit ints? Or some way to do what we want?
>>
>> If there's a way to do this, this could make playing simple
>> low-resolution videos from Racket pretty feasible.
>>
>> ____________________
>>   Racket Users list:
>>   http://lists.racket-lang.org/users

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to