So 100 million (rand) calls take 20 seconds. From temporarily changing
^BufferedImage chunk (chunk-cache chunk-num) to ^BufferedImage chunk nil, I
determined that 100 million of the *combination* of the cache lookup (with
no misses) and the .getRGB call took 30. That still leaves just over two
minutes with only the arithmetic left in the loop. Either something's
getting boxed or it's the trig calls.


On Thu, Nov 8, 2012 at 8:18 PM, Cedric Greevey <cgree...@gmail.com> wrote:

> (rand) is expensive -- removing the two (rand)s knocks about 40 seconds
> off it, nearly 1/5 the total time. I'll try replacing them with lookup from
> a precalculated grid of randoms -- long-range correlations shouldn't matter
> here.
>
>
>
>
> On Thu, Nov 8, 2012 at 8:00 PM, Cedric Greevey <cgree...@gmail.com> wrote:
>
>> On Thu, Nov 8, 2012 at 3:48 PM, Cedric Greevey <cgree...@gmail.com>wrote:
>>
>>> I have the following code to perform a complicated image convolution. It
>>> takes 10-15 seconds with output dimensions 256x256 and samples 6. No
>>> reflection warnings, and using unchecked math doesn't speed it up any. I've
>>> tried to ensure it uses primitive math inside the loops, aside from
>>> generating the outer loop's values. What cached-load-chunk does shouldn't
>>> matter much, but in most cases it should boil down to a map lookup inside a
>>> swap! and a couple of atom derefs and function calls. The bottleneck is
>>> likely in the math somewhere, and likely something is being boxed, though
>>> I've primitive-coerced every numerical let and loop value and avoided more
>>> than two arguments per arithmetic op.
>>>
>>> Can anyone spot anything I haven't that could be causing boxed
>>> arithmetic inside the loops?
>>>
>>
>> I've now checked for Var lookups (none outside the caching function, and
>> now none inside either) and checked the caching code itself (there's a .get
>> on a closed-over ConcurrentHashMap, a null check, a .get on a
>> SoftReference, and another null check, on each lookup, if there isn't a
>> cache miss on that lookup; plus a couple more method calls for the IFn
>> invokes and an ivar fetch to get the ConcurrentHashMap reference).
>>
>> In the absence of cache misses I'm still seeing ~3.5 *minutes* at
>> 1280x720 with 10 samples (= about 100 million iterations total of the inner
>> loop). The arithmetic in there is 23 floating-point ops, five compares, a
>> log, an atan, and three bitwise ANDs. Without the log and atan a hundred
>> million of that inner loop should take a second on this box. I very much
>> doubt the log and the atan are 209 times slower than the rest of it
>> combined. So there's three likely culprits: boxing, two calls to (rand),
>> and the BufferedImage .getRGB method call (on a 6000x2198 24bpp image,
>> though its size should matter not), unless trig is really that slow.
>>
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to