(rand) is expensive -- removing the two (rand)s knocks about 40 seconds off it, nearly 1/5 the total time. I'll try replacing them with lookup from a precalculated grid of randoms -- long-range correlations shouldn't matter here.
On Thu, Nov 8, 2012 at 8:00 PM, Cedric Greevey <cgree...@gmail.com> wrote: > On Thu, Nov 8, 2012 at 3:48 PM, Cedric Greevey <cgree...@gmail.com> wrote: > >> I have the following code to perform a complicated image convolution. It >> takes 10-15 seconds with output dimensions 256x256 and samples 6. No >> reflection warnings, and using unchecked math doesn't speed it up any. I've >> tried to ensure it uses primitive math inside the loops, aside from >> generating the outer loop's values. What cached-load-chunk does shouldn't >> matter much, but in most cases it should boil down to a map lookup inside a >> swap! and a couple of atom derefs and function calls. The bottleneck is >> likely in the math somewhere, and likely something is being boxed, though >> I've primitive-coerced every numerical let and loop value and avoided more >> than two arguments per arithmetic op. >> >> Can anyone spot anything I haven't that could be causing boxed arithmetic >> inside the loops? >> > > I've now checked for Var lookups (none outside the caching function, and > now none inside either) and checked the caching code itself (there's a .get > on a closed-over ConcurrentHashMap, a null check, a .get on a > SoftReference, and another null check, on each lookup, if there isn't a > cache miss on that lookup; plus a couple more method calls for the IFn > invokes and an ivar fetch to get the ConcurrentHashMap reference). > > In the absence of cache misses I'm still seeing ~3.5 *minutes* at 1280x720 > with 10 samples (= about 100 million iterations total of the inner loop). > The arithmetic in there is 23 floating-point ops, five compares, a log, an > atan, and three bitwise ANDs. Without the log and atan a hundred million of > that inner loop should take a second on this box. I very much doubt the log > and the atan are 209 times slower than the rest of it combined. So there's > three likely culprits: boxing, two calls to (rand), and the BufferedImage > .getRGB method call (on a 6000x2198 24bpp image, though its size should > matter not), unless trig is really that slow. > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en