Hi Jason,
Have you guys taken a look at core.matrix for any of this stuff? We're also
shooting for near-Java-parity for all of the core operations on large
double arrays.
(use 'clojure.core.matrix)
(require '[criterium.core :as c])
(let [a (double-array (range 10000))]
(c/quick-bench (esum a)))
WARNING: Final GC required 69.30384798936066 % of runtime
Evaluation count : 45924 in 6 samples of 7654 calls.
Execution time mean : 12.967112 µs
Execution time std-deviation : 326.480900 ns
Execution time lower quantile : 12.629252 µs ( 2.5%)
Execution time upper quantile : 13.348527 µs (97.5%)
Overhead used : 3.622005 ns
All the core.matrix functions get dispatched via protocols, so they work on
any kind of multi-dimensional matrix (not just Java arrays). This adds a
tiny amount of overhead (about 10-15ns), but it is negligible when dealing
with medium-to-large vectors/matrices/arrays.
I'm interested in feedback and hopefully we can collaborate: I'm keen to
get the best optimised numerical functions we can in Clojure. Also, I think
you may find the core.matrix facilities very helpful when moving to higher
level abstractions (i.e. 2D matrices and higher order multi-dimensional
arrays)
On Thursday, 13 June 2013 21:50:48 UTC+1, Jason Wolfe wrote:
>
> Taking a step back, the core problem we're trying to solve is just to sum
> an array's values as quickly as in Java. (We really want to write a fancier
> macro that allows arbitrary computations beyond summing that can't be
> achieved by just calling into Java, but this simpler task gets at the crux
> of our performance issues).
>
> This Java code:
>
> public static double asum_noop_indexed(double[] arr) {
> double s = 0;
> for (int i = 0; i < arr.length; i++) {
> s += arr[i];
> }
> return s;
> }
>
> can run on an array with 10k elements in about 8 microseconds. In
> contrast, this Clojure code (which I believe used to be as fast as the Java
> in a previous Clojure version):
>
> (defn asum-identity [^doubles a]
> (let [len (long (alength a))]
> (loop [sum 0.0
> idx 0]
> (if (< idx len)
> (let [ai (aget a idx)]
> (recur (+ sum ai) (unchecked-inc idx)))
> sum))))
>
> executes on the same array in about 40 microseconds normally, or 14
> microseconds with *unchecked-math* set to true. (We weren't using
> unchecked-math properly until today, which is why we were doing the hacky
> interface stuff above, please disregard that -- but I think the core point
> about an extra cast is still correct).
>
> For reference, (areduce a1 i r 0.0 (+ (aget a1 i) r)) takes about 23 ms to
> do the same computation (with unchecked-math true).
>
> Does anyone have ideas for how to achieve parity with Java on this task?
> They'd be much appreciated!
>
> Thanks, Jason
>
> On Thursday, June 13, 2013 12:02:56 PM UTC-7, Leon Barrett wrote:
>>
>> Hi. I've been working with people at Prismatic to optimize some simple
>> math code in Clojure. However, it seems that Clojure generates an
>> unnecessary type check that slows our (otherwise-optimized) code by 50%. Is
>> there a good way to avoid this, is it a bug in Clojure 1.5.1, or something
>> else? What should I do to work around this?
>>
>> Here's my example. The aget seems to generate an unnecessary
>> checkcastbytecode. I used Jasper and Jasmin to decompile and recompile
>> Bar.class
>> into Bar_EDITED.class, without that bytecode. The edited version takes
>> about 2/3 the time.
>>
>> (ns demo
>> (:import demo.Bar_EDITED))
>>
>> (definterface Foo
>> (arraysum ^double [^doubles a ^int i ^int asize ^double sum]))
>>
>> (deftype Bar []
>> Foo
>> (arraysum ^double [this ^doubles a ^int i ^int asize ^double sum]
>> (if (< i asize)
>> (recur a (unchecked-inc-int i) asize (+ sum (aget a i)))
>> sum)))
>>
>> (defn -main [& args]
>> (let [bar (Bar.)
>> bar-edited (Bar_EDITED.)
>> asize 10000
>> a (double-array asize)
>> i 0
>> ntimes 10000]
>> (time
>>
>> (dotimes [iter ntimes]
>> (.arraysum bar a i asize 0)))
>> (time
>> (dotimes [iter ntimes]
>> (.arraysum bar-edited a i asize 0)))))
>>
>>
>> ;; $ lein2 run -m demo
>> ;; Compiling demo
>> ;; "Elapsed time: 191.015885 msecs"
>> ;; "Elapsed time: 129.332 msecs"
>>
>>
>> Here's the bytecode for Bar.arraysum:
>>
>> public java.lang.Object arraysum(double[], int, int, double);
>> Code:
>> 0: iload_2
>> 1: i2l
>> 2: iload_3
>> 3: i2l
>> 4: lcmp
>> 5: ifge 39
>> 8: aload_1
>> 9: iload_2
>> 10: iconst_1
>> 11: iadd
>> 12: iload_3
>> 13: dload 4
>> 15: aload_1
>> 16: aconst_null
>> 17: astore_1
>> 18: checkcast #60 // class "[D"
>> 21: iload_2
>> 22: invokestatic #64 // Method
>> clojure/lang/RT.intCast:(I)I
>> 25: daload
>> 26: dadd
>> 27: dstore 4
>> 29: istore_3
>> 30: istore_2
>> 31: astore_1
>> 32: goto 0
>> 35: goto 44
>> 38: pop
>> 39: dload 4
>> 41: invokestatic #70 // Method
>> java/lang/Double.valueOf:(D)Ljava/lang/Double;
>> 44: areturn
>>
>>
>> As far as I can tell, Clojure generated a checkcast opcode that tests on
>> every loop to make sure the double array is really a double array. When I
>> remove that checkcast, I get a 1/3 speedup (meaning it's a 50% overhead).
>>
>> Can someone help me figure out how to avoid this overhead?
>>
>> Thanks.
>>
>> - Leon Barrett
>>
>
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.