[math] statistics performance boost

Ken Geis Thu, 13 May 2004 03:14:06 -0700

As I explained, I am using commons-math to enable data mining algorithms I am writing. I am using a lot of SummaryStatistics and TTest. Through some profiling, I was able to find places to optimize code and I ended up getting a 15x performance boost within my application. This was from three changes:

1. Add clone() to SummaryStatisticsImpl. This implies adding clone() to SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean, Mean, and Variance. To Mark, I think that the behavior of clone() is well implied by the Javadoc for java.lang.Object. I was surprised that I obviously had not read that before yesterday. To Phil, your suggested getSummary() method/bean would indeed solve my problem and give me even better performance. (clone() was ~20x faster than the serialize/deserialize hack I was using. This probably accounts for 2x of my overall 15x.)

2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance() was being called for every call to tTest. This is not a cheap method. After #1, this method was taking up something like 17% of the runtime of my synthetic benchmark. I created a method to lazily get the DistributionFactory and store it (transient) as a class attribute.

3. Make ContinuedFraction.evaluate(...) iterative instead of recursive. This gave me a 125% (2.25x) improvement in performance of this method. I think I can optimize it further, hopefully not at the cost of readability.

Patches available on request. Should I just start posting them when I have patches like this?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[math] statistics performance boost

Reply via email to