[ https://issues.apache.org/jira/browse/MATH-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andre Panisson updated MATH-224: -------------------------------- Attachment: commons_math.patch The patch I'm using as a solution to my problem. Had not implemented aggregation to ThirdMoment and FourthMoment, as my skills in statistics don't go so far... > Utility method to aggregate Statistics > -------------------------------------- > > Key: MATH-224 > URL: https://issues.apache.org/jira/browse/MATH-224 > Project: Commons Math > Issue Type: Improvement > Reporter: Andre Panisson > Priority: Minor > Attachments: commons_math.patch > > > Below is the conversation related to this topic that was posted to the > Commons Users group. > ------------------------------------------------- > Hi, > > > > I'm writing a complex validation algorithm, that makes a K-Fold > > cross-validation using a data set. The data set is partitioned into K > > subsamples, and of the K subsamples, a single subsample is retained > > as the validation data for testing, and the remaining K − 1 > > subsamples are used as training data. The process is then repeated K > > times, and at the end the K results are aggregated to a single > > result. The problem is that all K results return Statistics objects > > (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I > > need to make the aggregation of all K objects in a single Statistics. > > I think it is a common problem in the statistics field. There's > > anyone who had already implemented an utility method to do it? > There is no such feature currently in commons-math. The > SummaryStatistics class wraps a bunch of specialized statistics classes > (Sum, Mean, Max, SumOfSquares ...) which can be overriden by > user-provided StorelessUnivariateStatistic implementations. > So this feature should be added to the StorelessUnivariateStatistic > interface and all its implementations, with a signature like this: > public void aggregate(StorelessUnivariateStatistic otherStatistic); > The implementation of this method should only use the > StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This > seems feasible for the statistics used by SummaryStatistics, but has not > been done yet. > One should be aware that SummaryStatistics does not enforce strong > typing, so one could call aggregate on a Sum instance and provide it a > Min instance, which would of course result in meaningless results. > > Or maybe it would be interesting to request it as an Improvement to > > the Commons Math developers, adding an "aggregator" to all Statistics > > implementations? > If you want to request this improvement, please open a ticket for it > using our JIRA tracking system: > http://issues.apache.org/jira/browse/MATH. You'll have to register to be > able to add your feature request. You can also provide a patch if you > want to contribute it by yourself. > Luc > > > > Thanks in advance, > > > > Andre Panisson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.