Re: [boost] RE: Any interest in a stats class

2003-03-01 Thread Victor A. Wagner, Jr.
At Tuesday 2003/02/25 09:10, you wrote:
Please remember that stats can be more general.  I frequently use stats for
complex types.  In that case, mean is also complex, but var is scalar.  The
proposed implementation doesn't address this.
You sure lost me.  Would you care to point out _where_ the proposed 
implementation lacks?
Victor A. Wagner Jr.  http://rudbek.com
The five most dangerous words in the English language:
  There oughta be a law

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Any interest in a stats class

2003-02-25 Thread Hubert Holin
Somewhere in the E.U., le 25/02/2003

Bonjour


In article [EMAIL PROTECTED],
 Jason D Schmidt [EMAIL PROTECTED] wrote:

 I know this is well after the discussion on the stats class has ended,
 but I think I have a good idea here.
 
 Scott Kirkwood proposed a class that behaves something like this:
 
   stats myStats;
 for (int i = 0; i  100; ++i) {
 myStats.add(i);
 }
 cout  Average:   myStats.getAverage()  \n;
 cout  Max:   myStats.getMax()  \n;
 cout  Standard deviation:   myStats.getStd()  \n;
 
 In one of my classes in grad school, I found it quite useful and
 effecient to do statistics on the fly like this, so this stats class
 interests me.  Anyway, Scott has already alluded to the point I'm about
 to make.  I think it's important and useful for this stats class to
 integrate with the STL well.  This example code was inspired by the
 PointAverage example from Effective STL p. 161:
 
 // this class reports statistics
 template typename value_type
 class stats
 {
 public:
 stats(const size_t n, const value_type sum, const value_type
 sum_sqr):
 m_n(n), m_sum(sum), m_sum_sqr(sum_sqr)
 {}
 value_type sum() const
 { return m_sum; }
 value_type mean() const
 { return m_sum/m_n; }
 value_type var() const
 { return m_sum_sqr - m_sum*m_sum/m_n; }
 value_type delta() const  // aka, standard dev
 { return sqrt(var() / (m_n-1)); }
 private:
 value_type m_n, m_sum, m_sum_sqr;
 };
 
 // this class accumulates results that can be used to
 // compute meaningful statistics
 template typename value_type
 class stats_accum: public std::unary_functionconst value_type, void
 {
 public:
 stats_accum(): n(0), sum(0), sum_sqr(0)
 {}
  // use this to operate on each value in a range
 void operator()(argument_type x) 
 {
 ++n;
 sum += x;
 sum_sqr += x*x;
 }
 statsvalue_type result() const
 { return statsvalue_type(n, sum, sum_sqr); }
 private:
 size_t n;
 value_type sum, sum_sqr;
 };
 
 int main(int argc, char *argv[])
 {
 typedef float value_type;
 const size_t n(10);
 
 float f[n] = {0, 2, 3, 4, 5, 6, 7, 8, 9, 8};
 
// accumulate stats over a range of iterators
 my_stats = std::for_each(f, f+n,
 stats_accumvalue_type()).result();
 
 m = my_stats.mean();
 m = my_stats.delta();  // aka, standard deviation
 
 return 0;
 }

In this example, what is the advantage over filling a valarray 
and using a stat class which uses that as a constructor argument? You 
would get sum for free, and hopefully (yeah, right...) operations on 
valarrays could be hardware accelerated, whereas direct coding might not 
be. That is, at least, one of the ideas I encoded in the file I just 
uploaded on Yahoo (statistical_descriptor.h.gz).

 This seems to be pretty similar to what Scott has proposed, and it turns
 out that this method is very fast.  In my tests it has been nearly as
 fast as if we got rid of the classes and used a hand-written loop.  It's
 certainly much faster than storing the data in a std::valarray object,
 and using functions that calculate the mean  standard deviation
 separately.  This is just a neat application of Scott's idea.
 
 I think this stats could be pretty useful for scientific computing, and
 in this example it works very well with the STL and has great
 performance.  I'd like to see more code like this in Boost, but most of
 my work is numerical.  Take my opinion or leave it.
 
 Jason Schmidt

I agree with you that if the cardinal of the population is not 
known then your approach is still useable whereas mine is not realistic. 
But in that case you might have to reset the class periodically (if you 
are doing statistics on the fly and want to just test a sample). Your 
method might also be usefull when the amount of data is too big to be 
properly placed at once in memory.

So, we need classes for sequences, either in memory or via some 
iterator, one dimensional or multi dimensional, and we also need classes 
for (experimental) densities.

We also need generators for the usual densities. Since we aready 
have implementations of random, we should hitch our code to it. This 
also ties in with the request for special functions such as erf.

Since we now have uBlas, we can also try to aim for more complex 
statistical constructs such as Gaussian Mixture Models, though to train 
the Neural Networks which produce them, we also need good optimisation 
code, which we lack completely at present (and which in turn usually 
need some LA code).

Anybody want to try to get the COOOL (http://coool.mines.edu/)
people aboard Boost?

A bientot

Hubert Holin

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


RE: [boost] Re: Any interest in a stats class

2003-02-21 Thread Paul A. Bristow
http://scicomp.ewha.ac.kr/netlib/cephes/

for example, but many others according to Google.

(My attempts in using F2C were less than satisfying from a style point of view.

NOT Fortran to C++, if one wanted that ...)

Paul

Dr Paul A Bristow, hetp Chromatography
Prizet Farmhouse, Kendal, Cumbria, LA8 8AB  UK
+44 1539 561830  Mobile +44 7714 33 02 04
mailto:[EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Hubert Holin
 Sent: Monday, February 17, 2003 12:35 PM
 To: [EMAIL PROTECTED]
 Subject: [boost] Re: Any interest in a stats class


 Somewhere in the E.U., le 17/02/2003

 In article [EMAIL PROTECTED],
  Paul A. Bristow [EMAIL PROTECTED] wrote:

   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED]]On Behalf Of Hubert Holin
   Sent: Friday, February 14, 2003 1:25 PM
   To: [EMAIL PROTECTED]
   Subject: [boost] Re: Any interest in a stats class
  
  
   Somewhere in the E.U., le 14/02/2003
  
  
 There still is the question of whether similarity with NR is a
   problem or not (the language in which the techniques are implemented is
   different, but implementations of the techniques themselves are of
   course basically similar since they refer to the same math construction).
 
  I cannot see this being a serious problem unless we simply lift the
 NR in C++
  code verbatim.  (Most of it is still in old C style for one thing, despite
  the
  recent reissue).

   Yes, on that front we should be safe, but then IANAL...

   I am hoping that with uBlas, we can contribute more numerical
   stuff. I have some Gaussian Mixture Models code that I should be
   rewriting in the not too distant future (currently based on an old
   version of TNT, and most of the important pre-processing needed has to
   be done elsewhere, for the then lack of svd).
 
  This would be a most welcome developement. uBLAS seems a good
 starting point.
 
 My old files provide number_of_samples , max, min,
   first_max_index, first_min_index, mean, median, variance,
   standard_deviation, average_deviation, skewness and kurtosis for
   sequences (where appropriate), number_of_bins, mass, first_mode_value,
   first_mode, mean, median, variance, standard_deviation,
   average_deviation, skewness and kurtosis for deensities (where
   appropriate).
 
  Sounds a pretty good selection.

   I'll uplaod my old file in a moment, for inspirational input, and
 make a note in the Wiki, if I can get that to work.


Finally, there is the unsolved matter of the math functions we still
badly
need.
  
 Err, I kind of forgot which ones where requested...
 
  Well all the items in Stephen Moshier's Cephes collection say.  erf, gamma,
  beta, imcomplete, gaussian etc etc.  However, we didn't seem to get far with
  agreeing the format for these.  My naive assumption that double erf(double)
  style functions would be enough was criticised by those who wanted fancier
  solutions,
  some far fancier.

   I either forgot or missed that thread (I lost quite a bit of data
 and hence memory during my OS upgrade, thanks to a faulty ftp
 server...). Would you have a pointer handy?

  In my view getting this far would be a major step forward.  There are major
  problems in accuracy even at double, let alone long double.
 
  There was also talk of an NIST project but I haven't heard of any progress
  yet.


   I just checked the DLMf website (http://dlmf.nist.gov/), and it seems
 they are moving forward albeit slowly (book and free web document in
 2004). At any rate, that document will not, as I understand, include
 actual implementation in a computer language of the functions  al., so
 we should just go ahead and code, perhaps using existing fortran
 implementations as guidelines (though obviously having the document
 would make the coding *MUCH* easier :-)  ).

  Paul
 
 
   
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Garland
 Sent: Tuesday, February 11, 2003 4:19 PM
 To: Boost mailing list
 Subject: RE: [boost] Any interest in a stats class


 Scott K wrote:

  Hi all,
  I have a small family of statistics classes which I have used from
  time
  to time. The one I use most often is simply called stats.
  Here's an example of it's use:
  ...details snipped...

 I'm sure there are folks interested in statistical (and other)
 functions.  I've developed exactly this sort of class in the
 past so I understand the utility.  However, I suspect some of
 us would hope statistical algorithms to be formulated as STL
 Algorithm extensions.  Specifically concerning statistics see:

 http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
rithmExtensions/StatisticsAlgorithms

 and more generally:

 http://www.crystalclearsoftware.com/cgi-bin/boost_wiki

[boost] Re: Any interest in a stats class

2003-02-17 Thread Hubert Holin
Somewhere in the E.U., le 17/02/2003

In article [EMAIL PROTECTED],
 Paul A. Bristow [EMAIL PROTECTED] wrote:

  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED]]On Behalf Of Hubert Holin
  Sent: Friday, February 14, 2003 1:25 PM
  To: [EMAIL PROTECTED]
  Subject: [boost] Re: Any interest in a stats class
 
 
  Somewhere in the E.U., le 14/02/2003
 
 
There still is the question of whether similarity with NR is a
  problem or not (the language in which the techniques are implemented is
  different, but implementations of the techniques themselves are of
  course basically similar since they refer to the same math construction).
 
 I cannot see this being a serious problem unless we simply lift the NR in C++
 code verbatim.  (Most of it is still in old C style for one thing, despite 
 the
 recent reissue).

  Yes, on that front we should be safe, but then IANAL...

  I am hoping that with uBlas, we can contribute more numerical
  stuff. I have some Gaussian Mixture Models code that I should be
  rewriting in the not too distant future (currently based on an old
  version of TNT, and most of the important pre-processing needed has to
  be done elsewhere, for the then lack of svd).
 
 This would be a most welcome developement. uBLAS seems a good starting point.
 
My old files provide number_of_samples , max, min,
  first_max_index, first_min_index, mean, median, variance,
  standard_deviation, average_deviation, skewness and kurtosis for
  sequences (where appropriate), number_of_bins, mass, first_mode_value,
  first_mode, mean, median, variance, standard_deviation,
  average_deviation, skewness and kurtosis for deensities (where
  appropriate).
 
 Sounds a pretty good selection.

  I'll uplaod my old file in a moment, for inspirational input, and 
make a note in the Wiki, if I can get that to work.


   Finally, there is the unsolved matter of the math functions we still 
   badly
   need.
 
Err, I kind of forgot which ones where requested...
 
 Well all the items in Stephen Moshier's Cephes collection say.  erf, gamma,
 beta, imcomplete, gaussian etc etc.  However, we didn't seem to get far with
 agreeing the format for these.  My naive assumption that double erf(double)
 style functions would be enough was criticised by those who wanted fancier
 solutions,
 some far fancier.

  I either forgot or missed that thread (I lost quite a bit of data 
and hence memory during my OS upgrade, thanks to a faulty ftp 
server...). Would you have a pointer handy?

 In my view getting this far would be a major step forward.  There are major
 problems in accuracy even at double, let alone long double.
 
 There was also talk of an NIST project but I haven't heard of any progress 
 yet.


  I just checked the DLMf website (http://dlmf.nist.gov/), and it seems
they are moving forward albeit slowly (book and free web document in 
2004). At any rate, that document will not, as I understand, include 
actual implementation in a computer language of the functions  al., so 
we should just go ahead and code, perhaps using existing fortran 
implementations as guidelines (though obviously having the document 
would make the coding *MUCH* easier :-)  ).

 Paul
 
 
  
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Garland
Sent: Tuesday, February 11, 2003 4:19 PM
To: Boost mailing list
Subject: RE: [boost] Any interest in a stats class
   
   
Scott K wrote:
   
 Hi all,
 I have a small family of statistics classes which I have used from 
 time
 to time. The one I use most often is simply called stats.
 Here's an example of it's use:
 ...details snipped...
   
I'm sure there are folks interested in statistical (and other)
functions.  I've developed exactly this sort of class in the
past so I understand the utility.  However, I suspect some of
us would hope statistical algorithms to be formulated as STL
Algorithm extensions.  Specifically concerning statistics see:
   
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
   rithmExtensions/StatisticsAlgorithms
   
and more generally:
   
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
   rithmExtensions
   
We definitely need volunteers to take these rough Wiki musings and
convert them into actual documented libraries.  I'm not sure this
is what you had in mind, but I, for one, would welcome your effort
either way!
   
Jeff
 
 A Bientot
 
  HH

Hubert

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



[boost] Re: Any interest in a stats class

2003-02-14 Thread Hubert Holin
Somewhere in the E.U., le 14/02/2003

   Bonjour

In article [EMAIL PROTECTED],
 Paul A. Bristow [EMAIL PROTECTED] wrote:

 Stats are definitely a must-have for Boost, but as ever, the presentation is 
 not
 so easy to agree upon.

  I agree statistical utilities are a must. As many of us likely do, 
I have a few things I can contribute, which I needed for some past work 
(to work with (multi-dimentional) sequences of values, and with 
densities of distributions).

  There still is the question of whether similarity with NR is a 
problem or not (the language in which the techniques are implemented is 
different, but implementations of the techniques themselves are of 
course basically similar since they refer to the same math construction).

  I am hoping that with uBlas, we can contribute more numerical 
stuff. I have some Gaussian Mixture Models code that I should be 
rewriting in the not too distant future (currently based on an old 
version of TNT, and most of the important pre-processing needed has to 
be done elsewhere, for the then lack of svd).


 But it is also crucial to get the most accurate answer, and be able to prove 
 it.
 For example, B D McCullough, American Statistician Nov 1998 52(4), 358 and 
 1999
 53(2) 149-159 assessed several stats packages, and some came out rather badly 
 -
 you can guess which was worst, by far!
 
 NIST provide some test datasets
 
 http://www.itl.nist.gov/div898/strd/
 
 against which code can be judged (and some naive algorithms fail badly).
 
 Although I can see the benefits of an STL-style, I also have some difficulty 
 in
 imagining how the results returned can be other than reals? Even if we 
 'input'
 integer types, although sum can sensibly also be integer, I have some 
 difficulty
 in seeing how the the mean, variance etc are useful as integer types?
 And to expose the unsuspecting user to the risk of surprise seems unhelpful?
 
 Benefits from STL-style would be most obvious if can be applied to a circular
 buffer into which new data can be fed while stats can be recalculated Kalman
 filter style.
 
 While calculating the mean and variance, it is probably worth calculating the
 higher two skew and kurtosis too.
 
 And of course the median (and some percentiles) are also often more useful 
 than
 the mean.

  My old files provide number_of_samples , max, min, 
first_max_index, first_min_index, mean, median, variance, 
standard_deviation, average_deviation, skewness and kurtosis for 
sequences (where appropriate), number_of_bins, mass, first_mode_value, 
first_mode, mean, median, variance, standard_deviation, 
average_deviation, skewness and kurtosis for deensities (where 
appropriate).

 Finally, there is the unsolved matter of the math functions we still badly 
 need.

  Err, I kind of forgot which ones where requested...

 Confidence intervals are more informative than standard deviations etc.
 
 Paul
 
 Dr Paul A Bristow, hetp Chromatography
 Prizet Farmhouse, Kendal, Cumbria, LA8 8AB  UK
 +44 1539 561830  Mobile +44 7714 33 02 04
 mailto:[EMAIL PROTECTED]
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Garland
  Sent: Tuesday, February 11, 2003 4:19 PM
  To: Boost mailing list
  Subject: RE: [boost] Any interest in a stats class
 
 
  Scott K wrote:
 
   Hi all,
   I have a small family of statistics classes which I have used from time
   to time. The one I use most often is simply called stats.
   Here's an example of it's use:
   ...details snipped...
 
  I'm sure there are folks interested in statistical (and other)
  functions.  I've developed exactly this sort of class in the
  past so I understand the utility.  However, I suspect some of
  us would hope statistical algorithms to be formulated as STL
  Algorithm extensions.  Specifically concerning statistics see:
 
  http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
 rithmExtensions/StatisticsAlgorithms
 
  and more generally:
 
  http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
 rithmExtensions
 
  We definitely need volunteers to take these rough Wiki musings and
  convert them into actual documented libraries.  I'm not sure this
  is what you had in mind, but I, for one, would welcome your effort
  either way!
 
  Jeff

   A Bientot

HH

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



RE: [boost] Re: Any interest in a stats class

2003-02-14 Thread Paul A. Bristow


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Hubert Holin
 Sent: Friday, February 14, 2003 1:25 PM
 To: [EMAIL PROTECTED]
 Subject: [boost] Re: Any interest in a stats class


 Somewhere in the E.U., le 14/02/2003


   There still is the question of whether similarity with NR is a
 problem or not (the language in which the techniques are implemented is
 different, but implementations of the techniques themselves are of
 course basically similar since they refer to the same math construction).

I cannot see this being a serious problem unless we simply lift the NR in C++
code verbatim.  (Most of it is still in old C style for one thing, despite the
recent reissue).


 I am hoping that with uBlas, we can contribute more numerical
 stuff. I have some Gaussian Mixture Models code that I should be
 rewriting in the not too distant future (currently based on an old
 version of TNT, and most of the important pre-processing needed has to
 be done elsewhere, for the then lack of svd).

This would be a most welcome developement. uBLAS seems a good starting point.

   My old files provide number_of_samples , max, min,
 first_max_index, first_min_index, mean, median, variance,
 standard_deviation, average_deviation, skewness and kurtosis for
 sequences (where appropriate), number_of_bins, mass, first_mode_value,
 first_mode, mean, median, variance, standard_deviation,
 average_deviation, skewness and kurtosis for deensities (where
 appropriate).

Sounds a pretty good selection.

  Finally, there is the unsolved matter of the math functions we still badly
  need.

   Err, I kind of forgot which ones where requested...

Well all the items in Stephen Moshier's Cephes collection say.  erf, gamma,
beta, imcomplete, gaussian etc etc.  However, we didn't seem to get far with
agreeing the format for these.  My naive assumption that double erf(double)
style functions would be enough was criticised by those who wanted fancier
solutions,
some far fancier.

In my view getting this far would be a major step forward.  There are major
problems in accuracy even at double, let alone long double.

There was also talk of an NIST project but I haven't heard of any progress yet.

Paul


 
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Garland
   Sent: Tuesday, February 11, 2003 4:19 PM
   To: Boost mailing list
   Subject: RE: [boost] Any interest in a stats class
  
  
   Scott K wrote:
  
Hi all,
I have a small family of statistics classes which I have used from time
to time. The one I use most often is simply called stats.
Here's an example of it's use:
...details snipped...
  
   I'm sure there are folks interested in statistical (and other)
   functions.  I've developed exactly this sort of class in the
   past so I understand the utility.  However, I suspect some of
   us would hope statistical algorithms to be formulated as STL
   Algorithm extensions.  Specifically concerning statistics see:
  
   http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
  rithmExtensions/StatisticsAlgorithms
  
   and more generally:
  
   http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
  rithmExtensions
  
   We definitely need volunteers to take these rough Wiki musings and
   convert them into actual documented libraries.  I'm not sure this
   is what you had in mind, but I, for one, would welcome your effort
   either way!
  
   Jeff

A Bientot

 HH

 ___
 Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



[boost] Re: Any interest in a stats class

2003-02-11 Thread Scott Kirkwood
Well what do you know...
The order_2_accumulator class on that page looks just like my stats
class. I threw in min and max and have more functions, but otherwise
it's the same.

-Scott

Jeff Garland wrote:

...  Specifically concerning statistics see:

http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgorithmExtensions/StatisticsAlgorithms



___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost