stat CertifiedDataTest.java

mdiggory Mon, 16 Jun 2003 07:29:18 -0700

mdiggory    2003/06/16 07:29:31

  Modified:    math/xdocs developers.xml
               math/src/java/org/apache/commons/math/stat
                        UnivariateImpl.java
               math/src/test/org/apache/commons/math/stat
                        CertifiedDataTest.java
  Log:
  PR: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20782
  Submitted by: [EMAIL PROTECTED]
  
  I added this, but there are changes I'd like to make in the near future. Only the 
"running" aspects of the variance calc should be in the insertValue function, all 
other calculation should be in the getVariance function.
  
  Revision  Changes    Path
  1.5       +44 -40    jakarta-commons-sandbox/math/xdocs/developers.xml
  
  Index: developers.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-commons-sandbox/math/xdocs/developers.xml,v
  retrieving revision 1.4
  retrieving revision 1.5
  diff -u -r1.4 -r1.5
  --- developers.xml    4 Jun 2003 02:40:26 -0000       1.4
  +++ developers.xml    16 Jun 2003 14:29:30 -0000      1.5
  @@ -9,87 +9,87 @@
    <body>
     <section name="Aims">
      <p>
  -    Creating and maintaining a mathematical and statistical library that is 
  -    accurate requires a greater degree of communication than might be the 
  -    case for other components. It is important that developers follow 
  -    guidelines laid down by the community to ensure that the code they create 
  +    Creating and maintaining a mathematical and statistical library that is
  +    accurate requires a greater degree of communication than might be the
  +    case for other components. It is important that developers follow
  +    guidelines laid down by the community to ensure that the code they create
       can be successfully maintained by others.
       </p>
      </section>
      <section name='Guidelines'>
      <p>
  -   Developers are asked to comply with the following development guidelines. 
  +   Developers are asked to comply with the following development guidelines.
      Code that does not comply with the guidelines including the word <i>must</i>
      will not be committed.  Our aim will be to fix all of the exceptions to the
      "<i>should</i>" guidelines prior to a release.
      </p>
      <subsection name='Coding Style'>
       <p>
  -     Commons-math follows <a href="http://java.sun.com/docs/codeconv/";>Code 
  +     Commons-math follows <a href="http://java.sun.com/docs/codeconv/";>Code
        Conventions for the Java Programming Language</a>. As part of the maven
        build process, style checking is performed using the checkStyle plugin,
  -     using the properties specified in <code>checkStyle.properties</code>.  
  +     using the properties specified in <code>checkStyle.properties</code>.
        Committed code <i>should</i> generate no checkStyle errors.
       </p>
      </subsection>
  -   <subsection name='Documentation'>  
  +   <subsection name='Documentation'>
       <ul>
        <li>
  -      Committed code <i>must</i> include full javadoc.</li> 
  +      Committed code <i>must</i> include full javadoc.</li>
        <li>
  -      All component contracts <i>must</i> be fully specified in the javadoc class, 
  -      interface or method comments, including specification of acceptable ranges 
  -      of values, exceptions or special return values.</li>      
  +      All component contracts <i>must</i> be fully specified in the javadoc class,
  +      interface or method comments, including specification of acceptable ranges
  +      of values, exceptions or special return values.</li>
        <li>
  -      References to definitions for all mathematical 
  +      References to definitions for all mathematical
         terms used in component documentation <i>must</i> be provided, preferably
         as HTML links.</li>
        <li>
  -      Implementations <i>should</i> use standard algorithms and 
  +      Implementations <i>should</i> use standard algorithms and
         references to algorithm descriptions <i>should</i> be provided,
         preferably as HTML links.</li>
       </ul>
  -   </subsection>  
  -   <subsection name='Unit Tests'>  
  +   </subsection>
  +   <subsection name='Unit Tests'>
       <ul>
        <li>
  -      Committed code <i>must</i> include unit tests.</li> 
  +      Committed code <i>must</i> include unit tests.</li>
        <li>
  -      Unit tests <i>should</i> provide full path coverage. </li>     
  +      Unit tests <i>should</i> provide full path coverage. </li>
        <li>
  -      Unit tests <i>should</i> verify all boundary conditions specified in 
  +      Unit tests <i>should</i> verify all boundary conditions specified in
         interface contracts, including verification that exceptions are thrown or
  -      special values (e.g. Double.NaN, Double.Infinity) are returned as 
  +      special values (e.g. Double.NaN, Double.Infinity) are returned as
         expected. </li>
  -    </ul> 
  +    </ul>
      </subsection>
      <subsection name='Licensing and copyright'>
       <ul>
        <li>
  -      All new source file submissions <i>must</i> include the Apache Software 
  +      All new source file submissions <i>must</i> include the Apache Software
         License in a comment that begins the file </li>
        <li>
  -      All contributions must comply with the terms of the 
  +      All contributions must comply with the terms of the
         <a href="http://www.apache.org/foundation/ASF_Contributor_License_1_form.pdf";>
         Apache Contributor License Agreement (CLA)</a></li>
  -     <li> 
  -       Patches <i>must</i> be accompanied by a clear reference to a "source" 
  -       - if code has been "ported" from another language, clearly state the 
  +     <li>
  +       Patches <i>must</i> be accompanied by a clear reference to a "source"
  +       - if code has been "ported" from another language, clearly state the
          source of the original implementation.  If the "expression" of a given
          algorithm is derivative, please note the original source (textbook,
  -       paper, etc.).</li> 
  +       paper, etc.).</li>
        <li>
          References to source materials covered by restrictive proprietary
          licenses should be avoided.</li>
       </ul>
  -   </subsection>            
  -  </section> 
  +   </subsection>
  +  </section>
     <section name='Recommended Readings'>
      <p>
       Here is a list of relevant materials.  Much of the discussion surrounding
  -    the development of this component will refer to the various sources 
  -    listed below, and frequently the Javadoc for a particular class or 
  -    interface will link to a definition contained in these documents. 
  +    the development of this component will refer to the various sources
  +    listed below, and frequently the Javadoc for a particular class or
  +    interface will link to a definition contained in these documents.
      </p>
      <subsection name='Recommended Readings'>
       <dl>
  @@ -104,21 +104,21 @@
        </dd>
        <dt>Numerical analysis</dt>
        <dd>
  -      <a href="http://www.nr.com/ ">
  +      <a href="http://www.nr.com/";>
          Numerical Recipes (NR)
         </a><br/>
         <a href="http://www.mathcom.com/corpdir/techinfo.mdir/scifaq/index.html";>
  -       mathcom scientific computing FAQ
  +        Scientific Computing FAQ @ Mathcom
         </a><br/>
         <a href="http://www.ma.man.ac.uk/~higham/asna/asna2.pdf";>
          Bibliography of accuracy and stability of numerical algorithms
         </a><br/>
          <a href="http://tonic.physics.sunysb.edu/docs/num_meth.html";>
  -       SUNY StonyBrook numerical methods page
  +       SUNY Stony Brook numerical methods page
         </a><br/>
          <a href="http://epubs.siam.org/sam-bin/dbq/toclist/SINUM";>
          SIAM Journal of Numerical Analysis Online
  -      </a><br/>    
  +      </a><br/>
        </dd>
        <dt>Probability and statistics</dt>
        <dd>
  @@ -132,10 +132,10 @@
          Online Introductory Statistics (David W. Stockburger)
         </a><br/>
          <a href="http://www.ubmail.ubalt.edu/~harsham/statistics/REFSTAT.HTM";>
  -       Probablilty and Statistics Resources
  +       Probablility and Statistics Resources
         </a><br/>
          <a href="http://www.jstatsoft.org/";>
  -       Online journal of statistical software
  +       Online Journal of Statistical Software
         </a><br/>
        </dd>
       </dl>
  @@ -147,11 +147,15 @@
         <a href="http://rd11.web.cern.ch/RD11/rkb/titleA.html";>
          http://rd11.web.cern.ch/RD11/rkb/titleA.html
         </a><br/>
  -      <a href="http://mathworld.wolfram.com";>
  -       http://mathworld.wolfram.com
  +      <a href="http://mathworld.wolfram.com/";>
  +       http://mathworld.wolfram.com/
         </a><br/>
         <a href="http://www.itl.nist.gov/div898/handbook";>
          http://www.itl.nist.gov/div898/handbook
  +      </a><br/>
  +      <a href="http://doi.acm.org/10.1145/359146.359152";>
  +       Chan, T. F. and J. G. Lewis 1979, <i>Communications of the ACM</i>,
  +       vol. 22 no. 9, pp. 526-531.
         </a><br/>
         <a href="http://www.itl.nist.gov/div898/handbook";>
          http://www.wikipedia.org/wiki/
  
  
  
  1.5       +95 -80    
jakarta-commons-sandbox/math/src/java/org/apache/commons/math/stat/UnivariateImpl.java
  
  Index: UnivariateImpl.java
  ===================================================================
  RCS file: 
/home/cvs/jakarta-commons-sandbox/math/src/java/org/apache/commons/math/stat/UnivariateImpl.java,v
  retrieving revision 1.4
  retrieving revision 1.5
  diff -u -r1.4 -r1.5
  --- UnivariateImpl.java       14 Jun 2003 04:17:49 -0000      1.4
  +++ UnivariateImpl.java       16 Jun 2003 14:29:30 -0000      1.5
  @@ -60,41 +60,30 @@
   
   /**
    *
  - * Accumulates univariate statistics for values fed in 
  + * Accumulates univariate statistics for values fed in
    * through the addValue() method.  Does not store raw data values.
    * All data are represented internally as doubles.
  - * Integers, floats and longs can be added, but will be converted
  - * to doubles by addValue().  
  + * Integers, floats and longs can be added, but they will be converted
  + * to doubles by addValue().
    *
    * @author Phil Steitz
    * @author <a href="mailto:[EMAIL PROTECTED]">Tim O'Brien</a>
    * @author <a href="mailto:[EMAIL PROTECTED]">Mark Diggory</a>
    * @author Brent Worden
  + * @author <a href="mailto:[EMAIL PROTECTED]">Albert Davidson Chou</a>
    * @version $Revision$ $Date$
  - * 
  + *
   */
   public class UnivariateImpl implements Univariate, Serializable {
   
       /** hold the window size **/
       private int windowSize = Univariate.INFINITE_WINDOW;
   
  -    /** Just in case, the windowSize is not inifinite, we need to
  +    /** Just in case the windowSize is not infinite, we need to
        *  keep an array to remember values 0 to N
        */
       private DoubleArray doubleArray;
   
  -    /** running sum of values that have been added */
  -    private double sum = 0.0;
  -
  -    /** running sum of squares that have been added */
  -    private double sumsq = 0.0;
  -
  -    /** running sum of 3rd powers that have been added */
  -    private double sumCube = 0.0;
  -    
  -    /** running sum of 4th powers that have been added */
  -    private double sumQuad = 0.0;
  -    
       /** count of values that have been added */
       private int n = 0;
   
  @@ -107,18 +96,38 @@
       /** product of values that have been added */
       private double product = Double.NaN;
   
  -    /** Creates new univariate with an inifinite window */
  +    /** mean of values that have been added */
  +    private double mean = Double.NaN ;
  +
  +    /** running ( variance * (n - 1) ) of values that have been added */
  +    private double pre_variance = Double.NaN ;
  +
  +    /** variance of values that have been added */
  +    private double variance = Double.NaN ;
  +
  +    /** running sum of values that have been added */
  +    private double sum = 0.0;
  +
  +    /** running sum of squares that have been added */
  +    private double sumsq = 0.0;
  +
  +    /** running sum of 3rd powers that have been added */
  +    private double sumCube = 0.0;
  +
  +    /** running sum of 4th powers that have been added */
  +    private double sumQuad = 0.0;
  +
  +    /** Creates new univariate with an infinite window */
       public UnivariateImpl() {
           clear();
       }
  -    
  +
       /** Creates a new univariate with a fixed window **/
       public UnivariateImpl(int window) {
           windowSize = window;
           doubleArray = new FixedDoubleArray( window );
       }
   
  -     
       /**
        * @see org.apache.commons.math.stat.Univariate#addValue(double)
        */
  @@ -126,25 +135,19 @@
           insertValue(v);
       }
   
  -    
       /**
        * @see org.apache.commons.math.stat.Univariate#getMean()
        */
       public double getMean() {
  -        if (n == 0) {
  -            return Double.NaN;
  -        } else {
  -            return (sum / (double) n );
  -        }
  -     }
  +        return mean ;
  +    }
   
  -     
       /**
        * @see org.apache.commons.math.stat.Univariate#getGeometricMean()
        */
       public double getGeometricMean() {
           if ((product <= 0.0) || (n == 0)) {
  -            return Double.NaN; 
  +            return Double.NaN;
           } else {
               return Math.pow(product,( 1.0 / (double) n ) );
           }
  @@ -162,76 +165,71 @@
        */
       public double getStandardDeviation() {
           double variance = getVariance();
  +
           if ((variance == 0.0) || (variance == Double.NaN)) {
               return variance;
           } else {
               return Math.sqrt(variance);
           }
       }
  -    
  +
       /**
  -     * Returns the variance of the values that have been added as described by
  -     * <a href="http://mathworld.wolfram.com/k-Statistic.html";>Equation (5) for 
k-Statistics</a>.
  -     * 
  +     * Returns the variance of the values that have been added via West's
  +     * algorithm as described by
  +     * <a href="http://doi.acm.org/10.1145/359146.359152";>Chan, T. F. and
  +     * J. G. Lewis 1979, <i>Communications of the ACM</i>,
  +     * vol. 22 no. 9, pp. 526-531.</a>.
  +     *
        * @return The variance of a set of values.  Double.NaN is returned for
        *         an empty set of values and 0.0 is returned for a &lt;= 1 value set.
        */
       public double getVariance() {
  -        double variance = Double.NaN;
  -
  -        if( n == 1 ) {
  -            variance = 0.0;
  -        } else if( n > 1 ) {
  -            variance = (((double) n) * sumsq - (sum * sum)) / (double) (n * (n - 
1));    
  -        }
  -
  -        return variance < 0 ? 0.0 : variance;
  +        return variance ;
       }
  -     
  +
       /**
        * Returns the skewness of the values that have been added as described by
        * <a href="http://mathworld.wolfram.com/k-Statistic.html";>Equation (6) for 
k-Statistics</a>.
  -     * 
  +     *
        * @return The skew of a set of values.  Double.NaN is returned for
        *         an empty set of values and 0.0 is returned for a &lt;= 2 value set.
        */
       public double getSkewness() {
  -        
  +
           if( n < 1) return Double.NaN;
  -        if( n <= 2 ) return 0.0;                  
  -            
  -        return ( 2 * Math.pow(sum, 3) - 3 * sum * sumsq + ((double) (n * n)) * 
sumCube ) / 
  -               ( (double) (n * (n - 1) * (n - 2)) ) ;  
  +        if( n <= 2 ) return 0.0;
  +
  +        return ( 2 * Math.pow(sum, 3) - 3 * sum * sumsq + ((double) (n * n)) * 
sumCube ) /
  +               ( (double) (n * (n - 1) * (n - 2)) ) ;
       }
  -    
  +
       /**
        * Returns the kurtosis of the values that have been added as described by
        * <a href="http://mathworld.wolfram.com/k-Statistic.html";>Equation (7) for 
k-Statistics</a>.
  -     * 
  +     *
        * @return The kurtosis of a set of values.  Double.NaN is returned for
        *         an empty set of values and 0.0 is returned for a &lt;= 3 value set.
        */
       public double getKurtosis() {
  -        
  +
           if( n < 1) return Double.NaN;
           if( n <= 3 ) return 0.0;
  -        
  +
           double x1 = -6 * Math.pow(sum, 4);
           double x2 = 12 * ((double) n) * Math.pow(sum, 2) * sumsq;
           double x3 = -3 * ((double) (n * (n - 1))) * Math.pow(sumsq,2);
           double x4 = -4 * ((double) (n * (n + 1))) * sum * sumCube;
           double x5 = Math.pow(((double) n),2) * ((double) (n+1)) * sumQuad;
  -        
  -        return (x1 + x2 + x3 + x4 + x5) / 
  +
  +        return (x1 + x2 + x3 + x4 + x5) /
                  ( (double) (n * (n - 1) * (n - 2) * (n - 3)) );
  -    } 
  -    
  +    }
  +
       /**
        * Called in "addValue" to insert a new value into the statistic.
        * @param v The value to be added.
        */
       private void insertValue(double v) {
  -
           // The default value of product is NaN, if you
           // try to retrieve the product for a univariate with
           // no values, we return NaN.
  @@ -239,8 +237,14 @@
           // If this is the first call to insertValue, we want
           // to set product to 1.0, so that our first element
           // is not "cancelled" out by the NaN.
  +        //
  +        // For the first value added, the mean is that value,
  +        // and the variance is zero.
           if( n == 0 ) {
  -            product = 1.0;
  +            product = 1.0 ;
  +            mean = v ;
  +            pre_variance = 0.0 ;
  +            variance = 0.0 ;
           }
   
           if( windowSize != Univariate.INFINITE_WINDOW ) {
  @@ -251,17 +255,17 @@
                   sum -= discarded;
                   sumsq -= discarded * discarded;
                   sumCube -= Math.pow(discarded, 3);
  -                sumQuad -= Math.pow(discarded, 4); 
  -                
  +                sumQuad -= Math.pow(discarded, 4);
  +
                   if(discarded == min) {
                       min = doubleArray.getMin();
                   } else if(discarded == max){
                       max = doubleArray.getMax();
  -                } 
  -                
  +                }
  +
                   if(product != 0.0){
                       // can safely remove discarded value
  -                    product *=  v / discarded;
  +                    product *= v / discarded;
                   } else if(discarded == 0.0){
                       // need to recompute product
                       product = 1.0;
  @@ -272,8 +276,8 @@
                   } // else product = 0 and will still be 0 after discard
   
               } else {
  -                doubleArray.addElement( v );            
  -                n += 1.0;
  +                doubleArray.addElement( v );
  +                n += 1 ;
                   if (v < min) {
                       min = v;
                   }
  @@ -283,19 +287,28 @@
                   product *= v;
               }
           } else {
  -            // If the windowSize is inifinite please don't take the time to
  +            // If the windowSize is infinite please don't take the time to
               // worry about storing any values.  We don't need to discard the
               // influence of any single item.
  -            n += 1.0;
  +            n += 1 ;
               if (v < min) {
                   min = v;
  -            } 
  +            }
               if (v > max) {
                   max = v;
  -            } 
  +            }
               product *= v;
  +
  +            if ( n > 1 )
  +            {
  +                double deviationFromMean = v - mean ;
  +                double deviationFromMean_overN = deviationFromMean / n ;
  +                mean += deviationFromMean_overN ;
  +                pre_variance += (n - 1) * deviationFromMean * 
deviationFromMean_overN ;
  +                variance = pre_variance / (n - 1) ;
  +            }
           }
  -        
  +
           sum += v;
           sumsq += v * v;
           sumCube += Math.pow(v,3);
  @@ -306,7 +319,7 @@
        * @return Value of property max.
        */
       public double getMax() {
  -        if (n == 0) { 
  +        if (n == 0) {
               return Double.NaN;
           } else {
               return max;
  @@ -317,7 +330,7 @@
        * @return Value of property min.
        */
       public double getMin() {
  -        if (n == 0) { 
  +        if (n == 0) {
               return Double.NaN;
           } else {
               return min;
  @@ -351,16 +364,16 @@
       public double getSumCube() {
           return sumCube;
       }
  -    
  +
       /** Getter for property sumQuad.
        * @return Value of property sumQuad.
        */
       public double getSumQuad() {
           return sumQuad;
       }
  -    
  +
       /**
  -     * Generates a text report displaying 
  +     * Generates a text report displaying
        * univariate statistics from values that
        * have been added.
        * @return String with line feeds displaying statistics
  @@ -377,9 +390,9 @@
           outBuffer.append("kurtosis: " + getKurtosis() + "\n");
           return outBuffer.toString();
       }
  -    
  -    /** 
  -     * Resets all sums to 0, resets min and max 
  +
  +    /**
  +     * Resets all sums, product, mean, and variance to 0; resets min and max.
        */
       public void clear() {
           this.sum = this.sumsq = this.sumCube = this.sumQuad = 0.0;
  @@ -387,6 +400,8 @@
           this.min = Double.MAX_VALUE;
           this.max = Double.MIN_VALUE;
           this.product = Double.NaN;
  +        this.mean = Double.NaN ;
  +        this.variance = this.pre_variance = Double.NaN ;
       }
   
       /* (non-Javadoc)
  
  
  
  1.6       +5 -5      
jakarta-commons-sandbox/math/src/test/org/apache/commons/math/stat/CertifiedDataTest.java
  
  Index: CertifiedDataTest.java
  ===================================================================
  RCS file: 
/home/cvs/jakarta-commons-sandbox/math/src/test/org/apache/commons/math/stat/CertifiedDataTest.java,v
  retrieving revision 1.5
  retrieving revision 1.6
  diff -u -r1.5 -r1.6
  --- CertifiedDataTest.java    4 Jun 2003 04:03:55 -0000       1.5
  +++ CertifiedDataTest.java    16 Jun 2003 14:29:30 -0000      1.6
  @@ -118,8 +118,8 @@
                assertEquals("Lottery: mean", mean, u.getMean(), .000000000001);       
 
                
                loadStats("data/PiDigits.txt");
  -             assertEquals("PiDigits: std", std, u.getStandardDeviation(), 
.00000000000001);
  -             assertEquals("PiDigits: mean", mean, u.getMean(), .00000000000001);    
 
  +             assertEquals("PiDigits: std", std, u.getStandardDeviation(), 
.0000000000001);
  +             assertEquals("PiDigits: mean", mean, u.getMean(), .0000000000001);     
 
   
                loadStats("data/Mavro.txt");
                assertEquals("Mavro: std", std, u.getStandardDeviation(), 
.00000000000001);
  @@ -154,8 +154,8 @@
                assertEquals("Lottery: mean", mean, u.getMean(), .000000000001);       
         
                                                                                       
                                           
                loadStats("data/PiDigits.txt");
  -             assertEquals("PiDigits: std", std, u.getStandardDeviation(), 
.00000000000001);
  -             assertEquals("PiDigits: mean", mean, u.getMean(), .00000000000001);
  +             assertEquals("PiDigits: std", std, u.getStandardDeviation(), 
.0000000000001);
  +             assertEquals("PiDigits: mean", mean, u.getMean(), .0000000000001);
                
                loadStats("data/Mavro.txt");
                assertEquals("Mavro: std", std, u.getStandardDeviation(), 
.00000000000001);


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: jakarta-commons-sandbox/math/src/test/org/apache/commons/math/stat CertifiedDataTest.java

Reply via email to