Author: psteitz
Date: Sun Jun 24 14:10:19 2007
New Revision: 550285

URL: http://svn.apache.org/viewvc?view=rev&rev=550285
Log:
Added two-sample (binned comparison) ChiSquare test
JIRA: MATH-160
Thanks to: Matthias Hummel


Modified:
    
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTest.java
    
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTestImpl.java
    
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/TestUtils.java
    
jakarta/commons/proper/math/trunk/src/test/org/apache/commons/math/stat/inference/ChiSquareTestTest.java
    jakarta/commons/proper/math/trunk/xdocs/changes.xml

Modified: 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTest.java
URL: 
http://svn.apache.org/viewvc/jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTest.java?view=diff&rev=550285&r1=550284&r2=550285
==============================================================================
--- 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTest.java
 (original)
+++ 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTest.java
 Sun Jun 24 14:10:19 2007
@@ -211,4 +211,118 @@
      */
     boolean chiSquareTest(long[][] counts, double alpha) 
     throws IllegalArgumentException, MathException;
+
+    /**
+     * <p>Computes a 
+     * <a 
href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm";>
+     * Chi-Square two sample test statistic</a> comparing bin frequency counts
+     * in <code>observed1</code> and <code>observed2</code>.  The
+     * sums of frequency counts in the two samples are not required to be the
+     * same.  The formula used to compute the test statistic is</p>
+     * <code>
+     * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + 
observed2[i])]
+     * </code> where 
+     * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
+     * </p>
+     * <p>This statistic can be used to perform a Chi-Square test evaluating 
the null hypothesis that
+     * both observed counts follow the same distribution.
+     * <p>
+     * <strong>Preconditions</strong>: <ul>
+     * <li>Observed counts must be non-negative.
+     * </li>
+     * <li>Observed counts for a specific bin must not both be zero.
+     * </li>
+     * <li>Observed counts for a specific sample must not all be 0.
+     * </li>
+     * <li>The arrays <code>observed1</code> and <code>observed2</code> must 
have the same length and
+     * their common length must be at least 2.
+     * </li></ul><p>
+     * If any of the preconditions are not met, an
+     * <code>IllegalArgumentException</code> is thrown.
+     *
+     * @param observed1 array of observed frequency counts of the first data 
set
+     * @param observed2 array of observed frequency counts of the second data 
set
+     * @return chiSquare statistic
+     * @throws IllegalArgumentException if preconditions are not met
+     */
+    double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
+       throws IllegalArgumentException;
+
+    /**
+     * <p>Returns the <i>observed significance level</i>, or <a href=
+     * "http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue";>
+     * p-value</a>, associated with a Chi-Square two sample test comparing
+     * bin frequency counts in <code>observed1</code> and 
+     * <code>observed2</code>.
+     * </p>
+     * <p>The number returned is the smallest significance level at which one
+     * can reject the null hypothesis that the observed counts conform to the
+     * same distribution.
+     * </p>
+     * <p>See [EMAIL PROTECTED] #chiSquareDataSetsComparison(long[], long[]) 
for details
+     * on the formula used to compute the test statistic. The degrees of
+     * of freedom used to perform the test is one less than the common length
+     * of the input observed count arrays.
+     * </p>
+     * <strong>Preconditions</strong>: <ul>
+     * <li>Observed counts must be non-negative.
+     * </li>
+     * <li>Observed counts for a specific bin must not both be zero.
+     * </li>
+     * <li>Observed counts for a specific sample must not all be 0.
+     * </li>
+     * <li>The arrays <code>observed1</code> and <code>observed2</code> must
+     * have the same length and
+     * their common length must be at least 2.
+     * </li></ul><p>
+     * If any of the preconditions are not met, an
+     * <code>IllegalArgumentException</code> is thrown.
+     *
+     * @param observed1 array of observed frequency counts of the first data 
set
+     * @param observed2 array of observed frequency counts of the second data 
set
+     * @return p-value
+     * @throws IllegalArgumentException if preconditions are not met
+     * @throws MathException if an error occurs computing the p-value
+     */
+    double chiSquareTestDataSetsComparison(long[] observed1, long[] observed2)
+       throws IllegalArgumentException, MathException;
+
+    /**
+     * <p>Performs a Chi-Square two sample test comparing two binned data
+     * sets. The test evaluates the null hypothesis that the two lists of
+     * observed counts conform to the same frequency distribution, with
+     * significance level <code>alpha</code>.  Returns true iff the null
+     * hypothesis can be rejected with 100 * (1 - alpha) percent confidence.
+     * </p>
+     * <p>See [EMAIL PROTECTED] #chiSquareDataSetsComparison(double[], 
double[])} for 
+     * details on the forumla used to compute the Chisquare statistic used
+     * in the test. The degrees of of freedom used to perform the test is
+     * one less than the common length of the input observed count arrays.
+     * </p>
+     * <strong>Preconditions</strong>: <ul>
+     * <li>Observed counts must be non-negative.
+     * </li>
+     * <li>Observed counts for a specific bin must not both be zero.
+     * </li>
+     * <li>Observed counts for a specific sample must not all be 0.
+     * </li>
+     * <li>The arrays <code>observed1</code> and <code>observed2</code> must
+     * have the same length and their common length must be at least 2.
+     * </li>
+     * <li> <code> 0 < alpha < 0.5 </code>
+     * </li></ul><p>
+     * If any of the preconditions are not met, an
+     * <code>IllegalArgumentException</code> is thrown.
+     *
+     * @param observed1 array of observed frequency counts of the first data 
set
+     * @param observed2 array of observed frequency counts of the second data 
set
+     * @param alpha significance level of the test
+     * @return true iff null hypothesis can be rejected with confidence
+     * 1 - alpha
+     * @throws IllegalArgumentException if preconditions are not met
+     * @throws MathException if an error occurs performing the test
+     */
+    boolean chiSquareTestDataSetsComparison(long[] observed1, long[] 
observed2, double alpha)
+       throws IllegalArgumentException, MathException;
+
 }

Modified: 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTestImpl.java
URL: 
http://svn.apache.org/viewvc/jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTestImpl.java?view=diff&rev=550285&r1=550284&r2=550285
==============================================================================
--- 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTestImpl.java
 (original)
+++ 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/ChiSquareTestImpl.java
 Sun Jun 24 14:10:19 2007
@@ -173,6 +173,99 @@
     }
     
     /**
+     * @param observed1 array of observed frequency counts of the first data 
set
+     * @param observed2 array of observed frequency counts of the second data 
set
+     * @return chi-square test statistic
+     * @throws IllegalArgumentException if preconditions are not met
+     */
+    public double chiSquareDataSetsComparison(long[] observed1, long[] 
observed2)
+        throws IllegalArgumentException {
+        
+        // Make sure lengths are same
+        if ((observed1.length < 2) || (observed1.length != observed2.length)) {
+            throw new IllegalArgumentException(
+                    "oberved1, observed2 array lengths incorrect");
+        }
+        // Ensure non-negative counts
+        if (!isNonNegative(observed1) || !isNonNegative(observed2)) {
+            throw new IllegalArgumentException(
+                "observed counts must be non-negative");
+        }
+        // Compute and compare count sums
+        long countSum1 = 0;
+        long countSum2 = 0;
+        boolean unequalCounts = false;
+        double weight = 0.0;
+        for (int i = 0; i < observed1.length; i++) {
+            countSum1 += observed1[i];
+            countSum2 += observed2[i];   
+        }
+        // Ensure neither sample is uniformly 0
+        if (countSum1 * countSum2 == 0) {
+            throw new IllegalArgumentException(
+             "observed counts cannot all be 0"); 
+        }
+        // Compare and compute weight only if different
+        unequalCounts = (countSum1 != countSum2);
+        if (unequalCounts) {
+            weight = Math.sqrt((double) countSum1 / (double) countSum2);
+        }
+        // Compute ChiSquare statistic
+        double sumSq = 0.0d;
+        double dev = 0.0d;
+        double obs1 = 0.0d;
+        double obs2 = 0.0d;
+        for (int i = 0; i < observed1.length; i++) {
+            if (observed1[i] == 0 && observed2[i] == 0) {
+                throw new IllegalArgumentException(
+                        "observed counts must not both be zero");
+            } else {
+                obs1 = (double) observed1[i];
+                obs2 = (double) observed2[i];
+                if (unequalCounts) { // apply weights
+                    dev = obs1/weight - obs2 * weight;
+                } else {
+                    dev = obs1 - obs2;
+                }
+                sumSq += (dev * dev) / (obs1 + obs2);
+            }
+        }
+        return sumSq;
+    }
+
+    /**
+     * @param observed1 array of observed frequency counts of the first data 
set
+     * @param observed2 array of observed frequency counts of the second data 
set
+     * @return p-value
+     * @throws IllegalArgumentException if preconditions are not met
+     * @throws MathException if an error occurs computing the p-value
+     */
+    public double chiSquareTestDataSetsComparison(long[] observed1, long[] 
observed2)
+        throws IllegalArgumentException, MathException {
+        distribution.setDegreesOfFreedom((double) observed1.length - 1);
+        return 1 - distribution.cumulativeProbability(
+                chiSquareDataSetsComparison(observed1, observed2));
+    }
+
+    /**
+     * @param observed1 array of observed frequency counts of the first data 
set
+     * @param observed2 array of observed frequency counts of the second data 
set
+     * @param alpha significance level of the test
+     * @return true iff null hypothesis can be rejected with confidence
+     * 1 - alpha
+     * @throws IllegalArgumentException if preconditions are not met
+     * @throws MathException if an error occurs performing the test
+     */
+    public boolean chiSquareTestDataSetsComparison(long[] observed1, long[] 
observed2,
+            double alpha) throws IllegalArgumentException, MathException {
+        if ((alpha <= 0) || (alpha > 0.5)) {
+            throw new IllegalArgumentException(
+                    "bad significance level: " + alpha);
+        }
+        return (chiSquareTestDataSetsComparison(observed1, observed2) < alpha);
+    }
+
+    /**
      * Checks to make sure that the input long[][] array is rectangular,
      * has at least 2 rows and 2 columns, and has all non-negative entries,
      * throwing IllegalArgumentException if any of these checks fail.
@@ -281,10 +374,12 @@
         }
         return true;
     }
-    
+ 
     /**
      * Modify the distribution used to compute inference statistics.
-     * @param value the new distribution
+     * 
+     * @param value
+     *            the new distribution
      * @since 1.2
      */
     public void setDistribution(ChiSquaredDistribution value) {

Modified: 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/TestUtils.java
URL: 
http://svn.apache.org/viewvc/jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/TestUtils.java?view=diff&rev=550285&r1=550284&r2=550285
==============================================================================
--- 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/TestUtils.java
 (original)
+++ 
jakarta/commons/proper/math/trunk/src/java/org/apache/commons/math/stat/inference/TestUtils.java
 Sun Jun 24 14:10:19 2007
@@ -276,4 +276,31 @@
         return chiSquareTest. chiSquareTest(counts);
     }
 
+    /**
+     * @see 
org.apache.commons.math.stat.inference.ChiSquareTest#chiSquareDataSetsComparison(double[],
 double[])
+     */
+    public static double chiSquareDataSetsComparison(long[] observed1, long[] 
observed2)
+        throws IllegalArgumentException {
+        return chiSquareTest.chiSquareDataSetsComparison(observed1, observed2);
+    }
+
+    /**
+     * @see 
org.apache.commons.math.stat.inference.ChiSquareTest#chiSquareTestDataSetsComparison(double[],
 double[])
+     */
+    public static double chiSquareTestDataSetsComparison(long[] observed1, 
long[] observed2)
+        throws IllegalArgumentException, MathException {
+        return chiSquareTest.chiSquareTestDataSetsComparison(observed1, 
observed2);
+    }
+
+
+    /**
+     * @see 
org.apache.commons.math.stat.inference.ChiSquareTest#chiSquareTestDataSetsComparison(double[],
 double[], double)
+     */
+    public static boolean chiSquareTestDataSetsComparison(long[] observed1, 
long[] observed2,
+        double alpha)
+        throws IllegalArgumentException, MathException {
+        return chiSquareTest.chiSquareTestDataSetsComparison(observed1, 
observed2, alpha);
+    }
+
+
 }

Modified: 
jakarta/commons/proper/math/trunk/src/test/org/apache/commons/math/stat/inference/ChiSquareTestTest.java
URL: 
http://svn.apache.org/viewvc/jakarta/commons/proper/math/trunk/src/test/org/apache/commons/math/stat/inference/ChiSquareTestTest.java?view=diff&rev=550285&r1=550284&r2=550285
==============================================================================
--- 
jakarta/commons/proper/math/trunk/src/test/org/apache/commons/math/stat/inference/ChiSquareTestTest.java
 (original)
+++ 
jakarta/commons/proper/math/trunk/src/test/org/apache/commons/math/stat/inference/ChiSquareTestTest.java
 Sun Jun 24 14:10:19 2007
@@ -193,4 +193,70 @@
         assertEquals("chi-square p-value", 0.0462835770603,
                 testStatistic.chiSquareTest(counts), 1E-9);       
     }
+    
+    /** Target values verified using DATAPLOT version 2006.3 */
+    public void testChiSquareDataSetsComparisonEqualCounts()
+    throws Exception {
+        long[] observed1 = {10, 12, 12, 10};
+        long[] observed2 = {5, 15, 14, 10};    
+        assertEquals("chi-square p value", 0.541096, 
+                testStatistic.chiSquareTestDataSetsComparison(
+                observed1, observed2), 1E-6);
+        assertEquals("chi-square test statistic", 2.153846,
+                testStatistic.chiSquareDataSetsComparison(
+                observed1, observed2), 1E-6);
+        assertFalse("chi-square test result", 
+                testStatistic.chiSquareTestDataSetsComparison(
+                observed1, observed2, 0.4));
+    }
+    
+    /** Target values verified using DATAPLOT version 2006.3 */
+    public void testChiSquareDataSetsComparisonUnEqualCounts()
+    throws Exception {
+        long[] observed1 = {10, 12, 12, 10, 15};
+        long[] observed2 = {15, 10, 10, 15, 5};    
+        assertEquals("chi-square p value", 0.124115, 
+                testStatistic.chiSquareTestDataSetsComparison(
+                observed1, observed2), 1E-6);
+        assertEquals("chi-square test statistic", 7.232189,
+                testStatistic.chiSquareDataSetsComparison(
+                observed1, observed2), 1E-6);
+        assertTrue("chi-square test result", 
+                testStatistic.chiSquareTestDataSetsComparison(
+                observed1, observed2, 0.13));
+        assertFalse("chi-square test result", 
+                testStatistic.chiSquareTestDataSetsComparison(
+                observed1, observed2, 0.12));
+    }
+    
+    public void testChiSquareDataSetsComparisonBadCounts()
+    throws Exception {
+        long[] observed1 = {10, -1, 12, 10, 15};
+        long[] observed2 = {15, 10, 10, 15, 5};
+        try {
+            testStatistic.chiSquareTestDataSetsComparison(
+                    observed1, observed2);
+            fail("Expecting IllegalArgumentException - negative count");
+        } catch (IllegalArgumentException ex) {
+            // expected
+        }
+        long[] observed3 = {10, 0, 12, 10, 15};
+        long[] observed4 = {15, 0, 10, 15, 5};
+        try {
+            testStatistic.chiSquareTestDataSetsComparison(
+                    observed3, observed4);
+            fail("Expecting IllegalArgumentException - double 0's");
+        } catch (IllegalArgumentException ex) {
+            // expected
+        }
+        long[] observed5 = {10, 10, 12, 10, 15};
+        long[] observed6 = {0, 0, 0, 0, 0};
+        try {
+            testStatistic.chiSquareTestDataSetsComparison(
+                    observed5, observed6);
+            fail("Expecting IllegalArgumentException - vanishing counts");
+        } catch (IllegalArgumentException ex) {
+            // expected
+        }
+    }
 }

Modified: jakarta/commons/proper/math/trunk/xdocs/changes.xml
URL: 
http://svn.apache.org/viewvc/jakarta/commons/proper/math/trunk/xdocs/changes.xml?view=diff&rev=550285&r1=550284&r2=550285
==============================================================================
--- jakarta/commons/proper/math/trunk/xdocs/changes.xml (original)
+++ jakarta/commons/proper/math/trunk/xdocs/changes.xml Sun Jun 24 14:10:19 2007
@@ -84,6 +84,9 @@
       <action dev="psteitz" type="update" issue="MATH-158" due-to "Hasan 
Diwan">
         Added log function to MathUtils.
       </action>
+      <action dev="psteitz" type="update" issue="MATH-160" due-to "Matthias 
Hummel">
+        Added two sample (binned comparison) ChiSquare test.
+      </action>
     </release>
     <release version="1.1" date="2005-12-17"  
  description="This is a maintenance release containing bug fixes and 
enhancements.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to