-----Original Message-----
* t-test statistic needs to be added and we should probably add the capability of actually performing t- and chi-square tests at fixed significance levels (.1, .05, .01, .001). -- This is virtually done, just need to define a nice, convenient interface for doing one- and two-tailed tests. Thanks to Brent, we can actually support user-supplied significance levels (next item)
Anyone have any thoughts on the interface? I was thinking of an Inference interface that supports the conducting of one- and two-tailed tests as well as constructing their complementary confidence intervals. Or, if we want to separate concerns create both a HypothesisTest and a ConfidenceInterval interface, one for each type of inference. Either way, I would use the tried-and-true abstract factory way of creating inference instances. Comments are welcome.
I have been thinking about this. If I can stop sending emails for long enought to pull the patch together, I am about to submit a patch to BivariateRegression that adds the slope confidence interval computation and significance level, based on the new t-distribution impl (thanks, Brent!). I thought about a generic ConfidenceInterval interface, but then thought that it would be more convenient for users to just return the halfwidth in double getSlopeConfidenceInterval(). To support the goal of testing model significance, I also added getSignificance().
I think the concrete stuff is easier to use and all we need at present. Something like:
boolean twoTailedTTest(Univariate, Univariate,signif) or even
boolean twoTailedTTest(double[],double[],signif)
(obviously adding one-tailed tests and tests against constants as well and tests that return doubles representing minimal p-values, possibly called "significance")
boolean chiSquareTest(expected, observed, signif)
boolean chiSquareTest(Freq, Freq, signif)
To add the abstractions above meaningfully, we need to convince ourselves that either a) multiple implementation strategies might exist -- For parametric tests, this is not the case -- or b) the abstractions will make development of inferential components easier/more manageable. I am not sure about b). In fact, when I think about it I think that there is not much left when you abstract things to a high enough level to represent hypothesis testing and/or confidence intervals generically. I remember math stat students having a hard time understanding the abstract definitions of these concepts. I don't think that it is a good idea to force our users to think about these things. Therefore, I would recommend sticking with concrete implementations defined "close to" the statistical applications.
Keep the user application use cases in mind. If I want to determine whether the diffence in two means is significant, I should be able to do that quickly and intuitively, with one method call either using Univariates or double[]s.
* numerical approximation of the t- and chi-square distributions to enable user-supplied significance levels. See above. Someone just needs to put a fork in this. Tim? Brent?
Done.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]