Brent Worden wrote:
-----Original Message-----

* t-test statistic needs to be added and we should probably add
the capability
of actually performing t- and chi-square tests at fixed
significance levels
(.1, .05, .01, .001). -- This is virtually done, just need to
define a nice,
convenient interface for doing one- and two-tailed tests.  Thanks
to Brent, we
can actually support user-supplied significance levels (next item)


Anyone have any thoughts on the interface?  I was thinking of an Inference
interface that supports the conducting of one- and two-tailed tests as well
as constructing their complementary confidence intervals.  Or, if we want to
separate concerns create both a HypothesisTest and a ConfidenceInterval
interface, one for each type of inference.  Either way, I would use the
tried-and-true abstract factory way of creating inference instances.
Comments are welcome.

I have been thinking about this. If I can stop sending emails for long enought to pull the patch together, I am about to submit a patch to BivariateRegression that adds the slope confidence interval computation and significance level, based on the new t-distribution impl (thanks, Brent!). I thought about a generic ConfidenceInterval interface, but then thought that it would be more convenient for users to just return the halfwidth in double getSlopeConfidenceInterval(). To support the goal of testing model significance, I also added getSignificance().


I think the concrete stuff is easier to use and all we need at present. Something like:

boolean twoTailedTTest(Univariate, Univariate,signif) or even
boolean twoTailedTTest(double[],double[],signif)
(obviously adding one-tailed tests and tests against constants as well and tests that return doubles representing minimal p-values, possibly called "significance")
boolean chiSquareTest(expected, observed, signif)
boolean chiSquareTest(Freq, Freq, signif)


To add the abstractions above meaningfully, we need to convince ourselves that either a) multiple implementation strategies might exist -- For parametric tests, this is not the case -- or b) the abstractions will make development of inferential components easier/more manageable. I am not sure about b). In fact, when I think about it I think that there is not much left when you abstract things to a high enough level to represent hypothesis testing and/or confidence intervals generically. I remember math stat students having a hard time understanding the abstract definitions of these concepts. I don't think that it is a good idea to force our users to think about these things. Therefore, I would recommend sticking with concrete implementations defined "close to" the statistical applications.

Keep the user application use cases in mind. If I want to determine whether the diffence in two means is significant, I should be able to do that quickly and intuitively, with one method call either using Univariates or double[]s.



* numerical approximation of the t- and chi-square distributions to enable
user-supplied significance levels.  See above.  Someone just
needs to put a
fork in this. Tim? Brent?


Done.




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to