Forecasting Software

2000-04-12 Thread Brian E. Smith

I will be teaching Time Series and Forecasting (an MBA course) in the Fall.
 I am looking for an inexpensive software package that is good for
forecasting.  Last year I used Minitab 12 and found it easy-to-use and
accessible to students. It is available on our network with a site license
so we will use it again this year. In addition I would like to find a
package that is available to students at a reasonable price that includes
some more advanced features, particularly the AIC (Akaike's Information
Criterion).  Any ideas?

Brian 
___ 

Brian E. Smith  TEL: 514-398-4038 (Work)
McGill University   FAX: 514-398-3876 (Work)
1001 Sherbrooke St. WestFAX: 514-482-1639 (Home)  
Montreal, QC, Canada H3A 1G5EMAIL: [EMAIL PROTECTED] 

Url: http://www.management.mcgill.ca/homepage/profs/smithb
___

No human investigation can be called real science if it cannot be
demonstrated mathematically.  Leonardo da Vinci 
___ 


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread David A. Heiser

> Except for posterior probability, none of these are tools
> for the actual problems.  And posterior probability is not
> what is wanted; it is the posterior risk of the procedure.
>
> But even this relies on belief.  An approach to rational
> behavior makes the prior a weighting measure, without
> ringing in belief.  I suggest we keep it this way, and
> avoid the philosophical aspects


Disagree.

1. Determination of risk requires a model which is based on a belief system
(.e.g. there is/is not a minimum level of tremolite that causes
mesothelioma). Probability is difficult enough to deal with, let alone an
additional swamp called "risk". Those of us who have thought about
developing outcomes in terms of risk, basically have had to give it up. The
difference in interpretation of a risk value between different people is
much to great.

2. Weighing is again based on a belief system. Everything is not equal. Some
are more important than others.

>The data consists of what has been observed.  The likelihood
>principle then mandates that the probabilities of unobserved
>events becomes irrelevant.  This means that the typical test
>procedures (NOT the test STATISTICS) would have to be wrong.

3. This does not make sense. It needs something in addition.

I wrote:
>>Let us supose there are many plausible hypotheses. These include the
>>"nil hypothesis" any priori hypotheses any idea at all that may be
>>considered. Refer to these in terms of set of all plausible hypothesis
>>(including that of no effect) that are to be tested.

>The set of all plausible hypotheses is generally uncountable,
>even in the discrete case.

>>The process is to pick each hypothesis and test it.

>This cannot be done; there are too many.

4. If this is the result, then you have a really, really bad experiment. You
haven't thought about the problem and defined a finite region for
exploration. I sure could not do a PhD thesis and have it accepted if I
didn't have a defined region and objectives for the research.

 5. Let me quote R.A. Fisher "he (the investigator) should only claim that a
phenomenon is experimentally demonstrated when he knows how to design an
experiment so that it will rarely fail to give a significant result" (Fisher
1929b). The experiment is then the means to obtain data to test the chosen
hypotheses.

DAHeiser


The outcome of the
>test is not only a probability, but a reality check (the investigators
>belief system).




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Hypothesis testing and magic - episode 2

2000-04-12 Thread dennis roberts

At 09:30 AM 4/13/00 +1000, Alan McLean wrote:

>In the ‘soft’ sciences it is easy enough to identify a characteristic of
>interest ­ 

alan makes good points as usual ... but i totally object to the term 'soft'
sciences ...

what does soft imply? that the science is bad ... or, that merely that
variables are more 'difficult' to measure ... if that is the case, these
ought to be called the 'hard' sciences

the unpleasant associations with the term 'soft' are uncalled for ... there
are excellent 'scientists' (whatEVER that means) in all fields .. and some
pretty weak ones too (and gee ... BOTH kinds get tenure!) ... 

science is science ... and some practice it well ... some don't ... should
it be some demerit against them that they happen to have opted for a field
of interest ... even if many of the variables are difficult to measure?
perhaps that makes it even more challenging ... 

finally, i would not be so quick to claim that in the areas that are non
social science based ... that variables are all the clear and clean cut ...
there seems to be tremdous infighting about theories and how to 'validate'
them in medicine ... astronomy ... physics ... it is not like everything
there is so simple ... maybe don can pop in here with some relevant
examples ... 

i am sure there are 'mean' differences in terms of these things but ...
there is a lot more WITHin variation in terms of hardness/softness ... that
between disciplines
==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/droberts.htm


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: cluster analysis

2000-04-12 Thread T.S. Lim

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...
>
>Can anyone help with good resources on the web, journals, books, etc on
>cluster analysis - simularity and ordination.  Any recommended programs
>for this type of analysis too.
>
>Cheers
>Elisa Wood


For a list of cluster analysis programs, go to

   http://www.recursive-partitioning.com/cluster.shtml

For references, check out CSNA and Warren Sarle's bibliographies. The links 
are at

   http://www.kdcentral.com

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-12 Thread T.S. Lim

Data Mining = Statistics reborn with a new name.

You ask the wrong crowd. Go to

   http://www.kdcentral.com

and subscribe to datamine-l mailing list.



In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...
>
>I suspect in this forum, almost as bad as the F-word or N-word are the 
>DM-words... Data Mining... I agree, but wonder about criteria.
>
>Often in our various research domains we have no choice but to use 
>retrospective data. A classic example might be validating an investment 
>approach by examining historical data, which some call backtesting. 
>
>What are the criteria, how can we know when we have chance findings?
>
>I've argued that if the model is based on an a priori hypothesis, or can 
>be justfied by previously established theories, the possibility of data 
>mining may be ignored. When the pre-existing theory is less substantial, 
>one may ask if the discovered model fits data not included in the 
>original model (data which occurs after the model was discovered, or data 
>which precedes the data originally used to create the model).
>
>I'd like to hear the views of people on this forum. 
>
>The specific situation I'm refering to is an investment model called the 
>Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) 
>which was found to beat the S&P500 and Dow 30 over the period from 1973 
>through 1993. Since that date, and further backtested to 1961, it has not 
>similarly beat those traditional benchmark indexes, but also has not 
>performed worse (both of which could be due to lack of power). The 
>Foolish Four is based on a reasonable hypothesis that the worse 
>performing Dow Jones Industrial Average companies are poised to turn 
>around because they are simply too great to fail over the long term. The 
>judgement on poor performance is based on the stock yield (a high 
>yielding stock has a relatively high interest payment compared to price), 
>therefore a reasonable hypothesis is used to justify this approach. 
>Selection of 4 of the 5 worst performing Dow companies (the worst is 
>excluded because often these companies are in actual long term financial 
>trouble) is what makes up the Foolish Four.
>
>I am not affiliated with the Motley Fool (where this investment strategy 
>is touted) nor am I advertising for them. It is just an interesting 
>practical problem which raises a question I think many statiticians face, 
>how to explain when someone has conducted data mining and when they might 
>have sussed out a valid truth.
>
>Paul Bernhardt
>University of Utah
>Department of Educational Psychology

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Hypothesis testing and magic - episode 2

2000-04-12 Thread Alan McLean

Some more comments on hypothesis testing:

My impression of the ‘hypothesis test controversy’, which seems to exist
primarily in the areas of psychology, education and the like (this is
coming from someone who has been involved in education for all my
working life, but with a scientific/mathematical background), is that it
is at least partly a consequence of the sheer difficulty of carrying out
quantitative research in those fields. A root of the problem seems to be
definitional. I am referring here to the definition of the variables
involved.

In, say, an agricultural research problem it is usually easy enough to
define the variables. For a very simple example, if one is interested in
comparing two strains of a crop for yield, it is very easy to define the
variable of interest. It is reasonably easy to design an experiment to
vary fairly obvious factors and to carry out the experiment.

In the ‘soft’ sciences it is easy enough to identify a characteristic of
interest – the problem is how to measure it. If I am interested in the
relationship between ability in statistics and ethnic background, for
example, I measure the statistics ability using a test of some sort; I
measure ethnic background by defining a set of ethnicities. There are
literally an infinite number of combinations that I can use – infinitely
many different tests, all purporting to measure ‘statistics ability’
(even if I change only one word in a test, I cannot be absolutely
certain of its effect, so it is a different test!), and a very large
number of definitions of ‘ethnicity’.

This is of course not news to anyone reading this. But I am coming to my
point. Suppose I carry out an ‘experiment’ – I apply the test to a group
of people of varying ethnicity, score them on the test and analyse the
results, including a hypothesis test to decide if statistics ability is
related to ethnicity. This test might be a simple ANOVA, or a
Kruskal-Wallis or a chi square test, depending on how I score the test.

As I said earlier, a hypothesis test only helps the user to decide which
of two models is probably better. The point of the above paragraphs is
this: the definition of the models being compared includes the
definition of the variables used. If I reject the null model (a label I
prefer to ‘null hypothesis’) – that is I decide that the alternative
model is (likely to work) better – I am NOT saying that there is a
relationship between statistics ability and ethnicity. All I am saying
is that there is a relationship between the two variables I used.

Please note that the test is not saying this – I am. The test merely
gives me a measure of the strength of the evidence provided by the data
(‘significant at 1%’ or ‘p-value of .0135’); this measure is only
relevant if the models I have used are appropriate. I can use other
evidence (experience is what we usually use! but there may be related
tests that help) to decide if the model is appropriate.

So there are three levels at which judgement is used to make decisions:
 deciding what variables are to be used to measure the characteristics
of interest, and how any relationship between them relates to the
characteristics
 deciding on the model to be used, and how to test it
 deciding the conclusion for the model

In each of these there is evidence we use to help us make the decision.
The hypothesis test itself provides the test for the third.

Finally (at least for the moment) – whether we choose the null or
alternative model, it IS a decision. In research, accepting the null
means that we decide to accept it at least for the moment, so it is not
necessarily a committed decision. On the other hand, if a line of
investigation is not yielding results, the researcher is likely to not
continue on that line – so it is a decision which does lead to an
action.

For non research applications such as in quality control, accepting the
null model quite clearly is a decision to act on the basis of that. For
example, with a bottle filling machine which is periodically tested as
to the mean contents, the null is that the machine is filling the
bottles correctly. Rejecting the null entails stopping the machine;
accepting it means the machine will not be stopped.

Traditional hypothesis testing does incorporate a decision-theoretic
loss function – the p-value.

Regards again,
Alan


--
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and informa

summing standard errors within polynomial regression

2000-04-12 Thread Dale Glaser


A colleague sent the following to me at work today and after perusal of
various texts (Neter et al, Pedhazur, Cohen, etc.) I am unable to give
anything but an opinion...here is what he sent:

"Can you answer me the following question. It concerns what is the
appropriate standard error (SE) from a curve fitting program when what one
wants to plot is derived from a COMBINATION of certain parameters, each of
which has its own SE, specifically,
I fit parabolas to some data, sensitivity (y) as a function of pupil
position (x)
y = ax^2 + bx + c
>From trivial Calculus, the peak of this function is at -b/2a

Now, I have seperate SEs for a, b and c from the fits.
What is the best SE to use for -b/2a ? For example, do the SEs add?"

At first he proposed an equation that took the square root of the variance
of the estimates, i.e., sqrt(SE^2 lin + SE^2 quad).  My feeling was that
given the partialed nature of standard errors in a multiple regression
context, it may be misleading to add the respective standard errors, e.g.,
SE for the linear component + SE for the quadratic component, etc.,
especially given the collinearity (if centering not performed) of the terms.
However, I also understand that variance components can be additive. Anyway,
if anyone has a general opinion I would be most appreciative as I am a bit
stumped on this one...thank you...dale glaser




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Data Mining

2000-04-12 Thread Paul Bernhardt

I suspect in this forum, almost as bad as the F-word or N-word are the 
DM-words... Data Mining... I agree, but wonder about criteria.

Often in our various research domains we have no choice but to use 
retrospective data. A classic example might be validating an investment 
approach by examining historical data, which some call backtesting. 

What are the criteria, how can we know when we have chance findings?

I've argued that if the model is based on an a priori hypothesis, or can 
be justfied by previously established theories, the possibility of data 
mining may be ignored. When the pre-existing theory is less substantial, 
one may ask if the discovered model fits data not included in the 
original model (data which occurs after the model was discovered, or data 
which precedes the data originally used to create the model).

I'd like to hear the views of people on this forum. 

The specific situation I'm refering to is an investment model called the 
Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) 
which was found to beat the S&P500 and Dow 30 over the period from 1973 
through 1993. Since that date, and further backtested to 1961, it has not 
similarly beat those traditional benchmark indexes, but also has not 
performed worse (both of which could be due to lack of power). The 
Foolish Four is based on a reasonable hypothesis that the worse 
performing Dow Jones Industrial Average companies are poised to turn 
around because they are simply too great to fail over the long term. The 
judgement on poor performance is based on the stock yield (a high 
yielding stock has a relatively high interest payment compared to price), 
therefore a reasonable hypothesis is used to justify this approach. 
Selection of 4 of the 5 worst performing Dow companies (the worst is 
excluded because often these companies are in actual long term financial 
trouble) is what makes up the Foolish Four.

I am not affiliated with the Motley Fool (where this investment strategy 
is touted) nor am I advertising for them. It is just an interesting 
practical problem which raises a question I think many statiticians face, 
how to explain when someone has conducted data mining and when they might 
have sussed out a valid truth.

Paul Bernhardt
University of Utah
Department of Educational Psychology


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Bruce Weaver

On 12 Apr 2000, Herman Rubin wrote:

> >I have often wondered if an integrated course/course sequence might not be
> >better.
> 
> A course sequence of a rather different kind is definitely
> in order.  It would be at least three courses.
> 
> The first course would be a general probability only course,
> with the emphasis on understanding probability, not in carrying
> out computations.  This has nothing to do with the discipline
> of the individual student, although the level should be such
> that it uses as much mathematics as the student is going to know.
> One might, at this stage, introduce the ideas of statistical
> decision making, but most will need a full course in probability
> first to understand probability well enough to use it in any
> sensible manner.  If probability is presented as merely the
> limit of relative frequency, this might be quite difficult.
> 
> The second course should be a course in probability modeling
> in the student's department of application.  The construction
> of probability models, the making of assumptions, and the
> meaning of those assumptions, is almost totally absent in
> those using statistics today.  There should be strong warnings
> about the dangers of those assumptions being false, and that
> in practice these assumptions might not be quite true.
> 
> Only after this can one reasonably deal with the uncertainties
> of inference.


Dr. Rubin,
Are there any texbooks that you would deem suitable for the 3
courses you describe above? 

-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Michael Granaas

On Wed, 12 Apr 2000, Robert Dawson wrote:

> 
> I'm afraid that I don't follow your definition of a "plausible null".
> On the one hand, you say that my value (in the simulation I included) of
> 102 for the mean IQ of a population is "a priori false"; you then say that
> 
> "I like interval estimates because they give me a good
> range for my plausibly true values for the null."
> 
> But if I had computed a 95% confidence interval from almost any of those
> simulated data sets, 102 would have been in it.
> 
> Had I said that the mean IQ was actually 102 and that I was testing
> the null hypothesis that it was 100, would you have called _that_ a
> plausible null? My point - that repeated failures to reject the null
> should *not* automatically increase one's belief in its truth - would
> be equally valid.

One of the reasons I have been enjoying this discussion is I am learning
about the unshared assumptions that I have been making.

I have in my own mind been using "plausible" to refer to a hypothesis that
has not been refuted by data.  We may certainly find at some point that
the hypothesis is in fact false, but at the time we propose it it could be
true.  We may even wish it to be false at the time we propose it.  But as
of the time we propose it we cannot say with conviction that it is false.

So, the values identified by a confidence interval would fit my usage of
plausible.  They are consistent with current knowledge, but some, most of
them, will eventually be eliminated.  While a value pulled from thin air
is arguably plausible given no prior information, I would include a
requirement for parsimony.  Absent information a no effect hypothesis is
the most parsimonious.

I have been using "a priori false" to refer to a hypothesis that is known
to be inconsistent with current knowledge.  It is not even a reasonable
guess at the correct value.

If I choose to follow up on research in which a non-zero effect is well
established but the parameter estimate has, to me, an unacceptably wide
range I can use the previous estimate as my null and either find that my
results are explainable as a chance deviation from the existing estimate
or my results indicate that the existing estimate is too large/small.
That is my results would either tend to support the status quo or refute
it.

In a discussion about the estimation of the speed of light the authors
(sorry, I can't remember who or where.  If anyone recognizes this example
please point me to the reference) describe how the initial estimates of
the speed of light were too high and had a very wide CI.  Over the course
of years with improved technology the best guess estimate of light speed
changed and the CI narrowed.  While the mechanics that we know as
hypothesis testing were absent the researchers were clearly using the
established best estimate as the equivilent of a null and modifying it as
better data became available, and leaving it stand otherwise.

The only context for your example was that the data were generated with a
specification that they come from a population that was N(100, 15).  We
therefore have prior knowledge that 102 is not the correct answer.

But, if I were in fact trying to guess at the IQ of a population, the data
from a sample of n = 10 provides precious little information, as you
clearly demonstrated.  But, if I had to try, my likely null for an unknown
population would be 100 since that is the normed mean IQ for some
population and therefor is consistent with prior knowledge. That is a null
of IQ=100 is a credible true value until I can get better data (it might
even be the correct value). 

If n = 10 and I cannot reject a null of 100 I certainly agree that the
corroboration value is low.  But, if n = 100 and I can't reject a null of
100 I am starting to see support for 100 as a correct value.  If n = 500
and I cannot reject a null of 100 would you still demand that I had no
evidence supporting the null?  How about if n = 1000? 10,000?  How much
power has to be present before failure to reject the null is support of
the null?

Michael
> 
> -Robert Dawson
> 
> 
> 
> 
> 

***
Michael M. Granaas
Associate Professor[EMAIL PROTECTED]
Department of Psychology
University of South Dakota Phone: (605) 677-5295
Vermillion, SD  57069  FAX:   (605) 677-6604
***
All views expressed are those of the author and do not necessarily
reflect those of the University of South Dakota, or the South
Dakota Board of Regents.



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
terminat

Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <004101bfa35b$54beb900$[EMAIL PROTECTED]>,
David A. Heiser <[EMAIL PROTECTED]> wrote:
>This is a multi-part message in MIME format.

>--=_NextPart_000_003E_01BFA320.A7535300
>Content-Type: text/plain;
>   charset="iso-8859-1"
>Content-Transfer-Encoding: quoted-printable


>- Original Message -
>From: Michael Granaas <[EMAIL PROTECTED]>
>> Our current verbal lables leave much to be desired.

>> Depending on who you ask the "null hypothesis" is

>> a) a hypothesis of no effect (nil hypothesis)
>> b) an a priori false hypothesis to be rejected (straw dog hypothesis)
>> c) an a priori plausible hypothesis to be tested and falsified or
>> corroborated (wish I had a term for this usage/real null?)


>The concept of a hypothesis is important. It can be used to teach an 
>important statistical concept.

>Let us supose there are many plausible hypotheses. These include the 
>"nil hypothesis" any priori hypotheses any idea at all that may be 
>considered. Refer to these in terms of set of all plausible hypothesis 
>(including that of no effect) that are to be tested.

The set of all plausible hypotheses is generally uncountable,
even in the discrete case.  

>The process is to pick each hypothesis and test it.

This cannot be done; there are too many.

The outcome of the 
>test is not only a probability, but a reality check (the investigators 
>belief system).

The data consists of what has been observed.  The likelihood
principle then mandates that the probabilities of unobserved
events becomes irrelevant.  This means that the typical test
procedures (NOT the test STATISTICS) would have to be wrong.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



hyp test:better def

2000-04-12 Thread dennis roberts

it appears to me that we are having the same kinds of discussions on this 
topic as usual  and we go round and round ... and where we stop depends 
on when people get tired of it

is progress being made? i wonder ...

perhaps some of this time would be better spent defining more what a 
hypothesis is within the general area of doing research ... FORGET ABOUT 
STATISTICS FOR A MOMENT ... then, if we agree that there are times ... 
within the framework of trying to better understand phenomena ... that it 
is helpful perhaps vital for us to formulate AND test (gather data about) 
one or more researchable hypotheses

then we might get a better handle on

1. what the hypothesis is
2. what is a frameable version(s) of that hypothesis
3. what are some data handling (statistical?) ways of trying to collect and 
present evidence that will shed some light on how tenable or reasonable it 
is to keep that hypothesis as a work in progress ... or

to decide to abandon it and search for better hypotheses or notions or 
explanations of phenomena

we need to recognize however that

1. truth will not be found by this method ... that is, the absolute truth
2. our efforts at best will move us only closer to better understandings of 
phenomena ..
3. no matter what we find ... we always have to take it with a huge grain 
of salt ...

finally, i WOULD LIKE to offer some summary points that do seem sensible to me

A. the reliance on ... and dominance of ... traditional 'significance' 
testing ... in almost all of printed scientific literature ... across most 
disciplines ... is TOTALLY out of whack in terms of what this 'method' can 
tell us about phenomena

B. the failure of statisticians in general, particularly those (me 
included) who TEACH students about this stuff, to build into their psyches 
'priors', in some form, as herman and others have been preaching ... is 
tantamount to unethical statistical instructional practice

and C.

if we do A and don't do B ... we do a tremendous disservice to students we 
work with

now, how we reinvent our strategies ... is difficult INdeed ... but, we 
must try





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article ,
Magill, Brett <[EMAIL PROTECTED]> wrote:
>Seems to me that hypothesis testing remains an essential step. Take for
>instance the following data that I made up just for the purpose of
>illustration and the correlation matrix it produces:

>VAR1  VAR2
>2.00   2.00
>3.00   2.00
>5.00   6.00
>4.00   2.00
>3.00   1.00

>Correlations
>   VAR1VAR2
>VAR1   
>Pearson Correlation1.000   .765
>Sig. (2-tailed).   .132
>N  5   5

>VAR2
>Pearson Correlation.7651.000
>Sig. (2-tailed).132.
>N  5   5


>Now, .77 is probably a respectable correlation (depending of course on the
>application).  However, the question here is how much faith we have in this
>estimate.  Accepting the traditional alpha level of .05 (because it is not
>real data and so no reason not to) we would say that this is beyond what we
>will accept as the risk of making a Type I error, so we fail to reject the
>null.  This is not to say that the correlation is zero, but for practical
>purposes with this sample, we must treat it as no effect (and here probably
>take into consideration our power).  Effect size is useless without
>significance.  Significance is meaningless without information on effect
>size.


One can make a case for hypothesis testing in SOME
situations.  However, the above example is one which shows
some of what is wrong.

Even classical statisticians, faced with the rudiments of
decision theory, will agree that the significance level to
be used should generally decrease with increasing sample
size.  While there can be problems with small samples, the
converse is that it should increase with decreasing sample
size.  The choice of a significance level, without
consideration of the consequences of incorrect acceptance
if the null is false, fails when one considers the real
problem.

I have seen a paper claim that an effect was not important
because it came out at the .052 level.  This is bad
statistics, and bad science.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Michael Granaas <[EMAIL PROTECTED]> wrote:
>On Tue, 11 Apr 2000, Robert Dawson wrote:


.

>> and Michael Granaas responded
>> > This (point 4) is certainly what we have been lead to believe, but I
>> > question the assumption.  Do we not in fact teach that we are to act as if
>> > the null is true until we can demonstrate otherwise?

>> I certainly don't.  We *compute* as if the null was true, whether we
>> believe it or not; then we either conclude that (null + data) is implausible
>> or that the data are consistent with the null.

>And if the data are consistent with the null we do what?  Act as if the
>null were true?  Act as if nothing were true?  In a pure interpretation of
>this approach we must act as if there were no knowledge (null not
>rejected) or only very weak knowledge (effect is in the 
>direction).  The first is a complete waste of effort and the second
>provides only the weakest bit of sketchy knowledge.

>Every research project should plausibly add to our knowledge base.  But,
>if the null is a priori false failure to reject is just that a failure and
>waste of time.

One might think that if the null is a priori false, then
we should just go ahead and reject it without looking at
the data.  But we are asking the wrong question; what is
usually wanted is to decide whether it is better to act
as if the null is true.  This is the usual situation; if
statistical testing was available and used in science in
the early days, most hypotheses would have been rejected.
But without accepting a false hypothesis, it would not be
possible to draw conclusions, and progress would not be
made.  

In the physical sciences, it was sometimes the case that the 
data were sufficiently inexact that the error in the model
got swamped by the errors in the data; in such a case, the
null hypotheses might get through.  Chemists would have had
major problems in the early days if they could weigh their
samples accurately enough; the isotope effect would mess up
the theories.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <048a01bfa483$85f46280$[EMAIL PROTECTED]>,
Robert Dawson <[EMAIL PROTECTED]> wrote:
>Michael Granaas wrote (in part):
>> The problem is that interval estimation and null hypothesis testing are
>> seen as distinct species.  An interval that includes zero leads to the
>> same logical problems as failure to reject a false null.

Interval estimation at a fixed coverage probability also
does not meet any decision concept; at best, it can be
considered a descriptive statistic.  If there is enough
data, and there is no real null, one can use a flat prior,
and act as if the posterior distribution of the parameter
is essentially the normalized likelihood function.

But interval estimation should take into account the size
of the interval.  The easiest from a computational standpoint
happens to be linear in this; the action to be taken becomes
quite simple.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
dennis roberts <[EMAIL PROTECTED]> wrote:
>At 01:16 PM 4/10/00 -0300, Robert Dawson wrote:

>>both leave the listener wondering "why 0.5?"  If the only answer is "well,
>>it was a round number close enough to x bar [or "to my guesstimate before
>>the experiment"] not to seem silly, but far enough away that I thought I
>>could reject it." then the test is pointless.

 -Robert Dawson


>YOU HAVE made my case perfectly!  ... this is why the notion of hypothesis 
>testing is outmoded, no longer useful ... not worth the time we put into 
>teaching it ...
>in the case above ... i would ask:

There are cases where .5 makes sense, or rather
approximately .5 makes sense.  This happens in genetic
regression, if one assumes additivity, random mating, and
the contributions of the parents are equal.  However,
Rubin's second commandment is that thou shalt not believe
thy assumptions.

The problem of testing approximate hypotheses is more
difficult.  From a decision-theoretic standpoint, if
the width of the acceptance region in the parameter
space is small compared to the standard error of the
usual estimator, one can use a point null as a good
approximation.  If not, the users assumptions become
more important.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: scientific method

2000-04-12 Thread Rich Ulrich

On 10 Apr 2000 14:06:32 -0700, [EMAIL PROTECTED] (dennis roberts)
wrote:

> here are a few (fastly found i admit) urls about scientific method ... some 
> are quite interesting
 < snip; so that no one might think that I recommend the citations > 

I saved this note because it had references, but I was disappointed by
them, now that I finally got around to checking -- after the first
three, I quit checking.  

The first one had the tone such that I wondered if the author was
going to point me to Biblical Creationism as what he *recommended* .  
Well, it is not the perspective you will see in your Social Science
textbooks.

I recommend, instead, that if you want to understand how a scientist's
mind works, you might want to read a critique of that neurotic U.S.
movement -- Try Stephen Jay Gould, when he is writing essays and book
reviews (rather than the excellent Naturalist topics, which make up
most of his books).  I think "Urchins in the storm"  is a book of
reviews.  He also wrote an excellent piece about the role of biasses
in social science history, "The Mismeasure of Man".

For deep consideration of the scientific method, I recommend,
"Criticism and the Growth of Knowledge" (Lakatos, ed.).  This book
happens to be from the proceedings of a symposium devoted to exploring
Thomas Kuhn's thesis about revolutions in scientific discovery, and it
is a modern classic.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Michael Granaas <[EMAIL PROTECTED]> wrote:

>In thinking about my own failure to get students to ask follow up
>questions to a null hypothesis test I have formulated a couple of possible
>reasons. Let me know what you think.

>1.  Even when we teach statistics in the discipline areas we fail to
>integrate it with research.  We teach a course in statistics and a course
>in research design/methodology as if they were two distinct topics.  It
>seems to me that this could easily encourage the type of thinking that
>leads to substantive questions not being linked to the statistical
>hypothesis/procedure selected.  

>I have often wondered if an integrated course/course sequence might not be
>better.

A course sequence of a rather different kind is definitely
in order.  It would be at least three courses.

The first course would be a general probability only course,
with the emphasis on understanding probability, not in carrying
out computations.  This has nothing to do with the discipline
of the individual student, although the level should be such
that it uses as much mathematics as the student is going to know.
One might, at this stage, introduce the ideas of statistical
decision making, but most will need a full course in probability
first to understand probability well enough to use it in any
sensible manner.  If probability is presented as merely the
limit of relative frequency, this might be quite difficult.

The second course should be a course in probability modeling
in the student's department of application.  The construction
of probability models, the making of assumptions, and the
meaning of those assumptions, is almost totally absent in
those using statistics today.  There should be strong warnings
about the dangers of those assumptions being false, and that
in practice these assumptions might not be quite true.

Only after this can one reasonably deal with the uncertainties
of inference.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Robert Dawson

I wrote:
> >(a) that their discipline ought to be a science;
and Herman Rubin responded:
>
> What is a science?  The word means "knowledge".

It did once, and does still in certain uses. I _think_ that everybody
here is aware that the main meaning today is more restricted.

> >Granted, if they did understand statistics, they would not test
hypotheses
> >nearly as often as they do. However, that said, I am not entirely
persuaded
> >that risk calculation is the whole story, either. In many pure research
> >situations, "risk" is just not well defined. What is the risk involved in
> >believing (say) that the universe is closed rather than open?
>
> Both "hypotheses" are highly composite.  In a situation
> like this, what is the ADVANTAGE of assuming one rather
> than the other?  What action is going to be taken?
>
> There may be a point in investigating the problem, but is
> there one in drawing inferences?

Yes, surely. Let me turn the question around, Herman: your statements
seem to imply that any form of inference designed to do anything but choose
between a set of _actions_ is at least pointless and probably immoral; and
it seems as if you are advocating a philosophy of science from which the
concepts of "fact", "truth", and "falsehood", even in a tentative sense, are
to be eliminated, to be replaced by the concept of "utility".  What benefit
is there in this?

As for the risks, I can see definite disadvantages in proposals such as:

> The only general conclusion would be a summary of the likelihood
> function, or a reduction of the data to a point where the loss of
> information is not critical in computing a good approximation to
> the "best" action.

Part of communicating the results of a piece of research is summarizing
them and interpreting them, so that it takes less time to read a scientific
paper than it took to write it.  If scientific writing were restricted as
you suggest, the unfortunate person who *did* have to make a decision would
have the following Herculean program to carry out:

   First, make up a list of all possible states of the universe;
   Then do the data analysis for all relevant research yourself,
using the list assembled in the first step.

In situations where one actually *is* balancing risks, I quite agree
that one should analyze the data accordingly. However, I do not understand
your apparent implicit claim that no question should be asked or answered in
any other situation.

For instance, supposing one is trying to decide whether to add fluoride
to the drinking water of a town. The final decision should be a risk-benefit
analysis. However, the possibility of ever getting to a final decision
depends to a large extent on the fact that other researchers in the past did
*not* end their papers with phrases like "Does fluorine cause an elevated
risk of chicken pox?  I'm not going to tell you, but if you plug your
personal risk estimates into this calculation [Box IV] you can decide
whether you think the risk outweighs the benefits."


-Robert Dawson






===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <01e301bfa2ee$b69f42b0$[EMAIL PROTECTED]>,
Robert Dawson <[EMAIL PROTECTED]> wrote:
>Dennis Roberts asked, imagining a testing-free universe:

>>> what would the vast majority of folks who either do inferential work
>>and/or
>>> teach it ... DO
>>> what analyses would they be doing? what would they be teaching?

>I wrote:
>> >*  students would be told in their compulsory intro stats that
>> >"a posterior probability of 95% or greater is called
>> > "statistically significant", and we say 'we believe
>> > the hypothesis'. Anything less than that is called
>> >"not statistically significant", and we say 'we disbelieve
>> > the hypothesis'".

>and Herman Rubin responded:

>> Why?  What should be done is to use the risk of the procedure,
>> not the posterior probability.  The term "statistically significant"
>> needs abandoning; it is whether the effect is important enough
>> that it pays to take it into account.

>Dennis asked what _would_ happen, not what _should_.  Most of the abuses we
>see around us are not the fault of hypothesis testing _per_se_, but of
>statistics users who believe:

>(a) that their discipline ought to be a science;

What is a science?  The word means "knowledge".

>(b) that statistics must be used to make this so;

The problem is that they expect statistics to take in their
data and spew out the TRUTH.  The capital letters are not
an exaggeration.

>(c) and that it is unreasonable to expect them to _understand_
>statistics just because of (a) and (b).

They have elevated statistics to a religion, and as in many
religions, the layman only has to carry out the sacrifices
ordered by the priest to get the blessings of the gods.

They do not need to understand statistical CALCULATIONS, 
and they do not have to be able to produce the proofs.  What
they need to understand are the concepts of probability and
decision making, so that they can accurately communicate 
their problems to those who can help with the mechanics.

>Granted, if they did understand statistics, they would not test hypotheses
>nearly as often as they do. However, that said, I am not entirely persuaded
>that risk calculation is the whole story, either. In many pure research
>situations, "risk" is just not well defined. What is the risk involved in
>believing (say) that the universe is closed rather than open?

Both "hypotheses" are highly composite.  In a situation
like this, what is the ADVANTAGE of assuming one rather
than the other?  What action is going to be taken?

There may be a point in investigating the problem, but is
there one in drawing inferences?

>Moreover, suppose we elected Herman to the post of Emperor of Inference,
>(with the power of the "Bars and the Axes"?) to enforce a risk-based
>approach to statistics (not that he'd take it, but bear with me...), would
>the situation realy improve?

>My own feeling is that, in many "soft" science papers of the sort where
>the research is not immediately applied to the real world, but may affect
>public policy and personal politics, a "risk" aproach would be disastrous.
>If the researcher had to assign "risks" to outcomes that were merely a
>matter of correct or incorrect belief, it  would be all too tempting to
>assign a large risk to an outcome that "would set back the cause of X fifty
>years" and conversely a small risk to accepting a belief that might be
>considered "if not true, at least a useful myth." (Exercise: provide your
>own examples).  Everything would be lowered to the level of Pascal's Wager -
>surely the canonical example of the limitations of a risk-based approach?

It is precisely in these situations that a risk approach is 
absolutely necessary.  But the input to this, or any other,
sound risk approach must be made by those who will benefit
or suffer from the decision.  In the case of medical 
procedures, unless there is a public health question like
the spread of disease, it is the risk function of the patient
which is the one which should be used.  For public policy,
it is the risk function of the government which is involved.

On this topic, there is a fair book by Clemen, _Making Hard
Decisions_.  

>One might argue that in such a situation the rare reader who intends to
>take action, and not the writer, should do the statistics. Unfortunately, in
>the real world, that won't wash. People want simple answers, and with the
>flood of information that we have to deal with in keeping up with the
>literature in any subject today, this is not entirely a foolish or lazy
>desire.

As Einstein stated, make things as simple as possible, but
NO SIMPLER.  It is a foolish desire, fanned by ignorance.
We should be teaching that statistics at least as difficult
as the Oracle of Delphi, and that understanding the Oracle
can be as difficult as solving the problem otherwise.

It is considered the author's responsibility to reach a conclusion,
>not just to present a mass of undigested

Re: hyp testing

2000-04-12 Thread Bruce Weaver

On 11 Apr 2000, Donald F. Burrill wrote:

> On Mon, 10 Apr 2000, Bruce Weaver wrote in part, quoting Bob Frick:
> 
-- >8 ---

> > 
> > To put this argument another way, suppose the question is whether one 
> > variable influences another.  This is a discrete probability space with 
> > only two answers: yes or no.  Therefore, it is natural that both 
> > answers receive a nonzero probability. 
> 
> It may be (or seem) "natural";  that doesn't mean that it's so, 
> especially in view of the subsequent refinement:
> 
> > Now suppose the question is changed into 
> > one concerning the size of the effect.  This creates a continuous 
> > probability space, with the possible answer being any of an infinite 
> > number of real numbers and each one of these real numbers receiving an 
> > essentially zero probability.  A natural tendency is to include 0 in this 
> > continuous probability space and assign it an essentially zero 
> > probability.  However, the "no" answer, which corresponds to a size of 
> > zero, does not change probability just because the question is phrased 
> > differently.  Therefore, it still has its nonzero probability; only the 
> > nonzero probability of the "yes" answer is spread over the real numbers.
> > 
> 
> To this I have two objections:  (1) It is not clear that the "no" answer 
> "does not change probability ...", as Bob puts it.  If the question is 
> one that makes sense in a continuous probability space, it is entirely 
> possible (and indeed more usual than not, one would expect) that 
> constraining it to a two-value discrete situation ("yes" vs. "no") may 
> have entailed condensing a range of what one might call "small" values 
> onto the answer "no".  That is, the question may already, and perhaps 
> unconsciously, have been "coarsened" to permit the discrete expression 
> of the question with which Bob started.

I see your point.  But one of the examples Frick gives concerns the
existence of ESP.  In the discrete space, it does or does not exist.  For
this particular example, I think one could justify using a 1-tailed test
when moving to the continous space; and so the null hypothesis would
encompass "less than or equal to 0", and the alternative "greater than 0". 
It seems to me that with a one-tailed alternative like this, the null
hypothesis can certainly be true.  


>   (2) My second objection is that if the positive-discrete 
> probability is retained for the value "0" (or whatever value the former 
> "no" is held to represent), the distribution of the observed quantity 
> cannot be one of the standard distributions.  (In particular, it is not 
> normal.)  One then has no basis for asserting the probability of error 
> in rejecting the null hypothesis (at least, not by invoking the standard 
> distributions, as computers do, or the standard tables, as humans do 
> when they aren't relying on computers).  Presumably one could derive the 
> sampling distribution in enough detail to handle simple problems, but 
> that still looks like a lot more work than one can imagine most 
> investigators -- psychologists, say -- cheerfully undertaking.

This would not be a problem if the alternative was one-tailed, would it?

Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <007a01bfa1c7$aa97c460$[EMAIL PROTECTED]>,
David A. Heiser <[EMAIL PROTECTED]> wrote:
>Lots of interesting replies.

>A. The "community" Denis Roberts refers to wants statistics to tell them
>which is better, which of two models is the correct one, how much more will
>method B cost me,then method A, which process do I use that will make me
>more money, which is the best advertisment strategy, which or two positions
>that my candidate can take will get him the most votes, which (of several
>strategies/models) will get me more money when I trade on NASDAQ, what is
>the probability that I can get genome U1 patented, which treatment will get
>patients out of the hospital quicker, etc, etc, etc.

Part of that is because people have been taught that
statistics can give such answers.

>B. Statistics cannot do any of this. It can only tell you what is the
>probability that what you have occured by chance.

It can tell you this for the specific observation, and
in each state of nature.

>C. Whatever you use, hypothesis, confidence intervals posterior probability,
>or any other stat metod are only tools. The bottom line is a probability,
>not a definite answer.

Except for posterior probability, none of these are tools
for the actual problems.  And posterior probability is not
what is wanted; it is the posterior risk of the procedure.

But even this relies on belief.  An approach to rational
behavior makes the prior a weighting measure, without 
ringing in belief.  I suggest we keep it this way, and
avoid the philosophical aspects.  

>D. The only definate answer is if the result works as intended. Or as Joe
>Ward say's it does a good job of prediction.

>E.The success is a human judgement, which the people in A want a machine and
>software to do, and to be infalable (because human judgement is not).


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread dennis roberts

a professor thought that he was producing a test of 50 items at 'about the 
50%' difficulty level, that is .. on average, the scores would be about 
50%. now, he collected data from a random sample of n=40 of his class ... 
gave them the test ... and then did a ttest using 25 as the null ... he found

(now no fair tossing in other considerations like ... well, this is not 
planned properly etc  just take it on face value the way we ACTUALLY 
see it in the vast majority of the literature)

MTB > ttest 25 c1

One-Sample T: C1

Test of mu = 25 vs mu not = 25

Variable  N  Mean StDev   SE Mean
C1   40 32.20  9.86  1.56

Variable 95.0% CIT  P
C1(   29.05,   35.35) 4.62  0.000 <<- 
REJECT THE NULL

  -- 
--
  0 25 
   50

where on the number line might there 'real' level of performance be based 
on the rejected null?


another prof looking at the data, and keeping in mind what the professor in 
the course thought did the following 95% ci ...

MTB >
MTB > tint c1

One-Sample T: C1

Variable  N  Mean StDev   SE Mean 95.0% CI
C1   40 32.20  9.86  1.56  (   29.05,   35.35) 
<- CI

  -- 

  0  25 
 50

where are the number line do you think the 'real' level of performance is?

now, folks on the list have been trying to argue about what truth is ... or 
whether we actually could find it ... and i would say that in this case, 
one might define 'truth' at least two ways:

first truth: is the null true?

second truth: what is mu?

the first truth is of so little value (and only really says, we don't think 
it is 25) ... but the second gets at the heart of the problem ... what is 
going on with the students performance ...

the first truth and our 'proof' of or NOT of it ... just says WE were good 
or BAD at formulating the hypothesis ... but does not really get us closer 
to the second truth ... which speaks directly to the parameter ...




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
dennis roberts <[EMAIL PROTECTED]> wrote:
>i was not suggesting taking away from our arsenal of tricks ... but, since 
>i was one of those old guys too ... i am wondering if we were mostly lead 
>astray ...?

>the more i work with statistical methods, the less i see any meaningful (at 
>the level of dominance that we see it) applications of hypothesis testing ...

>here is a typical problem ... and we teach students this!

>1. we design a new treatment
>2. we do an experiment
>3. our null hypothesis is that both 'methods', new and old, produce the 
>same results

I presume you mean the same distribution of results.

But this is at least next to impossible.  Even if all you are
doing is using a new batch of the same old material, there will
be SOME difference.  This may or may not be important.

>4. we WANT to reject the null (especially if OUR method is better!)

Some do, and some do not.

>5. we DO a two sample t test (our t was 2.98 with 60 df)  and reject the 
>null ... and in our favor!
>6. what has this told us?

>if this is ALL you do ... what it has told you AT BEST is that ... the 
>methods probably are not the same ... but, is that the question of interest 
>to us?

>no ... the real question is: how much difference is there in the two methods?

>our t test does NOT say anything about that

>1 to 6 can be applied to all sorts of hyp tests ... and most lead us 
>essentially into a dead end

One should approach the problem as a decision problem form
the beginning.  The real main question is, should the new
treatment be used?  There are many variations on this, and
what may be the least useful action is to say either that
there is a statistically significant difference, OR that
there is no difference.  It is easy to give reasonable
examples where either variation of the current method is
the opposite of what is wanted.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-12 Thread Robert Dawson

Michael Granaas wrote (in part):
> The problem is that interval estimation and null hypothesis testing are
> seen as distinct species.  An interval that includes zero leads to the
> same logical problems as failure to reject a false null.

No; an interval that includes zero has additional information. Not
(to open another can of worms) because of being a confidence interval;
we can construct a 95% confidence region, the union of two intervals,
consisting of precisely the *least* plausible values, and it is possible
to construct a 95% CI that contains no information whatsoever about the
value of the parameter! But as somebody (Kalbfleisch? George Gabor?) said
once, the reason that confidence intervals as usually computed work as well
as they do is that they are closely related to maximum-likelihood intervals.

I'm afraid that I don't follow your definition of a "plausible null".
On the one hand, you say that my value (in the simulation I included) of
102 for the mean IQ of a population is "a priori false"; you then say that

"I like interval estimates because they give me a good
range for my plausibly true values for the null."

But if I had computed a 95% confidence interval from almost any of those
simulated data sets, 102 would have been in it.

Had I said that the mean IQ was actually 102 and that I was testing
the null hypothesis that it was 100, would you have called _that_ a
plausible null? My point - that repeated failures to reject the null
should *not* automatically increase one's belief in its truth - would
be equally valid.

-Robert Dawson






===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Hypothesis testing and magic

2000-04-12 Thread Silvert, Henry

Finally a voice of sanity!!!

Henry M. Silvert Ph.D.
Research Statistician
The Conference Board
845 Third Ave.
New York, NY 10022
Phone : (212)339-0438
Fax : (212)836-3825
Email : [EMAIL PROTECTED]


> -Original Message-
> From: Alan McLean [SMTP:[EMAIL PROTECTED]]
> Sent: Tuesday, April 11, 2000 7:47 PM
> To:   EDSTAT list
> Subject:  Hypothesis testing and magic
> 
> I have been reading all the back and forth about hypothesis testing with
> some degree of fascination. It's a topic of particular interest to me -
> I presented a paper called 'Hypothesis testing and the Westminster
> System' at the ISI conference in Helsinki last year.
> 
> What I find fascinating is the way that hypothesis testing is regarded
> as a technique for finding out 'truth'. Just wave a magic wand, and
> truth will appear out of a set of data (and mutter the magic number 0.05
> while you are waving it) Hypothesis testing does nothing of the sort
> - of course.
> 
> First, hypothesis testing is not restricted to statistics or 'research'.
> If you are told some piece of news or gossip, you automatically check it
> out for plausibility against your knowledge and experience. (This is
> known colloquially as a 'shit filter'.) If you are at a seminar, you
> listen to the presenter in the same way. If what you hear is consistent
> with your knowledge and experience you accept that it is probably true.
> If it is very consistent, you may accept that it IS true. If it is not
> consistent, you will question it, conclude that it is probably not true.
> 
> IF the news is something that requires some action on your part, you
> will act according your assessment of the information.
> 
> If the news is important to you, and you cannot decide which way to go
> on prior knowledge, you will presumably go and get corroborative
> information, hopefully in some sense objective information.
> 
> This describes hypothesis testing almost exactly; the difference is a
> matter of formalism.
> 
> Next - a statistical hypothesis test compares two probability models of
> 'reality'. If you are interested in the possible difference between two
> populations on some numeric variable - for example, between heights of
> men and heights of women in some population group - and you choose to
> express the difference in terms of means, you are comparing a model
> which says
> height of a randomly chosen individual = overall mean + random
> fluctuation
> with one which says
> height of a randomly chosen individual = overall mean + factor
> due to sex + random fluctuation
> You then make assumptions about the 'random fluctuations'.
> 
> Note that one of these models is embedded within the other - the first
> model is a particular case of the second. It is only in this situation
> that standard hypothesis testing is applicable.
> 
> Neither of these models is 'true' - but either or both may be good
> descriptions of the two populations. Good in the sense that if you do
> start to randomly select individuals, the results agree acceptably well
> with what the model predicts. The role of hypothesis testing is to help
> you decide which of these is (PROBABLY) the better model - or if neither
> is.
> 
> In standard hypothesis testing, one of these models is 'privileged' in
> that it is assumed 'true' - that is, if neither model is better, then
> you will use the privileged model. In most cases, this means the SIMPLER
> model.
> 
> More accurately - if you decide that the models are equally good (or
> bad) you are saying that you cannot distinguish between them on the
> basis of the information and the statistical technique used! To decide
> between them you will need either to use a different technique, or more
> realistically, some other criterion. For example, in a court case, if
> you cannot decide between the models 'Guilty' and 'Innocent', you may
> always choose 'Innocent'.
> 
> There is no reason why one model is thus privileged. In my paper I
> stressed my belief that this approach reflects our (and Fisher's)
> cultural heritage rather than any need for it to be that way. One can
> for example express the choice as between the embedded model and the
> embedded model suggested by the data. For a test on the difference
> between two means, this considers the models mu(diff) = 0 and mu(diff) =
> xbar. The interesting thing is that this is what we actually do!
> although it is dressed up in the language and technique of the general
> model mu(diff) not= 0.  (This dressing up is a lot of the reason why
> students have trouble with hypothesis testing.)
> 
> To conclude: hypothesis testing is NECESSARY. We do it all the time.
> Assessment of effect sizes is also necessary, but the two should not be
> confused.
> 
> Regards,
> Alan
> 
> --
> Alan McLean ([EMAIL PROTECTED])
> Department of Econometrics and Business Statistics
> Monash University, Caulfield Campus, Melbourne
> Tel:  +61 03 9903 2102Fax: +61 03 9903 2007
> 
> 
> 
> 
> ===

Re: cluster analysis

2000-04-12 Thread Gene Gallagher

I distribute a program called COMPAH that does Lance-Williams
combinatorial agglomerative clustering with 20-30 different
similarity dissimilarity indices.  It is a Fortran program
that runs on Windows or DOS.  It will cluster very large
datasets (thousands of items) quickly.  I provide documentation
with references and I also distribute the source code free.
It is set up to write and read Matlab binary files so that
you can use Matlab for ordination and COMPAH for clustering.
The program and documentation can be downloaded at
http://www.es.umb.edu/edgwebp.htm


In article <[EMAIL PROTECTED]>,
  Elisa Wood <[EMAIL PROTECTED]> wrote:
> Can anyone help with good resources on the web, journals, books, etc
on
> cluster analysis - simularity and ordination.  Any recommended
programs
> for this type of analysis too.
>
> Cheers
> Elisa Wood
>
>

--
Eugene D. Gallagher
ECOS, UMASS/Boston


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Support Vector Book: Available Now

2000-04-12 Thread svm_news

The Support Vector Book is now distributed and available
(see http://www.support-vector.net for details).


AN INTRODUCTION TO SUPPORT VECTOR MACHINES
(and other kernel-based learning methods)
N. Cristianini and J. Shawe-Taylor
Cambridge University Press, 2000
ISBN: 0 521 78019 5
http://www.support-vector.net

Contents - Overview

1 The Learning Methodology
2 Linear Learning Machines
3 Kernel-Induced Feature Spaces
4 Generalisation Theory
5 Optimisation Theory
6 Support Vector Machines
7 Implementation Techniques
8 Applications of Support Vector Machines
Pseudocode for the SMO Algorithm
Background Mathematics
References
Index


Description

This book is the first comprehensive introduction to Support Vector
Machines (SVMs), a new generation learning system based on recent
advances in statistical learning theory. The book also introduces
Bayesian analysis of learning and relates SVMs to Gaussian Processes
and other kernel based learning methods.
SVMs deliver state-of-the-art performance in real-world applications
such as text categorisation, hand-written character recognition, image
classification, biosequences analysis, etc. Their first introduction in
the early 1990s lead to a recent explosion of applications and
deepening theoretical analysis, that has now established Support Vector
Machines along with neural networks as one of the standard tools for
machine learning and data mining.

Students will find the book both stimulating and accessible, while
practitioners will be guided smoothly through the material required for
a good grasp of the theory and application of these techniques. The
concepts are introduced gradually in accessible and self-contained
stages, though in each stage the presentation is rigorous and thorough.

Pointers to relevant literature and web sites containing software
ensure that it forms an ideal starting point for further study. These
are also available on-line through an associated web site www.support-
vector.net, which will be kept updated with pointers to new literature,
applications, and on-line software.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===