Re: cigs & figs

2001-07-03 Thread Rich Ulrich

 - in respect of the up-coming U.S. holiday -

On Mon, 25 Jun 2001 11:49:47 GMT, mackeral@remove~this~first~yahoo.com
(J. Williams) wrote:

> On Sun, 24 Jun 2001 16:37:48 -0400, Rich Ulrich <[EMAIL PROTECTED]>
> wrote:
> 
> 
> >What rights are denied to smokers?  
JW > 
> Many smokers, including my late mother, feel being unable to smoke on
> a commerical aircraft, sit anywhere in a restaurant, etc. were
> violation of her "rights."  I don't agree as a non-smoker, but that
> was her viewpoint until the day she died.

What's your point:  She was a crabby old lady, whining (or
whinging) about fancied 'rights'?  

You don't introduce anything that seems "inalienable"  or 
"self-evident" (if I may introduce July-4th language).
Nobody stopped her from smoking as long as she kept it away
from other people-who-would-be-offended.

Okay, we form governments to help assure each other of rights.   
Lately, the law sees fit to stop some assaults from happening, 
even though it did not always do that in the past. - the offender
still has quite a bit of leeway; if you don't cause fatal diseases,
you legally can offend quite a lot.  We finally have laws about
smoking.

But she wants the law to stop at HER convenience?

[ snip, various ]
JW > 
> Talking about confused and/or politically driven,  what do Scalia and
> Thomas have to do with smoking rights?   Please cite the case law.

I mention "rights"  because that did seem to be a attitude you
mentioned that was (as you see) provocative to me.

I toss in S & T, because I think that, to a large extent, they
share your mother's preference for a casual, self-centered 
definition of rights.  And they are Supreme Court justices.
[ Well, they don't say, "This is what *I* want"  these two
translate the blame/ credit to Nature (euphemism for God).]

So: I don't fault your mother *too*  harshly, when Justices
hardly do better.  Even though a prolonged skew was needed,
to end up with two like this.


-- 
Rich Ulrich, [EMAIL PROTECTED]


http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: about a problem of khi2 test

2001-07-03 Thread Rich Ulrich

On Sun, 01 Jul 2001 14:19:31 +0200, Bruno Facon
<[EMAIL PROTECTED]> wrote:
> I work in the area of intelligence differentiation. I would like to know
> how to use the khi2 statistic to determine whether the number of
> statistically different correlations between two groups is due or not to
> random variations. In particular I would like to know how to determine
> the expected numbers of statistically different correlations due to
> “chance”.
> Let me take an example. Suppose I compare two correlations matrices of
> 45 coefficients obtained from two independent groups (A and B). If there
> is no true difference between the two matrices, the number of
> statistically different correlations should be equal to 1.25 in favor of

Yes, that is the number.   But there is not a legitimate test that I
know of, unless you are willing to make a strong assumption that 
no pair of the variables should be correlated.

I never heard of the khi2 statistic before this.  I searched with
google, and found a respectable number of references, and here
is something that I had not seen with a statistic:  kh2 appears to be
solely French in its use.  Of the first 50 hits, most were in French,
at French ISPs (.fr).  The few that were in English were also from
French sources.  

One article had a reference (not available in my local libraries):
Freilich MH and Chelton DB, J Phys Oceanogr  16, 741-757. 


> 
> group A and equal to 1.25 in favor of group B (in case of  alpha = .05).
> 
> Consequently, the expected number of nonsignificant differences should
> be 42.75. Is my reasoning correct?

I would be nice to test the numbers, but I don't credit that reference
as a good one, yet.  

I don't remember for sure, but I think you might be able to compare
two correlation matrices with programs from Jim Steiger's site,

http://www.interchg.ubc.ca/steiger/multi.htm

On the other hand, you would be better off if you can compare 
the entire covariance structures, to keep from making accidental
assumptions about variances.  (Does Jim provide for that?)

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Maximum Likelihood

2001-06-29 Thread Rich Ulrich

On 28 Jun 2001 20:39:18 -0700, [EMAIL PROTECTED] (Mark W. Humphries)
wrote:

> Hi,
> 
> Does anyone have references to a simple/intuitive introduction to Maximum
> Log Likelihood methods.
> References to algorithms would also be appreciated.
> 

Look on the Internet.

I used www.google.com to search on 
"maximum likelihood" tutorial  
(put the phrase in quotes to keep it together; 
or you can use Advanced search)

There were MANY hits, and the second reference 
was in a tutorial that begins at
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_2.html


The third reference was for some programs and examples in Gauss
(a programming language) by Gary King at Harvard, in his application
area.  If these aren't worthwhile (I did not try to download
anything),  there are plenty of other sites to check.

[ I am intrigued by G. King, a little.  This is the fellow who
putatively has a method, not Heckman's, for overcoming or
compensating for aggregation bias.  Which I never found available
for free.  But, too bad, the page says these programs go with 
his 1989 book, and I think his Method is more recent.]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with stats please

2001-06-25 Thread Rich Ulrich

On 24 Jun 2001 13:54:56 -0700, [EMAIL PROTECTED] (dennis roberts) wrote:

> At 12:20 PM 6/24/01 -0700, Melady Preece wrote:
> >Hi.  I am teaching educational statistics for the first time, and although I
> >can go on at length about complex statistical techniques, I find myself at a
> >loss with this multiple choice question in my test bank.  I understand why
> >the range of  (b) is smaller than (a) and (c), but I can't figure out how to
> >prove that it is smaller than (d).
> >
> >If you can explain it to me, I will be humiliated, but grateful.
> >
> >
> >1.  Which one of the following classes had
> >  the smallest range in IQ scores?

dr >
> of course, there is nothing about the shape of the distribution of any 
> class ... so, does the item assume sort of normal? in fact, since each of 
> these classes is probably on the small side ... it would be hard to assume 
> that but, for the sake of the item ... pretend
>  [ snip ]

Good point, about normality.
And who provides the "test bank" of items?

The testee has to  *assume*  a certain amount of normality,
which is not stated; and you have to *assume*  that the N is
greater than 2 -- or else the claim is *not*  true.

It seems to me  that   when the reader has to supply 
unstated technical assumptions like these,
the test-validator should be careful:  I suspect
that success on THIS  item  is context-dependent.

There is less problem, if everyone is always given exactly
the same test.  That *is*  an issue, if different sets
of items are extracted for use, at different times --
which is what I think of, when I hear "item bank."


Could other items clue this answer?  That is, 
Do other items STATE  those assumptions?
Do other items REQUIRE those assumptions if you are
going to answer them?   - If the user has seen 
items in his selection from the "bank",  is he more 
apt to make the intended assumptions here?

 I expect that a conscientious scale developer is
interested of minimizing the work required for validation;
and he would avoid this problem if he noticed it.  
Answer (b)  seems right, if the reader is supposed 
to describe what you would expect
'for moderate sized samples, with scores that are
continuous and approximately normal.'

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs & figs

2001-06-24 Thread Rich Ulrich

  - re: some outstandingly confused thinking.  Or writing.

On Sat, 23 Jun 2001 15:25:31 GMT, mackeral@remove~this~first~yahoo.com
(J. Williams) wrote:

[ snip;  Slate reference, etcetera ]
>   ... My mother was 91 years
> old when she died  a year ago and chain smoked since her college days.
> She defended the tobacco companies for years saying, "it didn't hurt
> me."  She outlived most of her doctors.   Upon quoting statistics and
> research on the subject, her view was that I, like other "do gooders
> and non-smokers," wanted to deny smokers their rights.  

What statistics would her view quote?  to show that someone
wants to deny smokers 'their rights'?
[ Hey, I didn't write the sentence ]

I just love it, how a 'natural right'  works out to be *exactly*
what the speaker wants to do.  And not a whit more.
(Thomas and Scalia are probably going to give us tons 
of that bad philosophy, over the next decades.)

What rights are denied to smokers?  You know, you can't 
build your outhouse right on the riverbank, either.

>Obviously,
> there is a health connection.  How strong that connection is, is what
> makes this a unique statistical conundrum.

How strong is that connection?  Well, quite strong.

I once considered that it might not be so bad to die 9 years
early, owing to smoking, if that cut off years of bad health 
and suffering.  Then I realized, the smoking grants you 
most of the bad health of old age, EARLY.  (You do miss 
the Alzheimer's.)  One day, I might give up smoking my pipe.

What is the statistical conundrum?  I can almost 
imagine an ethical conundrum.  ("How strongly can
we legislate, to encourage cyclists to wear helmets?")
I sure don't spot a statistical conundrum.

Is this word intended?  If so, how so?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-24 Thread Rich Ulrich

 - I will delete most, and comment on a few points.
Maybe further posts will delete the sci.stat.*  groups -

On Fri, 22 Jun 2001 20:49:02 GMT, Steve Leibel <[EMAIL PROTECTED]>
wrote:

> In article <[EMAIL PROTECTED]>,
>  Rich Ulrich <[EMAIL PROTECTED]> wrote:
> 
[ ... ]
> 
> Hallucinating?  On pot?  What are YOU smokin'?  Pot doesn't cause 
> hallucinations -- although a lot of anti-drug hysteria certainly does.

I read 30 years ago that the pharmacologists classed it
as hallucinogen, and then I discovered why.  Then I got
bored and quit.  At least the stuff is not addictive.

Should I conclude from your comments that this domestic, 
sinsemilla stuff I read about is grossly is inferior to the 
imports of old?


> A cursory web search turned up these links among many others to support 
> my statement.  Naturally this subject is controversial and there are 
> lots of conflicting studies.  ...

- even the first one you cite includes ample support for what I
posted.  After saying other negative things,

> http://www.norml.org/canorml/myths/myth1.shtml  says
' The second NHTSA study, "Marijuana and Actual Driving
Performance," concluded that the adverse effects of cannabis on
driving appear "relatively small" and are less than those of drunken
driving." '

"... less than those of drunken driving"  is *not*  refutation of what
I wrote.  That article supports me rather fully:  intoxicants help 
people have accidents.  Arguments to the contrary are (it seems
to me) supported by wishful thinking that causes the arguments
to blur before your very eyes.  

> 
> And "stranger named Steve?"  I've been on this newsgroup since 1995.  
> Not as famous as James Harris, maybe, but certainly no stranger.

 - sorry.   Today's check with  groups.google.com  shows me 
you post frequently in sci.math -- which I don't read.  This thread,
you may note (as I just noted),  is posted there, and crossposted 
to three  sci.stat.*  groups, where I do participate.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-22 Thread Rich Ulrich

On Fri, 22 Jun 2001 18:45:52 GMT, Steve Leibel <[EMAIL PROTECTED]>
wrote:

> In article <[EMAIL PROTECTED]>,
>  [EMAIL PROTECTED] (Eamon) wrote:
> 
> > (c) Reduced motor co-ordination, e.g. when driving a car
> > 
> 
> Numerous studies have shown that marijuana actually improves driving 
> ability.  It makes people more attentive and less aggressive.  You could 
> look it up.

An intoxicant does *that*?  

I think I recall in the literature, that people getting 
stoned, on whatever, occasionally  *think*  that 
their reaction time or sense of humor or other 
performance is getting better.   

Improving your driving by getting mildly stoned 
(omitting the episodes of hallucinating)
seems unlikely enough, to me, 
that  *I*  think the burden of proof is the stranger named Steve.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: a form of censoring I have not met before

2001-06-21 Thread Rich Ulrich

On 21 Jun 2001 00:35:11 -0700, [EMAIL PROTECTED] (Margaret
Mackisack) wrote:

> I was wondering if anyone could direct me to a reference about the 
> following situation. In a 3-factor experiment, measurements of a continuous 
> variable, which is increasing monotonically over time, are made every 2 
> hours from 0 to 192 hours on the experimental units (this is an engineering 
> experiment). If the response exceeds a set maximum level the unit is not 
> observed any more (so we only know that the response is > that level). If 
> the measuring equipment could do so it would be preferred to observe all 
> units for the full 192 hours. The time to censoring is of no interest as 
> such, the aim is to estimate the form of the response for each unit which 
> is the trace of some curve that we observe every 2 hours. Ignoring the 
> censored traces in the time period after they are censored puts a huge 

Well, it certainly *sounds*  as if the "time to censoring" should be 
of great interest, if you had an adequate model.

Thus, when you say that "ignoring" them gives  "a huge 
downward bias",  it sounds to me as if you are admitting that 
you do not have an acceptable model.

Who can you blame for that?  What leverage do you 
have, if you try to toss out those bad results?  (Surely, 
you do have some ideas about forming estimates
that *do*  take the hours into account.  The problem
belongs in the hands of someone who does.)

 - maybe you want to segregate trials into the ones
with 192 hours, or less than 192 hours; and figure two 
(Maximum Likelihood) estimates for the parameters, which
you then combine.


> downward bias into the results and is clearly not the thing to do although 
> that's what has been done in the past with these experiments. Any 
> suggestions of where people have addressed data of this or related form 
> would be very gratefully received.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help me, please!

2001-06-19 Thread Rich Ulrich

On 18 Jun 2001 01:18:37 -0700, [EMAIL PROTECTED] (Monica De Stefani)
wrote:

> 1) Are there some conditions which I can apply normality to Kendall
> tau?

tau is *lumpy*  in its distribution for N less than 10.

And all rank-order statistics are a bit problematic when 
you try to use them on rating scales with just a few discrete
scores -- the tied values give you bad scaling intervals, 
and the estimate of variance won't be very good,either.

For correlations, your assumption of 'normality' is usually
applied to the values at zero.

> I was wondering if x's observations must be
> independent and y's observations must be independent to apply
> asymptotically normal limiting
> distribution. 
> (null hypothesis = x and y are independent).
> Could you tell me something about?

 - Independence is needed for just about any tests.

I started to say (as a minor piece of exaggeration) that 
independence is needed "absolutely";  
but the correct statement, I think, is that independence
is always demanded  "relative to the error term."

[ snip, non-linear?]

"Monotonic" is the term.


[ snip, T(z):  I don't know what that is.]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Probability Of an Unknown Event

2001-06-18 Thread Rich Ulrich

On Sat, 16 Jun 2001 23:05:52 GMT, "W. D. Allen Sr."
<[EMAIL PROTECTED]> wrote:

> It's been years since I was in school so I do not remember if I have the
> following statement correct.
> 
> Pascal said that if we know absolutely nothing
> about the probability of occurrence of an event
> then our best estimate for the probability of
> occurrence of that event is one half.
> 
> Do I have it correctly? Any guidance on a source reference would be greatly
> appreciated!

I did a little bit of Web searching and could not find that.

Here is an essay about Bayes, which (dis)credits him and his
contemporaries as assuming something like that, years before Laplace.

I found it with a google search on 
 <"know absolutely nothing"  probability> .

 http://web.onetel.net.uk/~wstanners/bayes.htm

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: individual item analysis

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 14:24:39 -0700, [EMAIL PROTECTED] (Doug
Sawyer) wrote:

> I am trying to locate a journal article or textbook that addresses
> whether or not exam quesitons can be normalized, when the questions are
> grouped differently.  For example, could a question bank be developed
> where any subset of questions could be selected, and the assembled exam
> is normalized?
> 
> What is name of this area of statistics?  What authors or keywords would
> I use for such a search?  Do you know whether or not this can be done?


I believe that they do this sort of thing in scholastic achievement
tests, as a matter of course.  Isn't that how they make the transition
from year to year?  I guess this would be "norming".

A few weeks ago, I discovered that there is a whole series of
tech-reports put out by one of the big test companies.  I would 
look back to it, for this sort of question.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Rich Ulrich

On 17 Jun 2001 04:34:26 -0700, [EMAIL PROTECTED] (Marc)
wrote:

> I have to summarize the results of some clinical trials.
> Unfortunately the reported information is not complete.
> The information given in the trials contain:
> 
> (1) Mean effect in the treatment group (days of hospitalization)
> 
> (2) Mean effect in the control group (days of hospitalization)
> 
> (3) Numbers of patients in the control and treatment group
> 
> (4) p-values of a t-test (between the differences of treatment
> and control)
> My question:
> How can I calculate the variance of treatment difference which I need
> to perform meta-analysis? Note that the numbers of patients in the

Aren't you going too far?  You said you have to summarize.
Well, summarize.  The difference is in terms of days.  
Or it is in terms of percentage of increase.

And you have the t-test and p-values.  

You might be right in what you propose, but I think
you are much more likely to produce a useful report 
if you keep it simple.

You are right; meta-analyses are complex.  And a 
majority of the published ones are (in my opinion) awful.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 02:04:36 -0700, [EMAIL PROTECTED] (Eamon) wrote:

[ snip, Paul Jones.  About marijuana statistics.]

> 
> Surely this whole research is based upon a false premise. Isn't it
> like saying that 90%, say, of heroin users previously used soft drugs.
> Therefore, soft-drug use usually leads to hard-drug use - which does
> not logically follow. (A => B =/= B => A)
> 
> Conclusions drawn from the set of people who have had heart attacks
> cannot be validly applied to the set of people who smoke dope.
> Rather than collect data from a large number of people who had heart
> attacks and look for a backward link, they should monitor a large
> number of people who smoke dope. But, of course this is much more
> expensive.

It is much more expensive, but it is also totally stupid to carry out
the expensive research if the *cheap* and lousy research didn't
give you a hint that there might be something going on.

The numbers that he was asking about do pass the simple
test.  I mean, there were not 1 million people contributing one
hour each, but we should still ask, *Would*  this say something?
If it would not, then the whole question is *totally*  arid.  The 2x2
table is approximately
(dividing the first column by 100; and subtracting from a total):
10687   and  124
   175   and  9

That gives a contingency test of 21.2 or 18.2, with p-values 
under .001.  The Odds Ratio on that is 4.4.
That is pretty convincing that there is SOMETHING
going on, POSSIBLY something that merits an explanation.  
The expectation for the cell with 9  is just 2.2 -- the tiny cell is
the cell that matters for contributions to the test -- which is why it
is okay to lop the "hundreds"  off the first column (to make it
readable).

Now, you may return to your discussion of why the table is
not any good, and what is needed for a proper test.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-14 Thread Rich Ulrich

On 13 Jun 2001 20:32:51 -0700, [EMAIL PROTECTED] (Tracey
Continelli) wrote:

> Sidney Thomas <[EMAIL PROTECTED]> wrote in message 
>news:<[EMAIL PROTECTED]>...
> > srinivas wrote:
> > > 
> > > Hi,
> > > 
> > >   I have a problem in identifying the right multivariate tools to
> > > handle datset of dimension 1,00,000*500. The problem is still
> > > complicated with lot of missing data. can anyone suggest a way out to
> > > reduce the data set and  also to estimate the missing value. I need to
> > > know which clustering tool is appropriate for grouping the
> > > observations( based on 500 variables ).
> 
> One of the best ways in which to handle missing data is to impute the
> mean for other cases with the selfsame value.  If I'm doing
> psychological research and I am missing some values on my depression
> scale for certain individuals, I can look at their, say, locus of
> control reported and impute the mean value.  Let's say [common
> finding] that I find a pattern - individuals with a high locus of
> control report low levels of depression, and I have a scale ranging
> from 1-100 listing locus of control.  If I have a missing value for
> depression at level 75 for one case, I can take the mean depression
> level for all individuals at level 75 of locus of control and impute
> that for all missing cases in which 75 is the listed locus of control
> value.  I'm not sure why you'd want to reduce the size of the data
> set, since for the most part the larger the "N" the better.

Do you draw numeric limits for a variable, and for a person?
Do you make sure, first, that there is not a pattern?

That is -- Do you do something different depending on
how many are missing?  Say, estimate the value, if it is an
oversight in filling blanks on a form, BUT drop a variable if 
more than 5% of responses are unexpectedly missing, since 
(obviously) there was something wrong in the conception of it, 
or the collection of it  Psychological research (possibly) 
expects fewer missing than market research.

As to the N -  As I suggested before - my computer takes 
more time to read  50 megabytes than one megabyte.  But
a psychologist should understand that it is easier to look at
and grasp and balance raw numbers that are only two or 
three digits, compared to 5 and 6.

A COMMENT ABOUT HUGE DATA-BASES.

And as a statistician, I keep noticing that HUGE databases
tend to consist of aggregations.  And these are "random"
samples only in the sense that they are uncontrolled, and 
their structure is apt to be ignored.

If you start to sample, to are more likely to ask yourself about 
the structure - by time, geography, what-have-you.  

An N of millions gives you tests that are wrong; estimates 
ignoring "relevant" structure have a spurious report of precision.
To put it another way: the Error  (or real variation) that *exists*
between a fixed number of units (years, or cities, for what I
mentioned above) is something that you want to generalize across.  
With a small N, that error term is (we assume?) small enough to 
ignore.  However, that error term will not decrease with N, 
so with a large N, it will eventually dominate.  The test 
based on N becomes increasing irrelevant

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-12 Thread Rich Ulrich

On 11 Jun 2001 22:18:11 -0700, [EMAIL PROTECTED] (srinivas) wrote:

> Hi,
> 
>   I have a problem in identifying the right multivariate tools to
> handle datset of dimension 1,00,000*500. The problem is still
> complicated with lot of missing data. can anyone suggest a way out to
> reduce the data set and  also to estimate the missing value. I need to
> know which clustering tool is appropriate for grouping the
> observations( based on 500 variables ).

'An intelligent user' with a little experience.

Look at all the data, and figure what comprises a 
'random' subset.  There are not many purposes that
require more than 10,000  cases so long as your 
sampling gives you a few hundred in every interesting
category.  [This can cut down your subsequent 
computer processing time, since 1 million times 500 
could be a couple of hundred megabytes, and might
take some time just for the disk reading.]

Look at the means/ SDs/ # missing for all 500;
look at frequency tabulations for things in categories;
look at cross tabulations between a few variables of
your 'primary'  interest, and the rest.  Throw out what
is relatively useless.

For *your*  purposes, how do you combine logical categories?  -
8 ounce size with 24 ounce; chocolate with vanilla; etc.
A computer program won't tell you what makes sense, 
not for another few years.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: About kendall

2001-06-12 Thread Rich Ulrich

On 12 Jun 2001 08:43:53 -0700, [EMAIL PROTECTED] (Monica De Stefani)
wrote:

> When I aplly Kendall tau or Kendall's partial tau to a time series do
> I have to calcolate ranks or not?
> In fact a time series has a natural temporal order.

 ... but you are not partialing out time.  Surely.

Your program that does the Kendall tau must do some
ranking, as part of the algorithm.  Why do you think you 
might have to calculate ranks?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: please help

2001-06-11 Thread Rich Ulrich

On 10 Jun 2001 07:27:55 -0700, [EMAIL PROTECTED] (Kelly) wrote:

> I have the gage repeatability & reproducibility(gage R&R) analysis
> done on two instruments, what hyphoses test can I use to test that the
> repeatability variance(expected sigma values of repeatability) of the
> two instruments are significantly different form each other or to say
> one has a lower variance than the other.
> Any insight will be greatly appreciated.
> Thanks in advance for your help.

I am not completely sure I understand, but I will make a guess.

There is hardly any power for comparing two ANOVAs that are
done on different samples, until you make strong assumptions 
about samples being equivalent, in various regards.  

If ANOVAs are on the same sample,
then a CHOW test can be used on the "improved prediction"
if one hypothesis consists of an extra d.f.  of prediction.
If ANOVAs are on separate samples, I wonder if you could
compare the residual variances, by the simple variance 
ratio F-test -- well, you could do it, but I don't know what arguments
should be raised against it, for your particular case.

There are criteria resembling the CHOW test that are used less
formally, for incommensurate ANOVAs (not the same predictors)
 - AKAIKE and others.

If your measures are done on the same (exact) items, you 
might have a paired test.  Instrument A gets closer values on 
 of the measurements that are done.

Finally, if you can do a bunch of separate experiments, you
can test whether  A  or B  does better in more than half of them.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Diagnosing and addressing collinearity in Survival Analysis

2001-06-11 Thread Rich Ulrich

On 06 Jun 2001 06:46:55 GMT, [EMAIL PROTECTED] (ELANMEL) wrote:

> Any assistance would be appreciated:  I am attempting to run some survival
> analyses using Stata STCOX, and am getting messages that certain variables are
> collinear and have been dropped.  Unfortunately, these variables are the ones I
> am testing in my analysis!  
> 
If there are 3 groups (classes), then you can have only
two dummy variables to refer to their degrees of freedom.
You can code those in the most convenient and 
informative way.

If your problem arises otherwise, then you have a 
fundamental problem in the logic of what is being tested.
Google shows some examples of problems when I search
for  "statistical confounding"  (use the quotes for the search).
And  "confounded designs" seems to obtain discussions.


> I would appreciate any information or recommendations on how best to diagnose
> and explore solutions to this problem.  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Obtain standard error in nonlinear estimation

2001-06-06 Thread Rich Ulrich

On Mon, 4 Jun 2001 14:54:56 -0400, "Jiwu Rao"
<[EMAIL PROTECTED]> wrote:

> Hi
> 
> I performed a regression analysis on a model nonlinear in parameters.  The
> function is:
>  q = ( k* P^n )  +  (k2 * P2^n2)
> where P and P2 are independent variables,  k, n, k2, n2 are parameters.
> The estimates and their variances can be obtained, as well as correlation
> between any two parameters.
> 
> The question is:  how do I estimate the standard error in the first term of
> the equation?  That is, what is the error in estimating  w = k* P^n?

1) What is this question supposed to mean?  
How do *you* want to interpret the error in adding two variables 
to an equation, if it were an ordinary multiple regression?  

In terms of 'error', does it answer your question, to drop out the 
whole term, and compare the fuller Fit (4 or 5 variables) with a 
model having 2 variables less?  That probably gives you a 
statistical test if you are fitting by Least squares, or by 
Maximum likelihood.

2) If your correlations between parameters are nearly 1.0 (as you
go on to say), that suggests you don't have the model in an 
elegant form.  Reparameterize.

  The form  ofq= (k * P^n)   looks like a power-transformation.
If you are trying to solve for the Box-Cox transformation, or 
to do something similar, I think  you want a component that 
multiplies or divides by n,  as part of the constant  before P.
That should get rid of some of the correlation.

3) Are you really committed to that equation?
Who around you knows enough that they should commit
you to that equation?  - Ask *them*  what the proper 
parameterization should be, since the version yielding 
correlations near to 1.0  is (at best) a mistake from 
not paying enough attention.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Need Good book on foundations of statistics

2001-06-04 Thread Rich Ulrich

On 1 Jun 2001 19:07:31 GMT, [EMAIL PROTECTED] wrote:

> 
> Can anyone refer me to a good book on the foundations of statistics?

Stigler's "The History of Statistics"  is the most widely read of
recent popular histories.  It covers pre-1900.  His newer
book is "Statistics on the Table"  and I enjoyed that one, too.
It includes the founding of *modern*  statistics in, say, the 1930s,
in addition to much older anecdotes.


> I want to know of the limitations, assumptions, and philosophy
> behind statistics. A discussion of how the quantum world may have
> different laws of statistics might be a plus.

That last sentence makes me think that you don't know any
answers to the sentence just previous to it.
" ... have different laws"  is certainly not the way statisticians
would put it.  Leptons *obey*  different laws than baryons do
(I think), but the laws are descriptions that were imagined by 
human beings.  

I suppose one way to describe the dilemma of physics might
be, It is trying to force all of these particles into fitting
descriptions that are less than ideal (or, so it keeps working out).


I think it is curious and interesting that the physicists at the 
highest levels of abstraction -- cosmology; and high-energy
particles/relativity -- are beginning to use fairly ordinary 
'statistical tests' to judge whether they have anything.  
"IS there oscillation in the measured background of stars, near 4
degrees kelvin, across the whole universe?"
"IF they continued CERN for another 18 months, would there have
been another dozen or so  *apparent*  particles of the right type, 
so they could conclude that the number observed was 'significant'  
at the one-in-a-million level, instead of just one-in-two-hundred?"

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: fit-ness

2001-06-03 Thread Rich Ulrich

On Thu, 31 May 2001 12:05:24 +0100, "Alexis Gatt" <[EMAIL PROTECTED]>
wrote:

> Hi,
> 
> a basic question from a MSc student in England. First of all, yeah I read
> the FAQ and I didnt find anything answering my question, which is fairly
> simple: I am trying to analyse how well several mathematical methods perform
> to modelize a scanner. So I have, for every input data, the corresponding
> output given by the scanner and the values given by the mathematical models
> I am using.
> First, given the distribution of the errors, I can use the usual mean-StdDev

I can think of two or 3 meanings of 'scanner'  and not a one of 
them would have a simple, indisputable measure of 'error.'
 1) Some measures would be biased toward one  'method'  
or another, so a winner would be obvious.
 2) Some samples to be tested would be biased (similarly)
toward a winner by one method or another.  So you select
your winner by selecting your mix of samples.

If you have fine measures, then you can give histograms of your
results (assuming  1-dimensional, as your alternatives suggest).

Is it enough to have the picture?
What would your audience demand?  What is your need?


> if the distro is normal, or median-95th percentile otherwise. Any other
> known methods to enhance the pertinence of the analysis? Any ideas welcome.

Average squared error (giving SD) is popular.  
Average absolute error de-emphasizes the extremes.
Count of errors beyond a critical limit sometimes fills a need.

A more complicated way is to build in a cost function.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: ONLY ONE

2001-05-29 Thread Rich Ulrich

FYI  - that piece of HTML code is a SPAM advertisement, 
which does seem to evoke other Web addresses.

On 27 May 2001 18:51:32 -0700, [EMAIL PROTECTED]
([EMAIL PROTECTED]) wrote:

> 
> 
> 
> window.location="<A  HREF="http://www.moodysoft.com"">http://www.moodysoft.com"</A>;
> 
> 
> 
> 
> Best screen capture on earth and in cyberspace.In fact the only one.Anything 
>else is just a long learning process.
> SPX® v2.0Everytime you need to select a portion of 
>screen, hold right-click longer than usual until the cursor turns into the "cross" 
>graphical cursor.Make your selection and as soon as you release the mouse, SPX® will 
>send it to the destination of your choice: Clipboard, File, Mail, Printer/Fax
> Very useful, no?http://www.moodysoft.com";>www.moodysoft.com
> 
> 
> 
> 
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: The False Placebo Effect

2001-05-27 Thread Rich Ulrich

On 26 May 2001 03:50:32 GMT, Elliot Cramer <[EMAIL PROTECTED]>
wrote:

> Rich Ulrich <[EMAIL PROTECTED]> wrote:
> :  - I was a bit surprised by the newspaper coverage.   I tend to 
> : forget that most people, including scientists, do *not*  blame
> : regression-to-the-mean, as the FIRST suspicious cause 
> : whenever there is a pre-post design:  because they have 
> : scarce heard of it.
> 
> I don't see how RTM can explain the average change in a prepost design

 - explanation:  whole experiment is conducted on patients
who are at their *worst*  because the flare-up is what sent 
them to a doctor.  Sorry; I might have been more complete
there.  All the pre-post studies in psychiatric intervention 
(where I work) have this as something to watch for.

I guess I could have said, "first suspicious cause *of 
selective improvement*  in any pre-post design."

> those above the pre population mean will tend to be closer to the post
> population mean but this doesn't say anything about the average
> change. Any depression study is apt to show both a placebo AND a no
> treatment effect after 6 weeks

 - I'm not sure what that last phrase means... "both "
30% or so of acutely depressed patients will get quite a bit better.
In psychiatry, I think we have called some effects "placebo" 
even when we know that it is not a very good word.  
The experience of being in a research trial, by the way, seems 
to produce a placebo effect, according to what people have told me.
(I think that careful scientists attribute that one to the extra time
and attention given to those subjects.)
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: The False Placebo Effect

2001-05-25 Thread Rich Ulrich

On 24 May 2001 21:39:17 -0700, [EMAIL PROTECTED] (David Heiser) wrote:

> 
> Be careful on your assumptions in your models and studies!
> ---
> 
> Placebo Effect An Illusion, Study Says
> By Gina Kolata
> New York Times
> (Published in the Sacramento Bee, Thursday, May 24, 2001)
> 
> In a new report that is being met with a mixture of astonishment and some
> disbelief, two Danish researchers say that the placebo effect is a myth.

Do you think they will not believe in voudon/ voodoo, either?
> 
> The investigators analyzed 114 published studies involving about 7,500
> patients with 40 different conditions. They found no support for the common
> notion that, in general, about one-third of patients will improve if they
> are given a dummy pill and told it is real.
 [ ... ]
The story goes on.  The authors look at studies where the placebo
effect is probably explained by regression-to-the-mean.  
 - I was a bit surprised by the newspaper coverage.   I tend to 
forget that most people, including scientists, do *not*  blame
regression-to-the-mean, as the FIRST suspicious cause 
whenever there is a pre-post design:  because they have 
scarce heard of it.

On the other hand, I have expected for a long time that 
the best that a light-weight placebo will do is a 
light-weight improvement.  



> ... 
> The researchers said they saw a slight effect of placebos on subjective
> outcomes reported by patients, like their descriptions of how much pain they
> experienced. But Hrobjartsson said he questioned that effect. "It could be a
> true effect, but it also could be a reporting bias," he said. "The patient
> wants to please the investigator and tells the investigator, 'I feel
> slightly better. ' "

"Pain"  is a hugely subjective report.   It is notorious.
I would not want to do a summary across the papers of 
the whole field of pain-researchers, since -- based on
difficulty, and not on knowing those researchers -- I expect 
an enormous amount of bad research in that area.
 - I don't know if the researchers are quite unwise here, of
if they only seem that way because of bad news reporting.
 - Oh, I did read a meta-analysis a while ago, that one from
Steve Simon.  It was based on pain research (and, basically,
only relevant to pain research), and the authors insisted 
that the vast majority of studies were not very good.

About the studies these authors found, using 3 groups:

> They found 114, published between 1946 and 1998. When they analyzed the
> data, they could detect no effects of placebos on objective measurements,
> like cholesterol levels or blood pressure.

 - That is interesting.  114 is a big enough number.  Controlled
medical research, however, seemed to undergo big changes
across those decades.  I expect that double-blind and triple-blind
studies did not get much use until halfway through that interval.

If someone does look into the original publication, and will
tell us about it -- 
I am interested, especially, in what the authors say about 
pain studies, and what they say about time trends.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: sample size and sampling error

2001-05-25 Thread Rich Ulrich

Posted to sci.stat.consult,sci.stat.math, sci.stat.edu
where the same questions had been posted.

On Thu, 24 May 2001 09:51:48 -0400, "Mike Tonkovich"
<[EMAIL PROTECTED]> wrote:

> Before I get to the issue at hand, I was hoping someone might explain the
> differences between the following 3 newsgroups: sci.stat.edu, sci.stat.cons,
> and sci.stat.math?  Now that I've found these newsgroups, chances are good I
> will be taking advantage of the powerful resources that exist out there.
> However, I could use some guideance on what tends to get posted where?  Some
> general guidelines would be helpful.
 [ snip - statistical question, which someone has answered with 
plenty if good references and commentary.]

Don't worry a whole lot about where.  But if you want to post to all 
three, you can put all three in your address line.  That way, the 
message only goes out once; people with decent newsreaders
will only see it once; and a person who Replies (with most
newsreaders) will be carried in all three.

Two of these three groups also exist as Mail-lists (sse, ssc).
What I just wrote about Replies  probably doesn't work for them.
Someone reading on a List will (I think) reply just to that list.
Also, the Mail-list readers are less apt to read all the groups.

My stats-FAQ has messages saved from all three.  I never did
pay much attention to what showed up, where, but you could 
scan my site for some indication as of a few years ago, when 
I was compiling those files.  There are a lot of questions
that would suit to any of them, and many are cross-posted, 
or posted separately to each.

The math group tends to get some higher-calculus questions, 
and the questions overlapping with numerical analysis or
computer science.  You might look for cross-posts if you can
examine Headers.

I think it has been in .edu  where we have discussed 
standardized testing; and the philosophical ideas of
hypothesis testing; and how to teach statistical ideas.

The .consult  group seem appropriate for posing questions 
that don't have strong educational implications (that the poser 
notices).  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardized testing in schools

2001-05-25 Thread Rich Ulrich

On Thu, 24 May 2001 17:30:35 -0400, Rich Ulrich <[EMAIL PROTECTED]>
wrote:

> Standardized tests and their problems?  Here was a 
> problem with equating the scores between years.
> 
> The NY Times had a long front-page article on Monday, May 21:
> "When a test fails the schools, careers and reputations suffer."
> It was about a minor screw-up in standardizing, in 1999.  Or, since

I don't see the Sunday NY Times.  But there were letters on 
Thursday, May 24, concerning a story on  May 20, which 
was concerned with scoring errors 
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardized testing in schools

2001-05-25 Thread Rich Ulrich

On Thu, 24 May 2001 23:25:42 GMT, "W. D. Allen Sr."
<[EMAIL PROTECTED]> wrote:

> "And this proved to me , once again,
> why nuclear power plants are too hazardous to trust:..."
> 
> Maybe you better rush to tell the Navy how risky nuclear power plants are!
> They have only been operating nuclear power plants for almost half a century
> with NO, I repeat NO failures that has ever resulted in any radiation
> poisoning or the death of any ship's crew. In fact the most extensive use of
> Navy nuclear power plants has been under the most constrained possible
> conditions, and that is aboard submarines!
> 
> Beware of our imaginary boogy bears!!

As I construct an appropriate sampling frame, one out of two 
nuclear navies has a good long-term record.  Admiral Rickover 
had a fine success.  The other navy was not so lucky, or suffered 
because it was more pressed for resources.

> 
> You are right though. There is nothing really hazardous about the operation
> of nuclear power plants. The real problem has been civilian management's
> ignorance or laziness!
 [...]

I'm glad you see the problem - though I see it more as 'ordinary 
management'  than ignorance or laziness.  It might not even have
to be 'poor'  management by conventional terms; the conventions
don't take into account extraordinarily dangerous materials.  The
Japanese power plant's  nuke-fluke of last year was an illustration of
employee inventiveness and  'shop-floor innovation'.  Unfortunately
for them, they 'solved a problem'  that had been a (too-) cleverly
designed safety precaution.  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Standardized testing in schools

2001-05-24 Thread Rich Ulrich

Standardized tests and their problems?  Here was a 
problem with equating the scores between years.

The NY Times had a long front-page article on Monday, May 21:
"When a test fails the schools, careers and reputations suffer."
It was about a minor screw-up in standardizing, in 1999.  Or, since
the company stonewalled and refused to admit any problems,
and took a long time to find the problems, it sounds like it 
became a moderately *bad*  screw-up.

The article about CTB/McGraw-Hill starts on page 1, and covers
most of two pages on the inside of the first section.  It seems 
highly relevant to the 'testing' that the Bush administration 
advocates, to substitute for having an education policy.

CTB/McGraw-Hill  runs the tests for a number of states, so they
are one of the major players.  And this proved to me , once again,
why nuclear power plants are too hazardous to trust:  we can't
yet Managements to spot problems, or to react to credible  problem
reports in a responsible way.

In this example, there was one researcher from Tennessee who
had strong longitudinal data to back up his protest to the company;
the company arbitrarily (it sounds like) fiddled with *his*  scores, 
to satisfy that complaint, without ever facing up to the fact that 
they did have a real problem.  Other people, they just talked down.

The company did not necessarily lose much business from the 
episode because, as someone was quoted, all the companies
who sell these tests   have histories of making mistakes.  
(But, do they have the same history of responding so badly?)

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Variance in z test comparing purcenteges

2001-05-23 Thread Rich Ulrich

 - BUT, Robert, 
the equal N case is different from cases with unequal N -
 - or did I lose track of what the topic really is... -

On 22 May 2001 06:52:27 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:

> and Rich Ulrich responded: 
> > Aren't we looking at the same contrast as the t-test with
> > pooled and unpooled variance estimates?  Then -
> 
> Similar, but not identical. With the z-for-proportion we 
> have the additional twist that the amount of extra power
> from te unpooled test is linked to the size of the effect 
> we're trying to measure, in such a way that we get it 
> precisely when we don't need it. Or, to avoid being too 
> pessimistic, let's say that the pooled test only costs us 
> power when we can afford to lose some .
> 

- Robert wrote on May 18,"And, clearly, the pooled 
variance is larger; as the function is convex up, the 
linear interpolation is always less."

Back to my example in the previous post:  Whenever you 
do a t-test, you get exactly the same t if the Ns are equal.
For unequal N, you get a bigger t when the group with the 
smaller variance gets more weight.  I think your z-tests
on proportions have to work the same way.

I can do a t-test with a dichotomous variable as the criterion, 
testing 1 of 100  versus 3 of 6:  2x2 table is (1+99), (3+3).
That gives me a pooled t of 6 or 7, that is  p < .001; and  a
separate-variance t  that is p= 0.06.

 - I like that pooled test, but I do think that it has stronger
assumptions than the 2x2 table.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Elementary cross-sectional statistics

2001-05-21 Thread Rich Ulrich

On Mon, 21 May 2001 13:41:16 GMT, "Sakke" <[EMAIL PROTECTED]> wrote:

> Hello Everybody!
> 
> We have a probably very simple question. We are doing cross-sectional
> regressions. We are doing one regression per moth for a period of ten years,
> resulting in 120 regressions. As we understood, it is possible to just take
> a arithmetic average for every coefficient. 

Well, sure, it is possible to take an arithmetic average
and then you can tell people, "Here is the arithmetic average."
It's a lot harder to have any certainty that the average of a time
series means much.

>   What we do not know, is how to
> calculate the t-statistics for these coefficients. Can we just do the same,
> arithmetic average? Can anybody help us?

No, you certainly can't compute an average of some t-tests
and claim that it is a t-test.  What you absolutely have to have
(in some sense) is a model of what happens over 10 years.

For instance:  If it is the same experience over and over again
(that is your model of 'what happens'),
*maybe* it would be proper to average each Variable over the
120 time points;  and then do the regression.

That is the easiest case I can think of --the mean is supposed
to represent something, and you conclude that it represents
the whole thing.  

Otherwise:  What is there?  What are you
trying to conclude?  Why? (Who cares?)

Are the individual regressions 'significant'?  Highly?
Are there mean-differences over time?  
 - variations between years or seasons?  
Are the lagged correlations near zero?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Variance in z test comparing purcenteges

2001-05-18 Thread Rich Ulrich

On 18 May 2001 07:51:21 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:

 [ ... ] 
>   OK, so what *is* going on here?  Checking a dozen or so sources, I
> found that indeed both versions are used fairly frequently (BTW, I
> myself use the pooled version, and the last few textbooks I've used do
> so).
> 
>   Then I did what I should have done years ago, and I tried a MINITAB
> simulation. I saw that for (say) n1=n2=10, p1=p2=0.5, the unpooled
> statistic tends to have a somewhat heavy-tailed distribution. This makes
> sense: when the sample sizes are small the pooled variance estimator is
> computed using a sample size for which the normal approximation works
> better.
> 
>   The advantage of the unpooled statistic is presumably higher power;
> hoewever, in most cases, this is illusory. When p1 and p2 are close
> together, you do not *get* much extra power.  When they are far apart
> and have moderate sample sizes you don't *need* extra power. And when
[ snip, rest]

Aren't we looking at the same contrast as the t-test with 
pooled and unpooled variance estimates?  Then -

(a) there is exactly the same  t-test value when the Ns are equal; 
the only change is in DF.
(b) Which test is more powerful depends on which group is 
larger, the one with *small*  variance, or the one with *large*
variance.   -- it is a large difference when Ns and variances
are both different by (say) a fourfold factor or more.

If the big N has the small variance, then the advantage
lies with 'pooling'  so that the wild, small group is not weighted
as heavily.  If the big N has the large variance, then the 
separate-variance estimate lets you take advantage of the
precision of the smaller group.  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Intepreting MANOVA and legitimacy of ANOVA

2001-05-18 Thread Rich Ulrich

The usual problem of MANOVA, which is hard to avoid, is that
even if a test comes out significant, you can't say what you have
shown except 'different.'  

You get a clue by looking at the univariate tests and correlations.
Or drawing up the interesting contrasts and testing them to see
if they account for everything.

I have a problem, here, that might be avoidable -- I can't tell
what you are describing.  Part of that is 'ugly abbreviations,' 
part is 'I do not like the terminology, DV and IV, abbreviated or
not'  so I will not take much time at it.

On Fri, 18 May 2001 14:57:49 -0500, "auda" <[EMAIL PROTECTED]> wrote:

> Hi, all,
> In my experiment, two dependent variables were measured (say, DV1 and DV2).
> I found that when analyzed sepeartely with ANOVA, independent variable (say,
> IV and had two levels IV_1 and IV_2) modulated DV1 and DV2 differentially:
> 
> mean DV1 in IV_1 > mean DV1 in IV_2
> mean DV2 in IV_1 < mean DV2 in IV_2
> 
> If analyzed with MANOVA, the effect of IV was significant, Rao
> R(2,14)=112.60, p<0.000. How to intepret this result of MANOVA? Can I go
> ahead to claim IV modulated DV1 and DV2 differentially based up the result
> from MANOVA? Or I have to do other tests?
> 
> Moreover, can I treat DV1 and DV2 as two levels of a factor, say, "type of
> dependent variable", and then go ahead to test the data with
> repeated-measures ANOVA and see if there is an interaction between IV and
> "type of dependent variable"?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: bootstrap, and testing the mean of variance ratio

2001-05-17 Thread Rich Ulrich

On Thu, 17 May 2001 02:33:54 + (UTC), [EMAIL PROTECTED]
(rpking) wrote:

[ snip, some of my response and his old and new comments.]

> I use bootstrap to get the confidence intervals for A and B because
> they are both >0 by construction, so the exact distributions of A and 
> B cannot be normal, and thus starndard distribution theory cannot 
> be used to obtain CIs.

Occasionally, someone will say, as a point of theoretical interest, 
such-and-so  cannot be 'normal' because it has a limited range
(above zero, say).  

That is, in the context I think of, a hyper-technical point being
made.  It is to counter some silliness, where someone wants to
work from Perfect Normality.

Now, you have come up with the opposite silliness, and you claim
that normal distribution theory cannot be used for CIs, with that
thin excuse.  

You might consider:  the name of 'normal' was attached because 
of the success in describing sociological data with that shape:  
measures including height, weight, number of births and deaths.
Almost none of them included negative numbers.

> 
> Now I want to test the null hypothesis that A - B=0.  Let D=A-B.  Could
> D have a normal distribution?  I don't know, and that's why I'm asking.
> 
As I suggested - if we are not happy with the normality of variances,
it is usually fine after we take the log.  Ratios are another thing
that are usually dealt with by taking the log.

I posted:
> > ... and that is relevant to what?  Distributions of raw data are
> >seldom (if ever) "asymptotically normal".

I could clarify:  samples do not become 'more normal' when
the N gets larger.  We hope that their *means*  become better
behaved, and they usually do.  They don't have to be normal 
for the means to be used with the usual parametric statistics.
SO I will say, one more time, try to apply ordinary (normal)
statistics.

> 
> So no social scientist should ever use asymptotic theory in their
> anaysis of (raw) data?  This is certainly a very extreme view.

?? I don't know what you attributing to me -- I was trying to tell
you firmly, without being too rude, that only an ignorant amateur 
will start out with bootstrapping, and will refuse to use normal
theory (like you are doing).  I think you are mis-construing 
'asymptotic'  if you think it applies to raw data.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: (none)

2001-05-16 Thread Rich Ulrich

 [ note, Jay:  HTML-formatting makes this hard to read ]

On 11 May 2001 00:30:06 -0700, [EMAIL PROTECTED] (Jay Warner) wrote:
[snip, HTML header]

> I've had occasion to talk with a number of educator types lately, at different
> application and responsibility levels of primary & secondary Ed. 
> Only one recalled the term, regression toward the mean.  Some (granted,
> the less analytically minded) vehemently denied that such could be causing
> the results I was discussing.  Lots of other causes were invoked.
> IN an MBA course I teach, which frequently includes teachers wishing
> to escape the trenches, the textbook never once mentions the term. 
> I don't recall any other intro stat book including the term, much less
> an explanation.  The explanation I worked out required some refinement
> to become rational to those educator types (if it has yet :).

 - I am really sorry to learn that -
Not even the texts!  that's bad.  
By the way, there are two relevant chapters in the 1999 history,
"Statistics on the Table" by Stephen Stigler (see pages 157-179).

Stigler documents a big, embarrassing blunder by a noted 
economist, published in 1933.  Horace Secrist wrote a book with
tedious detail, much of it being accidental repetitions of regression
fallacy.  Hotelling panned it in a review in JASA.  Next, Secrist
replied in a letter, calling Hotelling "wholly mistaken."  Hotelling
tromped back, " ... and when one version of the thesis is interesting
but false and  the other is true but trivial, it becomes the duty of
the reviewer to give warning at least against the false version."

Maybe Stigler's user-friendly anecdote will help to spread the
lesson, eventually.

> So I'm not surprised that even the NYT would miss it entirely. 
> Rich, I hope you penned a short note to the editor, pointing out its presence. 
> Someone has to, soon.

I did not write, yet.  But I see an e-mail address, which is not usual
in the NYTimes.  I guess they identify Richard Rothstein as
[EMAIL PROTECTED]  
because this article was laid out as a feature (Lessons) instead of an
ordinary news report.  I'm still considering what I should say, if 
someone else doesn't tell me that they have passed the word.


> BTW, Campbell's text, "A primer on regression artifacts" mentions a
> correction factor/method, which I haven't understood yet.  Does anyone
> in education and other social science circles use this correction, and
> may I have a worked out example?

Since you mentioned it, I checked my new copy of the Campbell/ Kenny
book.  Are you in Chapter 5?  There is a lot going on, but I don't
grasp that there is any well-recommended correction.  Except, maybe, 
Structural-equations-modeling, and they just gesture vaguely in the
direction of that.  

Give me a page number?

I thought that they re-inforced my own prejudices, that when two
groups are not matched at Pre, you have a lot of trouble forming clear
conclusions.  You can be a bit assertive if one group "wins" by all
three standards (raw score, change score, regressed-change score), 
but you still can't be 100% sure.

When your groups don't match, you draw the graphs to help you 
clarify trends, since the eyeball is great at pattern analysis.
Then you see if any hostile interpretations can undercut your 
optimistic ones, and you sigh regrets when they do.

> Jay
> Rich Ulrich wrote:
> http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: bootstrap, and testing the mean of variance ratio

2001-05-16 Thread Rich Ulrich



On Wed, 16 May 2001 11:50:07 + (UTC), [EMAIL PROTECTED]
(rpking) wrote:

> For each of the two variance ratios, A=var(x)/var(y) and
> B=var(w)/var(z), I bootstrapped with 2000 replications to obtain
> confidence intervals.  Now I want to test whether the means are
> equal, ie. E(A) = E(B), and I am wondering whether I could just use
> the 2000 data points, calculate the standard deviations, and do a
> simple t test.

This raises questions, questions, questions.

What do you mean by a "data point"?  by "bootstrapping"?
Why do you want ratios of the variances?  If you are concerned with
variances, why aren't you considering the logs of V?  If you are
concerned with ratios, why are you considering the logs of the ratios?

With "2000 replications" each, there would seem to be 4000 points.
Or, what relation is there among x-y-z-w?If these give you 2000
vectors, then why don't you have a paired comparison in mind?

Bootstrapping is tough enough to figure what's proper, that I
don't want to bother with it.  Direct tests are usually enough:  So,
if you were considering a direct test, What would you be testing?
(I figure there is really good chance that you are wrong in what you
are trying to bootstrap, or how you are doing it.)

> I have concerns because A and B are bounded below at 0 (but not
> bounded above), so the distribution may not be asymptotically
> normal. 
 ... and that is relevant to what?  Distributions of raw data are
seldom (if ever) "asymptotically normal".

>But I also found the bootstrapped A and B are well away from
> zero; the 1% percentile has a value of 0.78.  

 ... well, I should hope they are away from zero.  Relevance?

>   So could t test be
> used in this situation?  Or should I do another bootstrapping for
> the test?

Take your original problem to a statistician.  "Bootstrap" is 
something to retreat to when you can't estimate error directly,
and you have given no clue why you might need it.

-- 
Rich Ulrich,[EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: additional variance explained (SPSS)

2001-05-13 Thread Rich Ulrich

On 11 May 2001 12:04:04 -0700, [EMAIL PROTECTED] (Dianne Worth)
wrote:

> "I have a multiple regression y=a+b1+b2+b3+b4+b5.  My Adj. R-sq is > .403.  
> 
> "I would like to determine how much explanation of variance each IV 
> provides.  I have created individual models (y=a+b1+b2+b3) to obtain 

Unfortunately - that is a F.A.Q.  which has no easy answer.  
Please read up on multiple regression in some textbooks.
The variables are acting together, so there is not actually 
any "amount that each IV provides."

Unless the variables are totally uncorrelated (say, design factors),
there is no satisfactory, unique answer.  

[ One clever partition uses the sum of r_0-times- ,
which does add up to the R-squared.  However, if it were a
*satisfactory*  generalization, it could never have terms less than
zero... which does happen. ]

What we usually get is the "variance AFTER all the others",
which is (for instance) obtained by subtraction, as you suppose;
and that will usually add up to far less than 100%.  
However, beware:  In odd cases, with "suppressor variables," 
these variances may add up to more than 100%.

The regression does give you the t-test (or F-test)  on the
contribution of variance.  Actually, that can be manipulated 
to give precisely the "variance after all the others"  since
the tests are using that variance in the numerator.

As to reporting: you should report the initial, zero-level
correlation.  That is other evidence about whatever is
happening in the equation.

So, you can report, "B3 accounts for  by
itself; and it can account for  even after 
the contributions of the other   are taken 
into account first."

Hope this helps.  I think my stats-FAQ offers some more
perspective on Multiple regression.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Question

2001-05-13 Thread Rich Ulrich

On 11 May 2001 07:34:38 -0700, [EMAIL PROTECTED] (Magill,
Brett) wrote:

> Don and Dennis,
> 
> Thanks for your comments, I have some points and futher questions on the
> ussue below.
> 
> For both Dennis and Don:  I think the option of aggregating the information
> is a viable one.  

I would call it "unavoidable"  rather than just "viable."  The data
that you show is basically aggregated  already;  there's just one item
per-person.

>  Yet, I cannot help but think there is some way to do this
> taking into account the fact that there is variation within organizations.
> I mean, if I have a organizational salary mean of .70 (70%) with a very tiny
 [ snip, rest]

 - I agree, you can use the information concerning within-variation.
I think it is totally proper to insist on using it, in order to
validate the conclusions, to whatever degree is possible.  
You might be able to turn around that 'validation'  to incorporate
it into the initial test;  but I think the role as "validation"  is
easier to see by itself, first.

Here's a simple example where the 'variance'  is Poisson.
(Ex.)  A town experiences some crime at a rate that declines 
steadily, from 20 000 incidents to 19 900 incidents, over a 5-year
period.  The linear trend fitted to the several points is "highly
significant"  by a regression test.  Do you believe it?

(Answer)  What I would believe is:  No, there is no trend, but it is
probably true that someone is fudging the numbers.  The 
*observed variation*  in means is far too small for the totals to
be seen be chance.  And the most obvious sources of error
would work in the opposite direction.  

[That is, if there were only a few criminals responsible for many
crimes each, and the number-of-criminals is what was subject 
to Poisson variation, THEN  the number-of-crimes should be 
even more variable.]

In your present case, I think you can estimate on the basis of
your factory (aggregate) data, and then you figure what you 
can about how consistent those numbers are with the 
un-aggregated data, in terms of means or variances.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Variance in z test comparing percenteges

2001-05-13 Thread Rich Ulrich

On 11 May 2001 22:29:37 -0700, [EMAIL PROTECTED] (Donald Burrill)
wrote:

> On Sat, 12 May 2001, Alexandre Kaoukhov (RD <[EMAIL PROTECTED]>) wrote:
> 
> > I am puzzled with the following question:
> > In z test for continuous variables we just use the sum of estimated
> > variances to calculate the variance of a difference of two means i.e.
> >s^2 = s1^2/n1 + s2^2/n2.

[ snip, Q and A,  AK and DB ... ]

> > On the other hand the chi2 is derived from Z^2 as assumed by first 
> > approach.

DB>
>   Sorry;  the relevance of this comment eludes me.

Well -- every (normal) z score can be squared, to produce a
chi-squared score.  One particular formula for a z matches the
Pearson product-moment chisquared test statistic.

> > Finally, I would like to know whether the second formula is ever used
> > and if so does it have any name.

DB>
> "Ever" is a wider universe of discourse than I would dare pretend to. 
> Perhaps colleagues on the list may know of applications.
> I would be surprised if it had been named, though.

I don't remember a name, either.  I think I do remember seeing 
a textbook that presented that t as their preferred "test for
proportions."

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: 2x2 tables in epi. Why Fisher test?

2001-05-10 Thread Rich Ulrich


 - I offer a suggestion of a reference.

On 10 May 2001 17:25:36 GMT, Ronald Bloom <[EMAIL PROTECTED]> wrote:

[ snip, much detail ] 
> It has become the custom, in epidemiological reports
> to use always the hypergeometric inference test --
> The Fisher Exact Test -- when treating 2x2 tables 
> arising from all manner of experimental setups -- e.g.
> 
> a.) the prospective study
> b.) the cross-sectional study
> 3.) the retrospective (or case-control) study
>  [ ... ]

I don't know what you are reading, to conclude that this
has "become the custom."   Is that a standard for some
journals, now?

I would have thought that the Logistic formulation was
what was winning out, if anything.

My stats-FAQ  has mention of the discussion published in
JRSS (Series B)  in the1980s.  Several statisticians gave 
ambivalent support to Fisher's test.  Yates argued the logic
of the exact test, and he further recommended the  X2 test
computed with his (1935) adjustment factor, as a very accurate 
estimator of Fisher's p-levels.

I suppose that people who hate naked p-levels will have to 
hate Fisher's Exact test, since that is all it gives you.

I like the conventional chisquared test for the 2x2, computed
without Yates's correction --  for pragmatic reasons.  Pragmatically,
it produces a good imitation of what you describe, a randomization
with a fixed N but not fixed margins.  That is ironic, as Yates
points out (cited above) because the test "assumes fixed margins"
when you derive it.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: (none)

2001-05-10 Thread Rich Ulrich


 - selecting from CH's article, and re-formatting.  I don't know if 
I am agreeing, disagreeing, or just rambling on.

On 4 May 2001 10:15:23 -0700, [EMAIL PROTECTED] (Carl Huberty)
wrote:

CH:  " Why do articles appear in print when study methods, analyses,
results, and conclusions are somewhat faulty?"

 - I suspect it might be a consequence of "Sturgeon's Law," 
named after the science fiction author.  "Ninety percent of 
everything is crap."  Why do they appear in print when they
are GROSSLY faulty?  Yesterday's NY Times carried a 
report on how the WORST schools have improved 
more than the schools that were only BAD.  That was much-
discussed, if not published.  - One critique was, the 
absence of peer review.  There are comments from statisticians
in the NY Times article; they criticize, but (I thought) they 
don't "get it"  on the simplest point.

The article, while expressing skepticism by numerous 
people, never mentions "REGRESSION TOWARD the MEAN"
which did seem (to me) to account for every single claim of the
original authors whose writing caused the article.


CH: " []  My first, and perhaps overly critical, response  is that
the editorial practices are faulty[ ... ] I can think of two
reasons: 1) journal editors can not or do not send manuscripts to
reviewers with statistical analysis expertise; and 2) manuscript
originators do not regularly seek methodologists as co-authors.  
Which is more prevalent?"

APA Journals have started trying for both, I think.  But I think
that "statistics" only scratches the surface.  A lot of what arises
are issues of design.  And then there are issues of "data analysis".

Becoming a statistician helped me understand those so that I could
articulate them for other people;  but a lot of what I know was never
important in any courses.  I remember taking just one course or
epidemiology, where we students were responsible for reading and
interpreting some published report, for the edification of the whole
class -- I thought I did mine pretty well, but the rest of the class
really did stagger through the exercise.  

Is this "critical reading"  something that can be learned, and
improved?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Analysis of a time series of categorical data

2001-05-04 Thread Rich Ulrich

On 3 May 2001 09:46:12 -0700, [EMAIL PROTECTED] (R. Mark Sharp; Ext.
476) wrote:

> If there is a better venue for this question, please advise me.

 - an epidemiology mailing list?
[ snip, much detail ] 
>  Time point 1Time point 2Time point 3Time point 4  Hosts
>  Inf  Not-InfInf  Not-InfInf  Not-InfInf  Not-Inf  Tested
> 
> G1-S11  14   11  4   11 1   13 2   57
> G1-S27   8   12  3   14 2   15 8   69
> G1-S31  246 18815915   95
> 
> G2-S43  12   12  4   10 4   14 2   61
> G2-S55  105  68 7   1114   57
> G2-S62  26   12 12   1116   1412  105
> 
> The questions are how can group 1 (G1) be compare to group 2 (G2) and 
> how can subgroups be compared. I maintain that the heterogeneity 
> within each group does not prevent pooling of the subgroup data 
> within each group, because the groupings were made a priori based on 
> genetic similarity.

Mostly, heterogeneity prevents pooling.  
What's an average supposed to mean?

Only if the Ns represent naturally-occurring proportions, 
and so does your hypothesis, then you MIGHT want to
analyze the numbers that way.

How much do you know about the speed of expected onset,
and offset of the disease?  If this were real, It looks to me like you
would want special software.  Or special evaluation of a likelihood 
function.  

I can put the hypothesis in simple  ANOVA terms, comparing species
(S).  Then, the within-Variability of G1 and G2 -- which is big --
would be used to test the difference Between:  according to some
parameter.   Would that be an estimate of "maximum number afflicted"?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Please ignore. Only a test

2001-05-02 Thread Rich Ulrich

On 1 May 2001 01:35:58 -0700, [EMAIL PROTECTED] wrote:

> Anyhow, what's going on here is that I've
> been rather dismayed at how my very
> carefully formatted postings have 
> appeared on the list with all their lines 
> truncated in all the wrong spots. (Or at
> least they've shown up that way on my
> server)  Have been advised that I should 
> keep their   length  to 72 characters or 
> less, and am giving that a shot.  Wanna
> see how well it works.


 - I just scanned a number of posts in sci.stat.edu, which is
EDSTAT-L,  and I don't see any that are truncated/wrapped.
I do see messages, now and then, in some groups, that 
have been line-wrapped by some agency.  And I can set 
a VIEW option in Forte Agent that might provide truncation,
or it will line-wrap for those messages where the whole
paragraph has been sent as a single line.

But I don't see a problem today.  And I don't have any
line-problem with a message that you posted a couple 
of days ago.  

So, I think you are reporting on a feature of the mail program
or the Newsreader that you happen to be using, or a feature 
of the OPTIONS of the program.

You can keep lines short to avoid causing those problems,
for whomever  Or you can keep them short just to make
them easier to read.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: (none)

2001-05-02 Thread Rich Ulrich

On 1 May 2001 16:14:28 -0700, [EMAIL PROTECTED] (SamFaz Consulting)
wrote:

> 

Under the Bill s. 1618 title III passed by the 105th US congress
this letter cannot be considered SPAM as long as the sender includes
contact information and a method of removal. To be removed, hit reply
and type ?remove? in the subject line.


Here was a message posted, that my reader saw as an attachment.
The lines above were at the start of the SPAM.

Ahem.  I am about 100% sure that the above is a lie.  In multiple
ways.  For instance, Is there a legal definition of SPAM?

It has been remarked that you do  *not*  want to use the 
"remove" option when someone sends SPAM that is using
a  *garbage*  mailing list (Such as, the above; and any other
mailing list that includes a Mail-list, or a Usenet group).

That's because the REPLY from you proves that your 
Mail address is real, and current, and that you read the 
message.  So the SPAMmer will save your name, specially.


If you can read the headers, you sometimes can make an
effective complaint to the X-abuse:  address, if there is one,
or to the appropriate <  Postmaster@isp-name  >.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



progress in science [ was: rotations and PCA ]

2001-04-22 Thread Rich Ulrich

 - "progress in science" is the new topic.  I comment.

On 9 Apr 2001 07:12:08 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:

> Eric Bohlman wrote:
>In science, it's not enough to
> > say that you have data that's consistent with your hypothesis; you also
> > need to show a) that you don't have data that's inconsistent with your
> > hypothesis and b) that your data is *not* consistent with competing
> > hypotheses.  And there's absolutely nothing controversial about that last
> > sentence [...]
> 
>   Well, I'd want to modify it a little. On the one hand, a certain amount
> of inconsistency can be (and sometimes must be) dealt with by saying
> "every so often something unexpected happens"; otherwise it would only
> take two researchers making inconsistent observations to bring the whole
> structure of science crashing down.  And on the other hand there are

Once upon a time, I spent many hours with the book, "Criticism and the
growth of knowledge."  Various (top) philosophers comment on Thomas
Kuhn's contributions (normal and revolutionary science; paradigms; and
so on), and on each other.

In real science (I. Lakatos argues), models are strongly resistant 
to refutation so long as they remain fertile for research and
speculation.  The pertinent historical model is "phlogiston versus the
caloric theory" -- The honored professors on neither side, it seems,
ever convinced the other;  there was plenty of conflicting data, for
decades.  But one side won new adherents and new researchers.


> _always_ competing hypotheses. [Consider Jaynes' example of the
> policeman seeing one who appears to be a masked burglar exiting from the
> broken window of a jewellery store with a bag of jewellery; he (the
> policeman) does *not* draw the perfectly logical conclusion that this
> might be the owner, returning from a costume party, and, having noticed
> that the window was broken, collecting his stock for safekeeping.] It is
> sufficient to show that your data are not consistent with hypotheses
> that are simpler or more plausible, or at least not much less simple or
> plausible.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Student's t vs. z tests

2001-04-22 Thread Rich Ulrich

On 21 Apr 2001 13:04:55 -0700, [EMAIL PROTECTED] (Will Hopkins)
wrote:

> I've joined this one at the fag end.  I'm with Dennis Roberts.  The way I 
> would put it is this:  the PRINCIPLE of a sampling distribution is actually 
> incredibly simple: keep repeating the study and this is the sort of spread 
> you get for the statistic you're interested in.  What makes it incredibly 
> simple is that I keep well away from test statistics when I teach stats to 
> biomedical researchers.  I deal only with effect (outcome) statistics.  I 
> even forbid my students and colleagues from putting the values of test 
> statistics in their papers.  Test statistics are clutter.
> 
> The actual mathematical form of any given sampling distribution is 
> incredibly complex, but only the really gifted students who want to make 
> careers out of statistical research need to come to terms with that.  The 

So you guys are all giving advice about teaching statistics to 
psychology majors/ graduates, who have no aspirations or 
potential for being anything more than "consumers" (readers)
of statistics?  Or (similar intent) to biomedical researchers?

Don't researchers deserve to be shown a tad more?

A problem that I have run into is that Researchers who are
well-schooled in the names and terms of procedures 
don't always recognize the leap to "good data analysis."
Actually, that can be true about people trained as biostatisticians, 
too, despite a modicum of exposure to case studies and Real Data,
and I suspect it is *usually*  true about people just emerging
from training as mathematical statisticians.

Just a couple of thoughts.

> rest of us just plug numbers into a stats package or spreadsheet.   I'm not 
> sure what would be a good sequence for teaching the mathematical 
> forms.  Binomial --> normal --> t is probably as good as any.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Question: Assumptions for Statistical Clustering (ie. Euclidean distance based)

2001-04-22 Thread Rich Ulrich

On Sun, 22 Apr 2001 16:23:46 GMT, Robert Ehrlich <[EMAIL PROTECTED]>
wrote:

> Clustering has a lot of associated problems.  The first is tha tof cluster
> validity--most algorithms define the existence of as many clusters as the user
> demands.  A very important problem is homogeneity of variance.  So a Z
> transformation is not a bad idea whether or not the variables are normal.

Unless you want the 0-1 variable to count as 10% as potent as the
variable scored 0-10.  The classical default analysis does let you
WEIGHT the variables, by using arbitrary scaling.  (Years ago, it was
typical, shoddy documentation of the standard default, that they
didn't warn the tyro.  Has it improved?  Has the default changed?)

> Quasi-normnality is about all you have to assume--the absence of intersample
> polymodality and the aproximation of the mean and the mode. However, to my
> knowledge, there is no satisfying "theory" associated withcluster analyis--only
> rules of thumb.
[ snip, original question ]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: ANCOVA vs. sequential regression

2001-04-20 Thread Rich Ulrich

On Fri, 20 Apr 2001 13:11:02 -0400, "William Levine"
<[EMAIL PROTECTED]> wrote:
 ...
> A study was conducted to assess whether there were age differences in memory
> for order independent of memory for items. Two preexisting groups (younger
> and older adults - let's call this variable A) were tested for memory for
> order information (Y). These groups were also tested for item memory (X).
> 
> Two ways of analyzing these data came to mind. One was to perform an ANCOVA
> treating X as a covariate. But the two groups differ with respect to X,
> which would make interpretation of the ANCOVA difficult. Thus, an ANCOVA did
> not seem like the correct analysis.

 - "potentially problematic" - but not always wrong.

> A second analysis option (suggested by a friend) is to perform a sequential
> regression, entering X first and A second to
> test if there is significant leftover variance explained by A.
 [ snip ...  suggestions? ]

Yes, you are right, that is exactly the same as the ANCOVA.

What can you do?  What can you conclude?  
That depends on  
 - how much you know and trust the *scaling*  of the X measure,
 - how much overlap there is between the groups, and 
 - how much correlation there is, X and Y.

You probably want to start by plotting the data.  When you use
different symbols for Age, what do you see about X and Y? and Age?

Here's a quick example of hard choices when groups don't match.

Assume:
group A improves, on the average, from a mean
score of 4, to 2.  Assume group B improves from 10 to 5

Then:  
 a) A is definitely better in "simple outcome" at 2 vs. 5;
 b) B is definitely better in "points of improvement" at 5 vs. 2;
 c) A and B fared exactly as well, in terms of "50% improvement"
(dropping towards a 0 that is apparently meaningful).

I would probably opt for that 3rd interpretation, given this set of
numbers, since the 3rd answer preserves a null hypothesis.

With another single set of numbers in hand, I would lean towards 
*whatever*  preserves the null.  But here is where experience is
supposed to be a teacher -- If you have dozens of numbers, 
eventually you have to read them with consistency, instead of
bending an interpretation to fit the latest set.  But if you do have
masses of data on hand, then you should have extra evidence 
about correlations, and about additive or multiplicative scaling.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Student's t vs. z tests

2001-04-19 Thread Rich Ulrich

On 19 Apr 2001 05:26:25 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:

 [ ... ] 
> The z test and interval do have some value as a pedagogical
> scaffold with the better students who are intended to actually
> _understand_ the t test at a mathematical level by the end of the
> course. 
> 
> For the rest, we - like construction crews - have to be careful
> about leaving scaffolding unattended where youngsters might play on it
> in a dangerous fashion.
> 
>   One can also justify teaching advanced students about the Z test so
> that they can read papers that are 50 years out of date. The fact that
> some of those papers may have been written last year - or next-  is,
> however, unfortunate; and we should make it plain to *our* students that
> this is a "deprecated feature included for reverse compatibility only".

Mainly, I disagree.

I had read 3 or 4 statistic books and used several stat programs
before I enrolled in graduate courses.  One of the *big surprises*  to
me was to learn that some statistics were approximations,
through-and-through, whereas others might be 'exact' in some sense.

Using z as the large sample test, in place of t, is approximate.  
Using z as the test-statistic on a dichotomy or ranks is exact, since
the variances are known from the marginal Ns.
Using z for *huge* N  is a desirable simplifications, now and then.

Is the 1-df chi-squared equally worthless, in your opinion?  
A lot of those exist, fundamentally, as the square of a z that 
*could*  be used instead (for example, McNemar's test).

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression hypotheses

2001-04-19 Thread Rich Ulrich

On Thu, 19 Apr 2001 10:27:40 -0400, "Junwook Chi" <[EMAIL PROTECTED]>
wrote:

> Hi everybody!
> I am doing Tobit analysis for my research and want to test regression
> hypotheses (Normality, Constant variance, Independence) using plots of
> residuals. I also want to check outliers and leverage. but I am not sure
> whether I could use these tests for the Tobit model (non-linear) or they
> apply only for linear regression. does anybody know it? thank you!
> 

Does your Tobit model have *testing* as part of it?

Normality might not matter much for any of the models.

Statistical tests - any of them - depend on how you weight your cases.
So, constant variance matters (more or less).  You can measure to see
how you get by.

Independence is a tricky notion, at times, but you surely need it.
It is best to assure independence from your model and your theory 
if you can,  because it can be tough to account for autocorrelation,
etc., after the fact.  This is also, "If you want useful testing."

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Simple ? on standardized regression coeff.

2001-04-18 Thread Rich Ulrich

On Tue, 17 Apr 2001 16:32:06 -0500, "d.u." <[EMAIL PROTECTED]>
wrote:

> Hi, thanks for the reply. But is beta really just b/SD_b? In the standardized
> case, the X and Y variables are centered and scaled. If Rxx is the corr matrix
 [ ... ]
No.  b/SD_b  is the t-test.
Beta is b, after it is scaled by the SD of X and the SD of Y.

Yes, beta is the b if X and Y  are  'scaled' to unit normal.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Simple ? on standardized regression coeff.

2001-04-17 Thread Rich Ulrich

On Mon, 16 Apr 2001 20:24:10 -0500, "d.u." <[EMAIL PROTECTED]>
wrote:

> Hi everyone. In the case of standardized regression coefficients (beta),
> do they have a range that's like a correlation coefficient's? In other
> words, must they be within (-1,+1)? And why if they do? Thanks!
> 
There is no limit on the raw coefficient, b, so there is no limit on
beta= b/SD.
In practice, b gets large when there is a suppressor relationship, so
that the x1-x2  difference is what matters, e.g.,  (10x1-9x2).

Beta is about the size of the univariate correlation when the
co-predictors balance out in their effects.  I usually want to
consider a different equation if any beta is greater than 1 or 
has the opposite sign from its  corresponding, initial r -- for 
instance, I might combine (X1, X2) in a rational way.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: In realtion to t-tests

2001-04-09 Thread Rich Ulrich

On Mon, 09 Apr 2001 10:44:40 -0400, Paige Miller
<[EMAIL PROTECTED]> wrote:

> "Andrew L." wrote:
> > 
> > I am trying to learn what a t-test will actually tell me, in simple terms.
> > Dennis Roberts and Paige Miller, have helped alot, but i still dont quite
> > understand the significance.
> > 
> > Andy L
> 
> A t-test compares a mean to a specific value...or two means to each
> other...
 [ ... ]

I remember my estimation classes, where the comparison was
always to ZERO for means.  To ONE, I guess, for ratios.
Technically  speaking, or writing.

For instance,   if the difference in averages X1, X2  is expected to
be zero, then  "{(X1-X2) -0 }"  ... is distributed as t .   It might
look like a lot of equations with the 'minus zero'  seemingly tacked
on, but  I consider this to be good form.  It formalizes as
   minus 

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Logistic regression advice

2001-04-09 Thread Rich Ulrich

On 6 Apr 2001 13:15:34 -0700, [EMAIL PROTECTED] (Zina Taran)
wrote:

 [ ... on logistic regression ]
ZT: "1). The 'omnibus' chi-squared for the equation.  Is it 
accurate to say that I can interpret individual significant
coefficients if (and only if) the equation itself is significant? "

Confused question.  Why do you label it the omnibus test?
When you think to use that term, the term is (mainly) a ROLE 
for the overall test, or for a test that subsumes a coherent set of
several tests;  sometimes you place use test that way, and 
sometimes you don't.

ZT: "2) A few times I added interaction terms and some things 
became significant.  Can I interpret these even if the interaction
variable itself (such as 'age') is not  significant?  Can I interpret
an interaction term if neither variable has a significant beta?"

Probably not.  Assuredly not, unless someone has used care and
attention (and knowledge) in the exact dummy-coding of the effects.


[ ... snip, 'Nagelkerke' that I don't recall; 'massive' regression
which is a term that escapes me, but I think it means, 'no hypotheses,
test everything'; and so I disapprove. ] 
ZT: "5) I know the general rule is 'just the facts'  in the results
section, meaning that there should be no explanation or 
interpretation regarding the results.  When writing the results
section do I specifically draw conclusions as to whether a 
hypothesis is supported or does that get left to the discussion?"

Do you have an Example that is difficult?  - It seems to me that 
if the analyses are straightforward, there should be little question
about what the 'results'  mean, when you lay them out in their own,
minimalist section.  In other words, leave discussion to the 
discussion; but that should be a re-cap of what's apparent.  
You hope.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: rotations and PCA

2001-04-08 Thread Rich Ulrich

 - Intelligence, figuring what it might be, and categorizing it, and
measuring it... I like the topics, so I have to post more.

On Thu, 05 Apr 2001 22:09:33 +0100, Colin Cooper
<[EMAIL PROTECTED]> wrote:

> In article <[EMAIL PROTECTED]>,
>  Rich Ulrich <[EMAIL PROTECTED]> wrote:
> 
> > I liked Gould's book.  I know that he offended people by pointing to
> > gross evidence of racism and sexism in 'scientific reports.'  But he
> > has (I think) offended Carroll in a more subtle way.  Gould is 
> > certainly partial to ideas that Carroll is not receptive to; I think
> > that is what underlies this critique.
> > 
> > ===snip
> 
> I've several problems with Gould's book.
> 
> (1)  Sure - some of the original applications of intelligence testing 
> (screening immigrants who were ignorant of the language using tests 
> which were grossly unfair to them) were unfair, immoral and wrong.  But 
> why impugn the whole area as 'suspect' because of the 
> politically-dubious activities of some researchers a century ago?  It

I think Gould to "impugned"  more than just one area.  The message, 
as I read it, was, "Be leery of social scientists who provide
self-congratulatory and self-serving, simplistic conclusions."

In recent decades, I imagine that economists have been bigger 
at that than psychologists.  Historians have quite a bit of 20th
century history-writing to live down, too.

 
> seems to me to be exceptionally surprising to find that ALL abilities - 
> musical, aesthetic, abstract-reasoning, spatial, verbal, memory etc. 
> correlate not just significantly but substantially.

Here is one URL  for references to Howard Gardner, who has
shown some facets of independence of abilities (and who you 
mention, below).
http://www.newhorizons.org/trm_gardner.html


> (2)  Gould's implication is that since Spearman found one factor 
> (general ability) whilst Thurstone fornd about 9 identifiable factors, 
> then factor analysis is a method of dubious use, since it seems to 
> generate contradictory models.  There are several crucial differences 

 - I read Gould as being more subtle than that.

> between the work of Spearman and Thurstone that may account for these 
> differences.  For example, (a)  Spearman (stupidly) designed tests 
> containing a broad spectrum of abilities: his 'numerical' test, for 
> example, comprised various sorts of problems - addition, fractions, etc.  
> Thurstone used separate tests for each: so Thurstone's factors 
> essentially corresponded to Spearman's tests. (b) Thurstone's work was 
> with students where the limited range of abilities would reduce the 
> magnitude of correlations between tests. (c)  More recent work (e.g., 
> Gustafsson, 1981; Carroll, 1993) using exploratory factoring and CFA 
> finds good evidence for a three-stratum model of abilities: 20+ 
> first-order factors, half a dozen second-order factors, or a single 
> 3rd-order factor.
> 
> (3)  Interestingly, Gardner's recent work has come to almost exactly the 
> same conclusions from a very different starting point.  Gardner 
> identiied groups of abilities which, according to the literature, tended 
> to covary - for example, which tend to develop at the same age, all 
> change following drugs or brain injury, which interfere with each other 
> in 'dual-task' experiments and so on.  His list of abilities derived in 
> this was is very similar to the factors identified by Gustaffson, 
> Carroll and others.

 - but Gardner has "groups of abilities" that are, therefore, distinct
from each other.  And also, only a couple of abilities are usually
rewarded (or even measured) in our educational system.  When I read
his book, I thought Gardner was being overly  "scholastic" in his
leaning, and restrictive in his data, too.

> I have a feeling that we're going to get on to the issue of whether 
> factors are merely arbitrary representations of sets of data or whether 
> some solutions are more are more meaningful than others - the rotational 
> indeterminacy problem - but I'm off to bed! 

Well, how much data can you load into one factor analysis? 
How much virtue can you assign to one 'central ability'?
 - I see the problem as philosophical instead of numeric.
What you will  *identify*  as a single factor (by techniques 
of today) will be more trivial than you want.

Daniel Dennett, in "Consciousness Explained," does a clever
job of defining consciousness.  And trivializing it; what I was
interested in (I reflect to myself) was something much grander, 
something more meaningful.  But intelligence and self-awareness 
are separate topics, and big ones.  Julian Jaynes's book was
more use

Re: Fw: statistics question

2001-04-06 Thread Rich Ulrich

I reformatted this.

Quoting a letter from Carmen Cummings to himself,
On 6 Apr 2001 08:48:38 -0700, [EMAIL PROTECTED] wrote:

> The below question was on my Doctorate Comprehensives in
> Education at the University of North Florida.
> 
> Would one of you learned scholars pop me back with 
>possible appropriate answers.

 the question
An educational researcher was interested in developing a
predictive scheme to forecast success in an elementary statistics
course at a local university. He developed an instrument with a
range of scores from 0 to 50. He administered this to 50 incoming
frechmen signed up for the elementary statistics course, before
the class started. At the end of the semester he obtained each of
the 50 student's final average. 

Describe an appropriate design to collect data to test the
hypothesis. 
= end of cite.

I hope the time of the Comprehensives is past.  Anyway, this
might be better suited for facetious answers, than serious ones.

The "appropriate design" in the strong sense:  

Consult with a statistician  IN ORDER TO "develop an instrument".  
Who decided only a single dimension should be of interest?  
(How else does one interpret a score with a "range" from 0 to 50?)

Consult with a statistician BEFORE administering something to --
selected?  unselected? -- freshman; and consult (perhaps) 
in order to develop particular hypotheses worth testing.  
I mean, the kids scoring over 700 on Math SATs will ace 
the course,  and the kids under 400 will have trouble.  

Generalizing, of course.  If "final average"  (as suggested) 
is the criterion, instead of "learning."
But you don't need a new study to tell you those results.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Repeated-measures t test for ratio level data

2001-04-06 Thread Rich Ulrich

replying to mine, and catching an error,
(My apologies for the error, please, and my
thanks to Jim for the catch ),

On Tue, 03 Apr 2001 15:05:20 -0700, James H. Steiger
 wrote:

> Things are not always what they seem.
> 
> Consider the following data:
> 
> A  B A/BLog A  Log B  Log A - Log 
>  3   13.477  0.477
>  1   3.333  0   .477 -.477
>  2   2 1.301.301  0
> 
> 
> The t test for the difference of logs obviously 
> gives a value of zero, while the t for the hypothesis
> that the mean ratio is 1 has a positive value.
> 
> This seems to show that the statement that the
> two tests are "precisely, 100% identical"
> is incorrect.
 [ snip, more ... ]


Yep, sorry -- I fear that I left out a step, even as I sat
and read the problem.  And when I read my own 1st draft
of an answer, I saw that it was worded a bit equivocally.
I made that statement firmer, but I forgot to make sure it
was still true, in detail.

{ 1/2, 1/1, 2/1 }  
( equal to .5, 1.0, 2.0)  clearly does not define equal steps.
Except, if you first take log(X).

The automatic advice for "ratios"  -- not always true, but 
always to be considered -- is "take the logs".  When you have 
a ratio ( above zero), there is far less room between 0-1  than 
above 1.  Is this asymmetry ever desirable, for the metric?  
Well, it *ought*  to be desirable, if you are going to use a ratio
without further transformation.  But I think it is not apt to be
desirable for human reaction times.

For log(A) and log(B),  consider:  log(A/B) = log(A) - log(B).
The one-sample test on *LOG*  of A/B  is the same as the 
difference in logs.

Those are the tests I had in mind... or should have had in mind.

-- 
Rich Ulrich, [EMAIL PROTECTED]

http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: attachments

2001-04-06 Thread Rich Ulrich

On Fri, 06 Apr 2001 13:34:03 GMT, Jerry Dallal
<[EMAIL PROTECTED]> wrote:

> "Drake R. Bradley" wrote:
> 
> > While I agree with the sentiments expressed by others that attachments should
> > not be sent to email lists, I take exception that this should apply to small
> > (only a few KB or so) gif or jpeg images. Pictures *are* often worth a
> > thousand words, and certainly it makes sense that the subscribers to a stat
> 
> It's worth noting that some lists have gateways to Usenet groups. 
> Usenet does not support attachments, so they will be lost to Usenet
> readers.  [ break ]

 - my Usenet connection seems to give me all the attachments.
But if I depended on a modem and a 7-bit protocol, I would be 
pleased if my ISP  filtered out the occasional, 100 kilobyte 8-bit
attachment.  (Some folk still use 7-bit protocols, don't they?)

> Also, even in the anything-goes early 21-st Century climate
> of the Internet, one big no-no remains the posting of binaries to
> non-binary groups.

Right; that's partly because of size.  My vendor has the practice,
these days, of saving ordinary groups for a week, binary groups
(which are the BULK of their internet feed) for 24 hours.  Binary
strings may be treated as screen-commands, if your Reader doesn't 
know to package them as an 'attachment' or otherwise ignore them.

Some attachments are binary, some are not.  
Standard HTML files are ASCII, with the added 'risk' 
(I sometimes look at it that way) of invoking an
immediate internet connection.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Repeated-measures t test for ratio level data

2001-04-03 Thread Rich Ulrich


Doing that one-sample t-test on the ratio is not a bad idea.

But it is not a new idea, either.  It is, precisely, 100% identical to
doing a repeated measures test on the logarithm of the raw numbers.
Which is the same as the paired t-test.


On 2 Apr 2001 11:53:11 -0700, [EMAIL PROTECTED] (Dr
Graham D Smith) wrote:

> I would like to start a discussion on a family of procedures 
> that tend not to be emphasised in the literature. The procedures 
> I  have in mind are based upon the ratio between two sets of 
> scores from the same sample.
[ ... snip, detail ]

> My feeling is that the t test for ratios should have a similar 
> status and profile as the repeated measures t test (on 
> differences). I suspect that the t test for differences is often 
> used when the t test for ratios would be more suitable. So 
> why is the procedure not more widely used? Perhaps this 
> is only a problem within psychology where ratio level data 
> is not commonly used.
[ snip, rest ]

Logarithms (if that is what is appropriate) is a more general start 
to a model.  Building directly on ratios is not as convenient.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: rotations and PCA

2001-04-02 Thread Rich Ulrich

On Sun, 01 Apr 2001 22:13:18 +0100, Colin Cooper
<[EMAIL PROTECTED]> wrote:

> ==snip  See Stephen Jay Gould's _The Mismeasure of Man_ for more 
> > details; note that Thurstone adopted varimax rotations because their 
> > results were consistent with *his* pet theories about intelligence.

> Hmm.  Gould's book is generally reckoned to be rather partial and not 
> particularly accurate - see for example JB Carroll's 'editorial review' 
> of the second edition in 'Intelligence' about 4 years ago.  (sorry - 
> haven't got the exact reference to hand).  Comrey & Lee's book is one of 

A google search on < Carroll Gould Intelligence > immediately hit
a copy of the article --

http://www.mugu.com/cgi-bin/Upstream/Issues/psychology/IQ/carroll-gould.html

I liked Gould's book.  I know that he offended people by pointing to
gross evidence of racism and sexism in 'scientific reports.'  But he
has (I think) offended Carroll in a more subtle way.  Gould is 
certainly partial to ideas that Carroll is not receptive to; I think
that is what underlies this critique.

After Google-ing Carroll, I see that he is a long-time researcher in 
"intelligence."  To me, it seems that Gould is in touch with the newer
stream of hypotheses about intelligence -- ideas that tend to
invalidate the basic structures of old-line theorists like Carroll.  

In the article, Carroll eventually seems to express high 
enthusiasm for 'new techniques' (compared to what 
Gould made use of)  in factor analysis.  I can say,
my own experience and reading has not led me to the same 
enthusiasm.   Am I missing something?


> the better introductions - Loehlin 'latent variable Models' is good if 
> you're coming to it from a structural equation modelling background.
> 
> Colin Cooper

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: convergent validity

2001-04-02 Thread Rich Ulrich

I'm coming in at a different slant from what I have seen posted 
on this thread (in sci.stat.edu).

On Thu, 29 Mar 2001 20:30:59 +0200, "H.Goudriaan"
<[EMAIL PROTECTED]> wrote:

 ...
> I have 2 questionnaires assessing (physical and emotional) health of
> heart patients. The 1st measures present state and it's assessed before
> treatment and a couple of months after treatment, so that difference
> scores can be calculated. The 2nd questionnaire is assessed after
> treatment only, and asks respondents how much they have changed on every
> aspect (same aspects as the first questionnaire) since just before
> treatment.
> Respondents received both questionnaires. Now I would like to
> investigate the convergent validity of the two domains assessed with
> both questionnaire versions. Is there a standard, straightforward way of
> doing this? Someone advised me to do a factoranalysis (PCA) (on the
> baseline items, the serially measured change scores and the
> retrosepctively assessed change scores) and then compare the
> factorloadings (I assume after rotation? (Varimax?)). I haven't got a
> good feeling about this method for two reasons:
> - my questionnaire items are measured on 5- and 7-point Likert scales,
> so they're not measured on an interval level and consequently not
> (bivariate) normally distributed;
 [ snip, about factor loading.]

If items were really Likert, they would be close enough to normal.

But there is no way (that comes to mind) that you should have labels
for "Change"  that are Likert:  Likert range is  "completely disagree"
... "completely agree"  and responses describe attitudes.  You can
claim to have Likert-type labels, if you do have a symmetrical set.
That is more likely to apply to your Present-Status reports, than to
Changes.  At any rate -- despite the fact that I have never found
clean definitions on this -- having a summed score is not enough 
to qualify a scale as Likert.

Thus, you *may*  be well-advised, if someone has advised you so, 
to treat your responses as 'categories' -- at least, until you do the
dual-scaling or other item analyses that will justify regarding them
as "interval."  For someone experienced  in drawing up scales, or 
if you were picking up items from much-used tests, that would 
not be a problem; but people are apt to make mistakes if they 
haven't seen those mistakes well-illustrated.

What is your question about individual items?  Are some, perhaps,
grossly inappropriate?  Or, too rarely marked?  If 11 are intended for
a "physical factor", there *should*  emerge a factor or principal
component to reflect it.  Ditto, for emotional.  Any items that don't
load are duds (that would be my guess).  Or do you imagine 2  strong
factors?  Again -- whatever happens should not come as much 
surprise if you've done this sort of thing before.

IF the items are done in strict-parallel, it seems unnecessary and
obfuscatory to omit a comparison of responses, item by item.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: rotations and PCA

2001-03-29 Thread Rich Ulrich

On Thu, 29 Mar 2001 10:17:09 +0200, "Nicolas Voirin"
<[EMAIL PROTECTED]> wrote:

> OK, thanks.
> 
> In fact, it's a "visual" method to see a set of points with the better
> view (maximum of variance).
> It's like to swivel a cube around to see all of its sides ... but this
> in more than 3D.
> When I show points in differents planes (F1-F2, F2-F3, F2-F4 ... for
> example), I make rotations, isn't it ?

I think I would use the term, "projection"  onto specific planes, if
you are denoting x,y, and z (for instance) with F1, F2, F3  :
You can look at the  x-y plane, the y-z plane,
and so on.

Here is an example in 2 dimensions, which suggests a simplified
version of an old controversy about 'intelligence'--
tests might provide two scores of  Math=110, Verbal= 90.
However, the abilities can be reported, with no loss of detail, as 
General= 100,  M-versus-V= +20.  Historically, Spearman wanted 
us all to conclude that "Spearman's g"   had to exist as a mental 
entity, since its statistical description could be reliably produced.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: One tailed vs. Two tailed test

2001-03-28 Thread Rich Ulrich

 - I finally get back to this topic -

On Fri, 16 Mar 2001 23:40:07 GMT, [EMAIL PROTECTED] (Jerry Dallal)
wrote:

> Rich Ulrich ([EMAIL PROTECTED]) wrote:
> 
> : Notice, you can take out a 0.1%  test and leave the main
> : test as 4.9%, which is  not effectively different from 5%.
> 
> I've no problem with having different probabilities in the 
> two tails as long as they're specified up front.  I say
> so on my web page about 1-sided tests.  I have concerns about 
> getting investigators to settle on anything other than 
> equal tails, but that's a separate issue.  
> The thing I've found interesting about
> this thread is that everyone who seems to be defending 
> one-tailed tests is proposing something other than a 
> standard one-tailed test!
> 
> FWIW, for large samples, 0.1% in the unexpected tail 
> corresponds to a t statistic of 3.09.  I'd love to 
> be a fly on the wall while someone is explaining to 
> a client why that t = 3.00 is non-significant!  :-)

 = concerning the 5.1% solution; asymmetrical testing
with 0.05  as a one-sided, nominal level of significance, 
and 0.001  as the other side (as a precaution).

Jerry,
In that last line, you are jumping to a conclusion.  
Aren't you jumping to a conclusion?

If the Investigator was seriously headed toward a 1-sided
test -- which (as I imagine it) is how it must have been, that
he could have been talked-around to the prospect of a 5.1%  
combined test instead -- then he won't be eager to 
jump on t= 3.00 as significant.  

I mean, it can be easier to "publish"  if you pass magic size,
but it is easier to avoid "perishing" in the long run, with a series
of connected hypotheses.

I think of the Investigator as torn three ways.

 a) Stick to the plan;  ignore the t=3.0, which is *not quite*  0.001.
'It did not reach the previously stated, 0.001  nominal level, and I
still don't believe it.  (And I don't want to furnish ammunition for
arguments for other things.)'  Practically speaking, the risk of
earning blame for stonewalling like that is not high.

 b) Run with it; claim that a two-sided test always *did*  make 
sense and the statistician was to blame for brain-fever, for
wanting 1-tailed in the first place.  (Or, never mention it.)
The fly on the wall probably would not see this.  
The statistician should have already quit.

 c) Report the outcome in the same diffident style as would have 
been earned by a 0.06  result in the other direction, "not quite
meeting the preset nominal test size, but it is suggestive."  
Unlike the 6% result, this one is unwelcome.  

T=3.00  will stir up investigation to try to undermine the implication
(such as it is).

I have trouble taking the imagined outcome much further without
speculating about where you have the trade-off between effect-size
and N;  and whether the "experimental design" was thoroughly robust --
and there's a different slant to the arguments if you are explaining
or explaining-away the results of uncontrolled observation.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: rotations and PCA

2001-03-28 Thread Rich Ulrich

On Wed, 28 Mar 2001 08:57:36 +0200, "Nicolas V." <[EMAIL PROTECTED]>
wrote:

> Hi,
> 
> What are "rotations" in PCA ?
> What is the difference between "rotated" and "unrotated" PCA ?
> Does it exist in others analysis ?

 = just on 'existence' =
Rotations certain exist in other analyses, and for other purposes.
Anytime you have a coordinate system, you have potential for drawing
in different axes, and then describing locations in terms of the new
system.  

On a map in 2D, you can describe positions as directions, N-E-S-W.
But if a river cuts along the diagonal, it could be more sensible to
describe cities as "up-river" from the ocean by some amount, 
and by how far they are from the main tributary. - simplification like
that, is the idea behind rotation.

Common Factors are usually selected from the full-rank set, and
rotated: so the description will be simpler.

The full-rank set of PCs is often used as a matter of convenience
(vectors are not correlated); and there's no help from rotation if
there's no separate description being used.

The set of "significant" Factors in canonical correlation might be
subjected to rotation, because they are rather like Common Factors;
but that is seldom done (in what I read).

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Data Reduction

2001-03-27 Thread Rich Ulrich

On 26 Mar 2001 19:12:22 -0800, [EMAIL PROTECTED] (Dianne Worth)
wrote:

> I have a regression model with (mostly) identifiable IVs.  In addition, 
> I want to examine another set of responses and have about 15-20 
> questions that relate to that new 'factor.'  
[ ... ]

There is just one factor to be defined?  
And you have a set of proposed questions, just 15 or 20?  
It seems like you should be able to look at your 
correlation matrix, or unrotated PC analysis, or FA, 
and drop the variables that are odd.

You add together the ones that are okay, and that's it.

Do you really need a second factor?  Or more?  
You might have to say why, before you figure out 
which analysis to do, and how many factors to save.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: 1 tail 2 tail mumbo jumbo

2001-03-25 Thread Rich Ulrich

On Mon, 19 Mar 2001 13:14:39 -0500, Bruce Weaver
<[EMAIL PROTECTED]> wrote:

> On Fri, 16 Mar 2001, Rich Ulrich wrote:
[ snip, including earlier post ] 
> > That ANOVA is inherently a 2-sided test.  So is the traditional 2x2
> > contingency table.   That is because,  sides  refer to  hypotheses.
> 
> 
> 
> 
> I agree with you Rich, except that I don't find "2-sided" all that
> appropriate for describing ANOVA.  For an ANOVA with more than 2 groups,
> there are MULTIPLE patterns of means that invalidate the null hypothesis,
> not just 2. With only 3 groups,for example:
> 
>   A < B < C
>   A < C < B
>   B < A < C
 [ ... ]

> And then if you included all of the cases where 2 of the means are equal
> to each other, but not equal to the 3rd mean, there are several more
> possibilities.  And these ways of departing from 3 equal means do not
> correspond to tails in some distribution.
> 
> There's my attempt to add to the confusion.  ;-)

If I convince people that they want only one *contrast*  for their
ANOVA, then it is just two-sided.  I've been talking people out
of blindly testing multiple-groups and multiple periods, for years.

Then I have to start over on the folks, to convince them about 
MANOVA.  If there are two groups and two variables, 
there are FOUR sides -- and that's if you just count what is
'significant' by the single variables.  Most of the possible results
are not useful ones; that is, they are not easily interpretable, when
no variable is 'significant' by itself, or when logical directions
seem to conflict.

We can interpret "group A is better than B."  And we analyze 
measures that have the scaled meaning, where one end is better.
So the sensible analysis uses a defined contrast, the 'composite 
score';  and then you don't have to use the MANOVA packages, 
and you have the improved power of testing just one or two sides.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: calculating reliability

2001-03-23 Thread Rich Ulrich

On 23 Mar 2001 02:53:11 GMT, John Uebersax <[EMAIL PROTECTED]>
wrote:

> Paul's comment is very apt.  It is very important to consider whether
> a consistent error should or should not count against reliability.
> In some cases, a constant positive or negative bias should not matter.

 - If you have a choice, you design your experiment so that a bias 
will not matter.  Assays may be conducted in large batches, or  the 
same rater may be assigned for both Pre and Post assessment.

> For example, one might be willing to standardize each measure before
> using it in statistical analysis.  The standardization would then
> remove differences due to a constant bias (as well as differences
> associated with a different variance for each measure/rating).

? so that rater A's BPRS  on the patient is divided by something, to
make it compare to rater B's rating?  That sounds hard to justify.
I agree that, conceivably, raters could want to use a scale
differently.  If that's a chance, then:  Before you start the study,
you train the raters to use the scale the same.

Standardizing for variance like that, between *raters,*  is 
something I don't remember  doing.   I do standardize for the 
observed SD  for a variable, when I create a composite score 
across several variables.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: calculating reliability

2001-03-22 Thread Rich Ulrich

On Thu, 22 Mar 2001 08:23:54 -0500, Bruce Weaver
<[EMAIL PROTECTED]> wrote:
> 
> On 21 Mar 2001, Awahab El-Naggar wrote:
> 
> > Dear Colleagues
> > I have been using "test-retest" method for calculating reliability by
> > applying the Pearson Product Moment (PPM) analysis. However, I have been
> > told that this not the right way to calculate reliability, and I should use
> > the ANOVA to calculate the reliability. Would you comment and advise me.
> > Many Thanks.
> > A'Wahab
> > 
> 
> Here are a couple sites that may provide some useful information:
> 
>   http://www.nyu.edu/acf/socsci/Docs/correlate.html
>   http://www.nyu.edu/acf/socsci/Docs/intracls.html

Awahab,

 = what is in your data =
If you want to know what you have in your data, you were doing it the
right way.  To be complete, you do want to look at the paired *t-test*
to check for systematic differences; and you want to confirm that the
variances are not too different.  If you have multiple raters, you
usually want to know about oddities for any single rater.

You can find other comments about reliability in my stats-FAQ.

 = publishing a single  number =
If you want to publish a simple, single number, then editors have been
trained to ask for an IntraClass Correlation (ICC) of some sort.  
The ICC reference Bruce W. cites above tells how SPSS now offers 
10 different ICCs, following some over-used, much-cited studies.  
The most common ICC (between two raters)  does a simple job
of confounding the Pearson correlation with the mean difference
(by assuming the means are equal), instead of inviting you look at
those two dimensions separately.  It can look pretty good, even when 
a t-test would give you a warning.  That's why I think of an ICC 
as a summary that "only an editor can love."  
   Once you have confirmed that you have good reliability, then you 
might want to do the ANOVA to get the ICC that an editor wants.
But a wise editor or reviewer should be pleased with suitable reports
of Pearson r   and  tests of means.


 = ICC for special purposes =
I have seen a study planned, where  3-rater estimates of some X
would be used, in order to increase the precision of X and reduce
the number of cases.  The estimate of the  eventual sample size 
used one particular species of ICC, from the many that are possible.
That's the legitimate reason for computing a special ICC (however,
I do have doubts about its accuracy).  Over the last thirty years, I
remember seeing that once.

-- 
Rich Ulrich, [EMAIL PROTECTED]


http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Seeking eingineering data sets

2001-03-21 Thread Rich Ulrich

On Tue, 20 Mar 2001 23:20:55 GMT, "W. D. Allen Sr."
<[EMAIL PROTECTED]> wrote:

> Check out XLStats at,
> 
> http://www.man.deakin.edu.au/rodneyc/XLStats.htm
> 
> I have used a number of stat programs and this one is the easiest to use for
> us non-professional statisticians. It solves what I believe is the biggest
> problem in statistics, i.e., which statistical inference test is appropriate
> for my particular problem.

 - Darn!   Those years of study and practice, all wasted!   Someone
just needed to sell me that magic box

By the way, WD Allen, you are in a horrible fix if you don't read
your  XLStats closer than you read the stats group.  
Jim asked for pointers to  *data*  and not to stat packs:

> "Jim Youngman" <[EMAIL PROTECTED]> wrote in message
> 9xHt6.40968$[EMAIL PROTECTED]">news:9xHt6.40968$[EMAIL PROTECTED]...
> > Can anyone point me to data sets related to (Civil) Engineering  that
> would
> > be suitable for use as examples in an elementary statistics course for
> > engineers?
===
Here is one reference.  It might not help, but it might inspire others
to mention journals, if this is now a popular thing.

I  discovered this a couple of weeks ago.  "Biometrics" has a number
of datasets online which have been used in their articles.  

I don't know what a civil engineer needs, so I can't judge their
relevance, even if I look at more than the title of the article.  


Click on 'data sets' on their main page - 
  http://stat.tamu.edu/Biometrics/ 

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Discriminant analysis group sample sizes

2001-03-21 Thread Rich Ulrich

On 20 Mar 2001 15:50:07 -0800, [EMAIL PROTECTED] (Brian M.
Schott) wrote:

BMS:  " Is it necessary to approximately balance the sample size in
the groups of the validation or holdout sample to  develop a good
discriminant function?"

The answer is generally No, but I'm not sure 
what the alternatives are supposed to be.

The 'holdout sample', if there's just one, doesn't 
do much to develop the function; it illustrates it.
Some points - being representative usually matter 
a lot (as opposed to merely being numerous).  
Never throw away free cases that you have in-hand,
just for the sake of achieving balance.


BMS:  " I am a little unclear about the extent to which the prior
probabilities can be used to adjust for sample proportions which do
not represent the population proportions.  I suspect there is a
difference  between predictive and descriptive discriminant analysis
in regard to this question, btw. But I cannot find a textbook that
addresses this question. "

In the usual, ordinary, discriminant function, the 'prior
probabilities'  play absolutely no role in the mathematics 
of the solution.  The DF is, in other terms, a problem in 
canonical correlation; or an eigenvector problem.  There's
no place in the basic problem for those weights to enter in.
[ You might assign weights on the groups, if you look at 
step-wise inclusion of variables -- but that whole prospect is
unappealing.  I have not bothered to see what is implemented. ]

The 'prior probabilities'  are used to in the step that describes the
(predicted) group memberships.   You draw lines in particular places. 
In your terms, you might say, it is a part of the 'descriptive
analysis' only.  However, there is NO  *analysis*  in a sense - 
you just have the description.  

Furthermore:  Using the priors is (often) not well understood.  
Most writers avoid  from saying much, because they haven't 
figured them out, either.

USUALLY, you do not need to (and should not) use priors.  
In the cases where they are used, USUALLY the adjustment 
(away from 50-50) should be less than proportionate to the Ns.
USUALLY, it is fair to draw a cutoff score (almost) arbitrarily.

I don't have an explicit reference.  The textbooks on my shelf
don't say much.  I can suggest looking for texts that mention
cost-benefit, and Decisions.  You might want to read journal
references on those topics, which are from the 1970s in my
texts.  I have not seen the new ones, but I suspect there are
citations for 1995+, to go along with  multiple-category logistic
regression being available in SPSS and SAS.  -- Try google.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Brogden-Clemens coefficient

2001-03-21 Thread Rich Ulrich

On Wed, 21 Mar 2001 01:08:00 GMT, Ken Reed <[EMAIL PROTECTED]>
wrote:

> Is anyone familiar with the Brogden-Clemens coefficient for measuring index
> reliability?
> 
> How is it calculated?
> 
> What is the original reference?

Is this your subject?
"When is a test valid enough?"
 http://www.aptitude-testing.com/brogden.htm

A google search on your subject drew a blank.
 didn't mention reliability.
 looks like the place to start.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: One tailed vs. Two tailed test

2001-03-19 Thread Rich Ulrich

On 16 Mar 2001 20:32:40 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:

[ ... ]
> seems to me when you fold over (say) a t distribution ... you don't have a 
> t distribution anymore ... mighten you have a chi square if before you fold 
> it over you square the values?
[ ... snip, rest ]

You are forgetting?   normal  z^2  is chi^squared.

And  t^2  with xxx degrees of freedom, is equal to F(1,xxx) d.f.

-- 
Rich U.
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: misues of statistics

2001-03-19 Thread Rich Ulrich

Elliot,  

the Baldus study  *might*  be a poor enough effort
that we shouldn't bother trying to figure what it said, 
and whether one court or another made good use of it --

On Fri, 16 Mar 2001 16:31:15 -0500, Elliot Cramer
<[EMAIL PROTECTED]> wrote:

> On Fri, 16 Mar 2001, Rich Ulrich wrote:
> 
> > Elliot,
> > 
> > It appears to me that Arnold Barnett is guilty 
> > of a serious misuse of statistical argument.

[ snip, various, mine and his.  Elliot had posted an article from
Barnett, concerning statistics offered for a court case.]

EC > 
> The point of the article is that the Supreme Court apparently understood
> the odd ratio to be a probability ratio.  The US district court did not
> make this mistake and issued a devastating critique of the Baldus Study
> which used linear regression instead of logistic regression, amongh other
> things.  It was VERY inadequate in dealing with nature of the crime which
> is the most important consideration in the death penalty.
[ ... ]

When I searched on "Baldus study",  Google included this page by the
Federation of American Scientists, with testimony to Congress in 1989.
The FAS is a lobbying organization whose testimony and data collection
have always been highly credible (and I have contributed money to FAS,
for years).

http://www.fas.org/irp/congress/1989_cr/s891018-drugs.htm

Statement of Edward S.G. Dennis, Jr., Assistant Attorney
General, Criminal Division 

[ ... ] 
"There appears to be a misconception that McCleskey involved
a judicial finding of systemic discrimination in the
imposition of the death penalty, and the upholding of
capital punishment despite such a finding. Any such reading
of the Court's opinion is contrary to fact. As I will
discuss in greater detail below, the district court in
McCleskey found that the empirical study on which the
systemic discrimination claim was based was seriously
flawed. The Supreme Court, in reviewing the case, did not
question the accuracy of the district court's findings. "

"In McCleskey, the defendant submitted a statistical study,
the Baldus study, that purported to show that a disparity in
the imposition of the death penalty in Georgia was
attributable to the race of the murder victim and, to a
lesser extent, the race of the defendant. Id. at 286. The
defendant argued that the Baldus study demonstrated that his
rights had been violated under the Eighth and Fourteenth
Amendments. "

[ ... ] 

"Second, as noted above, the Supreme Court simply assumed
that the Baldus study was statistically accurate in order to
reach the defendant's constitutional arguments. The record
is clear, however, that the Baldus study was significantly
flawed. As the Supreme Court noted, the district court in
the McCleskey care had examined the Baldus study `with
case,' following `an extensive evidentiary hearing.' 481
U.S. at 287. In the course of a thoughtful and exhaustive
opinion, the district court found that the Baldus study was
unpersuasive. Among many other things, the district court
found that the data compiled as the basis for the study was
incomplete and contained `substantial flaws' and that the
defendant had not established by a preponderance of the
evidence that the data was `essentially trustworthy.'
McCleskey v. Zant, 580 F. Supp. 338, 360 (N.D. Ga. 1984). 1 

"1 See also the Supreme Court's summary of the flaws in the
Baldus study found by the district court. 481 U.S. at 288
n.6. "
=== end of cite

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: can you use a t-test with non-interval data?

2001-03-18 Thread Rich Ulrich

On 17 Mar 2001 19:54:27 -0800, [EMAIL PROTECTED] (Will Hopkins)
wrote:

> I just thought of a new justification doing the usual parametric analyses 
> on the numbered levels of a Likert-scale variable.   Numbering the levels 
> is formally the same as ranking them, and a parametric analysis of a 
> rank-transformed variable is a non-parametric analysis.   If non-parametric 
> analyses are OK, then so are parametric analyses of Likert-scale variables.

Good comment.  

One thing that happened, in recent years, was that Conover, 
et al., showed that  you can to the t-test on Ranked data and 
get a really good approximation of the "exact" p-level, 
even when the Ns are quite small. 

Further:  Ranked data has theoretical problems with *ties* --
which is the chronic condition Likert-scale items.  In fact, using the
t-test on Ranks sometimes gives a better p-value that what your
textbook recommends for "adjusting for ties."  

Further again:  In the cases where there are "odd"  distributions,
in the several categories, you want to check to see what the
rank-tranformation assigns to categories as their effective "scores"
and then select between analyses.  For my data, the 1...5
assigned scoring almost always looks better than the intervals
achieved by ranks.

Agresti has a detailed example of arbitrary scoring of categories
in his textbook, "Introduction to categorical data analysis."

> 
> But...  an important condition is that the sampling distribution of your 
> outcome statistic must be normal.  This topic came up on this list a few 
> weeks ago.  In summary, if the majority of your responses are stacked up on 
> one or other extreme value of the Likert scale for one or more groups in 
> the analysis, and if you have less than 10 observations in one or more of 
> those groups, your confidence intervals or p values are untrustworthy.  See 
> http://newstats.org/modelsdetail.html#normal for more.

Good comment, too.  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: 1 tail 2 tail mumbo jumbo

2001-03-16 Thread Rich Ulrich

On 14 Mar 2001 08:33:29 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:

[ ... ] 
> however, i think that we definitely need some standardization and revamping 
> when it comes to using terms like 1 and 2 tailed tests ...
> the term "tail" ... either 1 tailed or 2 tailed ... should ONLY be used in 
> connection with what the test statistic that you have decided to use ... 
> naturally asks you to do with respect to deciding on critical values ...
[ snip, much]

I agree, that we need to be careful...  Maybe we need some
conventions?  I was less sure, until I read the following:

> 
> when we do a simple ANOVA ... this should be called a 1 tailed test ... no 
> matter what your research predictions are ... when we use chi square on a 
> contingency table ... it should be called a 1 tailed test ... no matter how 
> you think the direction of the relationship should go

Ooh, I don't like it, I don't like any mention of "1"  right here, in
either case.  Sure, "1"   is true, but it is mainly misleading and
irrelevant, right?

That ANOVA is inherently a 2-sided test.  So is the traditional 2x2
contingency table.   That is because,  sides  refer to  hypotheses.

The t-test is inherently 1-sided, like a z:  only the large, 
plus-sign  values have small p's.  But some people *always*  
refer to  2-sided probabilities of  z  and t.  That is, they use a
two-tailed t-test, (two-tailed z)  which is equivalent to using an
ANOVA F (chi-squared with 1 d.f.).

The default-test of any sort, I suggest, is "one-tailed"  and 
we get  p  from its Cumulative Distribution Function; and 
1-tailed does not have to be mentioned.  If we pool the tails, 
that requires special notice, and we should specify that the t
is "two-tailed."  

[ snip, more details;  including  't-test'  suggestions that 
are contrary to what I just wrote.]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: One tailed vs. Two tailed test

2001-03-16 Thread Rich Ulrich

Sides?  Tails?

There are hypotheses that are one- or two-sided.
There are distributions (like the t)  that are sometimes 
folded over, in order report "two tails" worth of p-level
for the amount of the extreme.

I don't like to write about these, because it is so easy
to be careless and write it wrong -- there is not an official
terminology.

On Thu, 15 Mar 2001 14:29:04 GMT, Jerry Dallal
<[EMAIL PROTECTED]> wrote:

> We don't really disagree.  Any apparent disagreement is probably due
> to the abbreviated kind of discussion that takes place in Usenet.
> See http://www.tufts.edu/~gdallal/onesided.htm
> 
> Alan McLean ([EMAIL PROTECTED]) wrote:
> 
> > My point however is still true - that the person who receives
> > the control treatment is presumably getting an inferior treatment. You
> > certainly don't test a new treatment if you think it is worse than
> > nothing, or worse than current treatments!
> 
> Equipoise demands the investigator be uncertain of the direction.
> The problem with one-tailed tests is that they imply the irrelevance
> of differences in a particular direction.  I've yet to meet the
> researcher who is willing to say they are irrelevant regardless of
> what they might be.
 [ ... ]

"Equipoise"?  I'm not familiar with that as a principle, though I
would guess

When I was taught testing, I was taught that using *one*  tail 
of a distribution is what is statistically intelligible, or natural.
Adding together the opposite extremes of the CDF,  as with a
"two-tailed t-test,"  is an arbitrary act.  It seems to be justified
or explained by pointing to the relation between tests on 
two means, t^2 = F.  Is that explanation enough?

Technically speaking (as I was taught, and as it still 
seems to me), there is nothing wrong with electing to 
take 4.5%  from one tail, and 0.5% from the other tail.
Someone has complained about this:  that is "really"  
what some experimenters do.  They say they plan a 
one-tailed t- test of a one-sided hypothesis.   However, 
they do not  *dismiss*  a big effect in the wrong direction, 
but they want to apply different values to it.  I say, This
does make sense, if you set up the tests like I just said.

That is:  I ask, What is believable?  
Yes, to a 4.4% test (for instance) in the expected direction.  
No, to a test of 2% or 1% or so, in the other  direction;
  - but:  Pay attention, if it is EXTREME enough.

Notice, you can take out a 0.1%  test and leave the main
test as 4.9%, which is  not effectively different from 5%.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: misues of statistics

2001-03-16 Thread Rich Ulrich
 a white-victim case is 0.99/0.01; in other
> words, a death sentence is 99 times as likely as the alternative. But even
> after being cut by a factor of 4.3, the odds ratio in the case of a black
> victim would take the revised value of 99/4.3 = 23, meaning that the
> perpetrator would be 23 times as likely as not to be sentenced to
> death. That is: 
> 
>  
> 
> Work out the algebra and you find that PB = 0.96. In other words, while a
> death sentence is almost inevitable when the murder victim is white, it is
> also so when the victim is black - a result that few readers of the "four
> times as likely" statistic would infer. While not all Georgia killings are
> so aggravated that PW = 0.99, the quoted study found that the heavy
> majority of capital verdicts came up in circumstances when PW, and thus
> PB, is very high. 
> 
> None of this is to deny that there is some evidence of race-of-victim
> disparity in sentencing. The point is that the improper interchange of two
> apparently similar words greatly exaggerated the general understanding of
> the degree of disparity. 

 - Now, the author is asserting that 1% versus 4%  is far different
from 99% versus 96%.  Statisticians should be leery of that.  

Yes, there are occasions when they differ: 1 versus 4 is an important
difference if you multiply the fractions  times costs or benefits.  
But I don't sense the relevance, when moving a fraction between 
categories of 'life in prison'  and 'death'.  

Steve Simon posted  a few weeks ago to one stats-group.  He rather
likes the likelihood approach, and he was citing someone else who
does;  whereas, I have posted several times about how foolish it seems
to me, both logically and mathematically, to model  'Likely' instead
of using Log-Odds.

>   Blame for the confusion should presumably be
> shared by the judges and the journalists who made the mistake and the
> researchers who did too little to prevent it. 

 - the judges and journalists missed the word; they missed the math
that would have made the word important; so they ended up with the
right conclusion.

> 
> (Despite its uncritical acceptance of an overstated racial disparity, the
> Supreme Court's McClesky v. Kemp decision upheld Georgia's death
> penalty. The court concluded that a defendant must show race prejudice in
> his or her own case to have the death sentence countermanded as
> discriminatory.) 

====
For what I have noticed,  omitting the Odds ratio is more likely to be
abusive, than *using*  it.  For instance,  98% of whites will complete
certain training, and 92% of blacks, that is another 4:1  Odds ratio.
There is not much difference in terms of success-rate (or
money-invested for training);  that is a big difference in failure
rate, which did seem to matter. 

 - I have seen that oversight in a newspaper report.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-03-15 Thread Rich Ulrich

 - I hate having to explain jokes -

On 14 Mar 2001 15:34:45 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:

> At 04:10 PM 3/14/01 -0500, Rich Ulrich wrote:
> 
> >Oh, I see.   You do the opposite.  Your own
> >flabby rationalizations might be subtly valid,
> >and, on close examination,
> >*do*  have some relationship to the questions
> 
> 
> could we ALL please lower a notch or two ... the darts and arrows? i can't 
> keep track of who started what and who is tossing the latest flames but ... 
> somehow, i think we can do a little better than this ... 

Dennis,
Please, where is YOUR sense of humor?   

My post was a literary exercise -- I intentionally posted his lines
immediately before mine, so the reader could follow my re-write 
phrase by phrase. 
I'm still hoping "Irving" will lighten up.

You chopped out the original that I was paraphrasing, and you did
*not*  indicate those important [snip]s -- You would mislead the
casual reader to think someone other than JimS is originating lines
like that, or intend them as critique in this group.
 - I'm not always kind, but I think I am never that wild.  
 - It's probably been a dozen years since I purely flamed like .

(Or maybe I never flamed, if you talk about the really empty ones.  
In the olden days of local Bulletin Boards, with political topics, I
discarded 1/3 of my compositions without ever posting, because of 
poor content or tone.  I still use some judgment in what I post.)


Compare his original line about  'little or no ... relationship'  with
my clever reversal,   "... on close examination, *do*  have some
relationship to the questions."

Well, I was trying for humor, anyway.  Sorry, if I missed.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-03-15 Thread Rich Ulrich

On 14 Mar 2001 21:55:48 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

> In article <[EMAIL PROTECTED]>,
> Rich Ulrich  <[EMAIL PROTECTED]> wrote:
> 
> >(This guy is already posting irrelevant rants as if 
> >I've driven him up the wall or something.  So this 
> >is just another poke in the eye with a blunt stick, to see
> >what he will swing at next)
> 
> I think we may take this as an admission by Mr. Ulrich that he is
> incapable of advancing any sensible argument in favour of his
> position.  Certainly he's never made any sensible response to my
> criticism.  

 - In a new thread, I have now provided a response that is sensible, 
or, at least, somewhat numeric.

I notice that Jim C.  has taken up the cudgel, in trying to explain
the basics of t-tests to Jim S, and that  "furthers my position."

I figure that after I state my position in one post, explicate it in
another, and try that again while refining the language -- then
I may as well call it quits with JS, when he still doesn't get the
points from the first (or from the couple of other people who
were posting them before I was).

I may not be saying it all that well, but I wasn't inventing the
position.

You and I are in agreement, now, on one minor conclusion:  
"The t-test isn't good evidence about a difference in averages."
But for me, that's true because the numbers are crappy 
indicators of performance -- which was clued *first*  by the 
distribution.

Whereas, you seem to have much more respect for crude
averages, compared to the several of us who object.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



MIT numbers. Was: Re: On inappropriate hy.

2001-03-15 Thread Rich Ulrich

Neal, 
I did intend to respond to this post -- you seem  serious about this, 
more so than "Irving."

On 13 Mar 2001 22:36:03 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

[ snip,  previous posts on what might be tested ]
> 
> None of you said it explicitly, because none of you made any coherent
> exposition of what should be done.  I had to infer a procedure which
> would make sense of the argument that a significance test should have
> been done.
> 
> NOW, however, you proceed to explicitly say exactly what you claim not
> to be saying:
> 
RU> >
> >I know that I was explicit in saying otherwise.  I said something
> >like,  If your data aren't good enough so you can quantify this mean
> >difference with a t-test, you probably should not be offering means as
> >evidence. 

 - This is a point that you are still missing.  
I am considering the data...  then rejecting the *data*  as lousy.

I'm *not*  drawing substantial conclusions (about the original
hypotheses) from the computed t, or going ahead with further tests.
 
NR>
> In other words, if you can't reject the null hypothesis that the
> performance of male and female faculty does not differ in some
> population from which the actual faculty were supposedly drawn, then
> you should ignore the difference in performance seen with the actual
> faculty, even though this difference would - by standard statistical
> methodology explained in any elementary statistics book - result in a
> higher standard error for the estimate of the gender effect, possibly
> undermining the claim of discrimination.

- Hey, I'm willing to use the honest standard error.  When I have
decent numbers to compare.  But when the numbers are not *worthy*  
of computing a mean, then I resist comparing means.
 
RU> >
> > And,  Many of us statisticians find tests to be useful,
> >even when they are not wholly valid.  
>
NR> 
> It is NOT standard statistical methodology to test the significance of
> correlations between predictors in a regression setting, and to then
> pretend that these correlations are zero if you can't reject the null.

 - again, I don't know where you get this.  

Besides, on these data, "We reject the null..."  once JS finally did 
a t-test.  But it was barely 5%.  

And now I complain that there is a huge gap.  It is hard to pretend
that these numbers were generated as small, independent effects that
are added up to give a distribution that is approximately normal.  

[ snip, some ] 
RN>
> So the bigger the performance differences, the less attention should
> be paid to them?  Strange...
> 
Yep, strange but true.

They would be more convincing if the gap were not there.

The t-tests (Students/ Satterthwaite) give p-values of .044 and .048
for the comparison of raw average values, 7032 versus 1529.
If we subtract off 5000 from each of the 3 large counts (over 10,000),
the t-tests have p-values of .037 and .036,  comparing 4532 versus
1529.  

Subtract 7000 for the three, p-values are hardly different, at .043,
.040; comparing counts of 3532 versus 1539.  

In my opinion, this final difference rates (perhaps) higher on the
scale of "huge differences"  than the first one:  the t-tests are
about equal, but the actual numbers (in the second set) don't confirm 
any suspicions about a bad distribution.   The first set is bad enough
that "averages"  are not very meaningful.



http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-03-14 Thread Rich Ulrich

On Tue, 13 Mar 2001 14:04:19 -0800, Irving Scheffe (JS)
<[EMAIL PROTECTED]> wrote:

> Actually, in practice, the decisions are seldom made
> on the basis of rational evaluation of data. They
> are usually made on the basis of political pressure,
> with thin, and obviously invalid, pseudo-rationalizations
> on the basis of data that, on close examination, have
> little or no necessary relationship to the questions
> being asked.

Oh, I see.   You do the opposite.  Your own
flabby rationalizations might be subtly valid, 
and, on close examination, 
*do*  have some relationship to the questions

[ snip, one sentence of post, plus irrelevant citation. ]

(This guy is already posting irrelevant rants as if 
I've driven him up the wall or something.  So this 
is just another poke in the eye with a blunt stick, to see
what he will swing at next)
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-03-13 Thread Rich Ulrich

On 12 Mar 2001 14:25:41 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

[ snip, baseball game; etc. ] 
> In this context, all that matters is that there is a difference.  As
> explained in many previous posts by myself and others, it is NOT
> appropriate in this context to do a significance test, and ignore the
> difference if you can't reject the null hypothesis of no difference in
> the populations from which these people were drawn (whatever one might
> think those populations are).

So far as I remember, you are the only person who imagined that
procedure,  "do a test and ignore ... if you can't reject"  Oh,
maybe Jim, too.

I know that I was explicit in saying otherwise.  I said something
like,  If your data aren't good enough so you can quantify this mean
difference with a t-test, you probably should not be offering means as
evidence.  And,  Many of us statisticians find tests to be useful,
even when they are not wholly valid.  As evidence, I pointed to the
(over-) acceptance of observational studies in epidemiology.  I think
I made those arguments at least two or three times, each.


As it turns out, the big gap in the "scores" makes those averages
dubious, even though a t-test *is*  nominally significant.  
(That's so, when computed on X  or on log(X),  but not so, on  1/X.)

And then, as I later discovered, the arguments and the 
style of the original report make Jim's criticism tenuous.  
Even if you were to illustrate how all the males have 
out-achieved all the females, by one criterion or by several 
criteria, you would not discredit the decision of the dean --  
Wasn't  the report was talking more about 
'what all our faculty deserve'  instead of what's earned by
individuals?  You guys have skipped that half.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-03-08 Thread Rich Ulrich

On Thu, 08 Mar 2001 10:38:59 -0800, Irving Scheffe
<[EMAIL PROTECTED]> wrote:

> On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich <[EMAIL PROTECTED]>
> wrote:
> 
> >On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving
> >Scheffe) wrote:
> >
> >My comments are written as responses to the technical 
> >comments to Jim Steiger's last post.  This is shorter than his post,
> >since I omit redundancy and mostly ignore his 'venting.'
> >I think I offer a little different perspective on my previous posts. 
> >
> >[ snip, intro. ]
> 
> Mr. Ulrich's latest post is a thinly veiled ad hominem, and
> I'd urge him to rethink this strategy, as it does not
> present him in a favorable light. 

 - I have a different notion of ad-hominem, since I think it is
something directed towards 'the person'  rather than at the
presentation.  Or else, I don't follow what he means by 'thinly
veiled.'

When a belligerent and nasty and arrogant tone seems to be
an essential part of an argument, I don't consider myself to be
reacting 'ad-hominem' when I complain about it -- it's not that I
hate to be ad-hominem, but I don't like to be misconstrued.

I'm willing, at times, to plunk for the 'ad-hominem'.   
For instance, since my last post on the subject, I looked at those
reports. Also, I searched with google for the IWF -- who printed the
anti-MIT critiques.  I see the organization characterized as an
'anti-feminist' organization, with some large funding from Richard
Scaife.  'Anti-feminist'  could mean a reasoned-opposition, or a
reflex opposition.  Given these papers, it appears to me to qualify as
'reflex' or kneejerk opposition.  Oh, ho! I say,  this explains where
the arguments came from, and why Jim keeps on going --  
Now, THIS PARAGRAPH   is what I consider an ad-hominem argument.  
And I'll give you some more.

Scaife is a paranoid moneybags and publisher who infests this
Pittsburgh region (which is why I have noticed him more than a
westerner like Coors).  His cash was important in persecuting Clinton
for his terms in office.   For example, Scaife  kept alive Victor
Foster's suicide for years.  He held out money for anyone willing to
chase down Clinton-scandals.  Oh, he funded the chair at Pepperdine
that Starr had intended to take.

Now:  My comment on the original reports:  I am happy to say that it
looks to me as if MIT is setting a good model for other universities
to follow.  The senior administrator listens to his faculty,
especially his senior faculty, and responds.  

MIT makes no point about numbers in their statements, and it 
does seem to be wise and proper that they don't do so.  

I see now, Jim is not really arguing with MIT.  They won't argue back.

Jim's purpose  is to create a hostile presence, a shadow to threaten 
other administrators.  He goes, like, "If you try to 'cut a break'
for women, we'll be watching and threatening and undermining,
threatening your job if we can."  

I suppose state universities are more vulnerable than the private
universities like MIT.  On the other hand, with the numbers that Jim
has put into the public eye, the next administrator can point to the
precedent of MIT and assert that, clearly, the simple numbers on
'quality' are substantially irrelevant to the issues, since they were
irrelevant at MIT.

Hope this helps.

-- 
Rich Ulrich, [EMAIL PROTECTED]

http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Trend analysis question: follow-up

2001-03-06 Thread Rich Ulrich

On 5 Mar 2001 16:41:22 -0800, [EMAIL PROTECTED] (Donald Burrill)
wrote:

> On Mon, 5 Mar 2001, Philip Cozzolino wrote in part:
> 
> > Yeah, I don't know why I didn't think to compute my eta-squared on the 
> > significant trends. As I said, trend analysis is new to me (psych grad
> > student) and I just got startled by the results.
> > 
> > The "significant" 4th and 5th order trends only account for 1% of the
> > variance each, so I guess that should tell me something. The linear 
> > trend accounts for 44% and the quadratic accounts for 35% more, so 79% 
> > of the original 82% omnibus F (this is all practice data).
> > 
> > I guess, if I am now interpreting this correctly, the quadratic trend 
> > is the best solution.
DB >
>   Well, now, THAT depends in part on what the 
> spectrum of candidate solutions is, doesn't it?  For all that what you 
> have is "practice data", I cannot resist asking:  Are the linear & 
> quadratic components both positive, and is the overall relationship 
> monotonically increasing?  Then, would the context have an interesting 
> interpretation if the relationship were exponential?  Does plotting 
 [ snip, rest ]

"Interesting interpretation" is important.  In this example, the
interest (probably) lies mainly with the variance-explained: 
in the linear and quadratic.

It's hard for me to be highly interested in an order-5 polynomial,
and sometimes a quadratic seems unnecessarily awkward.

What you want is the convenient, natural explanation.  
If "baseline" is far different from what follows, that will induce 
a bunch of high order terms if you insist on modeling all the 
periods in one repeated measures ANOVA.  A sensible
interpretation in that case might be, to describe the "shock effect"
and separately describe what happened later.

Example.
The start of Psychotropic medications has a huge, immediate,
"normalizing"  effect on some aspects of sleep of depressed patients
(sleep latency, REM latency, REM time, etc.).  Various changes 
*after*  the initial jolt can be described as no-change;  continued
improvement;  or  return toward the initial baseline.  

In real life, linear trends worked fine for describing the on-meds
followup observation nights (with - not accidentally - increasing
intervals between them).
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Census Bureau nixes sampling on 2000 count

2001-03-04 Thread Rich Ulrich

On Fri, 02 Mar 2001 12:16:42 GMT, [EMAIL PROTECTED] (J. Williams)
wrote:

> The Census Bureau urged Commerce Secretary Don Evans on Thursday not
> to use adjusted results from the 2000 population count.  Evans must
> now weigh the recommendation from the Census Bureau, and will make the
> decision next week.  If the data were adjusted statistically it  could
> be used to redistribute and remap political district lines. William
> Barron, the Bureau Director, said in a letter to Evans that he agreed
> with a Census Bureau committee recommendation "that unadjusted census
> data be released as the Census Bureau's official redistricting data."
> Some say about 3 million or so people make up a disenfranchising
> undercount.  Others disagree viewing sampling as a method to "invent"
> people who have not actually been counted.  Politically, the stakes
> are high on Evans' final decision.

People may wonder, 
"Why did the Census Bureau say this, and why is there little criticism
of them?"

According to the reports of a few weeks ago, the inner-city counts,
etc.,  of this census were quite a bit more accurate than they were 10
years ago.  That means that we couldn't be so sure that adjustment
would make a big improvement, or any improvement.

This frees Republicans of some blame, for this one instance, of
pushing specious technical arguments for short-term GOP gain.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Post-hoc comparisons

2001-03-02 Thread Rich Ulrich

On 2 Mar 2001 07:27:16 -0800, [EMAIL PROTECTED] (Esa M. Rantanen)
wrote:

[ snip, detail ]
> contingency table.  I have used a Chi-Sq. analysis to determine if there is
> a statisitcally significant difference between  the (treatment) groups (all
> 4!), and indeed there is.  I assume, however, that I cannot simply do
> pairwise comparisons between the groups using Chi-Sq. and 2 x 2 matrices
> without inflating the probability of Type 1 error, (1-alpha)^4 in this
> case.  As far as I know, there are no equivalents to Duncan's or Tukey's
> tests for the type of data (binary) I have to deal with.

Well, if you want to do the ANOVA on the dichotomous variable, 
I won't complain.  My reaction is, you are assuming that, somewhere,
great precision matters.  But being precise in your thinking will gain
you most, so that you do and report just ONE important test, that you 
figured out beforehand,  instead of trying to cope with 6 tests that
happen to fall into your lap.

I would probably 
  (a) Let the Overall test justify all my followup testing, where the
followup testing is descriptive, among categories of equal N and
equivalent importance; or  
  (b) Do a few specified tests with Bonferroni correction, and report
those tests.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-03-02 Thread Rich Ulrich
 that 
judgment is impertinent.  If these data were carefully designed, 
I should expect more qualification and justification to them; 
aren't they a crude number?  - Perhaps I miss something by not 
reading the papers, but, if so, you should have pointed Gene and
Dennis politely to the details, instead of blundering around and 
making it appear that "this one is huge"  is your whole basis.
My commentary is devoted to your presentation, here.

[ snip, "importance of issue" and more redundancy.]

Hope that helps.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Regression with repeated measures

2001-03-01 Thread Rich Ulrich

On 28 Feb 2001 09:24:55 -0800, [EMAIL PROTECTED] (Mike Granaas) wrote:

> 
> I have a student coming in later to talk about a regression problem.
> Based on what he's told me so far he is going to be using predicting
> inter-response intervals to predict inter-stimulus intervals (or vice
> versa).
 - Is it just me, or is that sentence hard to parse?
 " ... he is going to be using
   predicting inter-response intervals
   to predict inter-stimulus intervals (or vice versa)."

Since I am accustomed to S -> R,
I assume the 'vice-versa' must be the case; it leaves me with
   "Intervals between stimuli that predict, predicting  intervals
between responses."

Can I drop the word 'predicting' that seems (to me) accidental?

Well, it seems to me that an 'interval'  can be a stimulus or 
a measure of response, but when the problem keeps that 
terminology, it (further) suggests to me that data are 
collected as a time-series.
 - If so, Time-series has to be incorporated, from the start.

> 
> What bothers me is that he will be collecting data from multiple trials
> for each subject and then treating the trials as independent replicates.
> That is, assuming 10 tials/S and 10 S he will act as if he has 100
> independent data points for calculating a bivariate regression.
>  
> Obviously these are not independent data points.
>  
> Is the non-independence likely to be severe enough to warrant concern?
>  
> If yes, is there some method that will allow him to get the prediction
> equation he wants?

- Can he do a prediction equation on one person?
If there's a parameter for a person, then 
he has 10 people, each of whom yields a parameter value.
A test, of sorts, might be possible on the scores for one person.

But the generalization is tested using the 10 scores, 
comparing those parameter values to some null.

His power-of-analysis will be much better if he can define
his hypotheses from the start, instead of trying to let a
pattern 'emerge from the data'  across the 10 consecutive trials.


-- 
RIch Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Cronbach's alpha and sample size

2001-02-28 Thread Rich Ulrich

On Wed, 28 Feb 2001 12:08:55 +0100, Nicolas Sander
<[EMAIL PROTECTED]> wrote:

> How is Cronbach's alpha affected by the sample size apart from questions
> related to generalizability issues?

 - apart from generalizability, "not at all."
> 
> Ifind it hard to trace down the mathmatics related to this question
> clearly, and wether there migt be a trade off between N of Items and N
> of sujects (i.e. compensating for lack of subjects by high number of
> items).

I don't know what you mean by 'trade-off.'   I have trouble trying to
imagine just what it is, that you are trying to trace down.
But, NO.  

Once you assume some variances are equal, Alpha can be seen 
as a fairly simple function of the number of items and the average
correlation -- more items, higher alpha.   The average correlation has
a tiny bias  by N, but that's typically, safely ignored.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Satterthwaite-newbie question

2001-02-28 Thread Rich Ulrich

On Wed, 28 Feb 2001 08:26:30 -0500, Christopher Tong
<[EMAIL PROTECTED]> wrote:

> On Tue, 27 Feb 2001, Allyson Rosen wrote:
> 
> > I need to compare two means with unequal n's. Hayes (1994) suggests using a
> > formula by Satterthwaite, 1946.  I'm about to write up the paper and I can't
> > find the full reference ANYWHERE in the book or in any databases or in my
> > books.  Is this an obscure test and should I be using another?
> 
> Perhaps it refers to:
> 
> F. E. Sattherwaite, 1946:  An approximate distribution of estimates of
> variance components.  Biometrics Bulletin, 2, 110-114.
> 
> According to Casella & Berger (1990, pp. 287-9), "this approximation
> is quite good, and is still widely used today."  However, it still may
> not be valid for your specific analysis:  I suggest reading the
> discussion in Casella & Berger ("Statistical Inference", Duxbury Press,
> 1990).  There are more commonly used methods for comparing means with
> unequal n available, and you should make sure that they can't be used
> in your problem before resorting to Sattherwaite.

I don't have access to Casella & Berger, but I am curious about what
they recommend or suggest.  Compare means with Student's t-test or
logistic regression; or Satterthwaite t if you can't avoid it if both
means and variances are different enough, and you wouldn't rather do
some transformation (for example, to ranks:  then test Ranks).  And
there's randomization and bootstrap.  Anything else?

Yesterday (so it should still be on your server), there was a post
with comments about the t-tests.
 from the header
From: [EMAIL PROTECTED] (Jay Warner)
Newsgroups: sci.stat.edu
Subject: Re: two sample t


There are *additional* methods for comparing, but the one that is
*more common* is probably the Student's t, which  ignores the
inequality.

Any intro-stat-book with the t-test is likely to have one or another
version of the Satterthwaite t.  The SPSS website includes algorithms
for what that stat-package uses, under t-test, for "unequal
variances."  I find it almost impossible to find the algorithms by
navigating the site, so here is an address --
http://www.spss.com/tech/stat/Algorithms.htm

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: two sample t

2001-02-26 Thread Rich Ulrich

On 26 Feb 2001 12:26:19 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:

> when we do a 2 sample t test ... where we are estimating the population 
> variances ... in the context of comparing means ... the test statistic ...
> 
> diff in means / standard error of differences ... is not exactly like a t 
> distribution with n1-1 + n2-1 degrees of freedom (without using the term 
> non central t)
> 
> would it be fair to tell students, as a thumb rule  ... that in the case where:
> 
>   ns are quite different ... AND, smaller variance associated with larger 
> n, and reverse ... is the situation where the test statistic above is when 
> we are LEAST  comfortable saying that it follows (close to) a t 
> distribution with n1-1 + n2-1 degrees of freedom?
> 
> that is ... i want to set up the "red flag" condition for them ...
> 
> what are guidelines (if any) any of you have used in this situation?

Neither extreme is better than the other.  Student's t-test and that
Satterthwaite test have their problems in the opposite directions.

With unequal Ns and unequal variances, and a one-tailed test,
 - one t-test will be too small (rejecting, approximately, never) and
 - the other will be too big (rejecting about twice as often);
 - making the TWO-tailed versions come out 'robust'!  for size.

Neither direction is better until you decide what bias you want.

-- 
Rich Ulrich, [EMAIL PROTECTED]



http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-02-26 Thread Rich Ulrich

 - I want to comment a little more thoroughly about the lines I cited:
what Garson said about inference, and his citation of Olkey.


On Thu, 22 Feb 2001 18:21:41 -0500, Rich Ulrich <[EMAIL PROTECTED]>
wrote:

[ snip, previous discussion ]

me >
> I think that Garson is wrong, and the last 40 years of epidemiological
> research have proven the worth of statistics provided on non-random,
> "observational"  samples.  When handled with care.
> 
> From G. David Garson, "PA 765 Notes: An Online Textbook."
> 
> On Sampling
> http://www2.chass.ncsu.edu/garson/pa765/sampling.htm
> 
> Significance testing is only appropriate for random samples.
> 
> Random sampling is assumed for inferential statistics
> (significance testing). "Inferential" refers to the fact
> that conclusions are drawn about relationships in the data
> based on inference from knowledge of the sampling
> distribution. Significance tests are based on a sampling
> theory which requires that every case have a chance of being
> selected known in advance of sample selection, usually an
> equal chance. Statistical inference assesses the
> significance of estimates made using random samples. For
> enumerations and censuses, such inference is not needed
> since estimates are exact. Sampling error is irrelevant and
> therefore inferential statistics dealing with sampling error
> are irrelevant. 

 - I agree with most of what he says, throughout; there will be a
matter of nuances on interpretation and actions.

For enumerations and censuses, a limited sort of statistics on 'finite
populations,' he says sampling error is irrelevant.  Irrelevant is a
good and fitting word here.  This is not 'illegal  and banned,'  but
rather 'unwanted and totally beside the point.'

Garson >
>  Significance tests are sometimes applied
> arbitrarily to non-random samples but there is no existing
> method of assessing the validity of such estimates, though
> analysis of non-response may shed some light. The following
> is typical of a disclaimer footnote in research based on a
> non random sample: 

Here is my perspective on testing, which does not match his.
 - For a randomized experimental design,  a small p-level on 
a "test of hypothesis" establishes that *something*  seemed 
to happen, owing to the treatment; the test might stand 
pretty-much by itself.
 - For a non-random sample, a similar test establishes that
*something*  seems to exist, owing to the factor in question 
*or*  to any of a dozen factors that someone might imagine.  
The test establishes, perhaps, the  _prima facie_  case  but the
investigator has the responsibility of trying to dispute it.  

That is, it is an investigator's responsibility (and not just an
option) to consider potential confounders and covariates.  
If the small p-level stands up robustly, that is good for the 
theory -- but not definitive.  If there are vital aspects or factors
that cannot be tested, then opponents can stay unsatisfied, 
no matter WHAT the available tests may say.


Garson > 
> "Because some authors (ex., Oakes, 1986) note the use of
> inferential statistics is warranted for nonprobability
> samples if the sample seems to represent the population, and
> in deference to the widespread social science practice of
> reporting significance levels for nonprobability samples as
> a convenient if arbitrary assessment criterion, significance
> levels have been reported in the tables included in this
> article." See Michael Oakes (1986). Statistical inference: A
> commentary for social and behavioral sciences. NY: Wiley. 
> 

Garson is telling his readers and would-be statisticians  a way to
present p-levels,  even when the sampling doesn't justify it.
And, I would say, when the analysis doesn't justify it.
I am not happy with the lines -- The disclaimer does not assume 
that a *good*  analysis has been done, nor does it point to what 
makes up a good analysis.  

 '... if the sample seems to represent the population'  
seems to be a weak reminder of the proper effort to overcome 
'confounding factors';  it is not an assurance that the effects 
have proven to be robust.  

So, the disclaimer should recognize that the non random sample 
is potentially open to various interpretations; the present analysis
has attempted to control for several possibilities;  certain effects
do seem robust statistically, in addition to being supported by 
outside chains of inference, and data collected independently.

I suggested earlier that this is the status of epidemiological,
observational studies.  For the most part, those studies have 
been quite fruitful.  But not always.  They have been e

Re: Sample size question

2001-02-23 Thread Rich Ulrich

On 23 Feb 2001 12:08:45 -0800, [EMAIL PROTECTED] (Scheltema,
Karen) wrote:

> I tried the site but received errors trying to download it.  It couldn't
> find the FTP site.  Has anyone else been able to access it?

As of a few minutes ago, it downloaded fine for me, when I clicked on
it with  Internet Explorer.  The  .zip  file expanded okay.  I used
right-click (I just learned that last week) in order to download the
 .pfd  version of the help.

[ ... ]

< Earlier Q and Answer >
"Can anyone point me to software for estimating ANCOVA or regression
sample sizes based on effect size?"
> > Look here:
> > http://www.interchg.ubc.ca/steiger/r2.htm


Hmm.  Placing limits on R^2.  I have't read the 
accompanying documentation.  

On the general principal that you can't compute power
if you don't know what power you are looking for, I suggest reading
the relevant chapters in Jacob Cohen's book (1988+ edition).

-- 
Rich Ulrich, [EMAIL PROTECTED]


http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-02-22 Thread Rich Ulrich

On Mon, 19 Feb 2001 04:27:24 GMT, [EMAIL PROTECTED] (Irving
Scheffe) wrote:

> In responding to Rich, I'll intersperse selected comments with
> selected portions of his text and append his entire post below.

 - I'm not done with the topic yet.  But it is difficult to go on from
this point.

I think the difficulty is that JS has constructed his straw-man
argument about how "hypotheses" are handled; and since it 
is a stupid strategy, it is easy for him to claim that it is fatally
flawed.

>From his insistence on his "examples,"  it seems to me that he
believes that someone else is committed to using p-levels in a strict
way, by beating 5%.  That's certainly not the case for me, and I
doubt if anyone defends or promotes it, outside of carefully designed 
Controlled Random Experiments.

Despite the fact that I could not make sense of WHY he wanted
his example, it turns out -- after he explains it more -- that my own
analysis covered the relevant bases.  I agree, if you don't have
"statistical power,"  then you don't ask for a 5%  test, or (maybe) 
any test at all.  The JUSTIFICATION for having a test on the MIT
data is that the power is sufficient to say something.  

And what it said is that Jim did BAD INFERENCE.  I said that a 
couple of times.  I regret that I may have confused people with
unnecessary words about "inference."
 Outlier =>  No central tendency =>  Mean is BAD  statistic;
careful reader insists on more or better information before asserting
there's a difference.

I asserted that more than once.

Optimistically, my own data analysis technique might be described as 
"starting out with everything Jim might figure out and conclude from
the data, and adding to that, flexible comparisons from related
fields, and other statistical tools."   -- It was quite annoying for
me to read where he implicitly says, "You, idiot, would HAVE to 
decide otherwise."  I mean, I thought I wrote a lot clearer than that.


Now, below is a quotation that describes Jim's justifications, I 
hope, in more detail than Jim does.  This is from web site which I
just discovered, but which looks quite admirable -- except for this
question of "Sampling".  

I think that Garson is wrong, and the last 40 years of epidemiological
research have proven the worth of statistics provided on non-random,
"observational"  samples.  When handled with care.

>From G. David Garson, "PA 765 Notes: An Online Textbook."

On Sampling
http://www2.chass.ncsu.edu/garson/pa765/sampling.htm

Significance testing is only appropriate for random samples.

Random sampling is assumed for inferential statistics
(significance testing). "Inferential" refers to the fact
that conclusions are drawn about relationships in the data
based on inference from knowledge of the sampling
distribution. Significance tests are based on a sampling
theory which requires that every case have a chance of being
selected known in advance of sample selection, usually an
equal chance. Statistical inference assesses the
significance of estimates made using random samples. For
enumerations and censuses, such inference is not needed
since estimates are exact. Sampling error is irrelevant and
therefore inferential statistics dealing with sampling error
are irrelevant. Significance tests are sometimes applied
arbitrarily to non-random samples but there is no existing
method of assessing the validity of such estimates, though
analysis of non-response may shed some light. The following
is typical of a disclaimer footnote in research based on a
non random sample: 

"Because some authors (ex., Oakes, 1986) note the use of
inferential statistics is warranted for nonprobability
samples if the sample seems to represent the population, and
in deference to the widespread social science practice of
reporting significance levels for nonprobability samples as
a convenient if arbitrary assessment criterion, significance
levels have been reported in the tables included in this
article." See Michael Oakes (1986). Statistical inference: A
commentary for social and behavioral sciences. NY: Wiley. 


Maybe we can pick up the discussion from here?
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-02-18 Thread Rich Ulrich

I am going to try to stick to the statistics-related parts, in
replying to Jim Steiger.
With a fake user-name, JS wrote on Thu, 15 Feb 2001 17:34:15 GMT,
[EMAIL PROTECTED] (Irving Scheffe):

JS > "Rich:
  "To be blunt, although your comments in this forum are often
valuable, you fell far short of two cents worth this time.
  "This is not a popularity contest, it is a statistical argument. "

 - I say, if your 'statistical argument'  about 'populations' is
rejected by large (and growing) fraction of all statisticians, then I
think you do have to go back to defend your textbook, or show how your
argument differs from what I think it is.  That's what I was getting
at by mentioning textbooks.


< snip, verbiage; Jim cited me, RU >
> > - and if you want to know something about how unlikely it was to 
> >get means that extreme, you can randomize.  Do the test.
JS >
 "a. You  do *have* means 'that extreme.'
 "b. There is no 'likelihood' to be considered, because the entire
population is available. We were assessing the original MIT conjecture
that to imply there were important performance differences between
male and female biologists AT MIT would be 'the last refuge of the
bigot.'" 

Given group A and group B, I can do a t-test.  Or something.
That will give me a quantification that I did not have before.

Is such a test interesting?  - If I am really in a 'population' 
circumstance, that question can hardly arise; I would know that
the test tells me nothing.  It has nothing to do with taking a vote,
or providing services to a fixed population.

Why does Jim call some means 'extreme'?  - in a theoretical
'population',  you have means that *exist*.   Right now, I think that
it is difficult to justify applying any such adjectives, if you regard
the set of numbers as a 'population.'  

I am pointing out:  Jim claimed that the productivity of the Men was
impressively greater than that of the women; and that was an act of
inference on his part.  So, his act is screwed up, twice:  He does a
bad deduction / wrong inference (by ignoring p-level -- in this
instance, apparently ignoring the strong impact of an outlier), and
then he wrongly claims immunity from the standards of inference.
That is, he ought NOT to use means when there are huge outliers that
mess up the t-test;  and he ought to find a way to use a p-level for
support.

I have said this a number of times: if you extract meaning, if you 
make inferences, then you are treating the population as a sample.
That is what we do in science, and what we do in almost any occasion
where we are publishing for people who are not 'administration.'  
And that is why we seldom use  the set of statistics for 'finite
populations'  and why we do use tests of inference.
 
JS >
  "So, my countercomments to you are:
  "1. Rather than snipping the Gork example, deal with it.  Explain,
in detail, why the Gork women shouldn't be paid more than the men.
My prediction: you can't, and you won't."

In detail:  I think that it is a wretched example.  
I still can't figure out what it is supposed to exemplify.  
But I can comment on the problem.

= problem summarized
Productivity, 
Females:  (91, 92, 93)
Males: (89.5, 90, 90.5)
 Why should not Females be paid more, if that's what matters?
==
Based on a t-test, Females might test as having a higher mean.
With a few more cases, that difference would be 'significant' with
either parametric or rank-testing.

But if the natural meaning of production is being used, 
then there would be a natural zero, and one should OBSERVE:
all of these scores are confined to a tiny per-cent range.  
In fact, the range seems too tiny to be real.  Eventually, I 
conclude that I don't understand the mechanism of generating the
scores, and/ or someone has been 'cooking the books' or faking
the numbers.   

If there were a few more subjects added to each Sex, in 
the same narrow range and pattern, I would conclude that there
DEFINITELY was something phony going on.

If pay is to be meritocratic, that would seem to justify a TINY 
difference in wages.  Nothing about quality.  Piece work, I assume.

Sampling of  3 versus 3  is  small N;  it is far worse than  6 vs 6.

If this is supposed to be about 'statistical power':  
In the MIT citation data, the "large difference" between M and F 
*would*  be significant if there weren't something fishy. 


< snip, rest.  #2 and #3 - (2) seems to have been answered, and 
(3) seems to be a contentious  followup to the artificial example that
I scarcely understand in the first place. >

-- 
Rich Ulrich, [EMAIL PROTECTED]

http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Survival Analysis - Derivation of Functions?

2001-02-16 Thread Rich Ulrich

On Fri, 16 Feb 2001 14:59:14 -0500, "Michelle White"
<[EMAIL PROTECTED]> wrote:

> Is there any text or article that someone can recommend that clearly goes
> through the derivation of the survival function, density function, and
> hazard functions?  Especially how one is derived from the other?
> 

Survival Analysis. Kleinbaum, David G., 1996, ISBN: 0387945431,
Springer-Verlag New York, Incorporated,, US DOLLARS 65.00.

If you want the short version, here's a couple of pages on-line
that I found in less than a minute by searching (google) on 
"survival function"  "hazard function"

http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk

2001-02-15 Thread Rich Ulrich

I am just tossing in my two cents worth ...

On Thu, 15 Feb 2001 07:53:13 GMT, Jim Steiger, posting as
[EMAIL PROTECTED] (Irving Scheffe) wrote:

< snip, name comment > 

> 2. I tried to make the Detroit Pistons example as obvious as I could.
> The point is, if you want to know whether one population performed
> better than another, and you have the performance information, [under
> the simplying assumption, stated in the example and obviously not
> literally true in basketball, that you have an acceptable
> unidimensional index of performance], you don't do a statistical test,
> you simply compare the groups. 

 - and if you want to know something about how unlikely it was to 
get means that extreme, you can randomize.  Do the test.

> 
> Your question about the randomization test seems
> to reflect a rather common confusion, probably
> deriving from some overly enthusiastic comments 
> about randomization tests in some
> elementary book. 

 - If you are willing, perhaps we could discuss the textbook
examples.  I don't remember seeing what I would call
"overly enthusiastic comments about randomization."  
When I looked a few years ago, I did see one book with an 
opposite fault, exemplified in a problem about planets.  
I thought the authors' were pedantic or silly, when they refused 
to admit randomization as a first step of assessing whether there
*might*  be something interesting going on.

>Some people seem to
> emerge with vague notions that two-sample randomization tests make
> statistical testing appropriate in any situation in which you have
> two stacks of numbers. That obviously isn't true.
> Your final question asks if "statistical tests" be appropriate
> even when not sampling from a population. In some sense, sure. But not
> in this case.

I can't say that I have absorbed everything that has been argued.  
But as of now, I think Gene has the better of it.  To me, it is not
very appropriate to be highly impressed at the mean-differences, 
when TESTS that are attempted can't show anything.  The samples 
are small-ish, but the means must be wrecked a bit by outliers.

> 
> Maybe the following example will help make
> it clearer:
 < snip rest, including example that brings in "power" but not
convincingly. >

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Simulating T tests for Likert scales

2001-02-13 Thread Rich Ulrich

On 13 Feb 2001 01:38:35 -0800, [EMAIL PROTECTED] (Will Hopkins)
wrote:

> Rich Ulrich wrote:
> >You can use t-tests
> >effectively on outcomes that are dichotomous variables, and you use
> >the pooled version (Student's t) despite any difference in variances.
> >That is the test that gives you the proper p-levels.
WH > 
> Rich, if the sample sizes in the two groups are different, you have to use 
> the t test jigged for unequal variances.  That's what my simulations showed.
> 
> Your other commments about the robustness of t tests for Likert scales are 
> reassuring, and thanks for responding.  I did find that the confidence 
> interval went awry when responses got too stacked up on the first or last 
> level.

And what were the conditions of your simulations, the ones that
seemed to show a need for testing with 'unequal variances'?  
 - I assume that those were for Likert examples, not dichotomies.

I have been pleased with how well the Student's t performed with
dichotomies, and annoyed at how badly the Unequal-var test performed.
I can show those with EXAMPLES rather than randomizations.  

I just re-did a couple, to make sure that I was not remembering them
wrong.  Because I don't remember seeing these comparisons in public
before, I will show the results below:
 - Here are statistics (from SPSS) for the 2x2 table, and 
for the two t-tests that can be performed.  I consider the primary,
useful test to be the Pearson chisquared (no correction for
continuity).  The Student's t and the Pearson chisquared are 
practically identical in the first table;  and in the second table,
the Unequal var. t is again far off the mark by every comparison.


These tables are lined up for fixed font; but the lines
are short enough that they should usually not-wrap.
==  summary of 2x2 statistics
10% (of 20)  vs 1% (of 100)
   18 | 2
   99 | 1
 
  Chi-SquareValue DF   Significance
  ---  
Pearson  5.54 1  .0186
Continuity Correction2.46 1  .117
Likelihood Ratio 3.85 1  .0496
Mantel-Haenszel test for 5.49 1  .0191
  linear association
Fisher's Exact Test:
   One-Tail  .07
   Two-Tail  .07
- - - - - - - - 
t-test, pooled var2.39  118  .018
t-test, sep.means .01 vs .1   1.29   19.8.21
t-test, sep.means 1.84 vs 1.331.53   2.04.26
 #1   
Means of 0.01  vs  0.1
Levene's Test for Equality of Variances: F= 24.0  P= .000
 #2
Means of  1.84 vs  1.33
Levene's Test for Equality of Variances: F= 1.59  P= .210


1% (of 100) vs 10% (of 200)
  99 | 1
 180 |20
 
  Chi-SquareValueDFSignificance
  ---       
Pearson  8.291   .00398
Continuity Correction6.971   .0083 
Likelihood Ratio10.941   .00094
Mantel-Haenszel test for 8.261   .00404
  linear association
- - - - - - - - 
t-test, pooled var   2.91   298  .009
t-test, sep.means .01 vs .1  3.83   270.2.000
t-test, sep.means 1.54 vs 1.95   5.53   36.8 .000
 #1   
Means of 0.01  vs  0.1
Levene's Test for Equality of Variances: F= 40.9  P= .000
======== #2
Mean of  1.64  vs  1.95 
Levene's Test for Equality of Variances: F= 127.3 P= .000

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: needed clustering algorithm for similar mean groups

2001-02-12 Thread Rich Ulrich

On 12 Feb 2001 11:27:26 -0800, [EMAIL PROTECTED] (EAKIN MARK E)
wrote:

> I have just given a class their first exam and would like to put a class
> of 60 into groups of size three. I would like the groups to have basically
> the same average score on the first exam. Would anyone know of an
> algorithm for doing this? 

I don't imagine that I would be totally happy with a mechanical 
algorithm.  How much to I care about the Standard Deviation?

For starters, I guess I would generate 20 teams of 3 
by random, and then evaluate on a criterion or two.  

If I generated 1000 or 10,000 sets like this, 
then I could sort them into order.  And see if I like the results.
Maybe it would be the set with the minimum F-test
for the between-groups ANOVA.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Hypothesis Testing where General Limit Theorem doesn't hold?

2001-02-11 Thread Rich Ulrich

On Sun, 11 Feb 2001 01:53:00 GMT, "Neo Sunrider"
<[EMAIL PROTECTED]> wrote:

> I am just taking an undergraduate introductory stats course but now I
> am faced with a somewhat difficult problem (at least for me).
> 
> If I want to test a hypothesis (t-test, z-score etc.) and the underlying
> distribution will under no circumstances aproach normal... (i.e. the results
> of the experiement will always be something like 100*10.5, 40*-5 etc.) The
> Central Limit Theorem doesn't help here, or does it?
> 
> Can anyone explain, or point me in the right direction - how can I test in
> these cases?

It reads to me as if "the results"  will be 2-dimensioned, Frequency
(above: 100 or 40) and point-value (10.5 or -5)  and you are combining
them  "unthinkingly" as a product. Or does your notation indicate a
few outcome scores, for instance: 10.5 or -5,  and the number of times
those were manifested?

You don't want to use rank-transformation, if you are rightfully
concerned with the numerical average, of the scores or of those
products

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Simulating T tests for Likert scales

2001-02-11 Thread Rich Ulrich

On 1 Feb 2001 01:03:40 -0800, [EMAIL PROTECTED] (Will Hopkins) wrote:

> I have an important (for me) question, but first a preamble and hopefully 
> some useful info for people using Likert scales.
> 
> A week or so ago I initiated a discussion about how non-normal the 
> residuals have to be before you stop trusting analyses based on 
> normality.  Someone quite rightly pointed out that it depends on the sample 
> size, because the sampling distribution of almost every statistic derived 
> from a variable with almost any distribution is near enough to normal for a 
> large enough sample, thanks to the central limit theorem.  Therefore you 
> get believable confidence limits from t statistics.
> 
> But how non-normal, and how big a sample? I have been doing simulations to 
> find out.  I've limited the simulations to t tests for Likert scales with 
> only a few levels, because these crop up often in research, and 
> Likert-scale variables with responses stacked up at one end are not what 
> you call normally distributed.   Yes, I know you can and maybe should 
> analyze these with logistic regression, but it's hard work for 
 [ ... snip, rest ]

Here is an echo of comments I have posted before.  You can use t-tests
effectively on outcomes that are dichotomous variables, and you use
the pooled version (Student's t) despite any difference in variances.
That is the test that gives you the proper p-levels.  

"Likert scales"  are something that I tend to think of as "well
developed"  so they would offer no question to t-testing.  

But, anyway, items with 3 or 4 or 5 scale points are not prone to
having extreme outliers; and if your actual responses across 5 points
are bi-modal, you might want to rethink your response-meanings.
Generally, I generalize from the dichomous case, to conclude that the
t-test will be robust for items with a few points.  Years ago, I read
an article or two that explicitly asserted that conclusion, based on
some Monte Carlo simulations.

Just a few weeks ago, I read another justification for scoring
categories as integers -- the Mantel paper that is the basis for what
Agresti presents in his "Introduction to Categorical Data Analysis." .
That "M^2"  test (page 35) makes use of fixed variances for
proportions.  M^2 is tested as chi squared, and its computation is
almost identical to t.  

So I don't fret about using t on items with Likert-type responses,
even for small N.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



  1   2   >