Re: Salaries and gender

2002-03-04 Thread Wuensch, Karl L

Include in your list of references on salary differences and gender the
article by M. H. Birnbaum, Relationships among models of salary bias,
American Psychologist, 1985, July, 862-866.
 ~~~  
Karl L. Wuensch, Department of Psychology,
East Carolina University, Greenville NC  27858-4353
Voice:  252-328-4102 Fax:  252-328-6283
mailto:[EMAIL PROTECTED]
http://core.ecu.edu/psyc/wuenschk/klw.htm

=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
http://jse.stat.ncsu.edu/
=



Re: Normalizing a non-normal distribution

2001-07-06 Thread Thom Baguley

Brian MacDonald wrote:
 
 I am doing a series of analyses using discriminant analysis to predict group
 membership.  Several of the variables I am using show distributions that are
 not normal.  My question is can these (and for that matter shold they) be
 somehow transformed so that the resulting distribution looks and presumably
 acts in the analyses) like a normal distribution.

It depends. For some distributions it is easy to do the
transformations (e.g., log is often appropriate for +ve skew). An
alternative approach might be to consider logistic regression which
has several advantages over discriminant analysis and doesn't
require normality.

Thom


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Double mediation

2001-07-06 Thread Duncan Smith


Sylvia J. Hysong, Ph.D. [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Hello,

 I'm hoping someone can help me with this.  I have looked at a
 multitude of resources including the David Kenny page, this and other
 newsgroups, Pedhazur (1982), Cohen  Cohen (1983), and Darlington
 (1990?), to no avail.  I am hoping someone can direct me to the right
 resource.  I am trying to conduct a test of double mediation.  In
 other words, I am trying to test the hypothesis that x--z1--z2--y.
 Is there a way to do this (and if so, what is it?), or must I result
 to a path analysis or a structural equation model?

 Thanks in advance for any help.

If I understand the question correctly, this implies a number of conditional
independence relationships which can be tested.  i.e. x cond. ind. of z2 and
y given z1; x and z1 cond. ind. of y given z2; x cond. ind. of y given z1
and z2.  If these, and only these, independence relationships hold then you
have either x--z1--z2--y or x--z1--z2--y.  To decide which, you need
some background knowledge or to conduct an experiment.  You might want to
check out links at http://www.cs.berkeley.edu/~murphyk/Bayes/bnsoft.html




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: How calculate 95%=1.96 stdv

2001-07-05 Thread John Uebersax

Hi Stefan,

s.petersson [EMAIL PROTECTED] wrote in message 
news:XBE07.7641$[EMAIL PROTECTED]...

 Let's say I want to calculate this constant with a security level of
 93.4563, how do I do that? Basically I want to unfold a function like
 this:
 
 f(95)=1.96
 
 Where I can replace 95 with any number ranging from 0-100.

To Eric's reply I'd just add that use of a table is unnecessary. 
Especially in a computer program, it is easier to use a numerical
function to calculate the confidence interval.

The tables you've seen are for the cumulative probabilities of the
standard normal curve--otherwise known as the standard normal
cumulative density function (cdf).  The standard normal cdf is the
function:

+infinity 
   p = PHI(z) = INTEGRAL  phi(z)
   -infinity

where:
 z   =  standard normal deviate
  PHI(z) =  is the probability (p) of observing a score at or
below z
  phi(z) =  is the formula for the standard normal curve:

1/sqrt(2*pi) * exp(-z^2/2)  

Note that PHI() and phi() -- (these mean the greek letters, upper-case
and lower-case, respectively) are different.  PHI() is the cumulant of
phi().

With the function above, one supplies a value for z, and is given a
cumulative probability.

You seek the inverse function for PHI(), sometimes called the probit
function.  With the probit function, one supplies a value for p and
is returned the value of z such that the area under the standard
normal curve from -inf to z equals p.  (As Eric noted, you may need to
adjust p to handle issues of 1- vs 2-tailed intervals.)

Both the PHI() and probit() functions are well approximated in simple
applications (such as calculating confidence intervals) by simple
polynomial formulas of a few terms.  Some of these take as few as 2 or
3 lines of code.  A good reference for such approximations is:

Abramowitz, M., and I. A. Stegan, 1972: Handbook of Mathematical
Functions. Dover.

Hope this helps.

John Uebersax


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: How to calculate on-line the SD of a population?

2001-07-05 Thread Alan Miller

Luigi Bianchi wrote in message 9i2doj$r61$[EMAIL PROTECTED]...
Hi to all, it's the first time that I post to this NG, so I hope it is the
right place.
I have the following problem: I read data from an A/D board and I have to
provide an estimation of the SD of the population on-line, that is each
time
I read a sample I have to update the mean and SD. While it is really easy
to
update the mean, I don't remember how to do the same thing with the SD. I
remember that there was a formula, but I don't remeber it. Anyone could
help
me?

Thanks in advance,
Luigi

--
Luigi Bianchi
http://www.luigbianchi.com
[EMAIL PROTECTED]
Programming, C++, OWL, VCL, SDK, Dfm2API



You update the mean and the sum of squares of deviations from the mean:
For each new case (new value x)
n = n + 1
dev = x - mean
mean = mean + dev/n
ssq = ssq + dev*(x - mean)

Then the usual st.devn. estimate is:
sd = sqrt(ssq/(n-1))
If you want an approximately unbiased estimate of the std. devn., use
sd = sqrt(ssq/(n-1.5))

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-07-04 Thread J. Williams


Mr. Ulrich complains my 91 year-old deceased mother's concept of her
right to smoke is provocative to me.Wow!  Either he has too much
time on his hands or some really serious problems that can't be solved
through a statistics newsgroup.  How my dead mother's attitude toward
smoking created such an emotional tirade is beyond me.  The bizarre
and convoluted allusion to Justices Scalia and Thomas seems to be
another of Mr. Ulrich's hot buttons my mother inadvertently pushed
from her grave.  I suppose he thinks she was a part of the vast right
wing conspiracy he seems to be railing about.  What Mr. Ulrich
doesn't know is she was not only a lifelong smoker, but a Democratic
Party activist as well.  As yet, Mr. Ulrich has not provided the case
law attributed to the two Justices re: smoking rights vis a vis
Natural Law.   IMHO, his apparent need to spout Democratic Party
ideology would be more appropriate for a political science grouping.
Possibly, his political ranting plays well to the gallery in
Pittsburgh.  Are they lucky, or what?  


On Sun, 01 Jul 2001 19:08:44 -0400, Rich Ulrich [EMAIL PROTECTED]
wrote:

 - in respect of the up-coming U.S. holiday -

On Mon, 25 Jun 2001 11:49:47 GMT, mackeral@remove~this~first~yahoo.com
(J. Williams) wrote:

 On Sun, 24 Jun 2001 16:37:48 -0400, Rich Ulrich [EMAIL PROTECTED]
 wrote:
 
 
 What rights are denied to smokers?  
JW  
 Many smokers, including my late mother, feel being unable to smoke on
 a commerical aircraft, sit anywhere in a restaurant, etc. were
 violation of her rights.  I don't agree as a non-smoker, but that
 was her viewpoint until the day she died.

What's your point:  She was a crabby old lady, whining (or
whinging) about fancied 'rights'?  

You don't introduce anything that seems inalienable  or 
self-evident (if I may introduce July-4th language).
Nobody stopped her from smoking as long as she kept it away
from other people-who-would-be-offended.

Okay, we form governments to help assure each other of rights.   
Lately, the law sees fit to stop some assaults from happening, 
even though it did not always do that in the past. - the offender
still has quite a bit of leeway; if you don't cause fatal diseases,
you legally can offend quite a lot.  We finally have laws about
smoking.

But she wants the law to stop at HER convenience?

[ snip, various ]
JW  
 Talking about confused and/or politically driven,  what do Scalia and
 Thomas have to do with smoking rights?   Please cite the case law.

I mention rights  because that did seem to be a attitude you
mentioned that was (as you see) provocative to me.

I toss in S  T, because I think that, to a large extent, they
share your mother's preference for a casual, self-centered 
definition of rights.  And they are Supreme Court justices.
[ Well, they don't say, This is what *I* want  these two
translate the blame/ credit to Nature (euphemism for God).]

So: I don't fault your mother *too*  harshly, when Justices
hardly do better.  Even though a prolonged skew was needed,
to end up with two like this.


-- 
Rich Ulrich, [EMAIL PROTECTED]


http://www.pitt.edu/~wpilib/index.html



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-07-04 Thread Graaagh the Mighty

On Sun, 01 Jul 2001 17:05:52 GMT, [EMAIL PROTECTED] (John R
Ramsden) sat on a tribble, which squeaked:

One clever use for GPFs in an old OS called Primos (anyone
remember that?) was to detect kernel stack overflows. The
idea was that you positioned the stack in virtual address
space so that its end abutted onto a page marked void in
the address translation tables, in effect a hole in the
virtual address space. Then, if some rogue code overflowed
the stack it would try and reference an address in this page
and immediately throw up a page fault error. I think they
did the same with the (smaller but even more critical)
fault stacks, e.g. to catch recursive page fault errors.

I'd be surprised if the same trick isn't used, even more
extensively, in Windoze these days, since many ex-Primates
probably migrated to Microsoft after Prime Computer Inc's
woes in the early '90s.

Windoze does use a trick like that to detect when it needs to read a
page in from swap. A swapped-out page is marked void in the virtual
address table, and access to it triggers a page fault. The page is
then swapped in, unless it's truly bogus, in which case an application
fault occurs.

On comp.os.msdos.djgpp there's been some discussion about having the
runtime environment detect stack overflows by exactly the mechanism
you just described.

 As opposed to [2] the GPF's this guy is hiding - these
 are not GPF's that are supposed to happen.

Mind you, I can see how this might make more efficient and
streamlined the kind of code in which references through
null pointers were an anticipated but infrequent event.

Yeah, and I can see how this might be the most god-awful kluge in
world history, particularly when you can't distinguish accessing a
null pointer deliberately from doing so due to a bug.


-- 
Bill Gates: No computer will ever need more than 640K of RAM. -- 1980
There's nobody getting rich writing software that I know of. -- 1980
This antitrust thing will blow over. -- 1998
Combine neo, an underscore, and one thousand sixty-one to make my hotmail addy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: about a problem of khi2 test

2001-07-03 Thread Rich Ulrich

On Sun, 01 Jul 2001 14:19:31 +0200, Bruno Facon
[EMAIL PROTECTED] wrote:
 I work in the area of intelligence differentiation. I would like to know
 how to use the khi2 statistic to determine whether the number of
 statistically different correlations between two groups is due or not to
 random variations. In particular I would like to know how to determine
 the expected numbers of statistically different correlations due to
 “chance”.
 Let me take an example. Suppose I compare two correlations matrices of
 45 coefficients obtained from two independent groups (A and B). If there
 is no true difference between the two matrices, the number of
 statistically different correlations should be equal to 1.25 in favor of

Yes, that is the number.   But there is not a legitimate test that I
know of, unless you are willing to make a strong assumption that 
no pair of the variables should be correlated.

I never heard of the khi2 statistic before this.  I searched with
google, and found a respectable number of references, and here
is something that I had not seen with a statistic:  kh2 appears to be
solely French in its use.  Of the first 50 hits, most were in French,
at French ISPs (.fr).  The few that were in English were also from
French sources.  

One article had a reference (not available in my local libraries):
Freilich MH and Chelton DB, J Phys Oceanogr  16, 741-757. 


 
 group A and equal to 1.25 in favor of group B (in case of  alpha = .05).
 
 Consequently, the expected number of nonsignificant differences should
 be 42.75. Is my reasoning correct?

I would be nice to test the numbers, but I don't credit that reference
as a good one, yet.  

I don't remember for sure, but I think you might be able to compare
two correlation matrices with programs from Jim Steiger's site,

http://www.interchg.ubc.ca/steiger/multi.htm

On the other hand, you would be better off if you can compare 
the entire covariance structures, to keep from making accidental
assumptions about variances.  (Does Jim provide for that?)

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-07-03 Thread Rich Ulrich

 - in respect of the up-coming U.S. holiday -

On Mon, 25 Jun 2001 11:49:47 GMT, mackeral@remove~this~first~yahoo.com
(J. Williams) wrote:

 On Sun, 24 Jun 2001 16:37:48 -0400, Rich Ulrich [EMAIL PROTECTED]
 wrote:
 
 
 What rights are denied to smokers?  
JW  
 Many smokers, including my late mother, feel being unable to smoke on
 a commerical aircraft, sit anywhere in a restaurant, etc. were
 violation of her rights.  I don't agree as a non-smoker, but that
 was her viewpoint until the day she died.

What's your point:  She was a crabby old lady, whining (or
whinging) about fancied 'rights'?  

You don't introduce anything that seems inalienable  or 
self-evident (if I may introduce July-4th language).
Nobody stopped her from smoking as long as she kept it away
from other people-who-would-be-offended.

Okay, we form governments to help assure each other of rights.   
Lately, the law sees fit to stop some assaults from happening, 
even though it did not always do that in the past. - the offender
still has quite a bit of leeway; if you don't cause fatal diseases,
you legally can offend quite a lot.  We finally have laws about
smoking.

But she wants the law to stop at HER convenience?

[ snip, various ]
JW  
 Talking about confused and/or politically driven,  what do Scalia and
 Thomas have to do with smoking rights?   Please cite the case law.

I mention rights  because that did seem to be a attitude you
mentioned that was (as you see) provocative to me.

I toss in S  T, because I think that, to a large extent, they
share your mother's preference for a casual, self-centered 
definition of rights.  And they are Supreme Court justices.
[ Well, they don't say, This is what *I* want  these two
translate the blame/ credit to Nature (euphemism for God).]

So: I don't fault your mother *too*  harshly, when Justices
hardly do better.  Even though a prolonged skew was needed,
to end up with two like this.


-- 
Rich Ulrich, [EMAIL PROTECTED]


http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-07-03 Thread Reg Jordan

Actually, the word is unalienable.

reg
- Original Message - 
From: Rich Ulrich [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, July 01, 2001 7:08 PM
Subject: Re: cigs  figs


 - in respect of the up-coming U.S. holiday -
 
 On Mon, 25 Jun 2001 11:49:47 GMT, mackeral@remove~this~first~yahoo.com
 (J. Williams) wrote:
 
  On Sun, 24 Jun 2001 16:37:48 -0400, Rich Ulrich [EMAIL PROTECTED]
  wrote:
  
  
  What rights are denied to smokers?  
 JW  
  Many smokers, including my late mother, feel being unable to smoke on
  a commerical aircraft, sit anywhere in a restaurant, etc. were
  violation of her rights.  I don't agree as a non-smoker, but that
  was her viewpoint until the day she died.
 
 What's your point:  She was a crabby old lady, whining (or
 whinging) about fancied 'rights'?  
 
 You don't introduce anything that seems inalienable  or 
 self-evident (if I may introduce July-4th language).
 Nobody stopped her from smoking as long as she kept it away
 from other people-who-would-be-offended.
 
 Okay, we form governments to help assure each other of rights.   
 Lately, the law sees fit to stop some assaults from happening, 
 even though it did not always do that in the past. - the offender
 still has quite a bit of leeway; if you don't cause fatal diseases,
 you legally can offend quite a lot.  We finally have laws about
 smoking.
 
 But she wants the law to stop at HER convenience?
 
 [ snip, various ]
 JW  
  Talking about confused and/or politically driven,  what do Scalia and
  Thomas have to do with smoking rights?   Please cite the case law.
 
 I mention rights  because that did seem to be a attitude you
 mentioned that was (as you see) provocative to me.
 
 I toss in S  T, because I think that, to a large extent, they
 share your mother's preference for a casual, self-centered 
 definition of rights.  And they are Supreme Court justices.
 [ Well, they don't say, This is what *I* want  these two
 translate the blame/ credit to Nature (euphemism for God).]
 
 So: I don't fault your mother *too*  harshly, when Justices
 hardly do better.  Even though a prolonged skew was needed,
 to end up with two like this.
 
 
 -- 
 Rich Ulrich, [EMAIL PROTECTED]
 
 
 http://www.pitt.edu/~wpilib/index.html
 
 
 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =
 





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-07-03 Thread Jerrold Zar

Yes, historically correct.  Mr. Jefferson and colleagues used
unalienable in the Declaration of Independence, though inalienable
is the overwhelming preference nowadays.

---Jerry Zar

 Reg Jordan [EMAIL PROTECTED] 07/03/01 04:10PM 
Actually, the word is unalienable.

reg
- Original Message - 
From: Rich Ulrich [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, July 01, 2001 7:08 PM
Subject: Re: cigs  figs


 - in respect of the up-coming U.S. holiday -
 
 On Mon, 25 Jun 2001 11:49:47 GMT,
mackeral@remove~this~first~yahoo.com 
 (J. Williams) wrote:
 
  On Sun, 24 Jun 2001 16:37:48 -0400, Rich Ulrich [EMAIL PROTECTED]
  wrote:
  
  
  What rights are denied to smokers?  
 JW  
  Many smokers, including my late mother, feel being unable to smoke
on
  a commerical aircraft, sit anywhere in a restaurant, etc. were
  violation of her rights.  I don't agree as a non-smoker, but
that
  was her viewpoint until the day she died.
 
 What's your point:  She was a crabby old lady, whining (or
 whinging) about fancied 'rights'?  
 
 You don't introduce anything that seems inalienable  or 
 self-evident (if I may introduce July-4th language).
 Nobody stopped her from smoking as long as she kept it away
 from other people-who-would-be-offended.

snipped


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-07-01 Thread John R Ramsden

[EMAIL PROTECTED] (David C. Ullrich) wrote:

  And yet he never made the connection that maybe Michael
  Caracena's code *is* the code in Windows that regularly GPFs...

 Um, no. In [1] I wasn't talking about the GPF's that we
 see when Windows crashes. I forget the details, but
 these are _intentional_ GPF's that don't give error
 messages - they're part of how the system works.

One clever use for GPFs in an old OS called Primos (anyone
remember that?) was to detect kernel stack overflows. The
idea was that you positioned the stack in virtual address
space so that its end abutted onto a page marked void in
the address translation tables, in effect a hole in the
virtual address space. Then, if some rogue code overflowed
the stack it would try and reference an address in this page
and immediately throw up a page fault error. I think they
did the same with the (smaller but even more critical)
fault stacks, e.g. to catch recursive page fault errors.

I'd be surprised if the same trick isn't used, even more
extensively, in Windoze these days, since many ex-Primates
probably migrated to Microsoft after Prime Computer Inc's
woes in the early '90s.

 As opposed to [2] the GPF's this guy is hiding - these
 are not GPF's that are supposed to happen.

Mind you, I can see how this might make more efficient and
streamlined the kind of code in which references through
null pointers were an anticipated but infrequent event.

The code would need to be well-regulated though, by always
maintaining a context indicator to allow the signal handler
to, say, allocate and initialize the right kind of structure
for the pointer and then restart the offending instruction.


Cheers

---
John R Ramsden ([EMAIL PROTECTED])
---
The new is in the old concealed, the old is in the new revealed.
   St Augustine.
---


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Do you known?

2001-07-01 Thread Eric Bohlman

In sci.stat.edu Monica De Stefani [EMAIL PROTECTED] wrote:
 Hi, is there anybody known 
 Quade, D. (1976) . Nonparametric partial correlation, Measurement in
 the social Sciences. Edited by H.M. Blalock, Jr. Aldine Publishing
 Company: Chicago, 369-398?
 I would known how he calcolate Kendall's partial tau (presisely),
 please.

Kendall's partial tau is calculated exactly the same way as Pearson's 
partial r.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Maximum Likelihood

2001-06-29 Thread Herman Rubin

In article [EMAIL PROTECTED],
Mark W. Humphries [EMAIL PROTECTED] wrote:
Hi,

Does anyone have references to a simple/intuitive introduction to Maximum
Log Likelihood methods.
References to algorithms would also be appreciated.

Cheers,
 Mark W. Humphries


Any decent text on mathematical statistics has this.

As for algorithms, it is a problem of numerical analysis,
not of statistics.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Maximum Likelihood

2001-06-29 Thread Rich Ulrich

On 28 Jun 2001 20:39:18 -0700, [EMAIL PROTECTED] (Mark W. Humphries)
wrote:

 Hi,
 
 Does anyone have references to a simple/intuitive introduction to Maximum
 Log Likelihood methods.
 References to algorithms would also be appreciated.
 

Look on the Internet.

I used www.google.com to search on 
maximum likelihood tutorial  
(put the phrase in quotes to keep it together; 
or you can use Advanced search)

There were MANY hits, and the second reference 
was in a tutorial that begins at
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_2.html


The third reference was for some programs and examples in Gauss
(a programming language) by Gary King at Harvard, in his application
area.  If these aren't worthwhile (I did not try to download
anything),  there are plenty of other sites to check.

[ I am intrigued by G. King, a little.  This is the fellow who
putatively has a method, not Heckman's, for overcoming or
compensating for aggregation bias.  Which I never found available
for free.  But, too bad, the page says these programs go with 
his 1989 book, and I think his Method is more recent.]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-28 Thread Ross Presser

Ellen Hertz [EMAIL PROTECTED] wrote:

 I think you need 8760*(number of subjects followed for a year)
 assuming the 124 heart attacks were from more than one subject. 

If the data show that one subject suffered 124 heart attacks, then 
SOMEBODY'S been smoking marijuana for SURE.

-- 
Ross Presser * [EMAIL PROTECTED]
A free-range shoggoth is a happy shoggoth, and a happy shoggoth is 
 generally less inclined to eat all of you at once. - Tim Morgan


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: cigs figs

2001-06-27 Thread Jackson,P

The thing is, of course, in the case of the car accident survivors etc, in
each of those individual cases, we can usually gain some insight into what
contributed to the survival.  It would be very interesting to similarly
discover the basis of the long lives of the old lady smokers.



 -Original Message-
 From: Thom Baguley [mailto:[EMAIL PROTECTED]] 
 Sent: 26 June 2001 12:22
 To: [EMAIL PROTECTED]
 Subject: Re: cigs  figs
 
 
 J. Williams wrote:
  She maintained, in spite of the Surgeon General's report and other 
  studies I quoted, that smoking doesn't cause cancer or heart 
  disease.  Her proof was she and her sister (my aunt) both 
 lived to be 
  over 90 and were chain smokers.  She insisted there are 
 other factors 
  which accounted for the deaths of long-time smokers.
 
 A common type of argument.
 
 By the same logic being shot, being run over by a car or 
 falling out of an aeroplane aren't causes of death. There are 
 documented cases of people do all three and not dying. QED.
 
 Thom
 
 
 =
 Instructions for joining and leaving this list and remarks 
 about the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/ 
 =
 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-27 Thread Ellen Hertz

Paul,
I think you need 8760*(number of subjects followed for a year) assuming the
124 heart attacks were from more than one subject. Then you could do a test
as to whether or not marijuana  in a given hour is associated with heart
attack in that hour.

The hours for a fixed subject are not independent so you shouldn't lump them
together in a contingency table. One possible approach would be to do a
logistic regression with the person/hours being the observations, so that
there are 8760*(number of subjects followed for a year) observations. A
positive response is a heart attack and the predictors are MJ use that hour
and N-1dummy variables for the subjects. Then you want to look at the sign
and p-value of the MJ coefficient.

Hope this helps.

Ellen Hertz




Paul Jones [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 There was some research recently linking heart attacks with
 Marijuana smoking.

 I'm trying to work out the correlation and, most
 importantly, its statistical significance.

 In essence the problem comes down to:

 Of 8760 hours in a year, 124 had heart attacks in them, 141
 had MJ smokes in them and 9 had both.

 What statistical tests apply?
 Most importantly, what is the statistical significance of
 the correlation between smoking MJ in any hour and having a
 heart attack in that same hour?
 What is the probablity that the null hypothesis (that
 smoking marijuana and having a heart attack are unrelated)
 can be rejected?
 How reliable are the results from a dataset of this size?

 I'm not very literate in maths and stats - please help me
 out someone. I'm interested in this research from the
 perspective of medicinal marijuana.

 Thanks and take care,
 Paul
 All About MS - the latest MS News and Views
 http://www.mult-sclerosis.org/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-06-26 Thread Thom Baguley

J. Williams wrote:
 She maintained, in spite of the Surgeon General's report and other
 studies I quoted, that smoking doesn't cause cancer or heart
 disease.  Her proof was she and her sister (my aunt) both lived to be
 over 90 and were chain smokers.  She insisted there are other factors
 which accounted for the deaths of long-time smokers.

A common type of argument.

By the same logic being shot, being run over by a car or falling out
of an aeroplane aren't causes of death. There are documented cases
of people do all three and not dying. QED.

Thom


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Edstat: I. J. Good and Walker

2001-06-26 Thread Jay Warner



dennis roberts wrote:
At 06:08 PM 6/19/01 +, Jerry Dallal wrote:
>Alex Yu wrote:
> >
> > In 1940 Helen M. Walker wrote an article in the journal of Educational
> > Psychology regarding the concept degrees of freedom. In 1970s,
I. J. Good
> > wrote something to criticize Walker's idea. I forgot the citation.
I tried
> > many databases and even searched the internet but got no result.
Does any
> > one know the citation? Thanks in advance.
> >
>
>73AmerStat 27 227- 228 J What are degrees
of freedom? Good, I.
>J.
answer??? the number of options you have in deciding what courses you
can
take during your freshperson year in college ... MINUS ONE
the minus one is for the option of 0 courses - droping out. It is
presumed that you have already decided to stick it out. :)
Cheers,
Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com
The A2Q Method (tm) -- What do you want to improve today?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


Re: Normality in Factor Analysis

2001-06-25 Thread Glen Barnett


Robert Ehrlich [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Calculation of eigenvalues and eigenvalues requires no assumption.
 However evaluation of the results IMHO implicitly assumes at least a
 unimodal distribution and reasonably homogeneous variance for the same
 reasons as ANOVA or regression.  So think of th consequencesof calculating
 means and variances of a strongly bimodal distribution where no sample
 ocurrs near the mean and all samples are tens of standard devatiations
 from the mean.

The largest number of standard deviations all data can be from the mean is 1.

To get some data further away than that, some of it has to be less than 1 s.d.
from the mean.

Glen





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-06-25 Thread Herman Rubin

In article [EMAIL PROTECTED],
Rich Ulrich  [EMAIL PROTECTED] wrote:
  - re: some outstandingly confused thinking.  Or writing.

On Sat, 23 Jun 2001 15:25:31 GMT, mackeral@remove~this~first~yahoo.com
(J. Williams) wrote:

[ snip;  Slate reference, etcetera ]
   ... My mother was 91 years
 old when she died  a year ago and chain smoked since her college days.
 She defended the tobacco companies for years saying, it didn't hurt
 me.  She outlived most of her doctors.   Upon quoting statistics and
 research on the subject, her view was that I, like other do gooders
 and non-smokers, wanted to deny smokers their rights.  

What statistics would her view quote?  to show that someone
wants to deny smokers 'their rights'?
[ Hey, I didn't write the sentence ]

NO amount of demographic statistics can PROVE, even
statistically, that smoking is harmful to the person
doing it.  Statistical arguments based on such data
are at most indications, and may even be wrong.  The
woman who died recently at 120, a claimant for the 
title of the oldest living person, gave up smoking at
the age of 114.

I just love it, how a 'natural right'  works out to be *exactly*
what the speaker wants to do.

That is essentially it.  The only meaningful rights are the
rights to do what others do not want you to do.

 And not a whit more.
(Thomas and Scalia are probably going to give us tons 
of that bad philosophy, over the next decades.)

What rights are denied to smokers?  You know, you can't 
build your outhouse right on the riverbank, either.

This only applies to second hand smoke, where the 
rights of others are directly involved.  In some
places, you can build your outhouse right on the
riverbank; the only reason that you cannot or should
not do so generally are that it would threaten others.

Obviously,
 there is a health connection.  How strong that connection is, is what
 makes this a unique statistical conundrum.

How strong is that connection?  Well, quite strong.

Personally, I believe that there is a connection.  But it
is a situation where the prior probabilities of the various
states make a big difference.

I once considered that it might not be so bad to die 9 years
early, owing to smoking, if that cut off years of bad health 
and suffering.  Then I realized, the smoking grants you 
most of the bad health of old age, EARLY.  (You do miss 
the Alzheimer's.)  One day, I might give up smoking my pipe.

Why are you smoking a pipe?  Pipe smokers produce second
hand smoke, and lots of objectionable odors.  Can you cite
any benefits which cigarette smokers cannot also claim?
Everything involves risks and benefits, and the individual
should decide.

What is the statistical conundrum?  I can almost 
imagine an ethical conundrum.  (How strongly can
we legislate, to encourage cyclists to wear helmets?)
I sure don't spot a statistical conundrum.

I see no statistical conundrum, either, but merely a
situation where the regulators are using a very large
amount of prior assumptions to justify the legislation.

Now this does not mean that most of those assumptions may
not be correct, but that this is what they are going by.  I
believe that one MUST use prior assumptions, as otherwise
one will be strongly inconsistent, and it is even possible
that nothing will be done.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with stats please

2001-06-25 Thread Herman Rubin

In article 006901c0fce2$d07c7640$[EMAIL PROTECTED],
Melady Preece [EMAIL PROTECTED] wrote:
Hi.  I am teaching educational statistics for the first time, and although I
can go on at length about complex statistical techniques, I find myself at a
loss with this multiple choice question in my test bank.  I understand why
the range of  (b) is smaller than (a) and (c), but I can't figure out how to
prove that it is smaller than (d).

If you can explain it to me, I will be humiliated, but grateful.


1.  Which one of the following classes had
 the smallest range in IQ scores?

 A)  Class A has a mean IQ of 106
   and a standard deviation of ll.
 B)  Class B has an IQ range from 93
   to 119.
 C)  Class C has a mean IQ of 110
   with a variance of 200.
  D)  Class D has a median IQ of 100
   with Q1 = 90 and Q3 = 110.

The test bank says the answer is b.

Melady


What are the sizes of the classes?

What are the distributions of the scores in the various
classes?

If the scores are random from some probability
distribution, and other than the sample data there is no
additional information about the actual scores, for other
than extremely small classes (10 is large here), not many
absolute statements can be made.  I CAN tell that class C
cannot have a smaller range than 29, because otherwise the
variance cannot be 200, and scores are given as integers.
If they are not integers, it goes down slightly.

Even if the model is the totally untenable normal
distribution, the scores are RANDOM, and the samples need
not look at all normal.

As to what was bothering you, what are the quantiles
of the normal distribution?  
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-25 Thread David C. Ullrich

On Mon, 25 Jun 2001 09:09:52 GMT, [EMAIL PROTECTED] (Graaagh the
Mighty) wrote:

On Sun, 24 Jun 2001 14:39:06 GMT, [EMAIL PROTECTED] (David C.
Ullrich) sat on a tribble, which squeaked:

[1]That's one scary thing - in fact there are places in
Windows95 where the system _regularly_ creates GPF's;
something to do with thunking or something.

[2]But the scary thing about the quote is that the
guy was advocating _hiding_ AV's in programs we
write instead of fixing them. AV's can be hard to
debug - the eaiest way is to make certain they
don't arise in the first place. And given this
guy's attitude, one of the steps involved in
ensuring that your code contains no hard-to-debug
AV's is making sure you never use anything
he wrote. Hence the sig - it's a public-service
thing.

Sometimes you can have access violations all the 
time and the program still works. (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)

And yet he never made the connection that maybe Michael Caracena's
code *is* the code in Windows that regularly GPFs...

Um, no. In [1] I wasn't talking about the GPF's that we
see when Windows crashes. I forget the details, but
these are _intentional_ GPF's that don't give error
messages - they're part of how the system works.

As opposed to [2] the GPF's this guy is hiding - these
are not GPF's that are supposed to happen.

(Seriously though -- core parts of Windoze are written in Pascal, and
it is known that Windoze does hide some AVs it commits, especially
those involving reading through a null pointer!)

How do you know some parts are in Pascal, and what does that
have to do with AV's?

-- 
Bill Gates: No computer will ever need more than 640K of RAM. -- 1980
There's nobody getting rich writing software that I know of. -- 1980
This antitrust thing will blow over. -- 1998
Combine neo, an underscore, and one thousand sixty-one to make my hotmail addy.



David C. Ullrich
*
Sometimes you can have access violations all the 
time and the program still works. (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with stats please

2001-06-25 Thread Jerry Dallal

Melady Preece wrote:
 
 Hi.  I am teaching educational statistics for the first time, and although I
 can go on at length about complex statistical techniques, I find myself at a
 loss with this multiple choice question in my test bank.  I understand why
 the range of  (b) is smaller than (a) and (c), but I can't figure out how to
 prove that it is smaller than (d).
 
 If you can explain it to me, I will be humiliated, but grateful.

I'm not sure why you would be humiliated, even if the answer were
obvious. You can't prove the range of (b) is smaller than (d). The
question isn't even worded clearly. (b) says a range of from 93 to
119 They range from 93 to 119 and have a range of 26 (subject to
any typographical errors I might make!), but a range from to is
just...sloppy. If (d) were a small class, say 2 students, the upper
and lower quartiles could be 90 and 110, depending on the precise
definition of quartile being used, and the range would be 20, even
with normality, etc.

 
 1.  Which one of the following classes had
  the smallest range in IQ scores?
 
  A)  Class A has a mean IQ of 106
and a standard deviation of ll.
  B)  Class B has an IQ range from 93
to 119.
  C)  Class C has a mean IQ of 110
with a variance of 200.
   D)  Class D has a median IQ of 100
with Q1 = 90 and Q3 = 110.
 
 The test bank says the answer is b.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: M/G/1 model

2001-06-25 Thread apharhus

It's the Kendall notation A/B/C/D/E
A interarrival time distribution
M: exponential
D: deterministic
E: Erlang K
G: General
B Servive time distribution
M: exponential
D: deterministic
E: Erlang K
G: General
C number of parallel servers
D system capacity
E queuing rules
FIFO
LIFO
SIRO (servive in random order)
PRI (priority)
GD 'general discipline)

M/G/1 stand for Exponential interaaival time/general servic time
distribution/ 1 server

Hope this help

JCB
France
- Original Message -
From: *Silvia* [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, June 25, 2001 12:11 PM
Subject: M/G/1 model


 I am studying the M/G/1 model for retrial queues.
 I know that 1 in M/G/1 means that there is a single server.
 Does anyone can tell me what M and G exactly stand for?
 Thanks in advance,
 Silvia


 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-24 Thread David C. Ullrich

On Sat, 23 Jun 2001 23:35:06 GMT, Tetsuo [EMAIL PROTECTED]
wrote:

in article [EMAIL PROTECTED], Tetsuo at
[EMAIL PROTECTED] wrote on 24-06-2001 00:17:

 in article [EMAIL PROTECTED], David C. Ullrich at
 [EMAIL PROTECTED] wrote on 23-06-2001 16:06:
 
[obvious jokes'

[explanation of why the assertions in the obvious jokes are wrong] 

[...]

Sorry for that indeed, ppl actually have this kind of opinion on this
sometimes so I assumed I encountered just another one and got irritated. I
should've realized the poster would not spout such stupidity in a serious
manner though, of course...heh, certainly not in this ng.

No problem, actually I enjoyed reading it. Slightly disappointing
that you finally figured out I was being sarcastic - when I read
your post I was looking forward to stringing you along a bit.

Well, sorry again




David C. Ullrich
*
Sometimes you can have access violations all the 
time and the program still works. (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-24 Thread David C. Ullrich

On Sat, 23 Jun 2001 21:12:40 -0700, Chas F Brown
[EMAIL PROTECTED] wrote:



David C. Ullrich wrote:
 
[...]

In the back-of-envelope calculations I did, this is really the key
missing information. If heart attacks are evenly distributed through the
day, while MJ smoking (as far as I know!) clearly isn't for most users,
then the temporal correlation is going to be alot more marked.

 Or they tend to smoke
 before meals (I knew some people like that years ago in college)
 and tend to have heart attacks after meals. Or they tend to
 smoke when they start to feel little chest pains, as someone
 suggested.
 

But you're reading something into what I said, that I didn't say - I'm
not saying that the data imply that smoking _causes_ an increased your
risk of heart attack in the hour after smoking (although this evidence
would support further investigation that that _may_ be the case).

Ok.

[...]

 
 David C. Ullrich
 *
 Sometimes you can have access violations all the
 time and the program still works. (Michael Caracena,
 comp.lang.pascal.delphi.misc 5/1/01)

The scary thing is - he's right.

That's one scary thing - in fact there are places in
Windows95 where the system _regularly_ creates GPF's;
something to do with thunking or something.

But the scary thing about the quote is that the
guy was advocating _hiding_ AV's in programs we
write instead of fixing them. AV's can be hard to
debug - the eaiest way is to make certain they
don't arise in the first place. And given this
guy's attitude, one of the steps involved in
ensuring that your code contains no hard-to-debug
AV's is making sure you never use anything
he wrote. Hence the sig - it's a public-service
thing.

 (Ooops! Netscape just locked up - time
to reboot again...)

Cheers - Chas

---
C Brown Systems Designs
Multimedia Environments for Museums and Theme Parks
---



David C. Ullrich
*
Sometimes you can have access violations all the 
time and the program still works. (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with stats please

2001-06-24 Thread dennis roberts

At 12:20 PM 6/24/01 -0700, Melady Preece wrote:
Hi.  I am teaching educational statistics for the first time, and although I
can go on at length about complex statistical techniques, I find myself at a
loss with this multiple choice question in my test bank.  I understand why
the range of  (b) is smaller than (a) and (c), but I can't figure out how to
prove that it is smaller than (d).

If you can explain it to me, I will be humiliated, but grateful.


1.  Which one of the following classes had
  the smallest range in IQ scores?

of course, there is nothing about the shape of the distribution of any 
class ... so, does the item assume sort of normal? in fact, since each of 
these classes is probably on the small side ... it would be hard to assume 
that but, for the sake of the item ... pretend

in addition, it does not say to assume the population of IQ scores has mean 
= 100 and sd about 15 ... so, whether this plays a role or not, i am not 
sure BUT ...


  A)  Class A has a mean IQ of 106
and a standard deviation of ll.

at least about 2 units of 11 = 22 on each side of 106 ... range about 45 or 
so or more

  B)  Class B has an IQ range from 93
to 119.

well, range here is about 26 ... less than in A for sure

  C)  Class C has a mean IQ of 110
with a variance of 200.

variance of 200 means an sd about 14 ... so 2 units of 14 = 28 on each side 
of 110 ...
range must be 50 or more ... similar to A but, more than C

   D)  Class D has a median IQ of 100
with Q1 = 90 and Q3 = 110.

25th PR = 90 and 75PR = 110 ... IF we assumed the class was ND ... then the 
mean would be about 100 too ... and since -1 for SD below the mean and +1 
SD above the mean would give your roughly the 16th PR and 84th PR ... Q1 
and Q3 are NOT that far out ... so, the SD must be at least 10 or more ... 
thus, 2 units of at least 10 = 20 on either side of 100 = range of at least 
about 40 ... probably less than A or C ... but, more than B ...

B is probably the best of the lot BUT, i am NOT sure what the real purpose 
of this item is ...


The test bank says the answer is b.

Melady





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-06-24 Thread Rich Ulrich

  - re: some outstandingly confused thinking.  Or writing.

On Sat, 23 Jun 2001 15:25:31 GMT, mackeral@remove~this~first~yahoo.com
(J. Williams) wrote:

[ snip;  Slate reference, etcetera ]
   ... My mother was 91 years
 old when she died  a year ago and chain smoked since her college days.
 She defended the tobacco companies for years saying, it didn't hurt
 me.  She outlived most of her doctors.   Upon quoting statistics and
 research on the subject, her view was that I, like other do gooders
 and non-smokers, wanted to deny smokers their rights.  

What statistics would her view quote?  to show that someone
wants to deny smokers 'their rights'?
[ Hey, I didn't write the sentence ]

I just love it, how a 'natural right'  works out to be *exactly*
what the speaker wants to do.  And not a whit more.
(Thomas and Scalia are probably going to give us tons 
of that bad philosophy, over the next decades.)

What rights are denied to smokers?  You know, you can't 
build your outhouse right on the riverbank, either.

Obviously,
 there is a health connection.  How strong that connection is, is what
 makes this a unique statistical conundrum.

How strong is that connection?  Well, quite strong.

I once considered that it might not be so bad to die 9 years
early, owing to smoking, if that cut off years of bad health 
and suffering.  Then I realized, the smoking grants you 
most of the bad health of old age, EARLY.  (You do miss 
the Alzheimer's.)  One day, I might give up smoking my pipe.

What is the statistical conundrum?  I can almost 
imagine an ethical conundrum.  (How strongly can
we legislate, to encourage cyclists to wear helmets?)
I sure don't spot a statistical conundrum.

Is this word intended?  If so, how so?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with stats please

2001-06-24 Thread Donald Burrill

On Sun, 24 Jun 2001, Melady Preece wrote in part:

 I am teaching educational statistics for the first time, and although I 
 can go on at length about complex statistical techniques, I find myself 
 at a loss with this multiple choice question in my test bank.  I 
 understand why the range of (b) is smaller than (a) and (c), but I 
 can't figure out how to prove that it is smaller than (d).
 
 1.  Which of the following classes had the smallest range in IQ scores? 
 
  A)  Class A has a mean IQ of 106 and a standard deviation of ll.
  B)  Class B has an IQ range from 93 to 119.
  C)  Class C has a mean IQ of 110 with a variance of 200.
  D)  Class D has a median IQ of 100 with Q1 = 90 and Q3 = 110.
 
 The test bank says the answer is b.

Right.  Since you're happy that  range(B)  range(A)  and 
range(B)  range(C),  I'll focus on  (B) vs. (D).
In (B), the entire _range_ is from 93 to 119:  26 (or 27, 
depending on how you choose to define range) points.
In (D), the central half of the distribution is from 90 to 110: 
the interquartile range (IQR) is 20 points, symmetric about the median;  
the full range must therefore be greater than 20.  Now, _if_ the 
distribution is normal (which may be what we were to assume from the 
allegation that these are IQ scores;  although as Dennis has pointed out, 
ille non sequitur -- unless these are rather large classes AND NOT 
SELECTED BY I.Q. (or by any variable strongly related to I.Q.)), then 10 
points from Q1 to median (or from median to Q3) represents 0.67 standard 
deviation, which implies a standard deviation of about 15, which is 
larger than the standard deviation in (A) and slightly larger than that 
in (C).
However, we need not invoke the normal distribution.  We observe 
that the distribution in (D) is at least approximately symmetric (insofar 
as the quartiles are equidistant from the median).  If we may assume also 
that the distribution is unimodal (which I should think reasonable), it 
then follows (from the tailing off of distributions as one approaches 
the extremes) that the distance from minimum to Q1 (and the distance from 
Q3 to maximum) is greater than the distance from Q1 to median (or median 
to Q3).  This implies that the range of the distribution exceeds twice 
the interquartile range:  that is,  range(D)  2*20 = 40.  Since the 
range in (B) is only 26, clearly the range of (B) is less than the range 
of (D).

If any part of this argument remains unclear, I'd be happy to attack it 
again.  A rough sketch should make things pretty obvious, but it's a bit 
of a nuisance to draw pictures in ASCII characters!
--DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cigs figs

2001-06-23 Thread J. Williams

On 17 Jun 2001 14:47:14 GMT, [EMAIL PROTECTED] (EugeneGall)
wrote:

On Slate, there is quite a good discussion of the meaning and probabilistic
basis of the statement that 1 in 3 teen smokers will die of cancer.  It is
written by a math prof and it is one of the most effective lay discussions I've
seen of the use of probabilities in describing health risks.

http://slate.msn.com/math/01-06-14/math.asp

Maybe, I just notice it more, but it seems to me as I move about that
more and more young people are smoking.  Could it be that even with
all of the negatives, smoking is still popular and/or growing among
teeny boppers and young adults?  Recent jury awards to long-time
smokers seem to intimate that even with printed warnings, etc., the
tobacco companies are ultimately responsible for respiratory and
circulatory ailments.  Smokers it is assumed are addicts and
consequently not responsible for their actions.  A salient point in
Mr. Ellenberg's treatise is the query that of a sample of 100,000
deaths of male smokers, would 60,000 still be alive had they eschewed
coffin nails  throughout their lifetimes?  My mother was 91 years
old when she died  a year ago and chain smoked since her college days.
She defended the tobacco companies for years saying, it didn't hurt
me.  She outlived most of her doctors.   Upon quoting statistics and
research on the subject, her view was that I, like other do gooders
and non-smokers, wanted to deny smokers their rights.  Obviously,
there is a health connection.  How strong that connection is, is what
makes this a unique statistical conundrum.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: probability that Xi = X1...Xn

2001-06-23 Thread Richard Beldin


--20209B611F2A68F79DC95EE5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

You say the X1...Xn are independent. Are they also identically distributed?
If not, you will have some very cumbersome expressions.If we use f(Xk) as
the density and F(Xk) as the cdf of the k'th r.v., then the density for the
largest (which we call U) is n*F(U)^(n-1)f(U).
That is, the size of the sample times the (n-1) power of the CDF times the
density at U. The most complete reference on such issues is Sahran and
Greenbergs' Contributions to Order Statistics, about 1960 from John Wiley
and Sons.

Fabio Ulisse Pardi wrote:

 Can anybody give me a hint about this problem?

 Let the random variables X1,...,Xn be independent and let M be the index

 of the maximum among them (i.e. M=i implies Xi = X1,...,Xn).
 We want to find nice formulas that calculate the distribution of M from
 the distributions of X1...Xn, that we suppose belonging to the same
 class
 of distributions:
 for instance if we assume that all of X1...Xn are normally distributed,
 with parameters (m1,v1),...,(mn,vn), we would like to obtain a formula
 of the kind
Pr[M=i] = Fi(m1,...,mn,v1,...,vn)
 for every i=1..n.

 The problem is that the integral that calculates Pr[M=i] is quite
 complicated, and I haven't figured out how to express its value as a
 simple function of the parameters.

--20209B611F2A68F79DC95EE5
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

!doctype html public -//w3c//dtd html 4.0 transitional//en
html
You say the X1...Xn are independent. Are they also identically distributed?
If not, you will have some very cumbersome expressions.If we use f(Xk)
as the density and F(Xk) as the cdf of the k'th r.v., then the density
for the largest (which we call U) is n*F(U)^(n-1)f(U).
brThat is, the size of the sample times the (n-1) power of the CDF times
the density at U. The most complete reference on such issues is Sahran
and Greenbergs' uContributions to Order Statistics,/u about 1960 from
John Wiley and Sons.
pFabio Ulisse Pardi wrote:
blockquote TYPE=CITECan anybody give me a hint about this problem?
pLet the random variables X1,...,Xn be independent and let M be the index
pof the maximum among them (i.e. M=i implies Xi = X1,...,Xn).
brWe want to find nice formulas that calculate the distribution of M
from
brthe distributions of X1...Xn, that we suppose belonging to the same
brclass
brof distributions:
brfor instance if we assume that all of X1...Xn are normally distributed,
brwith parameters (m1,v1),...,(mn,vn), we would like to obtain a formula
brof the kind
brnbsp;nbsp; Pr[M=i] = Fi(m1,...,mn,v1,...,vn)
brfor every i=1..n.
pThe problem is that the integral that calculates Pr[M=i] is quite
brcomplicated, and I haven't figured out how to express its value as
a
brsimple function of the parameters./blockquote
/html

--20209B611F2A68F79DC95EE5--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-23 Thread Tetsuo

in article [EMAIL PROTECTED], David C. Ullrich at
[EMAIL PROTECTED] wrote on 23-06-2001 16:06:

 On Fri, 22 Jun 2001 20:49:02 GMT, Steve Leibel [EMAIL PROTECTED]
 wrote:

 Hallucinating?  On pot?  What are YOU smokin'?  Pot doesn't cause
 hallucinations 
 
 Where are you getting your facts from here? You've obviously
 never seen Reefer Madness or you wouldn't spout nonsense like
 this.

Normal pot doesn't cause hallucinations, exceptions have to be made with
allergies towards it, or not normal potency pot (artificially enhanced,
sprayed with lsd, etc...). If both of these factors are not taken in
consideration, reefer madness is a myth, sorry to disappoint you.

 I wonder if there's any data about correlation between alcohol
 use and traffic fatalities? Probably not, I certainly don't see
 why there would be any connection. I mean if alcohol were
 more dangerous than pot in just about any way a person could
 name that would mean that the laws in this country were all
 backwards.

Alcohol *is* more dangerous than pot. The laws are made to create a perfect
balance between social and economical wealth (although the first invokes the
latter), so alcohol and tobacco are forced into it, so *yes* they are
backwards (that is, in your country). Wake up, please. About the driving,
the other poster already supplied enough resources, but it has to be said
that I have yet to meet the first adequate doctor or psychologist who shares
your opinion on this.


-- 
   ---
  | Sig v.1.1 |
  |___|
 /   /|
:-=-=-=-=-=-: |___
|  Tetsuo   |/   /|
|-=-=-=-=-=-|-=-=-=-=-=-=-=-=-=-: |__
| 101001011 | ICQ# : ask me |/  /|
| 001010010 |-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=: |
| 011010011 | Artpage : http://zap.to/m_mortier| |
|  |/
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-23 Thread Tetsuo

in article [EMAIL PROTECTED], Tetsuo at
[EMAIL PROTECTED] wrote on 24-06-2001 00:17:

 in article [EMAIL PROTECTED], David C. Ullrich at
 [EMAIL PROTECTED] wrote on 23-06-2001 16:06:
 
 On Fri, 22 Jun 2001 20:49:02 GMT, Steve Leibel [EMAIL PROTECTED]
 wrote:
 
 Hallucinating?  On pot?  What are YOU smokin'?  Pot doesn't cause
 hallucinations 
 
 Where are you getting your facts from here? You've obviously
 never seen Reefer Madness or you wouldn't spout nonsense like
 this.
 
 Normal pot doesn't cause hallucinations, exceptions have to be made with
 allergies towards it, or not normal potency pot (artificially enhanced,
 sprayed with lsd, etc...). If both of these factors are not taken in
 consideration, reefer madness is a myth, sorry to disappoint you.
 
 I wonder if there's any data about correlation between alcohol
 use and traffic fatalities? Probably not, I certainly don't see
 why there would be any connection. I mean if alcohol were
 more dangerous than pot in just about any way a person could
 name that would mean that the laws in this country were all
 backwards.
 
 Alcohol *is* more dangerous than pot. The laws are made to create a perfect
 balance between social and economical wealth (although the first invokes the
 latter), so alcohol and tobacco are forced into it, so *yes* they are
 backwards (that is, in your country). Wake up, please. About the driving,
 the other poster already supplied enough resources, but it has to be said
 that I have yet to meet the first adequate doctor or psychologist who shares
 your opinion on this.
 

Sorry for that indeed, ppl actually have this kind of opinion on this
sometimes so I assumed I encountered just another one and got irritated. I
should've realized the poster would not spout such stupidity in a serious
manner though, of course...heh, certainly not in this ng.

Well, sorry again


-- 
   ---
  | Sig v.1.1 |
  |___|
 /   /|
:-=-=-=-=-=-: |___
|  Tetsuo   |/   /|
|-=-=-=-=-=-|-=-=-=-=-=-=-=-=-=-: |__
| 101001011 | ICQ# : ask me |/  /|
| 001010010 |-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=: |
| 011010011 | Artpage : http://zap.to/m_mortier| |
|  |/
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-23 Thread Chas F Brown



David C. Ullrich wrote:
 
 On Thu, 21 Jun 2001 21:14:44 -0700, Chas F Brown
 [EMAIL PROTECTED] wrote:
 
 

snip

 That seems to be the type of correlation that was reported here - some
 distribution of MJ smoking, and its *temporal* correlation with heart
 attacks.
 
 Now, that says exactly nothing about whether MJ use increases or
 decreases the liklihood of having a heart attack in general (it could in
 fact in general *decrease* heart attacks, even in our data set);
 
 That's exactly right. When I say that there's nothing we can conlude
 from the data given I didn't mean there's _nothing_ we can conclude,
 rather nothing we can conclude _concerning_ the question of
 whether smoking increases the risk of a heart attack.
 

I'm with you there.

 I don't see how we can even quite conclude that the risk of a heart
 attack is higher among users immediately after smoking, for various
 reasons: I doubt that most users' use is uniformly distributed
 during the 24 hours of the day, 

Well, we all have to sleep _sometime_ :) ...

 I have no idea whether heart
 attacks are uniformly distributed throuought the day, so it could
 well be that the times people tend to smoke are the same as the
 times they tend to have heart attacks. 

In the back-of-envelope calculations I did, this is really the key
missing information. If heart attacks are evenly distributed through the
day, while MJ smoking (as far as I know!) clearly isn't for most users,
then the temporal correlation is going to be alot more marked.

 Or they tend to smoke
 before meals (I knew some people like that years ago in college)
 and tend to have heart attacks after meals. Or they tend to
 smoke when they start to feel little chest pains, as someone
 suggested.
 

But you're reading something into what I said, that I didn't say - I'm
not saying that the data imply that smoking _causes_ an increased your
risk of heart attack in the hour after smoking (although this evidence
would support further investigation that that _may_ be the case).

I'm saying that it is certainly not unreasonable that the risk is
increased during the hour after smoking. That may very well be because,
purely coincidentally, you habitually puff in exactly the same time when
heart attacks are, for unrelated reasons, most likely to occur. Or
because you have intimations of oncoming heart attack, or for any other
reason.

But we don't need to know the reason for this temporal correlation. I
noted only that without further information, it is logical to be aware
of this temporal correlation, and take action appropriately. That action
wasn't stop smoking ganga!, but just an awareness of the higher risk
during this time period - perhaps taking extra precautions such as being
near medical assistance (for those for whom this is not just an
intellectual excercise).

 Then even if it _is_ true that a smoker is more likely to
 have a heart attack immediately after smoking a joint, that
 does _NOT_ show that smoking increases the risk! Could be
 as you say that it actually decreases the risk, but regardless
 the time immediately after smoking is the riskiest time.
 

Yah! That's what I was saying! And what does it make sense to do during
the riskiest time? Take actions that reduce your risk. (If we assume
that the report correctly correlated for the obvious elements of heart
attack temporal distribution and MJ usage temporal distribution...etc.)

 So it seems clear to me that there is _nothing_ we can conclude
 about whether smoking increases the risk of a heart attack -
 it also seems clear that that is _the_ question of interest
 here.
 

For me personally, I agree (and I think the reason why this study was so
widely reported was with the unjustified implication in mind). But for
somebody with MS and possibly a whole bunch of other related health
problems, they might have a different perspective.

snip of reasonable description of why Just Say NO! is just sad, sad,
sad

 Weaker (but
 much less interesting) conclusions about correlations
 might be possible, but there are _so_ many ways
 that a correlation could exist by accident that
 I don't see why one would care. (Unless one was
 planning on blurring the distinction between
 correlation and causation for political reasons...)
 

Ding! Gee, what kind of modern military/industrial/health care system
could _possibly_ want to do that?

 What I want to know is why alcohol and tobacco are legal:
 
 http://www.drugwarfacts.org/causes.htm
 http://www.drugwarfacts.org/addictiv.htm
 

Gosh! Don't take away my frosty cold malty one, too! (I promise I won't
drive!) And just let me stub out this cigarette before I continue...

snip

 
 David C. Ullrich
 *
 Sometimes you can have access violations all the
 time and the program still works. (Michael Caracena,
 comp.lang.pascal.delphi.misc 5/1/01)

The scary thing is - he's right. (Ooops! Netscape just locked up - time
to reboot again...)

Cheers - Chas


Re: Marijuana

2001-06-23 Thread Jake Wildstrom

Damnit, I promised I wouldn't get involved in this absurd and
off-topic thread, but I've got to set the record straight here:

In article [EMAIL PROTECTED],
Tetsuo  [EMAIL PROTECTED] wrote:
Normal pot doesn't cause hallucinations, exceptions have to be made with
allergies towards it, or not normal potency pot (artificially enhanced,
sprayed with lsd, etc...). If both of these factors are not taken in
consideration, reefer madness is a myth, sorry to disappoint you.

Two things:

(1) High dosage marijuana _can_ cause hallucinations, most commonly
auditory, in a significant minority of the population. This is not
an allergy issue, but a sensitivity to particular psychoactive
effects. Not everyone will experience such, but they definitely
are experienced by some.

(2) Marijuana is never laced with LSD. That's a waste of perfectly good
LSD. Heat-vaporization is not a usable means of LSD ingestion, simply
because LSD breaks down at high temperatures.

+--First Church of Briantology--Order of the Holy Quaternion--+
| A mathematician is a device for turning coffee into |
| theorems.  -Paul Erdos  |
+-+
|   Jake Wildstrom|
+-+


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-22 Thread Steve Leibel

In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Eamon) wrote:

 (c) Reduced motor co-ordination, e.g. when driving a car
 

Numerous studies have shown that marijuana actually improves driving 
ability.  It makes people more attentive and less aggressive.  You could 
look it up.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-22 Thread Rich Ulrich

On Fri, 22 Jun 2001 18:45:52 GMT, Steve Leibel [EMAIL PROTECTED]
wrote:

 In article [EMAIL PROTECTED],
  [EMAIL PROTECTED] (Eamon) wrote:
 
  (c) Reduced motor co-ordination, e.g. when driving a car
  
 
 Numerous studies have shown that marijuana actually improves driving 
 ability.  It makes people more attentive and less aggressive.  You could 
 look it up.

An intoxicant does *that*?  

I think I recall in the literature, that people getting 
stoned, on whatever, occasionally  *think*  that 
their reaction time or sense of humor or other 
performance is getting better.   

Improving your driving by getting mildly stoned 
(omitting the episodes of hallucinating)
seems unlikely enough, to me, 
that  *I*  think the burden of proof is the stranger named Steve.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Normality in Factor Analysis

2001-06-22 Thread Robert Ehrlich

Calculation of eigenvalues and eigenvalues requires no assumption.
However evaluation of the results IMHO implicitly assumes at least a
unimodal distribution and reasonably homogeneous variance for the same
reasons as ANOVA or regression.  So think of th consequencesof calculating
means and variances of a strongly bimodal distribution where no sample
ocurrs near the mean and all samples are tens of standard devatiations
from the mean.

 Hi,

 I have a question regarding factor analysis: Is normality an important
 precondition for using factor analysis?

 If no, are there any books that justify this.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-22 Thread Steve Leibel

In article [EMAIL PROTECTED],
 Rich Ulrich [EMAIL PROTECTED] wrote:

 On Fri, 22 Jun 2001 18:45:52 GMT, Steve Leibel [EMAIL PROTECTED]
 wrote:
 
  In article [EMAIL PROTECTED],
   [EMAIL PROTECTED] (Eamon) wrote:
  
   (c) Reduced motor co-ordination, e.g. when driving a car
   
  
  Numerous studies have shown that marijuana actually improves driving 
  ability.  It makes people more attentive and less aggressive.  You could 
  look it up.
 
 An intoxicant does *that*?  
 
 I think I recall in the literature, that people getting 
 stoned, on whatever, occasionally  *think*  that 
 their reaction time or sense of humor or other 
 performance is getting better.   
 
 Improving your driving by getting mildly stoned 
 (omitting the episodes of hallucinating)
 seems unlikely enough, to me, 
 that  *I*  think the burden of proof is the stranger named Steve.



Hallucinating?  On pot?  What are YOU smokin'?  Pot doesn't cause 
hallucinations -- although a lot of anti-drug hysteria certainly does.

A cursory web search turned up these links among many others to support 
my statement.  Naturally this subject is controversial and there are 
lots of conflicting studies.  The consensus is that at worst pot causes 
minor driving impairment similar to many prescription medications.  At 
least one study showed that pot users had FEWER fatal crashes than non 
users!  

And stranger named Steve?  I've been on this newsgroup since 1995.  
Not as famous as James Harris, maybe, but certainly no stranger.

This is a small sample of what came up when I entered marijuana 
driving into Google.  Read and learn.

http://www.norml.org/canorml/myths/myth1.shtml 

http://www.reconsider.org/issues/marijuana/driving.htm

http://www.cannabisnews.com/news/thread1016.shtml

http://www.marijuana-hemp.com/cin/facts/drivehi.shtml  When the data 
were analyzed, cannabis consumers actually showed a lower likelihood of 
being involved in a fatal crash than that of a drug-free control group, 
though the difference was not judged to be statistically significant.

http://www.hoboes.com/pub/Prohibition/Drug%20Information/Marijuana/Drivin
g/Driving

http://www.taima.org/en/driving.htm  It was of some interest that 
cannabis tended to show a negative effect on relative risk when other 
drug groups showed an increase.

http://www.norml.org.nz/norml/Marijuana/Driving.htm#abc981014

Steve L


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-22 Thread David C. Ullrich

On Thu, 21 Jun 2001 21:14:44 -0700, Chas F Brown
[EMAIL PROTECTED] wrote:



David C. Ullrich wrote:
 
 On Fri, 15 Jun 2001 15:23:03 +0100, Paul Jones
 [EMAIL PROTECTED] wrote:
 
 David C. Ullrich wrote:
 
  But analyzing it this way simply makes no sense. Those
  trials you're talking about are _far_ from independent;
  each trial is associated with a particular person, and
  there will be a very strong correlation between various
  trials for the same person at different hours.
 
 Okay then, how should it be analysed?
 
 I've explained at least twice why I do not believe it
 is _possible_ to draw the sort of inference you want
 to draw from the data you've given us. You must
 be reading _some_ of those posts or you wouldn't
 keep replying.
 

Well, although I've agreed with most of your complaints about trying to
derive any information from the scanty data shown, there is *something*
we can notice about the data set which has some relevance.

Let's say we look at a sampling of 100 people who have both had heart
attacks within the last year and have smoked an aspirin an average of
once a week during that year.

Now, without knowing what the average percentage of people who smoke
aspirin each year, and the average percentage of people who have heart
attacks each year without smoking aspirin, these numbers alone would be
pretty useless.

But if 95% of the people in the data set had their 1 heart attack inside
of 1 minute after smoking an aspirin, you'd have some reason to further
examine the hypothesis that, for some segment of the population, smoking
an aspirin could trigger a heart attack. (Of course it could also be
that impending heart attacks bring on the desire to smoke aspirin, or
some other hypothesis that correlates the two phenomena).

One the other hand, one would expect if there were no immediate
correlation between smoking aspirin and heart attacks, the average time
between smoking aspirin and heart attack would be more like 1/2 week.
This would then indicate that it was not particularly worthwhile to
investigate an immediate link between asprinin smoking and heart
attacks.

That seems to be the type of correlation that was reported here - some
distribution of MJ smoking, and its *temporal* correlation with heart
attacks.

Now, that says exactly nothing about whether MJ use increases or
decreases the liklihood of having a heart attack in general (it could in
fact in general *decrease* heart attacks, even in our data set);

That's exactly right. When I say that there's nothing we can conlude
from the data given I didn't mean there's _nothing_ we can conclude,
rather nothing we can conclude _concerning_ the question of
whether smoking increases the risk of a heart attack.

I don't see how we can even quite conclude that the risk of a heart
attack is higher among users immediately after smoking, for various
reasons: I doubt that most users' use is uniformly distributed
during the 24 hours of the day, I have no idea whether heart
attacks are uniformly distributed throuought the day, so it could
well be that the times people tend to smoke are the same as the
times they tend to have heart attacks. Or they tend to smoke
before meals (I knew some people like that years ago in college)
and tend to have heart attacks after meals. Or they tend to
smoke when they start to feel little chest pains, as someone
suggested.

Then even if it _is_ true that a smoker is more likely to
have a heart attack immediately after smoking a joint, that
does _NOT_ show that smoking increases the risk! Could be
as you say that it actually decreases the risk, but regardless
the time immediately after smoking is the riskiest time.

So it seems clear to me that there is _nothing_ we can conclude
about whether smoking increases the risk of a heart attack -
it also seems clear that that is _the_ question of interest
here.

Not that I'm claiming that it _is_ the case that smoking
decreases the risk of heart attack although the hour
immediately afterwards is the riskiest time. I have no
reason to think that's so. Also no reason to think it's
not so: People who assume such a thing is ridiculous 
think so because they've classified the world into
Good things and Bad things - actual things in the world
are not that simple:

(i) Aspirin is a Good thing. Good for pain and fever relief,
and actually an aspirin a day helps prevent heart attack
or stroke, I forget which. The reason I forget which is
it's irrelevant to me: For me aspirin is a Very Bad thing,
because of other medical problems.

(ii) Alcohol is a Bad thing. Except for that bit about
how a glass of red wine a day is good for you, in terms
os risk of heart attack or stroke, again I forget which.
Alas, it doesn't follow that a quart of whiskey a day
is good for you.

Given that there _are_ plenty of legitimate medical
uses for marijuana and given that the interaction between
the body and chemicals is simply _not_ a matter of some
chemicals Good and some Bad, the idea that 

Re: calculation of an effect size with medians

2001-06-22 Thread Konrad Halupka

Marc wrote:
 
 As a part of a report
 I have to perform a meta-analysis of
 some clinical trials.
 These trials report the median effect in the
 treatment group and the median effect in the control group
 (days of hospitalization). P-values from Mann-Whitney U-Tests
 are reported and the numbers of patients in treatment
 and control.
 My Question: How can I calculate an effect size
 (eg median difference between treatment and
 control)and confidence intervals with that data?

You'd need raw data to calculate the effect size for Mann-Whitney test.

Regards,
Konrad


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-21 Thread David C. Ullrich

On Fri, 15 Jun 2001 15:23:03 +0100, Paul Jones
[EMAIL PROTECTED] wrote:

David C. Ullrich wrote:
 
 But analyzing it this way simply makes no sense. Those
 trials you're talking about are _far_ from independent;
 each trial is associated with a particular person, and
 there will be a very strong correlation between various
 trials for the same person at different hours.

Okay then, how should it be analysed?

I've explained at least twice why I do not believe it
is _possible_ to draw the sort of inference you want
to draw from the data you've given us. You must
be reading _some_ of those posts or you wouldn't
keep replying.

Take care,
Paul
All About MS - the latest MS News and Views
http://www.mult-sclerosis.org/

David C. Ullrich
***
Sometimes you can have access violations all the
time and the program still works. (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: a form of censoring I have not met before

2001-06-21 Thread Rich Ulrich

On 21 Jun 2001 00:35:11 -0700, [EMAIL PROTECTED] (Margaret
Mackisack) wrote:

 I was wondering if anyone could direct me to a reference about the 
 following situation. In a 3-factor experiment, measurements of a continuous 
 variable, which is increasing monotonically over time, are made every 2 
 hours from 0 to 192 hours on the experimental units (this is an engineering 
 experiment). If the response exceeds a set maximum level the unit is not 
 observed any more (so we only know that the response is  that level). If 
 the measuring equipment could do so it would be preferred to observe all 
 units for the full 192 hours. The time to censoring is of no interest as 
 such, the aim is to estimate the form of the response for each unit which 
 is the trace of some curve that we observe every 2 hours. Ignoring the 
 censored traces in the time period after they are censored puts a huge 

Well, it certainly *sounds*  as if the time to censoring should be 
of great interest, if you had an adequate model.

Thus, when you say that ignoring them gives  a huge 
downward bias,  it sounds to me as if you are admitting that 
you do not have an acceptable model.

Who can you blame for that?  What leverage do you 
have, if you try to toss out those bad results?  (Surely, 
you do have some ideas about forming estimates
that *do*  take the hours into account.  The problem
belongs in the hands of someone who does.)

 - maybe you want to segregate trials into the ones
with 192 hours, or less than 192 hours; and figure two 
(Maximum Likelihood) estimates for the parameters, which
you then combine.


 downward bias into the results and is clearly not the thing to do although 
 that's what has been done in the past with these experiments. Any 
 suggestions of where people have addressed data of this or related form 
 would be very gratefully received.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-21 Thread Chas F Brown



David C. Ullrich wrote:
 
 On Fri, 15 Jun 2001 15:23:03 +0100, Paul Jones
 [EMAIL PROTECTED] wrote:
 
 David C. Ullrich wrote:
 
  But analyzing it this way simply makes no sense. Those
  trials you're talking about are _far_ from independent;
  each trial is associated with a particular person, and
  there will be a very strong correlation between various
  trials for the same person at different hours.
 
 Okay then, how should it be analysed?
 
 I've explained at least twice why I do not believe it
 is _possible_ to draw the sort of inference you want
 to draw from the data you've given us. You must
 be reading _some_ of those posts or you wouldn't
 keep replying.
 

Well, although I've agreed with most of your complaints about trying to
derive any information from the scanty data shown, there is *something*
we can notice about the data set which has some relevance.

Let's say we look at a sampling of 100 people who have both had heart
attacks within the last year and have smoked an aspirin an average of
once a week during that year.

Now, without knowing what the average percentage of people who smoke
aspirin each year, and the average percentage of people who have heart
attacks each year without smoking aspirin, these numbers alone would be
pretty useless.

But if 95% of the people in the data set had their 1 heart attack inside
of 1 minute after smoking an aspirin, you'd have some reason to further
examine the hypothesis that, for some segment of the population, smoking
an aspirin could trigger a heart attack. (Of course it could also be
that impending heart attacks bring on the desire to smoke aspirin, or
some other hypothesis that correlates the two phenomena).

One the other hand, one would expect if there were no immediate
correlation between smoking aspirin and heart attacks, the average time
between smoking aspirin and heart attack would be more like 1/2 week.
This would then indicate that it was not particularly worthwhile to
investigate an immediate link between asprinin smoking and heart
attacks.

That seems to be the type of correlation that was reported here - some
distribution of MJ smoking, and its *temporal* correlation with heart
attacks.

Now, that says exactly nothing about whether MJ use increases or
decreases the liklihood of having a heart attack in general (it could in
fact in general *decrease* heart attacks, even in our data set); but
instead would say, there is a segment of the population for whom MJ use
is followed by a high liklihood of a heart attack.

Would those people have had a heart attack anyway? Is this some small
segment of the population that reacts this way? These questions would
still remain without any further figures.

Even in the abscence of this data, though, one might want to take some
precautions during the hour following MJ usage, for those with an
otherwise high liklihood of heart attack, such as: be near medical
facilities, etc.

Cheers - Chas

---
C Brown Systems Designs
Multimedia Environments for Museums and Theme Parks
---


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: comparing 2 slopes

2001-06-20 Thread Tracey Continelli

mccovey@psych [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
 in article [EMAIL PROTECTED], Tracey
 Continelli at [EMAIL PROTECTED] wrote on 6/13/01 4:14 PM:
 
  Mike Tonkovich [EMAIL PROTECTED] wrote in message
  news:3b20f210_1@newsfeeds...
  Was hoping someone might be able to confirm that my approach for comparing 2
  slopes was correct.
  
  I ran an analysis of covariance using PROC GLM (in SAS) with an interaction
  statement.  My understanding was that a nonsignificant interaction term
  meant that the slopes were the same, and vice versa for a significant
  interaction term.  Is this correct and is this the best way to approach this
  problem with SAS?  Any help would certainly be apprectiated.
  
  Mike Tonkovich
  
  --
  Michael J. Tonkovich, Ph.D.
  Wildlife Research Biologist
  ODNR, Division of Wildlife
  [EMAIL PROTECTED]
  
  The slopes need not be the same if the interaction term is
  non-significant, BUT, the difference between them will not be
  statistically significant.  If the differences between the slops *are*
  statistically significant, this will be reflected in a statistically
  significant product term.  I have preferred using regression analyses
  with interaction terms, which can be easily incorporated by simply
  multiplying the variables together and then running the regression
  equation with each independent variable plus the product term [which
  is simply another name for the interaction term].  The results are
  much more straightforward in my mind.
  
  Tracey Continelli
  SUNY at Albany
 
 
 I agree completely but there can be problems interpreting the regression
 Output (e.g., mistakes like talking about main effects).  For advice on
 avoiding the common interpretation pitfalls, see
 
 Aiken  West (1991).  Multiple regression: Testing and interpreting
 interactions.  Sage.
 
 Irwin  McClelland (2001).  In Journal of Marketing Research.
 
 Gary McClelland
 Univ of Colorado


Quite so.  Once you add the product term, the interpretation changes,
and the parameter estimates are now known as simple main effects. 
The interpretation is pretty straightforward however.  The parameter
estimate, or slope, for your focal independent variable in the
interaction model simply represents the effect of your independent
variable upon your dependent variable when your moderator variable is
equal to zero, holding constant all other independent variables in
your model.  The same may be said for the slope of your moderator
variable - it represents the effect of that variable upon your
dependent variable when your focal independent variable is equal to
zero.  Because in my research [the social science variety] that
information isn't terribly useful [because most of the time you won't
realistically see the moderator variable at zero, i.e., a zero crime
rate or a zero poverty rate], what I will do is a mean centering
trick.  I'll subtract the mean from the moderator variable, rerun the
equation with the new mean centered variable and product term, and NOW
the parameter estimates of the simple main effects are meaningful for
me.  Now, when I look at the parameter estimates of the focal
independent variable, it is telling me the effect of that independent
variable upon the dependent variable when my moderator variable is at
its mean.  The actual product term remains identical to the original
equation [of course], but now the simple main effects are
realistically meaningful.  I'll also apply the same technique for when
the moderator variable is 2 standard deviations below the mean, 1
below the mean, all the way up to 2 standard deviations above the
mean.  This gives one a nice graphic sense of the way in which the
slope between your focal independent variable and your dependent
variable changes with successive changes in your moderator variable.


Tracey Continelli
Doctoral candidate
SUNY at Albany


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: trimming data

2001-06-20 Thread dennis roberts

At 11:24 AM 6/20/01 -0500, Mike Granaas wrote:

A colleague has approached me about locating references discussing the
trimming of data, with primary emphasis on psychological research.  He is
primarily interested in books/chapters/articles that emphasize the when
and how.

I am at a loss on this one and was wondering if anyone could offer a
coupld of references.

other than what some software programs do ... i don't have ready references 
... but, the notion is that for some distributions ... particularly with 
some outliers at ONE end ... if you trim say 5% from each end ... it will 
reduce the impact on your descriptive stats of the outliers ...

in minitab, there is a trimmed mean that you get as part of the DESCRIBE 
command which axes 5% from each end and THEN finds the mean for the middle 
90% ...
if you think about it ... you can trim different % values from the ends ... 
and, if you did a full trim of 50% from EACH end ... you are at the median!

clearly, the more you trim the data, the narrower the data set is ...

one should only consider trimming in the broader context of are there 
outliers and if there are, what (if anything) should we do about them? in 
some cases ... you do nothing since, from all accounts, the data are 
legitimate values ... but, if you find BAD data at the ends (due to 
miskeying, scoring error, etc.), then the first thing is to justify WHAT 
values to eliminate if any ...




Thanks,

Michael

***
Michael M. Granaas
Associate Professor[EMAIL PROTECTED]
Department of Psychology
University of South Dakota Phone: (605) 677-5295
Vermillion, SD  57069  FAX:   (605) 677-6604
***
All views expressed are those of the author and do not necessarily
reflect those of the University of South Dakota, or the South
Dakota Board of Regents.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: trimming data

2001-06-20 Thread dennis roberts



here is some help info from minitab about trimmed means ...
===
Trimmed mean

The trimmed mean (TrMean) is like the mean, but it excludes the most 
extreme values in the data set. The highest and lowest 5% of the values 
(rounded to the nearest integer) are dropped, and the mean is calculated 
for the remaining values.
For the precipitation data, 5% of 11 observations is 0.55, which rounds to 
1. Thus, the highest value and the lowest value are dropped, and the mean 
is calculated for the remaining data:
 1  2  2  3  3  3  3  4  4  5  10

This yields a value of 3.222. Like the median, the trimmed mean is less 
sensitive to extreme values than the mean. For example, the trimmed mean of 
this data set would be 3.222 even if there were 30 days with precipitation 
in April instead of 10.

© All Rights Reserved. 2000 Minitab, Inc.
==

keep in mind that if the data set is symmetrical ... then, trimming really 
accomplishes nothing ... when it comes to the mean ... even if there are 
extreme values ...

in a seriously + skewed distribution ... then trimming (for the mean) will 
back up the mean more to the LEFT ... compared to non trimming ... and just 
the opposite for a seriously - skewed distribution ...

as i said earlier, trimming will necessarily DECREASE the variability ... 



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: trimming data

2001-06-20 Thread Gary McClelland

in article [EMAIL PROTECTED], Mike
Granaas at [EMAIL PROTECTED] wrote on 6/20/01 10:56 AM:

 
 A colleague has approached me about locating references discussing the
 trimming of data, with primary emphasis on psychological research.  He is
 primarily interested in books/chapters/articles that emphasize the when
 and how.
 
 I am at a loss on this one and was wondering if anyone could offer a
 coupld of references.
 


McClelland, G.H. (2000).  Nasty data: Unruly, ill-mannered observations can
ruin your analysis.  In H.T. Reis  C.M. Judd (Eds.), Handbook of research
methods in social and personality psychology.  [Chpt 15]

Judd, C.M., (1989).  Data analysis: A model comparison approach.  HBJ.  [see
Chpt 9]

Madansky, A. (1988).  Prescrptions for working statisticians.
Springer-Verlag.

Atkinson, A.C. (1985).  Plots, transformations, and regression: An
introduction to grpahical methods of diagnostic regression analysis.
Clarendon Press.

Gary McClelland
Univ of Colorado



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: comparing 2 slopes

2001-06-20 Thread Gary McClelland

in article [EMAIL PROTECTED], Tracey
Continelli at [EMAIL PROTECTED] wrote on 6/20/01 7:06 AM:

 mccovey@psych [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]...
 in article [EMAIL PROTECTED], Tracey
 Continelli at [EMAIL PROTECTED] wrote on 6/13/01 4:14 PM:
 
 Mike Tonkovich [EMAIL PROTECTED] wrote in message
 news:3b20f210_1@newsfeeds...
 Was hoping someone might be able to confirm that my approach for comparing
 2
 slopes was correct.
 
 I ran an analysis of covariance using PROC GLM (in SAS) with an interaction
 statement.  My understanding was that a nonsignificant interaction term
 meant that the slopes were the same, and vice versa for a significant
 interaction term.  Is this correct and is this the best way to approach
 this
 problem with SAS?  Any help would certainly be apprectiated.
 
 Mike Tonkovich
 
 --
 Michael J. Tonkovich, Ph.D.
 Wildlife Research Biologist
 ODNR, Division of Wildlife
 [EMAIL PROTECTED]
 
 The slopes need not be the same if the interaction term is
 non-significant, BUT, the difference between them will not be
 statistically significant.  If the differences between the slops *are*
 statistically significant, this will be reflected in a statistically
 significant product term.  I have preferred using regression analyses
 with interaction terms, which can be easily incorporated by simply
 multiplying the variables together and then running the regression
 equation with each independent variable plus the product term [which
 is simply another name for the interaction term].  The results are
 much more straightforward in my mind.
 
 Tracey Continelli
 SUNY at Albany
 
 
 I agree completely but there can be problems interpreting the regression
 Output (e.g., mistakes like talking about main effects).  For advice on
 avoiding the common interpretation pitfalls, see
 
 Aiken  West (1991).  Multiple regression: Testing and interpreting
 interactions.  Sage.
 
 Irwin  McClelland (2001).  In Journal of Marketing Research.
 
 Gary McClelland
 Univ of Colorado
 
 
 Quite so.  Once you add the product term, the interpretation changes,
 and the parameter estimates are now known as simple main effects.
 The interpretation is pretty straightforward however.  The parameter
 estimate, or slope, for your focal independent variable in the
 interaction model simply represents the effect of your independent
 variable upon your dependent variable when your moderator variable is
 equal to zero, holding constant all other independent variables in
 your model.  The same may be said for the slope of your moderator
 variable - it represents the effect of that variable upon your
 dependent variable when your focal independent variable is equal to
 zero.  Because in my research [the social science variety] that
 information isn't terribly useful [because most of the time you won't
 realistically see the moderator variable at zero, i.e., a zero crime
 rate or a zero poverty rate], what I will do is a mean centering
 trick.  I'll subtract the mean from the moderator variable, rerun the
 equation with the new mean centered variable and product term, and NOW
 the parameter estimates of the simple main effects are meaningful for
 me.  Now, when I look at the parameter estimates of the focal
 independent variable, it is telling me the effect of that independent
 variable upon the dependent variable when my moderator variable is at
 its mean.  The actual product term remains identical to the original
 equation [of course], but now the simple main effects are
 realistically meaningful.  I'll also apply the same technique for when
 the moderator variable is 2 standard deviations below the mean, 1
 below the mean, all the way up to 2 standard deviations above the
 mean.  This gives one a nice graphic sense of the way in which the
 slope between your focal independent variable and your dependent
 variable changes with successive changes in your moderator variable.
 
 
 Tracey Continelli
 Doctoral candidate
 SUNY at Albany


I hope everyone in the social sciences using product terms or moderator
regression reads Tracey's thoughtful comments above.  Failing to realize
the coefficient for one of the components of a product is the effect of that
variable when the other variable of the product is zero is one of my
candidates for most common statistical error in the social sciences.  Mean
centering is indeed quite useful, even if one does not have products in the
model.  Also note that mean centering will always reduce the correlation
between the product and its components and if the component distributions
are symmetric it will reduce it to zero.  There always exists a change of
origin for the components that will make the correlation zero; hence, the
colinearity warnings when testing products are not meaningful.

Gary McClelland
Univ of Colorado



=
Instructions for joining and leaving this list and remarks about
the problem of 

Re: Help me, please!

2001-06-19 Thread Rich Ulrich

On 18 Jun 2001 01:18:37 -0700, [EMAIL PROTECTED] (Monica De Stefani)
wrote:

 1) Are there some conditions which I can apply normality to Kendall
 tau?

tau is *lumpy*  in its distribution for N less than 10.

And all rank-order statistics are a bit problematic when 
you try to use them on rating scales with just a few discrete
scores -- the tied values give you bad scaling intervals, 
and the estimate of variance won't be very good,either.

For correlations, your assumption of 'normality' is usually
applied to the values at zero.

 I was wondering if x's observations must be
 independent and y's observations must be independent to apply
 asymptotically normal limiting
 distribution. 
 (null hypothesis = x and y are independent).
 Could you tell me something about?

 - Independence is needed for just about any tests.

I started to say (as a minor piece of exaggeration) that 
independence is needed absolutely;  
but the correct statement, I think, is that independence
is always demanded  relative to the error term.

[ snip, non-linear?]

Monotonic is the term.


[ snip, T(z):  I don't know what that is.]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Consistency quotation

2001-06-19 Thread Jay Warner


G. B. Shaw - Pygmaillion (sp)
My Fair Lady, maybe too.
"I can tell a woman's age in half a minute - and I do." Surely
H. Higgins prided himself on consistency :)
Jay
[EMAIL PROTECTED] wrote:
I remember reading something like the following:
 "Consistency alone is not necessarily a virtue.
 One can be consistently obnoxious."
I believe it was in a discussion to an RSS read paper,
maybe from about 30 years ago, but I have not been able
to find it again. A web-search for "consistently obnoxious"
taught me more about asbestos corks than I care to know,
but was otherwise unhelpful.
Can anyone provide the source, or at least a lead?
 Many thanks, Ewart Shaw.
--
J.E.H.Shaw [Ewart Shaw]
[EMAIL PROTECTED] TEL: +44 2476 523069
 Department of Statistics, University of Warwick,
Coventry CV4 7AL, U.K.
 http://www.warwick.ac.uk/statsdept/Staff/JEHS/
The opposite of a profound truth is not also a profound truth.
=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

http://jse.stat.ncsu.edu/
=

--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com
The A2Q Method (tm) -- What do you want to improve today?






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


Re: Marijuana

2001-06-19 Thread James Hunter



W. D. Allen Sr. wrote:

 There is medical research that shows marijuana is more lethal than tobacco
 regarding lung cancer.

 Maybe there is a correlation between lung cancer susceptibility and heart
 attacks? We know there is for tobacco!

   We know there is a correlation between alcohol, doctor's bills,
   and the tuition for Big Fart science schools and heart attacks too!.
   But why is that always swept under the rug by probabilty theory genius?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: comparing 2 slopes

2001-06-19 Thread [EMAIL PROTECTED]

in article [EMAIL PROTECTED], Tracey
Continelli at [EMAIL PROTECTED] wrote on 6/13/01 4:14 PM:

 Mike Tonkovich [EMAIL PROTECTED] wrote in message
 news:3b20f210_1@newsfeeds...
 Was hoping someone might be able to confirm that my approach for comparing 2
 slopes was correct.
 
 I ran an analysis of covariance using PROC GLM (in SAS) with an interaction
 statement.  My understanding was that a nonsignificant interaction term
 meant that the slopes were the same, and vice versa for a significant
 interaction term.  Is this correct and is this the best way to approach this
 problem with SAS?  Any help would certainly be apprectiated.
 
 Mike Tonkovich
 
 --
 Michael J. Tonkovich, Ph.D.
 Wildlife Research Biologist
 ODNR, Division of Wildlife
 [EMAIL PROTECTED]
 
 The slopes need not be the same if the interaction term is
 non-significant, BUT, the difference between them will not be
 statistically significant.  If the differences between the slops *are*
 statistically significant, this will be reflected in a statistically
 significant product term.  I have preferred using regression analyses
 with interaction terms, which can be easily incorporated by simply
 multiplying the variables together and then running the regression
 equation with each independent variable plus the product term [which
 is simply another name for the interaction term].  The results are
 much more straightforward in my mind.
 
 Tracey Continelli
 SUNY at Albany


I agree completely but there can be problems interpreting the regression
Output (e.g., mistakes like talking about main effects).  For advice on
avoiding the common interpretation pitfalls, see

Aiken  West (1991).  Multiple regression: Testing and interpreting
interactions.  Sage.

Irwin  McClelland (2001).  In Journal of Marketing Research.

Gary McClelland
Univ of Colorado



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-18 Thread Art Kendall

you might want to go to http://www.pitt.edu/~csna/
and then cross-post your question to CLASS-L

The Classification Society meeting this weekend had a lot of discussion of
these topics.

My first question is whether you intend to interpret the clusters?

If so, what is the nature of the 500 variables?
What is the nature of your cases?
What does the set of cases represent?
How much data is missing. What kinds of missing data do you have?
What do you want to do with the cluster reults?
Are you interested in a tree or a simple clustering?


Many users of clustering use data reduction techniques such as factor
analysis to summarize the variability of the 500 with a smaller number of
dimensions.



srinivas wrote:

 Hi,

   I have a problem in identifying the right multivariate tools to
 handle datset of dimension 1,00,000*500. The problem is still
 complicated with lot of missing data. can anyone suggest a way out to
 reduce the data set and  also to estimate the missing value. I need to
 know which clustering tool is appropriate for grouping the
 observations( based on 500 variables ).



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Probability Of an Unknown Event

2001-06-18 Thread Art Kendall

The only time I can think of this being meaningful is in determining what size
sample to draw.  If we don't have any prior information about what the
proportion of events in a population have a particular characteristic (the
probability of a characteristic), then we assume the worse-case (widest
variance) of 50%.

W. D. Allen Sr. wrote:

 It's been years since I was in school so I do not remember if I have the
 following statement correct.

 Pascal said that if we know absolutely nothing
 about the probability of occurrence of an event
 then our best estimate for the probability of
 occurrence of that event is one half.

 Do I have it correctly? Any guidance on a source reference would be greatly
 appreciated!

 Thanks,

 WDA

 [EMAIL PROTECTED]

 end



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: 3rd degree polynom curve fitting, correlation needed

2001-06-18 Thread Paige Miller

Matti Overmark wrote:

 I have fitted a 3 rd degree curve to a sample (least square method), and
 I want to compare this particular R2 with that of
 a (similarily) fitted 2 degree polynom.

I can assure you that the 3rd degree polynomial will fit as well or
better than the 2nd degree polynomial, as measured by R-squared. If you
want a statistical test to test the hypothesis that the 3rd degree model
yields a significantly better fit compared to the second degree model,
then you should do an extra-sums-of-squares test, as explained in the
fine textbook by Draper and Smith Applied Regression Analysis.
 
 I want to see which of the two models is the best.
 Any suggestion of a good book?

A plot would work just fine, if you want to see how the models fit.

-- 
Paige Miller
Eastman Kodak Company
[EMAIL PROTECTED]

It's nothing until I call it! -- Bill Klem, NL Umpire
When you get the choice to sit it out or dance,
   I hope you dance -- Lee Ann Womack


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Probability Of an Unknown Event

2001-06-18 Thread W. D. Allen Sr.

Thanks Robert!

WDA

end

- Original Message -
From: Robert J. MacG. Dawson [EMAIL PROTECTED]
To: W. D. Allen Sr. [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Sunday, June 17, 2001 6:35 PM
Subject: Re: Probability Of an Unknown Event




 W. D. Allen Sr. wrote:
 
  It's been years since I was in school so I do not remember if I have the
  following statement correct.
 
  Pascal said that if we know absolutely nothing
  about the probability of occurrence of an event
  then our best estimate for the probability of
  occurrence of that event is one half.
 
 

 [snipped]





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Probability Of an Unknown Event

2001-06-18 Thread Rich Ulrich

On Sat, 16 Jun 2001 23:05:52 GMT, W. D. Allen Sr.
[EMAIL PROTECTED] wrote:

 It's been years since I was in school so I do not remember if I have the
 following statement correct.
 
 Pascal said that if we know absolutely nothing
 about the probability of occurrence of an event
 then our best estimate for the probability of
 occurrence of that event is one half.
 
 Do I have it correctly? Any guidance on a source reference would be greatly
 appreciated!

I did a little bit of Web searching and could not find that.

Here is an essay about Bayes, which (dis)credits him and his
contemporaries as assuming something like that, years before Laplace.

I found it with a google search on 
 know absolutely nothing  probability .

 http://web.onetel.net.uk/~wstanners/bayes.htm

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Probability Of an Unknown Event

2001-06-18 Thread Richard Beldin

The problem comes because there is often no unique way of defining events. It
is hard to think of a real example where we literally know nothing. The
equal probability answer is often just a cop-out for not thinking about what
we do know.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Maximum likelihood Was: Re: Factor Analysis

2001-06-18 Thread Herman Rubin

In article [EMAIL PROTECTED],
Ken Reed  [EMAIL PROTECTED] wrote:
It's not really possible to explain this in lay person's terms. The
difference between principal factor analysis and common factor analysis is
roughly that PCA uses raw scores, whereas factor analysis uses scores
predicted from the other variables and does not include the residuals.
That's as close to lay terms as I can get.

I have never heard a simple explanation of maximum likelihood estimation,
but --  MLE compares the observed covariance matrix with a  covariance
matrix predicted by probability theory and uses that information to estimate
factor loadings etc that would 'fit' a normal (multivariate) distribution.

MLE factor analysis is commonly used in structural equation modelling, hence
Tracey Continelli's conflation of it with SEM. This is not correct though.

I'd love to hear simple explanation of MLE!

MLE is triviality itself, if you do not make any attempt to
state HOW it is to be carried out.

For each possible value X of the observation, and each state
of nature \theta, there is a probability (or density with 
respect to some base measure) P(X | \theta).  There is no
assumption that X is a single real number; it can be anything;
the same holds about \theta.

What MLE does is to choose the \theta which makes P(X | \theta)
as large as possible.  That is all there is to it.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help me, please!

2001-06-18 Thread Glen Barnett


Monica De Stefani [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 2) Can Kendall discover nonlinear dependence?

He used to be able to, but he died.

(Look at how Kendall's tau is calculated. Notice that it is
not affected by any monotonic increasing transformation. So
Kendall's tau measures monotonic association - the tendency
of two variables to be in the same order.)

Glen





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: 3rd degree polynom curve fitting, correlation needed

2001-06-18 Thread Mike Granaas


Judd  McClelland, _Data Analysis: A Model Comparison Approch_, chapter 8.

MG

On 18 Jun 2001, Matti Overmark wrote:

 Hi group!
 
 I´m new to this group, so...just you know.
 
 I have fitted a 3 rd degree curve to a sample (least square method), and 
 I want to compare this particular R2 with that of
 a (similarily) fitted 2 degree polynom.
 
 I want to see which of the two models is the best.
 Any suggestion of a good book?
 
 Thanks in advance,
 Matti Ö.
 
 
 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =
 

***
Michael M. Granaas
Associate Professor[EMAIL PROTECTED]
Department of Psychology
University of South Dakota Phone: (605) 677-5295
Vermillion, SD  57069  FAX:   (605) 677-6604
***
All views expressed are those of the author and do not necessarily
reflect those of the University of South Dakota, or the South
Dakota Board of Regents.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Normality in Factor Analysis

2001-06-17 Thread Herman Rubin

In article 9gg7ht$qa3$[EMAIL PROTECTED],
haytham siala [EMAIL PROTECTED] wrote:
Hi,

I have a question regarding factor analysis: Is normality an important
precondition for using factor analysis?

If no, are there any books that justify this.

Factor analysis is quite robust against non-normality.
The essential factor structure is little affected by it
at all, although the representation may get somewhat
sensitive if data-dependent normalizations are used, such
as using correlations rather than covariances, or forcing
normalization on the covariance matrix of the factors.

Some of this is in my paper with Anderson in the
Proceedings of the Third Berkeley Symposium.  The result
on the asymptotic distribution, not at all difficult to
derive, is in one of my abstracts in _Annals of
Mathematical Statistics_, 1955.  It is basically this:

Suppose the factor model is 

x = \Lambda f + s,

f the common factors and s the specific factors.  Further
suppose that f and s, and also the elements of s, are
uncorrelated, and there is adequate normalization and
smooth identification of the model by the elements of
\Lambda alone.  Now estimate \Lambda, M, the covariance
matrix of f, and S, the diagonal covariance matrix of s.
Assuming the usual assumptions for asymptotic normality of
the sample covariances of the elements of f with s, and of
the pairs of different elements of s, the asymptotic
distribution of the estimates of \Lambda and the SAMPLE
values of M and S from their actual values will have the
expected asymptotic joint normal distribution.  This makes
no assumption about the distribution of M and S about 
their expected values, which is the main place were there
is an effect of normality. 



-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Rich Ulrich

On 17 Jun 2001 04:34:26 -0700, [EMAIL PROTECTED] (Marc)
wrote:

 I have to summarize the results of some clinical trials.
 Unfortunately the reported information is not complete.
 The information given in the trials contain:
 
 (1) Mean effect in the treatment group (days of hospitalization)
 
 (2) Mean effect in the control group (days of hospitalization)
 
 (3) Numbers of patients in the control and treatment group
 
 (4) p-values of a t-test (between the differences of treatment
 and control)
 My question:
 How can I calculate the variance of treatment difference which I need
 to perform meta-analysis? Note that the numbers of patients in the

Aren't you going too far?  You said you have to summarize.
Well, summarize.  The difference is in terms of days.  
Or it is in terms of percentage of increase.

And you have the t-test and p-values.  

You might be right in what you propose, but I think
you are much more likely to produce a useful report 
if you keep it simple.

You are right; meta-analyses are complex.  And a 
majority of the published ones are (in my opinion) awful.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 02:04:36 -0700, [EMAIL PROTECTED] (Eamon) wrote:

[ snip, Paul Jones.  About marijuana statistics.]

 
 Surely this whole research is based upon a false premise. Isn't it
 like saying that 90%, say, of heroin users previously used soft drugs.
 Therefore, soft-drug use usually leads to hard-drug use - which does
 not logically follow. (A = B =/= B = A)
 
 Conclusions drawn from the set of people who have had heart attacks
 cannot be validly applied to the set of people who smoke dope.
 Rather than collect data from a large number of people who had heart
 attacks and look for a backward link, they should monitor a large
 number of people who smoke dope. But, of course this is much more
 expensive.

It is much more expensive, but it is also totally stupid to carry out
the expensive research if the *cheap* and lousy research didn't
give you a hint that there might be something going on.

The numbers that he was asking about do pass the simple
test.  I mean, there were not 1 million people contributing one
hour each, but we should still ask, *Would*  this say something?
If it would not, then the whole question is *totally*  arid.  The 2x2
table is approximately
(dividing the first column by 100; and subtracting from a total):
10687   and  124
   175   and  9

That gives a contingency test of 21.2 or 18.2, with p-values 
under .001.  The Odds Ratio on that is 4.4.
That is pretty convincing that there is SOMETHING
going on, POSSIBLY something that merits an explanation.  
The expectation for the cell with 9  is just 2.2 -- the tiny cell is
the cell that matters for contributions to the test -- which is why it
is okay to lop the hundreds  off the first column (to make it
readable).

Now, you may return to your discussion of why the table is
not any good, and what is needed for a proper test.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Donald Burrill

On 17 Jun 2001, Marc wrote (edited):

 I have to summarize the results of some clinical trials.
 The information given in the trials contain:
 
 Mean effects (days of hospitalization) in treatment  control groups; 
 numbers of patients in the groups;  p-values of a t-test (of the 
 difference between treatment and control) .
 My question:  How can I calculate the variance of the treatment 
 difference, which I need to perform meta-analysis?  Note that the 
 numbers of patients in the groups are not equal.  
 Is it possible to do it like this:
 
 s^2 = (difference between contr and treatm)^2/ ((1/n1+1/n2)*t^2)

Yes, if you know t.  If all you know is that p  alpha for some alpha, 
you then know only that t  the t corresponding to alpha (AND you need to 
know whether the test had been one-sided or two-sided -- of course, you 
need to know that in any case), you can substitute that corresponding t 
to obtain an upper bound on s^2 -- ASSUMING that the t was calculated 
using a pooled variance (your s^2), not using the expression for separate 
variances in the denominator:  (s1^2/n1 + s2^2/n2).

Note that this s^2 is NOT the variance of the treatment difference, 
which you said you wanted to know;  it is the pooled variance estimate 
of the variance within each group.  
 The variance of the difference in treatment means, which _may_ be what 
you are interested in, would be 

(difference)^2 / t^2 

with the same caveats concerning what you know about t.

 How exact would such an approximation be?

Depends on the precision with which  p  was reported.

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: individual item analysis

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 14:24:39 -0700, [EMAIL PROTECTED] (Doug
Sawyer) wrote:

 I am trying to locate a journal article or textbook that addresses
 whether or not exam quesitons can be normalized, when the questions are
 grouped differently.  For example, could a question bank be developed
 where any subset of questions could be selected, and the assembled exam
 is normalized?
 
 What is name of this area of statistics?  What authors or keywords would
 I use for such a search?  Do you know whether or not this can be done?


I believe that they do this sort of thing in scholastic achievement
tests, as a matter of course.  Isn't that how they make the transition
from year to year?  I guess this would be norming.

A few weeks ago, I discovered that there is a whole series of
tech-reports put out by one of the big test companies.  I would 
look back to it, for this sort of question.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-17 Thread Ken Reed

It's not really possible to explain this in lay person's terms. The
difference between principal factor analysis and common factor analysis is
roughly that PCA uses raw scores, whereas factor analysis uses scores
predicted from the other variables and does not include the residuals.
That's as close to lay terms as I can get.

I have never heard a simple explanation of maximum likelihood estimation,
but --  MLE compares the observed covariance matrix with a  covariance
matrix predicted by probability theory and uses that information to estimate
factor loadings etc that would 'fit' a normal (multivariate) distribution.

MLE factor analysis is commonly used in structural equation modelling, hence
Tracey Continelli's conflation of it with SEM. This is not correct though.

I'd love to hear simple explanation of MLE!



 From: [EMAIL PROTECTED] (Tracey Continelli)
 Organization: http://groups.google.com/
 Newsgroups: sci.stat.consult,sci.stat.edu,sci.stat.math
 Date: 15 Jun 2001 20:26:48 -0700
 Subject: Re: Factor Analysis
 
 Hi there,
 
 would someone please explain in lay person's terms the difference
 betwn.
 principal components, commom factors, and maximum likelihood
 estimation
 procedures for factor analyses?
 
 Should I expect my factors obtained through maximum likelihood
 estimation
 tobe highly correlated?  Why?  When should I use a Maximum likelihood
 estimation procedure, and when should I not use it?
 
 Thanks.
 
 Rita
 
 [EMAIL PROTECTED]
 
 
 Unlike the other methods, maximum likelihood allows you to estimate
 the entire structural model *simultaneously* [i.e., the effects of
 every independent variable upon every dependent variable in your
 model].  Most other methods only permit you to estimate the model in
 pieces, i.e., as a series of regressions whereby you regress every
 dependent variable upon every independent variable that has an arrow
 directly pointing to it.  Moreover, maximum likelihood actually
 provides a statistical test of significance, unlike many other methods
 which only provide generally accepted cut-off points but not an actual
 test of statistical significance.  There are very few cases in which I
 would use anything except a maximum likelihood approach, which you can
 use in either LISREL or if you use SPSS you can add on the module AMOS
 which will do this as well.
 
 
 Tracey



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-16 Thread Alexandre Moura

Dear Haytham,

other issue concern with a measure of the latent construct is the
unidimensionality.  Hair et alli(1998): unidimensionality is an assumption
underlying the calculation of reliability and is demonstraded when
indicators of a construct have acceptable fit on a
single-factor(one-dimensional) model.(...) The use of reliability measures,
such Cronbach´s alpha, does not ensure unidimensionality but instead assumes
it exists. The researcher is encouraged to perform unidimensionality tests
on all multiple-indicator constructs before assessing their reliability.

This reference is very important:

Gerbing, David W., Anderson, James C. An updated paadigm for scale
development incorporating unidimensionality and its assesment.

Best regards,

Alexandre Moura.
P.S. Please accept my apologies for my English mistakes.



- Original Message -
From: haytham siala [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, June 15, 2001 5:40 PM
Subject: Factor Analysis


 Hi,
 I will appreciate if someone can help me with this question: if factors
 extracted from a factor analysis were found to be reliable (using an
 internal consistency test like a Cronbach alpha), can they be used to
 represent a measure of the latent construct? If yes, are there any
 references or books that justify this technique?






 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-16 Thread Alexandre Moura

The complete reference:

Gerbing, David W., Anderson, James C. An updated paradigm for scale
development incorporating unidimensionality and its assesment. Journal of
Marketing Research. Vol. XXV (May 1988).

Alexandre Moura.

- Original Message -
From: Alexandre Moura [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, June 16, 2001 9:26 AM
Subject: Re: Factor Analysis


 Dear Haytham,

 other issue concern with a measure of the latent construct is the
 unidimensionality.  Hair et alli(1998): unidimensionality is an
assumption
 underlying the calculation of reliability and is demonstraded when
 indicators of a construct have acceptable fit on a
 single-factor(one-dimensional) model.(...) The use of reliability
measures,
 such Cronbach´s alpha, does not ensure unidimensionality but instead
assumes
 it exists. The researcher is encouraged to perform unidimensionality tests
 on all multiple-indicator constructs before assessing their reliability.

 This reference is very important:

 Gerbing, David W., Anderson, James C. An updated paadigm for scale
 development incorporating unidimensionality and its assesment.

 Best regards,

 Alexandre Moura.
 P.S. Please accept my apologies for my English mistakes.



 - Original Message -
 From: haytham siala [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Friday, June 15, 2001 5:40 PM
 Subject: Factor Analysis


  Hi,
  I will appreciate if someone can help me with this question: if factors
  extracted from a factor analysis were found to be reliable (using an
  internal consistency test like a Cronbach alpha), can they be used to
  represent a measure of the latent construct? If yes, are there any
  references or books that justify this technique?
 
 
 
 
 
 
  =
  Instructions for joining and leaving this list and remarks about
  the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
  =




 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-16 Thread W. D. Allen Sr.

There is medical research that shows marijuana is more lethal than tobacco
regarding lung cancer.

Maybe there is a correlation between lung cancer susceptibility and heart
attacks? We know there is for tobacco!

WDA

end

Paul Jones [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 There was some research recently linking heart attacks with
 Marijuana smoking.

 I'm trying to work out the correlation and, most
 importantly, its statistical significance.

 In essence the problem comes down to:

 Of 8760 hours in a year, 124 had heart attacks in them, 141
 had MJ smokes in them and 9 had both.

 What statistical tests apply?
 Most importantly, what is the statistical significance of
 the correlation between smoking MJ in any hour and having a
 heart attack in that same hour?
 What is the probablity that the null hypothesis (that
 smoking marijuana and having a heart attack are unrelated)
 can be rejected?
 How reliable are the results from a dataset of this size?

 I'm not very literate in maths and stats - please help me
 out someone. I'm interested in this research from the
 perspective of medicinal marijuana.

 Thanks and take care,
 Paul
 All About MS - the latest MS News and Views
 http://www.mult-sclerosis.org/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Normality in Factor Analysis

2001-06-16 Thread Eric Bohlman

In sci.stat.consult haytham siala [EMAIL PROTECTED] wrote:
 I have a question regarding factor analysis: Is normality an important
 precondition for using factor analysis?

It's necessary for testing hypotheses about factors extracted by 
Joreskog's maximum-likelihood method.  Otherwise, no.

 If no, are there any books that justify this.

Any book on factor analysis or multivariate statistics in general.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-16 Thread Jake Wildstrom

In article XhRW6.14316$[EMAIL PROTECTED],
W. D. Allen Sr. [EMAIL PROTECTED] wrote:
There is medical research that shows marijuana is more lethal than tobacco
regarding lung cancer.

Thanks for playing, but sorry, no.

There's a lot of research which says a lot of different things about
marijauna's deleterious effects on the lungs. Off the top of my head:

A Berkeley study of the late '70s concluded that marijuana is
one-and-a-half times as carcinogenic as tobacco. This assesment took
into account _only_ quantities of tar. Tar, while a carcinogen, is not
the primary cancer-causing agent in tobacco, or even close; polonium
210 and lead 210 are considerably more hazardous and conspicuously
absent from marijuana. Add to this the fact that marijuana smokers are
unlikely to consume nearly as much net weight smokable material as
tobacco smokers, and you're talking apples and oranges.

Actual tests on real live people bears this out. Multiple population
samples show no correlation between marijuana use exclusive of tobacco
use and lung cancer:

Tashkin, D.P. et al, Longitudinal Changes in Respiratory Symptoms and
Lung Function in Non-smokers, Tobacco Smokers, and Heavy, Habitual
Smokers of Marijuana With or Without Tobacco, pp 25-36 in G. Chesher
et al (eds), Marijuana: an International Research Report, Canberra:
Australian Government Publishing Service (1988).

Sherrill, D.L. et al, Respiratory Effects of Non-Tobacco Cigarettes:
A Longitudinal Study in General Population, International Journal of
Epidemiology 20: 132-37 (1991).

Fligiel, S.E.G. et al, Bronchial Pathology in Chronic Marijuana
Smokers: A Light Electron Microscope Study, Journal of Psychoactive
Drugs 20:33-42 (1988).

Maybe there is a correlation between lung cancer susceptibility and heart
attacks? We know there is for tobacco!

Well, inhaling smoke of _any_ sort actually puts some strain on your
heart. I believe specific toxins in tobacco exacerbate the problem,
but it's present for all types of smokables.


Of course, we're very off-topic here. Anyone want to crosspost this
thread to sci.med.*, or talk.politics.drugs?

+--First Church of Briantology--Order of the Holy Quaternion--+
| A mathematician is a device for turning coffee into |
| theorems.  -Paul Erdos  |
+-+
|   Jake Wildstrom|
+-+


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Pooled relative risks

2001-06-15 Thread David Duffy

JFC [EMAIL PROTECTED] wrote:

 I am concerned in the way of calculating  pooled relative risks, since
 it is interesting in some *meta-analytical* applications.

See eg Rothman Modern Epidemiology P 196 et seq.


-- 
| David Duffy. ,-_|\
| email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101/ *
| Epidemiology Unit, The Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia v 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Paul Jones

David C. Ullrich wrote:
 
 considerable benefit for neurogenic bladder problems,
 
 I did not know that, but I know that the topic is of considerable
 interest to people with various other conditions. 

Yes, recent work at the National Hospital of Neurology and
Neurosurgery in London, UK has shown that two cannibinoids
administered in a spray considerably reduce urinary
frequency and the number of time PwMS have to get up to pee
during the night (a big problem). The researcher I was
talking to said that there are cannibinoid receptors in the
bladder and the cortex but not in the micuration control
areas of the brainstem nor in the spinal cord.

 As is the
 fact that the Supreme Court seems to have decided that pi = 3
 again...

More like -6.

 Here I get a little lost again. Exactly what does it mean
 to say the relative risk is 4.8?

I assumed it meant event A happened 4.8 times as much as
would be expected if the two events were unrelated.

 And here again I'm _totally_ lost.

Okay, put it like this:

Of 1086240 trials, A happened in 17484 of them, B happened
in 124 and both A and B happened in 9.

I really need to know how to how to calculate the
statistical implications here. Please someone help me!

 What I want to know is what is the correlation between these
 two event?
 Most importantly, how statistically significant is the
 result?
 Can any reasonable conclusions be drawn from these data -
 esp, in view of the small dataset size?

Take care,
Paul
All About MS - the latest MS News and Views
http://www.mult-sclerosis.org/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Paul Jones

David Petry wrote:
 
 Keep in mind that correlation is not the same as causation.
 
 That's of particular importance in a study like this one.
 
 That is, if people are taking marijuana to treat pain and
 general discomfort, and if heart attacks are preceded by
 pain and discomfort, then there will be a strong correlation
 between marijuana use and later heart attacks, but it
 won't be proof of causation.

I know the study is flawed in ever so many ways. I just want
to get at the statistical implications. I wish I hadn't
memtioned marijuana or the trial. 

Please help me to find the appropriate statistical test
(e.g. two-tailed t-test, Spearman Rank correlation, chi2
test or whatever) and help me work out the statistical
significance of any correlation between events A and B
where:

In 1086240 trials, A happened in 17484 of them, B happened
in 124 and both A and B happened in 9.

Is there a statistical association between A and B? How
significant is that association? I would be ever so grateful
if someone could help.

Take care,
Paul
All About MS - the latest MS News and Views
http://www.mult-sclerosis.org/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Eamon

Paul Jones [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...

 snip
 
 So the research says that of a large number of people who
 had heart attacks at a centre, 124 people had used MJ in the
 year preceding the HA. Of these 9 reported that they had
 used MJ in the hour preceding the HA. All MJ users were
 questioned on the frequency with which they used MJ. The
 relative risk was reported as 4.8 - I used this to
 back-calculate that the average number of MJ usages per year
 rounded 141 - (9/n)/(115/(8760-n)) = 4.8

snip

ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°

Surely this whole research is based upon a false premise. Isn't it
like saying that 90%, say, of heroin users previously used soft drugs.
Therefore, soft-drug use usually leads to hard-drug use - which does
not logically follow. (A = B =/= B = A)

Conclusions drawn from the set of people who have had heart attacks
cannot be validly applied to the set of people who smoke dope.
Rather than collect data from a large number of people who had heart
attacks and look for a backward link, they should monitor a large
number of people who smoke dope. But, of course this is much more
expensive.


Just my humble tupennyworth,
Eamon


ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸¸,ø¤º°


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Socspace
In his redoubtable Re: Marijuana dated: 6/14/2001 5:47:23 PM Central Daylight 
Time, Jim Ferry wrote

I was surprised to see this subject heading on sci.math. I thought
it might have to do with the following lyrics (I forget the name of
the group and the song):

"I smoke two joints at two o' clock;
I smoke two joints at four.
I smoke two joints before I smoke two joints,
And then I smoke two more."

Given an infinite supply of marijuana, even granting immortality
to Cheech and Chong would not make the above feat possible. One
would need to have existed for an infinite amout of time.

And even then, smoking a joint takes at least one Planck time unit,
so if you plot on a time-line the points at which each joint-pair-
smoking finishes, there can't be any accumulation points. This
would seem to preclude any such feat of pot-smoking . . . unless
you somehow exist in a strange temporal topology (e.g., the long
line).

So then, how much marijuana would one have to smoke to actually
change the nature of (one's personal) time in such a way? I'm
guessing that no finite amount would suffice, but do not hazard
a guess as to the precise cardnality required. 

I read the above message by starting with the first word [I] and ending with 
the last word [required]. Therefore I read the whole thing along some sort 
of 
time line (to use Jim's term)

Now hear this (I don't mean that literally of course).:

I am utterly incapable of understanding the logic adduced in the last three
paragraphs that come after the first paragraph as well as after the lyric 
which 
comes after the first paragraph and before the last three paragraphs. 

So now my question to Jim becomes: is what I'm about to say possible 
according his line of thinking [pun intended] ? 

I perused one paragraph at the beginning, 
I perused three paragraphs at the end. 
I perused one paragraph before I perused three paragraphs 
And then I perused three paragraphs more. 

To my untutored mind 'tis entirely possible because the third and fourth lines
merely iterate the first and second. See the following exegesis :

I perused one paragraph [that at the beginning] before I perused three 
paragraphs [those at the end] and then [having perused the paragraph at the
beginning before going on to peruse the three paragraphs at the end] I perused
three paragraphs more [i.e. the last three paragraphs]

Now if you tell me that the aforegoing is impossible then --- whoopee ---
I have done the impossible; because that is precisely what I did.

Of course, you may retort: "Well hows come ya didunt say ya whatcha meant
from da git go?" To which I could only sigh and reply: "Because I was 
engaging
in a bit of whimsical wordplay, good sir --- and am inconsolably, 
irremediably, 
not to mention insincerely, sorry that you failed to comprehend what I was 
up to."

Now think about this Jim, think real hard!!! Which seems more plausible, that
that the unknown author(s) of the lyric you cite were 1) just having some fun
with words, or 2) that they were deliberately constructing a verbal paradox 
in 
the sense of a self-contradictory statement that at first seems true but could
be mathematically demonstrated to be false? 

Oh yes!!! If you opt for the second option, please support your decision 
mathematically. (I won't understand it of course. But what difference does
*that* make? I will nevertheless be tremenjusly impressed)

Su servidor [Sp: your servant --- but don't take that literally]

Harley Upchurch, M.D. (No no!!! Not medical doctor, mathematical dummy.) 
. 


Re: Marijuana

2001-06-15 Thread David C. Ullrich

On Fri, 15 Jun 2001 08:02:23 +0100, Paul Jones
[EMAIL PROTECTED] wrote:

David C. Ullrich wrote:
 
 considerable benefit for neurogenic bladder problems,
 
 I did not know that, but I know that the topic is of considerable
 interest to people with various other conditions. 

Yes, recent work at the National Hospital of Neurology and
Neurosurgery in London, UK has shown that two cannibinoids
administered in a spray considerably reduce urinary
frequency and the number of time PwMS have to get up to pee
during the night (a big problem). The researcher I was
talking to said that there are cannibinoid receptors in the
bladder and the cortex but not in the micuration control
areas of the brainstem nor in the spinal cord.

 As is the
 fact that the Supreme Court seems to have decided that pi = 3
 again...

More like -6.

 Here I get a little lost again. Exactly what does it mean
 to say the relative risk is 4.8?

I assumed it meant event A happened 4.8 times as much as
would be expected if the two events were unrelated.

 And here again I'm _totally_ lost.

Okay, put it like this:

Of 1086240 trials, A happened in 17484 of them, B happened
in 124 and both A and B happened in 9.

But analyzing it this way simply makes no sense. Those
trials you're talking about are _far_ from independent;
each trial is associated with a particular person, and
there will be a very strong correlation between various
trials for the same person at different hours.

I really need to know how to how to calculate the
statistical implications here. Please someone help me!

I know that the way you've been putting things makes
no sense. I suspect, but I don't know for sure, that
to get the sort of information you want you need more
data than what you've told us - you also need data on
how many people in the general population, without
heart attacks, do and do not smoke evil weeds.

 What I want to know is what is the correlation between these
 two event?
 Most importantly, how statistically significant is the
 result?
 Can any reasonable conclusions be drawn from these data -
 esp, in view of the small dataset size?

You keep asking this. The size of the dataset is not the
reason we cannot draw the sort of inferences you're 
interested in.

Take care,
Paul
All About MS - the latest MS News and Views
http://www.mult-sclerosis.org/



David C. Ullrich
*
Sometimes you can have access violations all the 
time and the program still works. (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Sturla Molden

On Fri, 15 Jun 2001 08:02:23 +0100, Paul Jones
[EMAIL PROTECTED] wrote:

Of 1086240 trials, A happened in 17484 of them, B happened
in 124 and both A and B happened in 9.

I really need to know how to how to calculate the
statistical implications here. Please someone help me!

It is simple to solve this problem using a Monte Carlo
simulation, that is, an approximate permutation test. 
I would gladly do that, but I need to know the frequency
of pot smoking among those 124. That is, how many hours 
each one spend smoking pot in a year. From this information 
we can calculate how likely that there will be 9 or more 
coincidences of smoking pot and having a heart attack 
given statistical independence. 

Sturla Molden



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Axel Harvey

On Thu, 14 Jun 2001, Chas F Brown wrote:

 Jim Ferry wrote:
  
  [ ... ] it might have to do with the following lyrics (I forget the
  name of the group and the song):
  
  I smoke two joints at two o' clock;
   I smoke two joints at four.
   I smoke two joints before I smoke two joints,
   And then I smoke two more.
 
 Suprisingly, the name of this song is Smoke Two Joints (by Sublime,
 available on the Mallrats Sound Track Album).

It is also a vague echo of one the Earl of Rochester's poems which
begins, I rise at Eleven, I dine about Two, / I get drunk before
Sev'n; and the next Thing I do... The rest is unpublishable in this
dignified company.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-15 Thread Wade Ramey

In article 
[EMAIL PROTECTED],
 Axel Harvey [EMAIL PROTECTED] wrote:

 It is also a vague echo of one the Earl of Rochester's poems which
 begins, I rise at Eleven, I dine about Two, / I get drunk before
 Sev'n; and the next Thing I do... The rest is unpublishable in this
 dignified company.

He designs a web site dedicated to his proof of FLT?

Wade


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-15 Thread Tracey Continelli

Hi there,

would someone please explain in lay person's terms the difference
betwn.
principal components, commom factors, and maximum likelihood
estimation
procedures for factor analyses?

Should I expect my factors obtained through maximum likelihood
estimation
tobe highly correlated?  Why?  When should I use a Maximum likelihood
estimation procedure, and when should I not use it?

Thanks.

Rita

[EMAIL PROTECTED]


Unlike the other methods, maximum likelihood allows you to estimate
the entire structural model *simultaneously* [i.e., the effects of
every independent variable upon every dependent variable in your
model].  Most other methods only permit you to estimate the model in
pieces, i.e., as a series of regressions whereby you regress every
dependent variable upon every independent variable that has an arrow
directly pointing to it.  Moreover, maximum likelihood actually
provides a statistical test of significance, unlike many other methods
which only provide generally accepted cut-off points but not an actual
test of statistical significance.  There are very few cases in which I
would use anything except a maximum likelihood approach, which you can
use in either LISREL or if you use SPSS you can add on the module AMOS
which will do this as well.


Tracey


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-15 Thread Timothy W. Victor

_Psychometric Theory_, by Jum Nunnally to name one.

haytham siala wrote:
 
 Hi,
 I will appreciate if someone can help me with this question: if factors
 extracted from a factor analysis were found to be reliable (using an
 internal consistency test like a Cronbach alpha), can they be used to
 represent a measure of the latent construct? If yes, are there any
 references or books that justify this technique?

-- 
Timothy Victor
[EMAIL PROTECTED]
Policy Research, Evaluation, and Measurement
Graduate School of Education
University of Pennsylvania


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-14 Thread Herman Rubin

In article 9g9k9f$h4c$[EMAIL PROTECTED],
Eric Bohlman [EMAIL PROTECTED] wrote:
In sci.stat.consult Tracey Continelli [EMAIL PROTECTED] wrote:
 value.  I'm not sure why you'd want to reduce the size of the data
 set, since for the most part the larger the N the better.

Actually, for datasets of the OP's size, the increase in power from the 
large size is a mixed blessing, for the same reason that many 
hard-of-hearing people don't terribly like wearing hearing aids: they 
bring up the background noise just as much as the signal.  With an N of 
one million, practically *any* effect you can test for is going to be 
significant, regardless of how small it is.


This just points out another stupidity of the use of 
significance testing.  Since the null hypothesis is
false anyhow, why should we care what happens to be
the probability of rejecting when it is true?

State the REAL problem, and attack this.  

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-14 Thread Rich Ulrich

On 13 Jun 2001 20:32:51 -0700, [EMAIL PROTECTED] (Tracey
Continelli) wrote:

 Sidney Thomas [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
  srinivas wrote:
   
   Hi,
   
 I have a problem in identifying the right multivariate tools to
   handle datset of dimension 1,00,000*500. The problem is still
   complicated with lot of missing data. can anyone suggest a way out to
   reduce the data set and  also to estimate the missing value. I need to
   know which clustering tool is appropriate for grouping the
   observations( based on 500 variables ).
 
 One of the best ways in which to handle missing data is to impute the
 mean for other cases with the selfsame value.  If I'm doing
 psychological research and I am missing some values on my depression
 scale for certain individuals, I can look at their, say, locus of
 control reported and impute the mean value.  Let's say [common
 finding] that I find a pattern - individuals with a high locus of
 control report low levels of depression, and I have a scale ranging
 from 1-100 listing locus of control.  If I have a missing value for
 depression at level 75 for one case, I can take the mean depression
 level for all individuals at level 75 of locus of control and impute
 that for all missing cases in which 75 is the listed locus of control
 value.  I'm not sure why you'd want to reduce the size of the data
 set, since for the most part the larger the N the better.

Do you draw numeric limits for a variable, and for a person?
Do you make sure, first, that there is not a pattern?

That is -- Do you do something different depending on
how many are missing?  Say, estimate the value, if it is an
oversight in filling blanks on a form, BUT drop a variable if 
more than 5% of responses are unexpectedly missing, since 
(obviously) there was something wrong in the conception of it, 
or the collection of it  Psychological research (possibly) 
expects fewer missing than market research.

As to the N -  As I suggested before - my computer takes 
more time to read  50 megabytes than one megabyte.  But
a psychologist should understand that it is easier to look at
and grasp and balance raw numbers that are only two or 
three digits, compared to 5 and 6.

A COMMENT ABOUT HUGE DATA-BASES.

And as a statistician, I keep noticing that HUGE databases
tend to consist of aggregations.  And these are random
samples only in the sense that they are uncontrolled, and 
their structure is apt to be ignored.

If you start to sample, to are more likely to ask yourself about 
the structure - by time, geography, what-have-you.  

An N of millions gives you tests that are wrong; estimates 
ignoring relevant structure have a spurious report of precision.
To put it another way: the Error  (or real variation) that *exists*
between a fixed number of units (years, or cities, for what I
mentioned above) is something that you want to generalize across.  
With a small N, that error term is (we assume?) small enough to 
ignore.  However, that error term will not decrease with N, 
so with a large N, it will eventually dominate.  The test 
based on N becomes increasing irrelevant

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Mr Unreliable

David C. Ullrich wrote in message [EMAIL PROTECTED]...

On Thu, 14 Jun 2001 15:22:25 +0100, Paul Jones
[EMAIL PROTECTED] wrote:

There was some research recently linking heart attacks with
Marijuana smoking.

I'm trying to work out the correlation and, most
importantly, its statistical significance.

In essence the problem comes down to:

Of 8760 hours in a year, 124 had heart attacks in them, 141
had MJ smokes in them and 9 had both.

What statistical tests apply?

None. What you've said here makes no sense - what does
it mean for an _hour_ to have MJ smoke?

If you're actually reporting on actual research it
would be interesting to know what the actual researchers
actually said - if there's actual research out there
that talks about the number of hours in a year containing
smoke that will be remarkable.

If otoh this is a homework question you should quote
the question more accurately. (If the homework question
_really_ reads _exactly_ the way you put it then you
should complain to whoever assigned it that it makes
no sense.)

Most importantly, what is the statistical significance of
the correlation between smoking MJ in any hour and having a
heart attack in that same hour?

Now this sounds more like you're talking about one
person. This is an actual person who actually had
124 heart attacks in one year? I doubt it.

What is the probablity that the null hypothesis (that
smoking marijuana and having a heart attack are unrelated)
can be rejected?
How reliable are the results from a dataset of this size?

I'm not very literate in maths and stats - please help me
out someone. I'm interested in this research from the
perspective of medicinal marijuana.

Fascinating topic. If this is not actually homework you
need to explain the question much more accurately.

The data presented may refer to a much-reported study. (See, for example,
http://www.eurekalert.org/releases/bidm-bsf022800.html
)


To quote from there:

The findings are the latest to emerge from a multicenter study of 3,882
patients who survived heart attacks. In this report, 124 people reported
using marijuana regularly. Of these, 37 people reported using marijuana
within 24 hours of their heart attacks, and nine smoked marijuana within an
hour of their heart attacks.

Note:

124 people...
9 within an hour...
And 3882/37 + 37 = 141

MU




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Paul Jones

Steve Leibel wrote:
 
 So the people who died from heart attacks weren't even considered in the
 study.  Perhaps of all the people who had heart attacks, recent mj use
 was statistically correlated with saving their lives.  That would be
 consistent with what you just described.  So the methodology sounds
 bogus.

That's not all - the MJ users had an excess of males,
cigarette smokers and obese people - all increased risks for
myocardial infarction. 

These articles rarely show statistical significance and it's
hard to get hold of the full text without paying loads for
it - besides, the full text might not quote p values. I want
to know how statistically significant the association is,
even given the studies obvious weaknesses. I need to know
how to calculate a p value. 

If anyone could help it would be of great value to myself
and a number of other PwMS.

Thanks and take care,
Paul
All About MS - the latest MS News and Views
http://www.mult-sclerosis.org/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Steve Leibel

In article 9galk6$fjr$[EMAIL PROTECTED],
 Mr Unreliable [EMAIL PROTECTED] wrote:

 David C. Ullrich wrote in message [EMAIL PROTECTED]...
 
 On Thu, 14 Jun 2001 15:22:25 +0100, Paul Jones
 [EMAIL PROTECTED] wrote:
 
 There was some research recently linking heart attacks with
 Marijuana smoking.
 
 I'm trying to work out the correlation and, most
 importantly, its statistical significance.
 
 In essence the problem comes down to:
 
 Of 8760 hours in a year, 124 had heart attacks in them, 141
 had MJ smokes in them and 9 had both.
 
 What statistical tests apply?
 
 None. What you've said here makes no sense - what does
 it mean for an _hour_ to have MJ smoke?
 
 If you're actually reporting on actual research it
 would be interesting to know what the actual researchers
 actually said - if there's actual research out there
 that talks about the number of hours in a year containing
 smoke that will be remarkable.
 
 If otoh this is a homework question you should quote
 the question more accurately. (If the homework question
 _really_ reads _exactly_ the way you put it then you
 should complain to whoever assigned it that it makes
 no sense.)
 
 Most importantly, what is the statistical significance of
 the correlation between smoking MJ in any hour and having a
 heart attack in that same hour?
 
 Now this sounds more like you're talking about one
 person. This is an actual person who actually had
 124 heart attacks in one year? I doubt it.
 
 What is the probablity that the null hypothesis (that
 smoking marijuana and having a heart attack are unrelated)
 can be rejected?
 How reliable are the results from a dataset of this size?
 
 I'm not very literate in maths and stats - please help me
 out someone. I'm interested in this research from the
 perspective of medicinal marijuana.
 
 Fascinating topic. If this is not actually homework you
 need to explain the question much more accurately.
 
 The data presented may refer to a much-reported study. (See, for example,
 http://www.eurekalert.org/releases/bidm-bsf022800.html
 )
 
 
 To quote from there:
 
 The findings are the latest to emerge from a multicenter study of 3,882
 patients who survived heart attacks. In this report, 124 people reported
 using marijuana regularly. Of these, 37 people reported using marijuana
 within 24 hours of their heart attacks, and nine smoked marijuana within an
 hour of their heart attacks.


So the people who died from heart attacks weren't even considered in the 
study.  Perhaps of all the people who had heart attacks, recent mj use 
was statistically correlated with saving their lives.  That would be 
consistent with what you just described.  So the methodology sounds 
bogus.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Mr Unreliable

David C. Ullrich wrote in message [EMAIL PROTECTED]...

On Thu, 14 Jun 2001 15:22:25 +0100, Paul Jones
[EMAIL PROTECTED] wrote:

There was some research recently linking heart attacks with
Marijuana smoking.

I'm trying to work out the correlation and, most
importantly, its statistical significance.

In essence the problem comes down to:

Of 8760 hours in a year, 124 had heart attacks in them, 141
had MJ smokes in them and 9 had both.

What statistical tests apply?

None. What you've said here makes no sense - what does
it mean for an _hour_ to have MJ smoke?

If you're actually reporting on actual research it
would be interesting to know what the actual researchers
actually said - if there's actual research out there
that talks about the number of hours in a year containing
smoke that will be remarkable.

If otoh this is a homework question you should quote
the question more accurately. (If the homework question
_really_ reads _exactly_ the way you put it then you
should complain to whoever assigned it that it makes
no sense.)

Most importantly, what is the statistical significance of
the correlation between smoking MJ in any hour and having a
heart attack in that same hour?

Now this sounds more like you're talking about one
person. This is an actual person who actually had
124 heart attacks in one year? I doubt it.

What is the probablity that the null hypothesis (that
smoking marijuana and having a heart attack are unrelated)
can be rejected?
How reliable are the results from a dataset of this size?

I'm not very literate in maths and stats - please help me
out someone. I'm interested in this research from the
perspective of medicinal marijuana.

Fascinating topic. If this is not actually homework you
need to explain the question much more accurately.

The data presented may refer to a much-reported study. (See, for example,
http://www.eurekalert.org/releases/bidm-bsf022800.html
)


To quote from there:

The findings are the latest to emerge from a multicenter study of 3,882
patients who survived heart attacks. In this report, 124 people reported
using marijuana regularly. Of these, 37 people reported using marijuana
within 24 hours of their heart attacks, and nine smoked marijuana within an
hour of their heart attacks.

Note:

124 people...
9 within 24 hours...
And 3882/37 + 37 = 141

MU




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Paul Jones

Thanks for replying, David. I'll try to frame the problem
better.

First, I shall explain my motivations. 

There has recently been some research that implied that
smoking MJ increased risk of heart attack in the hour
following the heart attack. I haven't got the full text of
the article - I've just seen the abstract, the press
releases and resultant press coverage. There is a lot of
dodgy research research and I want to know how statistically
valid this research is.

As you can imagine this topic is of great interest to people
who use medicinal marijuana for multiple sclerosis as it has
considerable benefit for neurogenic bladder problems,
neuropathic pain and muscle spasms. The headline that MJ may
increase heart attack risk in the hour following smoking it
is extremely pertinent to people with MS. This explains my
motives. This is not homework - I have MS.

So the research says that of a large number of people who
had heart attacks at a centre, 124 people had used MJ in the
year preceding the HA. Of these 9 reported that they had
used MJ in the hour preceding the HA. All MJ users were
questioned on the frequency with which they used MJ. The
relative risk was reported as 4.8 - I used this to
back-calculate that the average number of MJ usages per year
rounded 141 - (9/n)/(115/(8760-n)) = 4.8

I see an immediate mistake in what I wrote before - I have
used the average Med MJ smokes but the total heart attacks.
Restating the problem:

Event A is smoking MJ.
Event B is having HA. 
Let's assume that both events can only happen once per hour
and that each person only had one HA.

Of 1,086,240 hours, A happened 17,484 times, B happened 124
times and both A and B happened 9 times.

What I want to know is what is the correlation between these
two event? 
Most importantly, how statistically significant is the
result? 
Can any reasonable conclusions be drawn from these data -
esp, in view of the small dataset size?

I would appreciate being corrected. 

Take care,
Paul
All About MS - the latest MS News and Views
http://www.mult-sclerosis.org/


David C. Ullrich wrote:
 
 On Thu, 14 Jun 2001 15:22:25 +0100, Paul Jones
 [EMAIL PROTECTED] wrote:
 
 There was some research recently linking heart attacks with
 Marijuana smoking.
 
 I'm trying to work out the correlation and, most
 importantly, its statistical significance.
 
 In essence the problem comes down to:
 
 Of 8760 hours in a year, 124 had heart attacks in them, 141
 had MJ smokes in them and 9 had both.
 
 What statistical tests apply?
 
 None. What you've said here makes no sense - what does
 it mean for an _hour_ to have MJ smoke?
 
 If you're actually reporting on actual research it
 would be interesting to know what the actual researchers
 actually said - if there's actual research out there
 that talks about the number of hours in a year containing
 smoke that will be remarkable.
 
 If otoh this is a homework question you should quote
 the question more accurately. (If the homework question
 _really_ reads _exactly_ the way you put it then you
 should complain to whoever assigned it that it makes
 no sense.)
 
 Most importantly, what is the statistical significance of
 the correlation between smoking MJ in any hour and having a
 heart attack in that same hour?
 
 Now this sounds more like you're talking about one
 person. This is an actual person who actually had
 124 heart attacks in one year? I doubt it.
 
 What is the probablity that the null hypothesis (that
 smoking marijuana and having a heart attack are unrelated)
 can be rejected?
 How reliable are the results from a dataset of this size?
 
 I'm not very literate in maths and stats - please help me
 out someone. I'm interested in this research from the
 perspective of medicinal marijuana.
 
 Fascinating topic. If this is not actually homework you
 need to explain the question much more accurately.
 
 Thanks and take care,
 Paul
 All About MS - the latest MS News and Views
 http://www.mult-sclerosis.org/
 
 David C. Ullrich
 *
 Sometimes you can have access violations all the
 time and the program still works. (Michael Caracena,
 comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: multivariate techniques for large datasets

2001-06-14 Thread S. F. Thomas

Herman Rubin wrote:
 
 In article 9g9k9f$h4c$[EMAIL PROTECTED],
 Eric Bohlman [EMAIL PROTECTED] wrote:
 In sci.stat.consult Tracey Continelli [EMAIL PROTECTED] wrote:
  value.  I'm not sure why you'd want to reduce the size of the data
  set, since for the most part the larger the N the better.
 
 Actually, for datasets of the OP's size, the increase in power from the
 large size is a mixed blessing, for the same reason that many
 hard-of-hearing people don't terribly like wearing hearing aids: they
 bring up the background noise just as much as the signal.  With an N of
 one million, practically *any* effect you can test for is going to be
 significant, regardless of how small it is.
 
 This just points out another stupidity of the use of
 significance testing.  Since the null hypothesis is
 false anyhow, why should we care what happens to be
 the probability of rejecting when it is true?
 
 State the REAL problem, and attack this.

How true! The only drawback there can be to more rather than less
data for inferential purposes would have to center around the extra
cost of computation, rather than the inconvenience posed to
significance testing methodology. 

There is a significant philosophical question lurking here. It is a
reminder of how we get so attached to the tools we use that we
sometimes turn their bugs into features. Significance testing is a
make-do construction of classical statistical inference, in some
sense an indirect way of characterizing the uncertainty surrounding a
parameter estimate. The Bayesian approach of attempting to
characterize such uncertainty directly, rather than indirectly, and
further of characterizing directly, through some function
transformation of the parameter in question, the uncertainty
surrounding some consequential loss or profit function critical to
some real-world decision, is clearly laudable... if it can be
justified. 

Clearly, from a classicist's perspective, the Bayesians have failed
at this attempt at justification, otherwise one would have to be a
masochist to stick with the sheer torture of classical inferential
methods. Besides, the Bayesians indulge not a little in turning bugs
into features themselves. 

At any rate, I say all that to say this: once it is recognized that
there is a valid (extended) likelihood calculus, as easy of
manipulation as the probability calculus in attempting a direct
characterization of the uncertainty surrounding statistical model
parameters, the gap between these two ought to be closed. 

I'm not holding my breath, as this may take several generations. We
all reach for the tool we know how to use, not necessarily for the
best tool for the job. 

 Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
 [EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558

Regards,
S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Jim Ferry

I was surprised to see this subject heading on sci.math.  I thought
it might have to do with the following lyrics (I forget the name of
the group and the song):

I smoke two joints at two o' clock;
 I smoke two joints at four.
 I smoke two joints before I smoke two joints,
 And then I smoke two more.

Given an infinite supply of marijuana, even granting immortality
to Cheech and Chong would not make the above feat possible.  One
would need to have existed for an infinite amout of time.

And even then, smoking a joint takes at least one Planck time unit,
so if you plot on a time-line the points at which each joint-pair-
smoking finishes, there can't be any accumulation points.  This
would seem to preclude any such feat of pot-smoking . . . unless
you somehow exist in a strange temporal topology (e.g., the long
line).

So then, how much marijuana would one have to smoke to actually
change the nature of (one's personal) time in such a way?  I'm
guessing that no finite amount would suffice, but do not hazard
a guess as to the precise cardnality required.

| Jim Ferry  | Center for Simulation  |
++  of Advanced Rockets   |
| http://www.uiuc.edu/ph/www/jferry/ ++
|jferry@[delete_this]uiuc.edu| University of Illinois |


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread David C. Ullrich

On Thu, 14 Jun 2001 16:37:02 +0100, Mr Unreliable
[EMAIL PROTECTED] wrote:

David C. Ullrich wrote in message [EMAIL PROTECTED]...

On Thu, 14 Jun 2001 15:22:25 +0100, Paul Jones
[EMAIL PROTECTED] wrote:

There was some research recently linking heart attacks with
Marijuana smoking.
[...]

Fascinating topic. If this is not actually homework you
need to explain the question much more accurately.

The data presented may refer to a much-reported study. (See, for example,
http://www.eurekalert.org/releases/bidm-bsf022800.html
)


To quote from there:

The findings are the latest to emerge from a multicenter study of 3,882
patients who survived heart attacks. In this report, 124 people reported
using marijuana regularly. Of these, 37 people reported using marijuana
within 24 hours of their heart attacks, and nine smoked marijuana within an
hour of their heart attacks.

Right. Seems to me (although I really know nothing about this
sort of thing) that to draw any reliable conclusions (not that
_you_'d care about that) we need to know a little more, like what
fraction of the people who did _not_ get heart attacks smoke,
regularly or otherwise.

Note:

124 people...
9 within an hour...
And 3882/37 + 37 = 141

MU





David C. Ullrich
*
Sometimes you can have access violations all the 
time and the program still works. (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-14 Thread Jay Warner


Brother! That topic sure drew a crowd! :)
Paul Jones wrote:
There was some research recently linking heart attacks
with
Marijuana smoking.
[big snip]
Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com
The A2Q Method (tm) -- What do you want to improve today?






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


Re: Marijuana

2001-06-14 Thread David Petry


Paul Jones wrote ...

So the research says that of a large number of people who
had heart attacks at a centre, 124 people had used MJ in the
year preceding the HA. Of these 9 reported that they had
used MJ in the hour preceding the HA. All MJ users were
questioned on the frequency with which they used MJ.

Keep in mind that correlation is not the same as causation.

That's of particular importance in a study like this one.

That is, if people are taking marijuana to treat pain and
general discomfort, and if heart attacks are preceded by
pain and discomfort, then there will be a strong correlation
between marijuana use and later heart attacks, but it
won't be proof of causation.






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Average Distance to nearest neighbour

2001-06-13 Thread John Garber

Hello!
Thank you very much for answers. 
But I also wamt to know pdf function for this distance.
And if you know, please, give me references to books 
where I can see this formula.

Thank, John Gerber
  
   John Garber [EMAIL PROTECTED] wrote in message
I am looking for a solution of the following problem:
Assume a square area with sides of length L. N points are randomly
  distributed
 within area. The location of each point is independent of other
  points.
The location of a point is a uniform random variable - a point is
equally likely to be anywhere within the square.
   
Find the expected value of the distance from a randomly selected
point to its nearest neighbor.
   
Thanks, John Gerber


 
 
 
 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: please help

2001-06-13 Thread S. F. Thomas

Kelly wrote:
 
 I have the gage repeatability  reproducibility(gage RR) analysis
 done on two instruments, what hyphoses test can I use to test that the
 repeatability variance(expected sigma values of repeatability) of the
 two instruments are significantly different form each other or to say
 one has a lower variance than the other.
 Any insight will be greatly appreciated.
 Thanks in advance for your help.

One approach is to form the likelihood function in each case and to
eliminate the nuisance parameters (the means) by marginalization.
Although it is well known that marginalization by maximization will
give misleading answers for both the location and precision of your
estimate of the variances, I have shown how another method based on
marginalization by the rule of product-sum can avoid the problems
known to exist with respect to the former. (See _Fuzziness and
Probability_ (ACG Press, 1995)). This method also avoids the
assumptions of the Bayesian approach -- effectively a method of
marginalization by integration -- which have been considered and
rejected, and with good reason in my opinion, by those of the
classical school. The product-sum method may be relatively easily
implemented within an extensible stat package such as R, and I would
be happy to apply my implementation of it to your problem if you
would send me the two datasets. Essentially, once the nuisance
parameters (the one or more means) are eliminated, what is left in
each case is the (marginal) likelihood function of the variance, and
one could effectively compare directly the plots of the two variance
marginal likelihoods, and also, if need be, the likelihood function
of the difference, to see how different this is from zero. This is
not a classicist's answer, but tests of hypothesis and all that can
be obviated if the likelihood function can be directly manipulated in
the way I describe. This has been the whole point of the Bayesian
method, except of course for the inadequate justification provided
not only for its insistent subjectiveness, but also for treating
model parameters as though they were random variables in their own
right. Hope this is helpful.

Regards,
S. F. Thomas


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



  1   2   3   4   5   6   >