subject:"A question"

Re: Applied analysis question

2002-03-03 Thread Rich Ulrich



On 28 Feb 2002 07:37:16 -0800, [EMAIL PROTECTED] (Brad Anderson)
wrote:
 Rich Ulrich [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
  On 27 Feb 2002 11:59:53 -0800, [EMAIL PROTECTED] (Brad Anderson)
  wrote:
BA
   I have a continuous response variable that ranges from 0 to 750.  I
   only have 90 observations and 26 are at the lower limit of 0, which is
   the modal category.  The mean is about 60 and the median is 3; the
   distribution is highly skewed, extremely kurtotic, etc.  Obviously,
   none of the power transformations are especially useful.  The product
  
[ snip, my own earlier comments ]
BA 
 I should have been more precise.  It's technically a count variable
 representing the number of times respondents report using dirty
 needles/syringes after someone else had used them during the past 90
 days.  Subjects were first asked to report the number of days they had
 injected drugs, then the average number of times they injected on
 injection days, and finally, on how many of those total times they had
 used dirty needles/syringes.  All of the subjects are injection drug
 users, but not all use dirty needles.  The reliability of reports near
 0 is likely much better than the reliability of estimates near 750. 
 Indeed, substantively, the difference between a 0 and 1 is much more
 significant than the difference between a 749 and a 750--0 represents
 no risk, 1 represents at least some risk, and high values--regardless
 of the precision, represent high risk.

Okay, here is a break for some comment by me.

There are two immediate aims of analyses:  to show that
results are extreme enough that they don't happen by 
chance - statistical testing;  and to characterize the results 
so that people can understand them - estimation.

When the mean is 60 and the median is 3, giving report 
on averages, as if they were reports on central tendencies,
 is not going to help much with either aim.  If you 
want to look at outcomes, you make groups (as you did)
that seem somewhat homogeneous.  0 (if it is). 1.  2-3
eventually, your top group of 90+, which comes out to
'daily',  seems reasonable as a top-end.  Using groups 
ought to give you a robust test, whatever you are testing,
unless those distinctions between 10 and 500 needle-sticks
become important.  Using groups also lets you inspect, 
in particular, the means for 0, 1, 2 and 3.

I started thinking that the dimension is something like 
'promiscuous use of dirty needles';  and I realized that
an analogy to risky sex was not far wrong.  Or, at any rate,
doesn't seem far wrong to me.  But your  measure 
(the one that you mention, anyway) does not distinguish
between 1 act each with 100 risky partners, and 100 acts 
with one. 

Anyway, one way to describe the groups is to have some
experts place the reports of behaviors into 'risk-groups'.
Or assign the risk scores.   Assuming that those scores do
describe your sample, without great non-normality, you 
should be able to use averages of risk-scores for a technical
level of testing and reporting, and convert them back to the
verbal anchor-descriptions in order to explain what they mean.


[ ...Q about zero; kurtosis.]
RU  
  Categorizing the values into a few categories labeled, 
  none, almost none,   is one way to convert your scores.  
  If those labels do make sense.
 
 Makes sense at the low end 0 risk.  And at the high end I used 90+
 representing using a dirty needle/syringe once a day or more often. 
 The 2 middle categories were pretty arbitrary.

[ snip, other procedures ]

 One of the other posters asked about the appropriate error term--I
 guess that lies at the heart of my inquiry.  I have no idea what the
 appropriate error term would be, and to best model such data.  I often
 deal with similar response variables that have distributions in which
 observations are clustered at 1 or both ends of the continuum.  In
 most cases, these distributions are not even approximately unimodal
 and a bit skewed--variables for which normalizing power
 transformations make sense.  Additionally, these typically aren't
 outcomes that could be thought of as being generated by a gaussian
 process.

Can you describe them usefully?  What is the shape of
the behaviors that you observe or expect, corresponding to
the drop-off of density near either extreme?

 In some cases I think it makes sense to consider poisson and
 generalizations of poisson processes although there is clearly much
 greater between subject heterogeneity than assumed by a poisson
 process.  I estimated poission and negative binomial regression
 models--there was compelling evidence that the poission was
 overdispersed.  I also used a Vuong statistic to compare NB regression

[ snip, more detail ]

 I think a lot of folks just run standard analyses or arbitrarily apply
 some normalizing transformation because that's whats done in their
 field.  Then report the results without really examining the
 underlying

Re: Applied analysis question

2002-03-01 Thread Eric Bohlman


Rolf Dalin [EMAIL PROTECTED] wrote:
 Brad Anderson wrote:

 I have a continuous response variable that ranges from 0 to 750.  I only
 have 90 observations and 26 are at the lower limit of 0, 

 What if you treated the information collected by that variable as really
 two variables, one categorical variable indicating zero or non-zero value.
 Then the remaining numerical variable could only be analyzed conditionally
 on the category was non-zero.

 In many cases when you collect data on consumers consumption of 
 some commodity, you would end up in a big number of them not 
 using the product at all, while those who used the product would 
 consume different amounts.

IIRC, your example is exactly the sort of situation for which Tobit 
modelling was invented.



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Applied analysis question

2002-03-01 Thread Brad Anderson


[EMAIL PROTECTED] (Eric Bohlman) wrote in message 
news:a5o5b1$fi0$[EMAIL PROTECTED]...
 Rolf Dalin [EMAIL PROTECTED] wrote:
 
 IIRC, your example is exactly the sort of situation for which Tobit 
 modelling was invented.

Considered that (actually estimated a couple of Tobit models and if I
use a log transformed or box-cox transformed response the results are
consistent with the ordinal logit I originally described) but Tobt
assumes a normally distributed censored response -- the observed
distribution for the non-zero responses is not approximately normal
(even with transformations) and I don't think it's reasonable to
assume the errors are generated by an underlying gaussian process.  My
understanding of the Tobit model is that it's not especially robust to
violations of the this assumption.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Applied analysis question

2002-02-28 Thread Brad Anderson


Rich Ulrich [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
 On 27 Feb 2002 11:59:53 -0800, [EMAIL PROTECTED] (Brad Anderson)
 wrote:
 
  I have a continuous response variable that ranges from 0 to 750.  I
  only have 90 observations and 26 are at the lower limit of 0, which is
  the modal category.  The mean is about 60 and the median is 3; the
  distribution is highly skewed, extremely kurtotic, etc.  Obviously,
  none of the power transformations are especially useful.  The product
 
 I guess it is 'continuous'  except for having 26 ties at 0.  
 I have to wonder how that set of scores arose, and also, 
 what should a person guess about the *error*  associated
 with those:   Are the numbers near 750  measured with
 as much accuracy as the numbers near 3?

I should have been more precise.  It's technically a count variable
representing the number of times respondents report using dirty
needles/syringes after someone else had used them during the past 90
days.  Subjects were first asked to report the number of days they had
injected drugs, then the average number of times they injected on
injection days, and finally, on how many of those total times they had
used dirty needles/syringes.  All of the subjects are injection drug
users, but not all use dirty needles.  The reliability of reports near
0 is likely much better than the reliability of estimates near 750. 
Indeed, substantively, the difference between a 0 and 1 is much more
significant than the difference between a 749 and a 750--0 represents
no risk, 1 represents at least some risk, and high values--regardless
of the precision, represent high risk.
 
 How do zero scores arise?  Is this truncation;  the limit of
 practical measurement;  or just what?

Zero scores are logical and represent no risk, negative values are not
logical.
 
 Extremely kurtotic, you say.  That huge lump at 0 and skew
 is not consistent with what I think of as kurtosis, but I guess
 I have not paid attention to kurtosis at all, once I know that
 skewness is extraordinary.

True, the kurtosis statistic exceeded 11, and and a plot against the
normal indicates a huge lump in the low end of the tail, and also a
larger proportion of very high values than expected.
 
 Categorizing the values into a few categories labeled, 
 none, almost none,   is one way to convert your scores.  
 If those labels do make sense.

Makes sense at the low end 0 risk.  And at the high end I used 90+
representing using a dirty needle/syringe once a day or more often. 
The 2 middle categories were pretty arbitrary.

If I analyze a contingency Table using the 4-category response and a
3-category measure of the primary covariate (categories defined using
clinically meaningful categories, the association is quite strong
and I used the exact p-value associated with the CMH difference in row
means test (using SAS) and the association is signficant.  I also used
the 3-category predictor and the procedures outlined by Stokes et al.
(2000) to estimate a rank analysis of covariance--again with
consistent results.

I've also run a few other analyses I didn't describe.  I used the
Box-Cox procedure to find a power transformation.  Although the
skewness statistic then looks great, the distribution is still not
approximately normal.  However, a regression using the transformed
variable is consistent with the ordered logit and the contingency
table analysis.

One of the other posters asked about the appropriate error term--I
guess that lies at the heart of my inquiry.  I have no idea what the
appropriate error term would be, and to best model such data.  I often
deal with similar response variables that have distributions in which
observations are clustered at 1 or both ends of the continuum.  In
most cases, these distributions are not even approximately unimodal
and a bit skewed--variables for which normalizing power
transformations make sense.  Additionally, these typically aren't
outcomes that could be thought of as being generated by a gaussian
process.

In some cases I think it makes sense to consider poisson and
generalizations of poisson processes although there is clearly much
greater between subject heterogeneity than assumed by a poisson
process.  I estimated poission and negative binomial regression
models--there was compelling evidence that the poission was
overdispersed.  I also used a Vuong statistic to compare NB regression
with zero-inflated NB regression--the results support the
zero-inflated model.  The model standard errors for a zero-inflated
model are wildly different than the Huber-White sandwich robust
standard errors.  The later give results that are fairly consistent
with the ordered logit, the model based standard errors are
huge--given that these are asymptotic statistics and I have a
relatively small sample, I don't really trust either.

I think a lot of folks just run standard analyses or arbitrarily apply
some normalizing transformation because that's whats done in

Re: Applied analysis question

2002-02-28 Thread Dennis Roberts


At 07:37 AM 2/28/02 -0800, Brad Anderson wrote:

I think a lot of folks just run standard analyses or arbitrarily apply
some normalizing transformation because that's whats done in their
field.  Then report the results without really examining the
underlying distributions.  I'm curious how folks procede when they
encounter very goofy distrubions.  Thanks for your comments.

i think the lesson to be gained from this is that, we seem to be focusing 
on (or the message that students and others get) getting the analysis DONE 
and summarizied ... and with most standard packages ... that is relatively 
easy to do

for example, you talk about a simple regression analysis and then show them 
in minitab that you can do that like: mtb regr 'height' 1 'weight' and, 
when they do it, lots of output comes out BUT, the first thing is the best 
fitting straight line equation like:

The regression equation is
Weight = - 205 + 5.09 Height

and THAT's where they start AND stop (more or less)

while software makes it rather easy to do lots of prelim inspection of 
data, it also makes it very easy to SKIP all that too

before we do any serious analysis ... we need to LOOK at the data ... 
carefully ... make some scatterplots (to check for outliers, etc.), to look 
at some frequency distributions ON the variables, to even just look at the 
means and sds ... to see if some serious restriction of range issue pops up 
...

THEN and ONLY then, after we get a feel for what we have ... THEN and ONLY 
then should we be doing the main part of our analysis ... ie, testing some 
hypothesis or notion WITH the data (actually, i might call the prelims the 
MAIN part but, others might disagree)

we put the cart before the horse ... in fact, we don't even pay any 
attention to the horse

unfortunately, far too much of this is caused by the dominant and 
preoccupation of doing significance tests so we run routines that give us 
these p values and are done with it ... without  paying ANY attention to 
just looking at the data

my 2 cents worth



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
=

Dennis Roberts, 208 Cedar Bldg., University Park PA 16802
Emailto: [EMAIL PROTECTED]
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Applied analysis question

2002-02-28 Thread Rich Ulrich


On 27 Feb 2002 14:14:44 -0800, [EMAIL PROTECTED] (Dennis Roberts) wrote:

 At 04:11 PM 2/27/02 -0500, Rich Ulrich wrote:
 
 Categorizing the values into a few categories labeled,
 none, almost none,   is one way to convert your scores.
 If those labels do make sense.

 well, if 750 has the same numerical sort of meaning as 0 (unit wise) ... in 
 terms of what is being measured then i would personally not think so SINCE, 
 the categories above 0 will encompass very wide ranges of possible values
[ ... ]

Frankly, the question is about meaning of numbers, 
and I would to ask it.

I don't expect a bunch of zeros, with 3 as median, and 
values up to 750.  Numbers like that *might*  reflect,
say, the amount of gold detected in some assays.  
Then, you want to know the handful of locations with 
numbers near 750.  If any of the numbers at all are big
enough to be interesting.

Data like those are  *not*  apt to be congenial for taking means.  
And if 750 is meaningful, using ranks is apt to be nonsensical, too.


In this example, the median was 3.
Does *that*  represent a useful interval from 0?  - if so, *that* 
tells me scaling or scoring is probably not  well-chosen.

Is there a large range of 'meaning'  between 0 and non-zero?  
Is there a range of meaning concealed within zero?
Zero children as outcome of a marriage can reflect 
(a) a question being asked too early; 
(b) unfortunate happenstance; or 
(c) personal choice
 - categories, within 0, and none of them are necessarily
a good 'interval'  from the 1, 2, 3...  answers.  But that 
(further) depends on what questions are being asked.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Applied analysis question

2002-02-27 Thread Brad Anderson


I have a continuous response variable that ranges from 0 to 750.  I
only have 90 observations and 26 are at the lower limit of 0, which is
the modal category.  The mean is about 60 and the median is 3; the
distribution is highly skewed, extremely kurtotic, etc.  Obviously,
none of the power transformations are especially useful.  The product
moment correlation between the response and the primary covariate is
near zero, however, a rank-order correlation coefficient is about .3
and is signficant.  We have 5 additional control variables.  I'm
convinced that any attempt to model the conditional mean response is
completely inappropriate, yet all of the alternatives appear flawed as
well.  Here's what I've done:

I've collapsed the outcome into 3- and 4- category ordered response
variables and estimated ordered logit models.  I dichotomized the
response (any vs none) and estimated binomial logit.  All of these
approaches yield substantively consistent results using both the model
based standard errors and the Huber-White sandwich robust standard
errors.  My concerns about this approach are 1) the somewhat arbitrary
classification restricts the observed variability, and 2) the
estimators assume large sample sizes.

I rank transformed the response variable and estimated a robust
regression (using the rreg procedure in Stata)--results were
consistent with those obtained for the ordered and binomial logit
models described above.  I know that Stokes, Davis, and Koch have
presented procedures to estimate analysis of covariance on ranks, but
I've not seen reference to the use of rank transformed response
variables in a regression context.

A plot of the rank-transformed response with the primary covariate
clearly suggests a meaningful pattern.  Contingency table analysis
with a collapsed covariate strongly suggest a meaningful pattern.  But
I'm at something of a loss to know the best way to analyze and report
the results.  Thanks in advance.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Applied analysis question

2002-02-27 Thread Dennis Roberts


At 04:11 PM 2/27/02 -0500, Rich Ulrich wrote:

Categorizing the values into a few categories labeled,
none, almost none,   is one way to convert your scores.
If those labels do make sense.

well, if 750 has the same numerical sort of meaning as 0 (unit wise) ... in 
terms of what is being measured then i would personally not think so SINCE, 
the categories above 0 will encompass very wide ranges of possible values

if the scale was # of emails you look at in a day ... and 1/3 said none or 
0 ... we could rename the scale 0 = not any, 1 to 50 as = some, and 51 to 
750 as = many (and recode as 1, 2, and 3) .. i don't think anyone who just 
saw the labels ... and were then asked to give some extemporaneous 'values' 
for each of the categories ... would have any clue what to put in for the 
some and many categories ... but i would predict they would seriously 
UNderestimate the values compared to the ACTUAL responses

this just highlights that for some scales, we have almost no 
differentiation at one end where they pile up ... perhaps (not saying one 
could have in this case) we could have anticipated this ahead of time and 
put scale categories that might have anticipated that

after the fact, we are more or less dead ducks

i would say this though ... treating the data only in terms of ranks ... 
does not really solve anything ... and clearly represents being able to say 
LESS about your data or interrelationships (even if the rank order r is .3 
compared to the regular pearson of about 0) ... than if you did not resort 
to only thinking about the data in rank terms




--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
=

Dennis Roberts, 208 Cedar Bldg., University Park PA 16802
Emailto: [EMAIL PROTECTED]
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Applied analysis question

2002-02-27 Thread Rolf Dalin


Brad Anderson wrote:

 I have a continuous response variable that ranges from 0 to 750.  I only
 have 90 observations and 26 are at the lower limit of 0, 

What if you treated the information collected by that variable as really
two variables, one categorical variable indicating zero or non-zero value.
Then the remaining numerical variable could only be analyzed conditionally
on the category was non-zero.

In many cases when you collect data on consumers consumption of 
some commodity, you would end up in a big number of them not 
using the product at all, while those who used the product would 
consume different amounts.

Rolf Dalin
**
Rolf Dalin
Department of Information Tchnology and Media
Mid Sweden University
S-870 51 SUNDSVALL
Sweden
Phone: 060 148690, international: +46 60 148690
Fax: 060 148970, international: +46 60 148970
Mobile: 0705 947896, intnational: +46 70 5947896

mailto:[EMAIL PROTECTED]
http://www.itk.mh.se/~roldal/
**


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on Conditional PDF

2002-02-25 Thread Glen Barnett



Chia C Chong [EMAIL PROTECTED] wrote in message
a5d38d$63e$[EMAIL PROTECTED]">news:a5d38d$63e$[EMAIL PROTECTED]...


 Glen [EMAIL PROTECTED] wrote in message
 [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
  Do you want to make any assumptions about the form of the conditional,
  or the joint, or any of the marginals?

 Well, the X  Y are dependent and hence there are being descibed by a joint
 PDF.

This much is clear.

 I am not sure what other assumption I can make though..

I merely though you may have domain specific knowledge of the variables and
their likely relationships which might inform the choice a bit (cut down the
space
of possibilities).

Can you at least indicate whether any of them are restricted to be positive?

Glen




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on Conditional PDF

2002-02-25 Thread Chia C Chong



Glen Barnett [EMAIL PROTECTED] wrote in message
a5dev7$8jn$[EMAIL PROTECTED]">news:a5dev7$8jn$[EMAIL PROTECTED]...

 Chia C Chong [EMAIL PROTECTED] wrote in message
 a5d38d$63e$[EMAIL PROTECTED]">news:a5d38d$63e$[EMAIL PROTECTED]...
 
 
  Glen [EMAIL PROTECTED] wrote in message
  [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
   Do you want to make any assumptions about the form of the conditional,
   or the joint, or any of the marginals?
 
  Well, the X  Y are dependent and hence there are being descibed by a
joint
  PDF.

 This much is clear.

  I am not sure what other assumption I can make though..

 I merely though you may have domain specific knowledge of the variables
and
 their likely relationships which might inform the choice a bit (cut down
the
 space
 of possibilities).

 Can you at least indicate whether any of them are restricted to be
positive?


All values of X and Z are positive while Y can have both positive and
negative values.
In fact, X has the range span from 0 to 250 (time) and Y has values that
span from -60 to +60 (angle) and Z has some positive values. Note that, the
joint PDF of X  Y was defined as f(X,Y)=f(Y|X)f(X) in which f(Y|X) is a
conditional Gaussian PDF and f(X) is an exponential PDF. The plot of the 3rd
variable, Z (Power)  i.e. Z vs X and Z vs.Y, respectively shows that Z has
some kind of dependency on X and Y, hence, my original post was asking the
possible method of finding the conditional PDF of Z on both X and Y. I hope
this makes things a little bit clearer or more complicated???


Thanks..

CCC

 Glen






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on CDF

2002-02-22 Thread Glen Barnett



Henry [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 On Fri, 22 Feb 2002 08:55:42 +1100, Glen Barnett
 [EMAIL PROTECTED] wrote:

 Bob [EMAIL PROTECTED] wrote in message
 [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
  A straight line CDF would imply the data is uniformly distributed,
  that is, the probability of one event is the same as the probability
  of any other event.  The slope of the line would be the probability of
  an event.
 
 I doubt that - if the data were distributed uniformly on [0,1/2), say, then
 the slope of the line would be 2!

 I suspect he meant probability density.

I guess that's actually correct - the slope of the pdf is zero. However, I'm
fairly certain that's not what he meant.

Glen



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on CDF

2002-02-22 Thread Henry


On Sat, 23 Feb 2002 00:27:00 +1100, Glen Barnett
[EMAIL PROTECTED] wrote:


Henry [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 On Fri, 22 Feb 2002 08:55:42 +1100, Glen Barnett
 [EMAIL PROTECTED] wrote:

 Bob [EMAIL PROTECTED] wrote in message
 [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
  A straight line CDF would imply the data is uniformly distributed,
  that is, the probability of one event is the same as the probability
  of any other event.  The slope of the line would be the probability of
  an event.
 
 I doubt that - if the data were distributed uniformly on [0,1/2), say, then
 the slope of the line would be 2!

 I suspect he meant probability density.

I guess that's actually correct - the slope of the pdf is zero. However, I'm
fairly certain that's not what he meant.

I was trying to suggest that he meant the slope of the CDF was the
height of the PDF.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on CDF

2002-02-22 Thread Glen Barnett



Henry [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 I was trying to suggest that he meant the slope of the CDF was the
 height of the PDF.

Oh, okay. Yes, that would be correct, but it shouldn't be called probability!

Glen



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on CDF

2002-02-21 Thread Glen Barnett



Bob [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 [EMAIL PROTECTED] (Linda) wrote in message
news:[EMAIL PROTECTED]...
  Hi!
 
  If I plot CDF of a sample data and this CDF looks like a straight line
  cross through 0. What does this implies?? Normally, CDF will not look
  like a straight line but sth like a S2 shape, isn't??
 
  Linda

 A straight line CDF would imply the data is uniformly distributed,
 that is, the probability of one event is the same as the probability
 of any other event.  The slope of the line would be the probability of
 an event.

I doubt that - if the data were distributed uniformly on [0,1/2), say, then
the slope of the line would be 2!

Glen



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on CDF

2002-02-21 Thread Bob


[EMAIL PROTECTED] (Linda) wrote in message 
news:[EMAIL PROTECTED]...
 Hi!
 
 If I plot CDF of a sample data and this CDF looks like a straight line
 cross through 0. What does this implies?? Normally, CDF will not look
 like a straight line but sth like a S2 shape, isn't??
 
 Linda

A straight line CDF would imply the data is uniformly distributed,
that is, the probability of one event is the same as the probability
of any other event.  The slope of the line would be the probability of
an event.

Bob


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on CDF

2002-02-21 Thread Henry


On Fri, 22 Feb 2002 08:55:42 +1100, Glen Barnett
[EMAIL PROTECTED] wrote:

Bob [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 A straight line CDF would imply the data is uniformly distributed,
 that is, the probability of one event is the same as the probability
 of any other event.  The slope of the line would be the probability of
 an event.

I doubt that - if the data were distributed uniformly on [0,1/2), say, then
the slope of the line would be 2!

I suspect he meant probability density.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-20 Thread Robert J. MacG. Dawson




Herman Rubin wrote:

 
ExpVar = -ln(UnifVar);
 
 It is not a good method in the tails, and is much too slow.

If I recall correctly, transcendental operations on a Pentium require
only a couple hundred clock cycles and can usually be optimized to take
place during other calculations; so a few million simulations per second
ought to be possible on the average domestic machine. 

-Robert Dawson


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-19 Thread Herman Rubin


In article [EMAIL PROTECTED],
Robert J. MacG. Dawson [EMAIL PROTECTED] wrote:


Linda wrote:

 I want to generate a series of random variables, X with exponential
 PDF with a given mean,MU value. However, I only want X to be in some
 specified lower and upper limit?? Say between 0 - 150 i.e. rejected
 anything outside this range Does anyone have any ideas how should I do
 that??

   For untruncated exponential RV's the negative-log method of converting
a uniform [0,1] RV is about as good as you can get:

   ExpVar = -ln(UnifVar); 

It is not a good method in the tails, and is much too slow.

 It can easily be adjusted to censor to any interval [a,b] by prescaling
onto [exp(-b),exp(-a)];

   TruncExpVar = -ln(exp(-b) + (exp(-a)-exp(-b))*UnifVar);

This is efficient but slow, and has the same inaccuracy in
the tails if b  a.  It is also unnecessarily complex;
equivalent results, are obtained by writing it as

TruncExpVar =  a - ln(exp(a-b) + (1.0-exp(a-b))*UnifVar);
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Newbie question

2002-02-18 Thread Robert J. MacG. Dawson




AP wrote:
 
 Hi all:
 
 I would appreciate your help in solving this question.
 
 calculate the standard deviation of a sample where the mean and
 standard deviation from the process are provided?
 E.g. Process mean = 150; standard deviation = 20. What is the SD for
 a sample of 25?  The answer suggested is 4.0

Right answer, wrong question...

You were, almost certainly, not asked for the standard deviation of the
sample, but for the standard deviation of the MEAN of the sample.

The thing you need to note here is that the sample is obtained through
a random process,  so that most things computed from the sample are
likewise randomized through the sampling process. 

It is often helpful to think of taking a lot of samples all of the same
size, computing the mean (or whatever) for each of them, and then
analyzing that set of numbers.  In particular, you can calculate the
standard deviation.

Probability theory tells us that in the population of ALL samples of
size N from a population with mean mu and standard deviation sigma, the
sample means will have mean mu and standard deviation sigma/sqrt(N).
Moreover, as N gets larger, the sampling distribution gets closer to a
normal distribution, which under some circumstances lets us say more
about the distribution based on mu and sigma/sqrt(N).

-Robert Dawson


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-18 Thread Robert J. MacG. Dawson




Linda wrote:
 
 I want to generate a series of random variables, X with exponential
 PDF with a given mean,MU value. However, I only want X to be in some
 specified lower and upper limit?? Say between 0 - 150 i.e. rejected
 anything outside this range Does anyone have any ideas how should I do
 that??

For untruncated exponential RV's the negative-log method of converting
a uniform [0,1] RV is about as good as you can get:

ExpVar = -ln(UnifVar); 

 It can easily be adjusted to censor to any interval [a,b] by prescaling
onto [exp(-b),exp(-a)];

TruncExpVar = -ln(exp(-b) + (exp(-a)-exp(-b))*UnifVar);


-R. Dawson


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-18 Thread Robert J. MacG. Dawson




Alan Miller wrote (six times):
 
 Linda wrote in message [EMAIL PROTECTED]...
 I want to generate a series of random variables, X with exponential
 PDF with a given mean,MU value. However, I only want X to be in some
 specified lower and upper limit?? Say between 0 - 150 i.e. rejected
 anything outside this range Does anyone have any ideas how should I do
 that??
 
 Regards,
 Linda
 
 Is MU the mean before truncation? - or afterwards?
 The ziggurat algorithm seems to be the fastest for generating
 exponentially-distributed RV's.
 You can then simply scale them, by multiplying by the mean BEFORE
 truncation, and then throw away any which exceed the upper bound.
 

Alternatively, following Herman Rubin's idea, you can post the same
message to EDSTAT-L repeatedly and let X be the delay until somebody
points this out. This should be geometrically distributed, which will
approximate the desired exponential distribution grin, duck,  run

For most purposes I do not share Herman's concern about the tails of
the distribution. If we use (say) a 64-bit integer as the basis of the
uniform distribution, granularity will only be significant for the last
few dozen values, which will turn up once every quintillion or so
runs.   Moreover, fast hardware logarithms are almost a given today.

However, his gimmick of randomizing the mantissa and characteristic
separately is a good one and well worth remembering. If I recall
correctly, math coprocessors use binary logs too, so a super-fast
algorithm for (say) a Pentium would probably tie the two approaches
together.

-Robert Dawson


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-17 Thread Linda


Thanks everyone for helping me...

Regards,
Linda


Art Kendall [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
 try this SPSS syntax.
 
 new file.
 * this program generates 200 cases
 * trims those outside the desired range
 * and takes the first 100  of the remaining.
 * change lines flagged with   .
 input program.
 loop #i = 1 to 200. /*  .
 compute mu= .005. /*  .
 compute x = rv.exp(mu).
 end case.
 end loop.
 end file.
 end input program.
 formats mu (f6.3).
 select if x gt 0 and x le 150. /*  .
 compute seqnum =$casenum.
 execute.
 select if seqnum le 100. /*  .
 execute.
 
 
 Linda wrote:
 
  I want to generate a series of random variables, X with exponential
  PDF with a given mean,MU value. However, I only want X to be in some
  specified lower and upper limit?? Say between 0 - 150 i.e. rejected
  anything outside this range Does anyone have any ideas how should I do
  that??
 
  Regards,
  Linda


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Newbie question

2002-02-17 Thread Rich Ulrich


On 15 Feb 2002 14:38:49 -0800, [EMAIL PROTECTED] (AP) wrote:

 Hi all:
 
 I would appreciate your help in solving this question.
 
 calculate the standard deviation of a sample where the mean and 
 standard deviation from the process are provided?
 E.g. Process mean = 150; standard deviation = 20. What is the SD for 
 a sample of 25?  The answer suggested is 4.0

Here is a vocabulary distinction.   Or error.
I don't know if you are repeating the problem wrong, or 
you are speaking from a tradition that I am not familiar with.

As I am familiar with it, statisticians say that 
the standard deviation  is the standard deviation of the sample.

We say that the standard deviation of the sample *mean* 
will be frequently referred to as the standard error;  and 
The SD of the mean [or the SE] equals SD/sqrt(N).

That is confusing enough.  
I hope this makes your sources clear.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Herman Rubin


In article [EMAIL PROTECTED],
Bill Rowe  [EMAIL PROTECTED] wrote:
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Linda) wrote:

I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

I am a little unclear on what you want. A random variable with an 
exponential distribution has no upper bound. Are you looking for a 
random deviate from a truncated distribution?

In any case, X = -ln(U) where U is a uniform random deviate will be 
exponentially distributed with lambda = 1. For a different lambda simply 
scale -ln(U) by a suitable constant. To have a different minimum, simply 
add whatever offset you want. To truncate the distribution, simply throw 
away values above the desired limit. Note this can be made a bit more 
computationally efficient by truncating the uniform distribution prior 
to taking the logartithm.

One can use a much faster algorithm than using a logarithm,
unless the logarithm is a fast hardware one.  Also, the logarithm
routine used gives poor accuracy in the tails, and there are 
reasons for wanting good accuracy there.  If one is going to
use a logarithm, I suggest using X = -ln(U) + K*ln(2.), where
U is uniform (.5, 1) and K is the number of 0's until a 1 in
a random bit stream.  High quality is needed in K.

Now as to how to generate the distribution wanted, the random
variable X is a linear function of an exponential truncated
to be between 0 and M.  One could take the remainder of an
exponential random variable when divided by M, or modify the
generating algorithm never to generate one that large.  If
M is small, the following is a simple methods, not necessarily
optimal.  

Let V be uniform (0, M) and T a test exponential.  Replace
T by T-V.  If this is positive, use V and a truncated exponential,
and continue.  If not, we lose both V and T.

My faster method of generating exponentials is based on this
general idea, but with the range divided.  A more detailed
preliminary description of the process is available, and a
student is working on putting it into a program library.




-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-16 Thread Art Kendall


try this SPSS syntax.

new file.
* this program generates 200 cases
* trims those outside the desired range
* and takes the first 100  of the remaining.
* change lines flagged with   .
input program.
loop #i = 1 to 200. /*  .
compute mu= .005. /*  .
compute x = rv.exp(mu).
end case.
end loop.
end file.
end input program.
formats mu (f6.3).
select if x gt 0 and x le 150. /*  .
compute seqnum =$casenum.
execute.
select if seqnum le 100. /*  .
execute.


Linda wrote:

 I want to generate a series of random variables, X with exponential
 PDF with a given mean,MU value. However, I only want X to be in some
 specified lower and upper limit?? Say between 0 - 150 i.e. rejected
 anything outside this range Does anyone have any ideas how should I do
 that??

 Regards,
 Linda



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Newbie question

2002-02-15 Thread AP


Hi all:

I would appreciate your help in solving this question.

calculate the standard deviation of a sample where the mean and 
standard deviation from the process are provided?
E.g. Process mean = 150; standard deviation = 20. What is the SD for 
a sample of 25?  The answer suggested is 4.0

TIA
/anil


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-15 Thread Bill Rowe


In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Linda) wrote:

I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

I am a little unclear on what you want. A random variable with an 
exponential distribution has no upper bound. Are you looking for a 
random deviate from a truncated distribution?

In any case, X = -ln(U) where U is a uniform random deviate will be 
exponentially distributed with lambda = 1. For a different lambda simply 
scale -ln(U) by a suitable constant. To have a different minimum, simply 
add whatever offset you want. To truncate the distribution, simply throw 
away values above the desired limit. Note this can be made a bit more 
computationally efficient by truncating the uniform distribution prior 
to taking the logartithm.

-- 
-
PGPKey fingerprint: 6DA1 E71F EDFC 7601 0201  9243 E02A C9FD EF09 EAE5


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-15 Thread Alan Miller


Linda wrote in message [EMAIL PROTECTED]...
I want to generate a series of random variables, X with exponential
PDF with a given mean,MU value. However, I only want X to be in some
specified lower and upper limit?? Say between 0 - 150 i.e. rejected
anything outside this range Does anyone have any ideas how should I do
that??

Regards,
Linda

Is MU the mean before truncation? - or afterwards?
The ziggurat algorithm seems to be the fastest for generating
exponentially-distributed RV's.
You can then simply scale them, by multiplying by the mean BEFORE
truncation, and then throw away any which exceed the upper bound.

--
Alan Miller (Honorary Research Fellow, CSIRO Mathematical
 Information Sciences)
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on random number generator

2002-02-15 Thread A Ramesh


Hi,
Define Y = X if  X=T 
   = 0   otherwise
For your problem, T=150 (threshold) and X is exponential random variable
with mean, MU.
So, first generate X and compare with T and assign a value to Y as
specified in the above rule.

Alternatively, find the CDF (distribution function) of Y from the above
rule and then use a uniform random variable in (0, 1) to generate Y
itself. 

hope this helps
regards
Ramesh

Linda wrote:
 
 I want to generate a series of random variables, X with exponential
 PDF with a given mean,MU value. However, I only want X to be in some
 specified lower and upper limit?? Say between 0 - 150 i.e. rejected
 anything outside this range Does anyone have any ideas how should I do
 that??
 
 Regards,
 Linda


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: one-way ANOVA question

2002-02-14 Thread Rich Ulrich


On 13 Feb 2002 09:48:41 -0800, [EMAIL PROTECTED] (Dennis Roberts) wrote:

 At 09:21 AM 2/13/02 -0600, Mike Granaas wrote:
 On Fri, 8 Feb 2002, Thomas Souers wrote:
  
   2) Secondly, are contrasts used primarily as planned comparisons? If 
  so, why?
  
 
 I would second those who've already indicated that planned comparisons are
 superior in answering theoretical questions and add a couple of comments:
 
 another way to think about this issue is: what IF we never had ... nor will 
 in the future ... the overall omnibus F test?
 
 would this help us or hurt us in the exploration of the 
 experimental/research questions of primary interest?

 - not having it available, even abstractly, 
would HURT, because we would be 
without that reminder of  'too many hypotheses'.

In practice, I *do*  consider the number of tests.
Just about always.

Now, I am not arguing that the particular form 
of having an ANOVA omnibus-test  is essential.
Bonferroni correction can do a lot of the same. It just
won't always be as efficient.

 i really don't see ANY case that it would hurt us ...
 and, i can't really think of cases where doing the overall F test helps us ...
 

But, Dennis, I thought you told us before, 
you don't appreciate  hypothesis testing ...
I thought you could not think of cases where doing
*any*  F-test helps us.

[ ... ]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: one-way ANOVA question

2002-02-13 Thread Mike Granaas


On Fri, 8 Feb 2002, Thomas Souers wrote:
 
 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? 
 

I would second those who've already indicated that planned comparisons are
superior in answering theoretical questions and add a couple of comments:

1) an omnibus test followed by pairwise comparisons cannot clearly answer
theoretical questions involving more than two groups.  Trend analysis is
one example where planned comparisons can give a relatively unambigious
answer (is there a linear, quadratic, etc trend?) where pairwise tests
leave the research trying to interpret the substantive meaning of a
particular pattern of pairwise differences.  

2) planned comparisons require that the researcher think through the
theoretical implications of their research efforts prior to collecting
data.  It is too common for folks to gather some data appropriate for an
ANOVA, without thinking through the theoretical implications of
their possible results, analyze it with an omnibus test (Ho: all the means
the same) and rely on post-hoc pairwise comparisons to understand the
theoretical meaning of their findings.  In a multi-group design if you
cannot think of at least one meaningful contrast code prior to collecting
the data, you haven't really thought through your research.

3) your power is better.  It is well known that when you toss multiple
potential predictors into a multiple regression equation you run the risk
of washing out the effect of a single good predictor by combining it
with one or more bad predictors.  ANOVA is a special case of multiple
regression where each df in the between subjects line represents a
predictor (contrast code).  By combining two or more contrast codes into a
single omnibus test you reduce your ability to detect meaningful
differences amongst the collection of non-differences.

Hope this helps.

Michael

***
Michael M. Granaas
Associate Professor[EMAIL PROTECTED]
Department of Psychology
University of South Dakota Phone: (605) 677-5295
Vermillion, SD  57069  FAX:   (605) 677-6604
***
All views expressed are those of the author and do not necessarily
reflect those of the University of South Dakota, or the South
Dakota Board of Regents.



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: one-way ANOVA question

2002-02-13 Thread Jerry Dallal


Thomas Souers wrote:
 
 Hello, I have two questions regarding multiple comparison tests for a one-way ANOVA 
(fixed effects model).
 
 1) Consider the Protected LSD test, where we first use the F statistic to test the 
hypothesis of equality of factor level means. Here we have a type I error rate of 
alpha. If the global F test is significant, we then perform a series of t-tests 
(pairwise comparisons of factor level means), each at a type I error rate of alpha. 
This may seem like a stupid question, but how does this test preserve a type I error 
for the entire experiment? 

As you (nearly) say, [Only i]f the global F test is significant, we
then perform a series of t-tests 

 
 2) Secondly, are contrasts used primarily as planned comparisons? If so, why?

It depends on the research question.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: one-way ANOVA question

2002-02-13 Thread Dennis Roberts


At 09:21 AM 2/13/02 -0600, Mike Granaas wrote:
On Fri, 8 Feb 2002, Thomas Souers wrote:
 
  2) Secondly, are contrasts used primarily as planned comparisons? If 
 so, why?
 

I would second those who've already indicated that planned comparisons are
superior in answering theoretical questions and add a couple of comments:

another way to think about this issue is: what IF we never had ... nor will 
in the future ... the overall omnibus F test?

would this help us or hurt us in the exploration of the 
experimental/research questions of primary interest?

i really don't see ANY case that it would hurt us ...

and, i can't really think of cases where doing the overall F test helps us ...

i think mike's point about planning comparisons making us THINK about what 
is important to explore in a given study ... is really important because, 
we have gotten lazy when it comes to this ... we take the easy way out of 
testing all possible paired comparisons when, it MIGHT be that NONE of 
these are really the crucial things to be examined




Dennis Roberts, 208 Cedar Bldg., University Park PA 16802
Emailto: [EMAIL PROTECTED]
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

one-way ANOVA question

2002-02-08 Thread Thomas Souers


Hello, I have two questions regarding multiple comparison tests for a one-way ANOVA 
(fixed effects model).

1) Consider the Protected LSD test, where we first use the F statistic to test the 
hypothesis of equality of factor level means. Here we have a type I error rate of 
alpha. If the global F test is significant, we then perform a series of t-tests 
(pairwise comparisons of factor level means), each at a type I error rate of alpha. 
This may seem like a stupid question, but how does this test preserve a type I error 
for the entire experiment? I understand that with a Bonferroni-type procedure, we can 
test each pairwise comparison at a certain rate, so that the overall type I error rate 
of the experiment will be at most a certain level. But with the Protected LSD test, I 
don't quite see how the comparisons are being protected. Could someone please explain 
to me the logic behind the LSD test?

2) Secondly, are contrasts used primarily as planned comparisons? If so, why? 

I would very much appreciate it if someone could take the time to explain this to me. 
Many thanks. 


Go Get It!
Send FREE Valentine eCards with Lycos Greetings
http://greetings.lycos.com


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: one-way ANOVA question

2002-02-08 Thread David C. Howell


You have to keep in mind that the LSD is concerned with familywise error
rate, which is the probability that you will make at least one
type I error in your set of conclusions. For the familywise error rate, 3
errors are no worse than 1.
Suppose that you have three groups. If the omnibus null is true, the
probability of erroneously rejecting the null with the overall Anova is
equal to alpha, which I'll assume you set at .05. IF you reject the null,
you have already made one type I error, so the chances of making more do
not matter to the familywise error rate. Your Type I error rate is
.05.
Now suppose that the null is false-- mu(1) = mu(2) /= mu(3). Then it is
not possible to make a Type I error in the overall F, because the omnibus
null is false. There is one chance of making a Type I error in testing
individual means, because you could erroneously declare mu(1) /= mu(2).
But since the other nulls are false, you can't make an error there. So
again, your familywise probability of a Type I error is .05.
Now assume 4 means. Here you have a problem. It is possible that mu(1) =
mu(2) /= mu(3) = mu(4). You can't make a Type I error on the omnibus
test, because that null is false. But you will be allowed to test mu(1) =
mu(2), and to test mu(3) = mu(4), and each of those is true. So you have
2 opportunities to make a Type I error, giving you a familywise rate of
2*.05 = .10.
So with 2 or 3 means, the max. familywise error rate is .05. With 4 or 5
means it is .10, with 6 or 7 means it is .15, etc.
But keep in mind that, at least in psychology, the vast majority of
experiments have no more than 5 means, and many have only 3. In that
case, the effective max error rate for the LSD is .10 or .05, depending
on the number of means. Other the other hand, if you have many means, the
situation truly gets out of hand.
Dave Howell
At 10:37 AM 2/8/2002 -0800, you wrote:
Hello, I have two questions
regarding multiple comparison tests for a one-way ANOVA (fixed effects
model).
1) Consider the Protected LSD test, where we first use the F
statistic to test the hypothesis of equality of factor level means. Here
we have a type I error rate of alpha. If the global F test is
significant, we then perform a series of t-tests (pairwise comparisons of
factor level means), each at a type I error rate of alpha. This may seem
like a stupid question, but how does this test preserve a type I error
for the entire experiment? I understand that with a Bonferroni-type
procedure, we can test each pairwise comparison at a certain rate, so
that the overall type I error rate of the experiment will be at most a
certain level. But with the Protected LSD test, I don't quite see how the
comparisons are being protected. Could someone please explain to me the
logic behind the LSD test?
2) Secondly, are contrasts used primarily as planned comparisons? If so,
why? 
I would very much appreciate it if someone could take the time to explain
this to me. Many thanks. 

Go Get It!
Send FREE Valentine eCards with Lycos Greetings
http://greetings.lycos.com

=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at

http://jse.stat.ncsu.edu/
=


**
David C. Howell
Phone:
(802) 656-2670
Dept of Psychology
Fax:
(802) 656-8783
University of Vermont
email:
[EMAIL PROTECTED]
Burlington, VT 05405 

http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html
http://www.uvm.edu/~dhowell/gradstat/index.html

Re: one-way ANOVA question

2002-02-08 Thread Dennis Roberts


At 10:37 AM 2/8/02 -0800, Thomas Souers wrote:

2) Secondly, are contrasts used primarily as planned comparisons? If so, why?

well, in the typical rather complex study ... all pairs of possible mean 
differences (as one example) are NOT equally important to the testing of 
your theory or notions

so, why not set up ahead of time ... THOSE that are (not necessarily 
restricted to pairs) you then follow ... let the other ones alone

no law says that if you had a 3 by 4 by 3 design, that the 3 * 4 * 3 = 36 
means all need pairs testing ... in fact, come combinations may not even 
make a whole lot of sense EVEN if it is easier to work them into your design


I would very much appreciate it if someone could take the time to explain 
this to me. Many thanks.


Go Get It!
Send FREE Valentine eCards with Lycos Greetings
http://greetings.lycos.com


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
=

Dennis Roberts, 208 Cedar Bldg., University Park PA 16802
Emailto: [EMAIL PROTECTED]
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: one-way ANOVA question

2002-02-08 Thread jim clark


Hi

On 8 Feb 2002, Thomas Souers wrote:

 2) Secondly, are contrasts used primarily as planned
 comparisons? If so, why?

There are a great many possible contrasts even with a relatively
small number of means.  If you examine the data and then decide
what contrasts to do, then you have in some informal sense
performed a much larger set of contrasts than you actually
formally test.  Specifying the contrasts in advance means that
you have only performed the number of statistical tests actually
calculated.

Another (related) way to think of it is that planned contrasts
take advantage of pre-existing theory and data to perform tests
that favor certain outcomes.  To do this, however, contrasts must
be specified independently of the data (i.e., planned).  Perhaps
could be thought of as some kind of quasi-bayesian thinking?  
That is, given a priori factors favoring certain outcomes, the
actual data does not need to be as strong to tilt the results in
that direction.

Best wishes
Jim


James M. Clark  (204) 786-9757
Department of Psychology(204) 774-4134 Fax
University of Winnipeg  4L05D
Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]
CANADA  http://www.uwinnipeg.ca/~clark




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Question on Poisson -- Multinomial Relationship

2002-02-04 Thread Bhaskara A.


Hi all,

 The conditional distribution of Poisson variates given their sum is
multinomial.  Does anyone know the densitity of Poisson variates, given
their partial sums S1, S2, etc. Sk, with each Si possibly overlapping with
one or more of the other sums?

 Thanks in advance.
Bhaskara





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

(Probably Simple) Chi-Square Intuition Question

2002-01-21 Thread andrew leventis


I'm looking at forced-reponse answers to a question where there a
several possible choices.

I'm trying to test the significance of the difference between the
proportion choosing answer A and the proportion choosing answer B. 
I've got the fairly-simple formula for a chi-square-distributed test
statistic.  I'm puzzled, however, by the effect of changing the number
of answer options on the chi-square critical value.

Suppose 34% of the sample always choses A and 21% always answers B (no
matter how many choices there are).  Because the test-statics
reportedly has the number of total choices (minus 1) as the degrees of
freedom, this implies that as the number of choices goes up, it's
going to be harder and harder for me to show that the A and B
proportions are statically different.

I would think that the differences between the A and B proportions
(13% in my example here), would be more impressive as you increased
the number of total options.

Please help, I'm totally stumped!  

Thanks much in advance!

andy leventis
[EMAIL PROTECTED]


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: cell-counts question

2002-01-19 Thread Jay Warner


I'm not clear on your level of understanding, so apology if I repeat ground
you already have plowed twice.

1)The symbol:6.27X10^7 means (is mathematically equal to) the
symbol:   62,700,000.  Could be the biostatistician counted one heck of a lot
of cells, or had some means to estimate the count from a smaller volume than
a standard volume used for reporting.

2)When calculating average and standard deviation, we can 'adjust' the
actual measured numbers by adding (or subtracting) a constant to each
measurement, or by multiplying (or dividing) a constant to each measurement.
Each of these possible adjustments changes the average and standard deviation
in known ways.

thus, we can divide each measurement by 10,000,000   (10^7), do the average 
stdev calculation, and then 'adjust' the result back again at the end.

the equation for the relationships is given by:

If U = a*X + bthenEq. 1

xbar(U) = a* xbar(X) +bEq. 2  3
stdev(U) = a* stdev(X) + 0(b does not change stdev)

so in your case, the report measured 6.27*10^7, etc.  they divided each
measurement by 10^7 to get 6.27, etc.  This is using Eq. 1 above, with a =
1/10^7.  Then they calculated the average and standard deviation (which is
much easier without all those 0's hanging around :) .  then they can multiply
xbar and stdev by 10^7, and report the average and stdev of the original
measurements for all to see.  This is using Eq. 2  3, only first solving for
xbar(X)  stdev(X) to get Eq. 4  5:

xbar(X) = (xbar(U) - b)/aEq. 4   (from Eq. 2)
stdev(X) = xbar(U)/aEq. 5(from Eq. 3)

1/a = 1/(1/10^7) = 10^7 in your case, so stdev(X) = stdev(U)*10^7.

Result:  easier calculation, easier visualization of the number crunching,
easier display on a graph for example, BUT:  no change in result.

Requirements:a and b must be constants.
Eq. 1 must be applied to _all_ the data used in
the calculations.

this kind of thing is often done without noticing, when we change the scale
of the measurements.  Some length measurements are written in a log book in
'mils' in the USA, where 1 mil = 0.001 inches.  the calculations are done in
mils, then reported in a report in inches.  Hence, an average of 9.4 mils
becomes an average of 0.0094 inches.

I believe European locomotive (train engine) plans are documented in mm, from
one end to the other.  But the overall length is reported to management and
the public in meters.

Does this help?

Jay

Wei Wang wrote:

 Dear Friends,

 Here is an exam question which I don't know how to do. Can anyone help me?

 The question is a biostatistician was asked to analyze some data
 regarding cell counts, and the values were reported like 6.27x10^7,
 72.5x10^7, 3.42x10^7, etc. rather than using the data exactly as
 reported, the biostatistician used the values as 6.27, 72.5, 3.42, etc.
 what effect does this have on estimation of mean and standard deviation?
 What effect does this have on hypothesis testing about the mean? Why?

 Thank you very much for your help.

 Christine,

--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

cell-counts question

2002-01-18 Thread Wei Wang


Dear Friends,

Here is an exam question which I don't know how to do. Can anyone help me?

The question is a biostatistician was asked to analyze some data
regarding cell counts, and the values were reported like 6.27x10^7,
72.5x10^7, 3.42x10^7, etc. rather than using the data exactly as
reported, the biostatistician used the values as 6.27, 72.5, 3.42, etc.
what effect does this have on estimation of mean and standard deviation?
What effect does this have on hypothesis testing about the mean? Why?

Thank you very much for your help.

Christine,





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: SAT Question Selection

2002-01-14 Thread Stan Brown


[cc'd to previous poster; please follow up in newsgroup]

L.C. [EMAIL PROTECTED] wrote in sci.stat.edu:
Back in my day (did we have days back then?) I recall
talk of test questions on the SAT. That is, these questions
were not counted; they were being tested for (I presume)
some sort of statistical validity.

Does anyone have any statistical insight into the SAT question
selection process. Does anyone have a specific lead? I can
find virtually nothing.

I remember reading a good book about the inner operation of ETS 
(administers the SATs), with some bits about the test questions 
you refer to, but I can't quite remember the title. I've searched 
the catalog of my old library, and this _may_ be it:

Lemann, Nicholas.
 The big test : the secret history of the American meritocracy
 New York : Farrar, Straus and Giroux, 1999.

-- 
Stan Brown, Oak Road Systems, Cortland County, New York, USA
  http://oakroadsystems.com/
What in heaven's name brought you to Casablanca?
My health. I came to Casablanca for the waters.
The waters? What waters? We're in the desert.
I was misinformed.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: SAT Question Selection

2002-01-14 Thread Dennis Roberts


for the SAT ... which is still paper and pencil ... you will find multiple 
sections ... math and verbal ... as far as i know ... there usually are 3 
of one and 2 of the other ... the one with 3 has A section that is called 
operational ... which does NOT count ... but is used for trialing new 
items ... revised items ... etc.

don't expect them to tell you which one that is however ...

in a sense ... they are making YOU pay for THEIR pilot work ... and, of 
course, if you happen to really get fouled up on the section that is 
operational and does not count ... it could carry over emotionally to 
another section ... and have some (maybe not much) impact on your 
motivation to do well on that next section

unless it has changed ...



At 05:19 PM 1/14/02 -0500, you wrote:
[cc'd to previous poster; please follow up in newsgroup]

L.C. [EMAIL PROTECTED] wrote in sci.stat.edu:
 Back in my day (did we have days back then?) I recall
 talk of test questions on the SAT. That is, these questions
 were not counted; they were being tested for (I presume)
 some sort of statistical validity.
 
 Does anyone have any statistical insight into the SAT question
 selection process. Does anyone have a specific lead? I can
 find virtually nothing.

I remember reading a good book about the inner operation of ETS
(administers the SATs), with some bits about the test questions
you refer to, but I can't quite remember the title. I've searched
the catalog of my old library, and this _may_ be it:

Lemann, Nicholas.
  The big test : the secret history of the American meritocracy
  New York : Farrar, Straus and Giroux, 1999.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
   http://oakroadsystems.com/
What in heaven's name brought you to Casablanca?
My health. I came to Casablanca for the waters.
The waters? What waters? We're in the desert.
I was misinformed.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

SAT Question Selection

2002-01-13 Thread L.C.


Back in my day (did we have days back then?) I recall
talk of test questions on the SAT. That is, these questions
were not counted; they were being tested for (I presume)
some sort of statistical validity.

Does anyone have any statistical insight into the SAT question
selection process. Does anyone have a specific lead? I can
find virtually nothing.

Thanks and Regards,
-Larry Curcio



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: SAT Question Selection

2002-01-13 Thread Rich Ulrich


On Sun, 13 Jan 2002 13:04:14 GMT, L.C. [EMAIL PROTECTED]
wrote:

 Back in my day (did we have days back then?) I recall
 talk of test questions on the SAT. That is, these questions
 were not counted; they were being tested for (I presume)
 some sort of statistical validity.
 
 Does anyone have any statistical insight into the SAT question
 selection process. Does anyone have a specific lead? I can
 find virtually nothing.

I believe that they have to change their questions a lot more
often than they used to, now that they occasionally reveal 
some questions and answers.

The Educational Testing Service has a web site that looks
pretty nice, in my 60-second opinion.
   http://www.ets.org/research/

They do seem to invite communication -- I suggest you e-mail,
if you don't find  what you are looking for in their 8 research areas,
or elsewhere.

It seems to me that I found a statistics journal produced by ETS  
when I was looking up references for scaling, a year or so ago.
But I don't remember that for a fact.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Sorry for question, but how is the english word for @

2002-01-09 Thread nada


at
Nathaniel [EMAIL PROTECTED] wrote in message
news:9v3d79$2rj$[EMAIL PROTECTED]...
 Hi,

 Sorry for question, but how is the english word for @
 Pleas forgive me.

 N.






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on 2-D joint distribution...

2002-01-05 Thread Glen Barnett



Chia C Chong [EMAIL PROTECTED] wrote in message
a145qk$qfq$[EMAIL PROTECTED]">news:a145qk$qfq$[EMAIL PROTECTED]...
 Hi!

 I have a series of observations of 2 random variables (say X and Y) from my
 measurement data. These 2 RVs are not independent and hence f(X,Y) ~=
 f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to
 plot the 2-D kernel density estimates of these 2 RVs and from the it looks
 like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other
 side looks like Gamma/Weibull/Exponential shape.

 My intention is to find the joint 2-D distribution of these 2 RVs so that I
 can reprenseted this by an equation (so that I could regenerate this plot
 using simulation later on). I wonder whether anyone has come across this
 kind of problem and what method that I should use??






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on 2-D joint distribution...

2002-01-05 Thread Herman Rubin


In article a145qk$qfq$[EMAIL PROTECTED],
Chia C Chong [EMAIL PROTECTED] wrote:
Hi!

I have a series of observations of 2 random variables (say X and Y) from my
measurement data. These 2 RVs are not independent and hence f(X,Y) ~=
f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to
plot the 2-D kernel density estimates of these 2 RVs and from the it looks
like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other
side looks like Gamma/Weibull/Exponential shape.

My intention is to find the joint 2-D distribution of these 2 RVs so that I
can reprenseted this by an equation (so that I could regenerate this plot
using simulation later on). I wonder whether anyone has come across this
kind of problem and what method that I should use??

There is, in the collection by Johnson and Kotz (and others
for some of the volumes), a listing of classical bivariate
distributions.  It is hard enough to estimate one-dimensional
distributions; it gets worse as the dimension increases.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Question on 2-D joint distribution...

2002-01-04 Thread Chia C Chong


Hi!

I have a series of observations of 2 random variables (say X and Y) from my
measurement data. These 2 RVs are not independent and hence f(X,Y) ~=
f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to
plot the 2-D kernel density estimates of these 2 RVs and from the it looks
like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other
side looks like Gamma/Weibull/Exponential shape.

My intention is to find the joint 2-D distribution of these 2 RVs so that I
can reprenseted this by an equation (so that I could regenerate this plot
using simulation later on). I wonder whether anyone has come across this
kind of problem and what method that I should use??

Thanks...

Regards,
CCC






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Question on 2-D joint distribution...

2002-01-04 Thread MathCraft Consulting


Chia C Chong [EMAIL PROTECTED] wrote in message
a145qk$qfq$[EMAIL PROTECTED]">news:a145qk$qfq$[EMAIL PROTECTED]...
 Hi!

 I have a series of observations of 2 random variables (say X and Y)
from my
 measurement data. These 2 RVs are not independent and hence f(X,Y) ~=
 f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried
to
 plot the 2-D kernel density estimates of these 2 RVs and from the it
looks
 like Laplacian/Gaussian/Generalised Gaussian shape in one side and the
other
 side looks like Gamma/Weibull/Exponential shape.

 My intention is to find the joint 2-D distribution of these 2 RVs so
that I
 can reprenseted this by an equation (so that I could regenerate this
plot
 using simulation later on). I wonder whether anyone has come across
this
 kind of problem and what method that I should use??

 Thanks...

 Regards,
 CCC


In plotting the distributions of these two RVs, were you looking at the
MARGINAL distributions?  If so, it might be more useful to look at a
range of CONDITIONAL distributions for each variable, since it is the
conditional distributions that you ultimately need to define in order to
arrive at a joint distribution. One variable's conditional distribution
could conceivably change substantially over the range of the other
variable's values.

By looking at how each variable's conditional pdf shape changes at
different values of the other variable, you may be able to select a
distributional form (Weibull, Gamma, etc.) that is able to represent the
varying shape of one variable's pdf by a change of parameter values.
Whichever variable has a conditional pdf form that seems best suited to
representation by a known distributional form (with varying parameters),
is the one you can choose as the dependent variable.

For example, let's say that, in looking at the conditional distributions
for each variable, you decide that the pdf for one of the variables can
be represented pretty well by a Gamma distribution, with parameters b
and c.  Let Y be the variable whose pdf can be represented by the Gamma
distribution, and call the other variable X.  Then f(Y) = Gamma[Y,b,c],
where Gamma[Y,b,c] denotes the Gamma probability density as a function
of Y, with parameters b and c.  By changing b and c, you are able to
obtain the different shapes that f(Y) assumes over the range of values
of X.

Thus, you can fit a different Gamma distribution for Y, AT EVERY VALUE
OF X.  This will give you a set of b and c parameter values for each X.
If you plot the different b and c values as functions of X, you can get
some idea of what the functional form of the dependence might be.  For
the sake of simplicity, let's say that it turns out to be linear for
both b and c.  Then...

  Gamma parameter b = P0 + P1*X
  Gamma parameter c = Q0 + Q1*X

You can now do regressions to determine the coefficients.  Of course,
the functional form will probably NOT be linear.  And the functional
form may also not be the same for both parameters.

With the parameters expressed as a function of X, you can write...

  f(X,Y) = Gamma[Y,b(X),c(X)].

And this is, in fact, the joint distribution you are looking for!

WARNING! You will need a LOT of data.  You first need to determine a
conditional distribution for Y, at every value of X, which is one set of
regressions (but, hopefully, you have software that will do the
distribution fits automatically for you).  Then you have to do another
regression for each distribution parameter.  And you will probably need
fairly good fits to do a reasonable job of reproducing the overall joint
pdf.

The difficult part of this will probably be trying to find a single
distributional form (Weibull, or Gamma, or whatever) that can represent
all of the conditional pdf shapes for one of the variables.  Of course,
if you can't, then you could define several intervals for one of the
variables, and apply a different distributional form for each interval.
But things can get very messy very quickly!

This is probably not the only way to approach the problem, but I hope
this helps.

--
T. Arthur Wheeler
MathCraft Consulting
Columbus, OH 43017







=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Re: Measure of Association Question.

2002-01-02 Thread John Uebersax


[EMAIL PROTECTED] (Petrus Nel) wrote in message 
news:000201c18fe2$f73aeee0$ed9e22c4@oemcomputer...

 I require some advice regarding the following: One set of variables is 
 the grades obtained by students for different high school subjects (i.e. 
 the symbols candidates obtained such as A, B, C, D, etc. for each 
 subject). The other set of variables are the scores obtained for a 
 college level subject (i.e. no symbols, just their percentages 
 ... 
 The grades obtained for their high school subjects were coded on the 
 questionnaire as follows - 1=A, 2=B, 3=C, 4=D, 5=E, 6=F.  
 ...
 How do I proceed?

Simpler answer:

First, change the coding to 1=F, 2=E, 3=D, 4=C, 5=B, 6=A.   In the US
at least
there is no 'E'; if so, the correct coding would be 1=F, 2=D, 3=C,
4=B, 5=A.

If the latter coding is used, calculate the Spearman rank correlation
between the grade in a given high school course and the college score.

If the former coding is used, you can use either the Pearson
correlation or the Spearman rank correlation; the Pearson correlation
would probably be better.

More complex answer:

The approach above ignores the fact that within each letter grade
there is variation--e.g., all students who get a 'B' are not at the
same level.  Further, there is censoring at the upper end and lower
ends of the scale--e.g., no matter how well a person does, the highest
grade they can get is an 'A'.

The polyserial correlation can account for this.  The polyserial
correlation estimates what the correlation of grade and score would be
if grades were measured on a continuous scale.  An assumption is that
there is a bivariate normal distribution between (1) the continuous
latent variable of which grade is a manifest representation and (2)
the percentage score.

The polyserial correlation is related to the polychoric correlation. 
For information about the polychoric correlation, see:

http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm

Drasgow F. Polychoric and polyserial correlations. In Kotz L, Johnson
NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74).
New York: Wiley, 1988.

I don't know if SPSS will calculate the polyserial correlation--the
last I
heard it did not.  If not, the polyserial correlation can be
calculated with the program PRELIS, which is distributed with LISREL. 
Many universities have copies of LISREL/PRELIS.

If you are interested in comparing to see which high school classes
best predict college scores, then, as a practical matter, I would
expect you would draw the same conclusions regardless of whether you
used the Pearson, the Spearman, or the polyserial correlation
coefficients.

Good luck!


John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych
Diet  Fitness:http://members.aol.com/WeightControl101



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=

Measure of Association Question.

2001-12-28 Thread Petrus Nel





Dear members,

I require some advice 
regarding the following: One set of variables is the grades obtained by students 
for different highschool subjects (i.e. the symbols candidates obtained 
such as A, B, C, D, etc. for each subject). The other set of variables are the 
scores obtained for a college level subject (i.e. no symbols, just their 
percentages obtained). I want to determine the correlation between their grades 
for different high school subjects (A, B, C, D, etc.) and their percentage 
scores for a college level subject.

The grades obtained for 
their high school subjects were coded on the questionnaire as follows - 1=A, 
2=B, 3=C, 4=D, 5=E, 6=F. I`ve entered the data for the grades as 1,2,3, 
etc. to indicate the grade (category) and the percentages (as theother 
variable)into SPSS. How do I proceed?

Any comments are 
welcome.

Regards,

Petrus 
Nel

Re: Maximum Likelihood Question

2001-12-23 Thread Jimc10


To all, 

Thanks so much for all your ideas and insights thus far. To those who have
suggested a Baysean approach, I am interested, but I am weeks away from
understanding it well enough to figure out if I can use it. Also, I think I am
close to developing a usable technique along my current line. The only
constrain on my parameters is that they remain positive. Occassionally one will
approach zero, not often. I am reposting because I have another focused
question stemming from the same problem.

MY SITUATION:

I am studying a time-dependent stochastic Markov process. The conventional
method involved fitting data to exponential decay equations and using the
F-test to determine the number of components required. The problem (as I am
sure you all see) is that the F-test assumes the data is iid, and conflicting
results are often observed.   As a first step, I have been attempting to fit
similar (simulated) data directly to Markov models using the Q-matrix and
maximum likelihood methods. The likelihood function is:

L= (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E))

Where  | CV-Matrix |  is the determinant of the Covariance matrix, (O) is the
vector of observed values in time order and (E) is the vector of the values
predicted by the Markov model for the corresponding times. The Covariance
matrix is generated by the Markov model.

My two objectives are to determine the number of free parameters, and to
estimate the values of the parameters. Because the data is simulated I know
what the number of parameters and their values are. 

MY PROBLEM:

I have been using the Log(Likelihood) method to compare the results of fitting
to the correct model and to a simpler sub-hypothesis (H0). I am getting very
small Log(Likelihood ratio)?¡¥s  when I know the more complex model is correct
(i.e. H0 should be rejected). When I first observed this I tried increasing the
N values, and found a decrease rather than an
increase in the Log(Likelihood ratio).  When I look at the likelihood function,
the weighted Sum of Squares factor : (  (O-E).CV^-1.(O-E)  ) is very different
between the two hypotheses (i.e. favoring rejection of H0), but difference in
the determinant portion ( (1/Sqrt( | CV-Matrix |))  ) is in the opposite
direction. As a result, the Log(Likelihood ratio) is below that needed to
reject H0. 


I asked about just fitting  (O-E).CV^-1.(O-E) and was reminded that without the
determinant factor, the likelihood would be maximized by simply increasing the
variance. This appears to be true in practice. 

In learning about the quadratic form, I read in several places that, for the
distribution to approach a chi square distribution, the Covariance Matrix must
be idempotent (CV^2 = CV). I am almost certain this is not the case. 

I am hoping to get feedback on this idea:

THE QUESTION: Following maximization of the full likelihood function ( 
(1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E))  ) for both
models, can I use the F-test to compare the weighted Sum of Squares (i.e.
(O-E).CV^-1.(O-E)   )  of the two models, rather than the likelihood ratio
test. In other words, does correcting each (O-E) for its variance and
covariance legitimize the F-test?

Any insight is greatly appreciated. Thanks for your patience and consideration.

James Celentano





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Maximum Likelihood Question

2001-12-20 Thread David Jones



Herman Rubin [EMAIL PROTECTED] wrote in message
9vqoln$[EMAIL PROTECTED]">news:9vqoln$[EMAIL PROTECTED]...

 Maximum likelihood is ASYMPTOTICALLY optimal in LARGE
 samples.  It may not be good for small samples; it pays
 to look at how the actual likelihood function behaves.
 The fit is always going to improve with more parameters.


This may be the trouble in the actual problem being attempted, but
there are other possibilities, besides the potential for having
programmed things incorrectly. One such trouble might be that the
parameters are constrained and that the maximum-likelihood estimates
given such constraints are falling on the edge of the allowed region
.. then the usual asymptotics don't apply.

David Jones




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Maximum Likelihood Question

2001-12-19 Thread Herman Rubin


In article [EMAIL PROTECTED],
Jimc10 [EMAIL PROTECTED] wrote:
To all who have helped me on the previous thread thank you very much. I am
reposting this beause the question has become more focused.

I am studying a stochastic Markov process and using a maximum likelihood
technique to fit observed data to theoretical models. As a first step I am
using a Monte Carlo technique to generate simulated data from a known model to
see if my fitting method is acurate. In particular I want to know if I can use
this techniques to dtermine the number of free parameters in the Markov Model.
I have been using the Log(Likelihood) method which seems to be widely acceted. 
I am  getting very small Log(Likelihood ratios) in cases when I know the more
complex model is correct (i.e. H0 should be rejected). When I first observed
this I tried increasing the N values, and found a decrease rather than an
increase in the Log(Likelihood ratio). I now  think I know why. I am posting in
hopes of finding out if my proposed solution is  1)statistical heracy, 2)so
obvious that I should have realized it 6 months ago, or 3)a plausible idea in
need of validation.

The likelihood fuction I have been using up to now which I will call the FULL
likelihood function is:

L= (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E))

Where  | CV-Matrix |  is the determinant of the Covariance matrix, (O) is the
vector of observed  values in time order and (E) is the vector of the values
predicted by the Markov model for the corresponding times. The Covariance
matrix is generated by the Markov model.

IN A NUTSHELL: It appears that the factor  (1/Sqrt( | CV-Matrix |)) is the
source of the problem. In many MLE discriptions this is a constant and drops
out. In my case there is a big difference between the (1/Sqrt( | CV-Matrix |))
for different models (several log units). I believe this may be biasing the fit
in some way. 

MY PROPOSAL: I have begun fitting my data to the follwing simplified likelihood
formula:

L= exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)).

Does this seem reasonable?

It is highly unlikely that it would give asymptotically
optimal estimators, although there are cases where this
does happen.  It can happen that it will be consistent
and have positive efficiency, for example if the parameter
effect on E is such that L would be O(n) for any wrong
parameter, and O(1) for the true parameter, all this in
probability, and the covariance matrix does not blow up
in too bad a manner.

If the major problem is with the fit of the covariance
matrix, it will not be good, and if E does not involve
some of the parameters, but the covariance matrix can
go to infinity on those, by doing that, L can go to 0,
which would maximize it as it is negative.  As you say
the covariance matrix varies considerably, I would 
suggest including it.

Maximum likelihood is ASYMPTOTICALLY optimal in LARGE
samples.  It may not be good for small samples; it pays
to look at how the actual likelihood function behaves.
The fit is always going to improve with more parameters.

I believe your best bet would be robust approximate
Bayesian analysis.  This is hard to describe in a 
newsgroup posting, and in any case requires some user
input.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Maximum Likelihood Question

2001-12-16 Thread Jimc10


To all who have helped me on the previous thread thank you very much. I am
reposting this beause the question has become more focused.

I am studying a stochastic Markov process and using a maximum likelihood
technique to fit observed data to theoretical models. As a first step I am
using a Monte Carlo technique to generate simulated data from a known model to
see if my fitting method is acurate. In particular I want to know if I can use
this techniques to dtermine the number of free parameters in the Markov Model.
I have been using the Log(Likelihood) method which seems to be widely acceted. 
I am  getting very small Log(Likelihood ratios) in cases when I know the more
complex model is correct (i.e. H0 should be rejected). When I first observed
this I tried increasing the N values, and found a decrease rather than an
increase in the Log(Likelihood ratio). I now  think I know why. I am posting in
hopes of finding out if my proposed solution is  1)statistical heracy, 2)so
obvious that I should have realized it 6 months ago, or 3)a plausible idea in
need of validation.

The likelihood fuction I have been using up to now which I will call the FULL
likelihood function is:

L= (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E))

Where  | CV-Matrix |  is the determinant of the Covariance matrix, (O) is the
vector of observed  values in time order and (E) is the vector of the values
predicted by the Markov model for the corresponding times. The Covariance
matrix is generated by the Markov model.

IN A NUTSHELL: It appears that the factor  (1/Sqrt( | CV-Matrix |)) is the
source of the problem. In many MLE discriptions this is a constant and drops
out. In my case there is a big difference between the (1/Sqrt( | CV-Matrix |))
for different models (several log units). I believe this may be biasing the fit
in some way. 

MY PROPOSAL: I have begun fitting my data to the follwing simplified likelihood
formula:

L= exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)).

Does this seem reasonable?

Thanks for any insight

James Celentano


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sorry for question, but how is the english word for @

2001-12-12 Thread Nathaniel



U¿ytkownik Nathaniel [EMAIL PROTECTED] napisa³ w wiadomo¶ci
news:9v3d79$2rj$[EMAIL PROTECTED]...
 Hi,

 Sorry for question, but how is the english word for @
 Pleas forgive me.

 N.

Thank everyone for valuable information.

Nathaniel






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sorry for question, but how is the english word for @

2001-12-11 Thread Robert J. MacG. Dawson




Nathaniel wrote:
 
 Hi,
 
 Sorry for question, but how is the english word for @
 Pleas forgive me.

You're forgiven...grin

The New Hacker's Dictionary gives:

common: at sign; at; strudel 
rare (and often facetious):  vortex, whorl, whirlpool , cyclone, snail,
ape, cat, rose, cabbage.

Official ANSI name: commercial at 

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sorry for question, but how is the english word for @

2001-12-11 Thread Nathaniel


Thank everyone for valuable information.

Nathaniel


Uzytkownik Art Kendall [EMAIL PROTECTED] napisal w wiadomosci
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 atusually indicate  some kind of rate or unit price 10 pounds @ $1
 per pound

 on the net is is used as a separator between the id of an individual and
 his/her location

 [EMAIL PROTECTED] id spoken as john dot smith at harvard dot e d u.

 until the early-80's or so dot was spoken as point as in filname point
 ext (extension indicating type).  Sometimes addresses were  given as
 john.smith at harvard.edu
 Nathaniel wrote:

  Hi,
 
  Sorry for question, but how is the english word for @
  Pleas forgive me.
 
  N.





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sorry for question, but how is the english word for @

2001-12-11 Thread Socspace

Nathaniel:

The symbol @ belongs to the cateqory of special characters in English. 
Although it is often rendered as "commercial at" in a technical context, in 
the vernacular (and on the net) it is most often rendered as simply"at."

I can't help but advise that, since English is clearly your second
language, you would do very well to utterly ignore the, er, uh,
erudite message from Dr. Kendall dated 12/10/2001 It could do damage
to your vocabulary. 

The kindest thing that can be said about said message, is that it must have 
been very hastily written. ('Twas most certainly very carelessly written)

For example, Dr. Kendall's second line reads:

on the net is is used as a separator between the id of an individual and
his/her location

Apart from the fact that the first word should have been capitalized (a very 
minor matter), the sentence 'would have been much better written:

"On the net it is used as a separator between the screen name and the
domain name in an e-mail address."

I realize that you might well need definitions for the technical terms
"screen name" and "domain name." They can be found in the Webopedia:
Online Computer Dictionary for Internet Terms and Technical Support. at 
@ http://www.webopedia.com/ -- a truely excellent online reference work.

By the way, I can't help chuckling a bit at Dr. Kendall's use of "id" as an
abbreviation for "identification" If you will check with Merriam-Webster 
OnLine @ http://www.m-w.com/netdict.htm you will find that correct 
abbreviation (acronym or initialism) is "ID."

Meanwhile, "id" is a psychoanalytical term that has something to do with
the psyche. I could go on, but 'nuff said [enough said] for present 
purposes.

Respectfully:

Harley Upchurch

Re: Sorry for question, but how is the english word for @

2001-12-10 Thread Richard Wright


The name given to the symbol @ in international standard character
sets is 'commercial at'.

See

http://www.quinion.com/words/articles/whereat.htm

for a history of the symbol.

Richard Wright



On Mon, 10 Dec 2001 23:34:19 +0100, Nathaniel [EMAIL PROTECTED]
wrote:

Hi,

Sorry for question, but how is the english word for @
Pleas forgive me.

N.





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sorry for question, but how is the english word for @

2001-12-10 Thread Art Kendall


atusually indicate  some kind of rate or unit price 10 pounds @ $1
per pound

on the net is is used as a separator between the id of an individual and
his/her location

[EMAIL PROTECTED] id spoken as john dot smith at harvard dot e d u.

until the early-80's or so dot was spoken as point as in filname point
ext (extension indicating type).  Sometimes addresses were  given as
john.smith at harvard.edu
Nathaniel wrote:

 Hi,

 Sorry for question, but how is the english word for @
 Pleas forgive me.

 N.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

RE: Question about concatenating probability distributions

2001-12-10 Thread David Heiser



RE: The Poisson process and Lognormal action time.

This kind of problem arises a lot in the actuarial literature (a
process for the number of claims and a process for the claim size),
and the Poisson and the lognormal have been used in this context - it
might be worth your while to look there for results.

Glen
...
This is a very general and important event process. It is also used to
describe
the general failure-repair process that occurs at any repair shop. The
Poisson
is a good approximation of the arrival times of equipment to be repaired,
and
the log-normal is a good approximation of the time it takes to repair it.

From an operations standpoint, the downtime is approximated by the
exponential
distribution (occurrence) and a log-normal repair time, which includes
diagnosis,
replacement and validation.

In the Air Force (1982-1995) where the reliability and maintainability of
equipment has to be
characterized, the means are determined and used in a form called
availability.
We never got beyond the use of availability. They never got into the
distribution and confidence interval aspects.

As a general approximation, the log-normal distribution approximates human
reaction times to events.

 DAHeiser



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question about concatenating probability distributions

2001-12-09 Thread Glen


Jacek Gomoluch [EMAIL PROTECTED] wrote in message 
news:9uqkmv$954$[EMAIL PROTECTED]...
 In a stochastic process the number of customers which are arriving at a
 server (during a time intervall) is desribed by a Poisson distribution:
 
 P(n)=exp(-v) * (v^n)/(n!)
 
 Each arriving customer has a task to be carried out of which the size (in
 units) is described by a lognormal distribution:
 
 f(u)= exp(-(ln u)^2 / (2*a^2)) /  (u*a*SQRT(2*PI))
 
 Question: What is the total number of units (i.e.  size of all tasks)
 requested during the time intervall ?
 
 I wonder how these distributions can be concatenated, and if there is a
 formula for this.

If the count variable and the size variable are independent,
calculation of the mean and variance of the total is straightforward.

This kind of problem arises a lot in the actuarial literature (a
process for the number of claims and a process for the claim size),
and the Poisson and the lognormal have been used in this context - it
might be worth your while to look there for results.

Glen


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Question about concatenating probability distributions

2001-12-07 Thread Jacek Gomoluch


In a stochastic process the number of customers which are arriving at a
server (during a time intervall) is desribed by a Poisson distribution:

P(n)=exp(-v) * (v^n)/(n!)

Each arriving customer has a task to be carried out of which the size (in
units) is described by a lognormal distribution:

f(u)= exp(-(ln u)^2 / (2*a^2)) /  (u*a*SQRT(2*PI))

Question: What is the total number of units (i.e.  size of all tasks)
requested during the time intervall ?

I wonder how these distributions can be concatenated, and if there is a
formula for this.

Thanks for any help!

Jacek Gomoluch







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question about concatenating probability distributions

2001-12-07 Thread Peter Rabinovitch


If the poisson arrival process and the work process are independent,
then have a look at Wald's law in (almost) any probability book. For
example, the mean amount of work is then simply the product of the means
of each RV, in your case:

E(amount of work in a fixed time interval)=v*E(U) where U is your
lognormal RV.


Jacek Gomoluch wrote:
 
 In a stochastic process the number of customers which are arriving at a
 server (during a time intervall) is desribed by a Poisson distribution:
 
 P(n)=exp(-v) * (v^n)/(n!)
 
 Each arriving customer has a task to be carried out of which the size (in
 units) is described by a lognormal distribution:
 
 f(u)= exp(-(ln u)^2 / (2*a^2)) /  (u*a*SQRT(2*PI))
 
 Question: What is the total number of units (i.e.  size of all tasks)
 requested during the time intervall ?
 
 I wonder how these distributions can be concatenated, and if there is a
 formula for this.
 
 Thanks for any help!
 
 Jacek Gomoluch

-- 
Peter Rabinovitch


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Stat question

2001-12-06 Thread Dennis Roberts


the reality of this is ... sometimes getting notes from other students is 
helpful ... sometimes it is not ... there is no generalization one can make 
about this

most student who NEED notes are not likely to ask people other than their 
friends ... and, in doing so, probably know which of their friends they 
have the best chance of getting good notes from ... (at least READABLE!) 
...even lazy students are not likely to ask for notes from people that even 
THEY know are not going to be able to do them any good

but i don't think we can say anything really systematic about this activity 
other than, sometimes it helps ... sometimes it does not help

At 06:24 PM 12/5/01 -0800, Glen wrote:
Jon Miller [EMAIL PROTECTED] wrote in message 
  You can ask the top students to look at their notes, but you should be 
 prepared
  to find that their notes are highly idiosyncratic.  Maybe even unusable.

Having seen notes of some top students on a variety of occasions
(as a student and as a lecturer), that certainly does happen
sometimes. But just about as likely is to find a set of notes that
are actually better than the lecturer would prepare themselves.

Glen


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Stat question

2001-12-05 Thread Jon Miller


Stan Brown wrote:

 Jon Miller [EMAIL PROTECTED] wrote in sci.stat.edu:
 
 Stan Brown wrote:
 
  I would respectfully suggest that the OP _first_ carefully study the
  textbook sections that correspond to the missed lectures, get notes from
  a classmate
 
 This part is of doubtful usefulness.

 Doubtful? It is of doubtful usefulness to get notes from a classmate and
 study the covered section of the textbook? Huh?

Sorry, bad editing on my part.

Getting notes from a classmate is of doubtful usefulness.  Plenty of anecdotes
on request.

If Cathy Cheng is in your class, you can just photocopy her notes and use them
as a textbook.  But most students?  Why would you care what someone who is
struggling to pass thinks the prof might have said?

You can ask the top students to look at their notes, but you should be prepared
to find that their notes are highly idiosyncratic.  Maybe even unusable.

Jon Miller



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: simple Splus question - plot regression function

2001-12-04 Thread Anon.


Alexander Sirotkin wrote:
 
 Hi.
 
 After fitting a linear regression model I need to do an extremely
 simple thing - plot the regression function along with the original
 data. Splus has a simple way to plot quite a few complex plots and
 a very complicated way to do this simple one !
 
 Is there a simple way to plot the regression function and the data ?

abline!
e.g.

reg1 - lm(y~x)
plot(x,y)
abline(reg1)

I can do naught more than suggest reading Venables  Ripley's Modern
Applied Statistics with S-plus.  And seeing as this is going to and
Aussie NG, I suspect by doing this I'll warm the cockles of the heart of
at least one of the authors.

Bob

-- 
Bob O'Hara
Metapopulation Research Group
Division of Population Biology
Department of Ecology and Systematics
PO Box 17 (Arkadiankatu 7)
FIN-00014 University of Helsinki
Finland

NOTE: NEW TELEPHONE NUMBER
tel: +358 9 191 28779  fax: +358 9 191 28701
email: [EMAIL PROTECTED]
To induce catatonia, visit:
http://www.helsinki.fi/science/metapop/

It is being said of a certain poet, that though he tortures the English
language, he has still never yet succeeded in forcing it to reveal his
meaning
- Beachcomber


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: probability question

2001-12-04 Thread Franck Corset


Il s'agit d'un message multivolet au format MIME.
--982FBF2E2FA5C1B960626D56
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hi,
This assertion is true
Franck

Matt Dobrin a écrit :

 Does P(A*B|C)=P(A|C)*P(B|A*C)?  If not, what does it equal?  Thanks in
 advance.
 -Matt

--
_/_/_/_/_/ _/_/_/_/_/
   _/ _/   Franck Corset
  _/ _/   Projet IS2
 _/ _/   Inria Rhone-Alpes
_/_/_/_/_/ _/   ZIRST, 655, avenue de l'Europe
   _/ _/   Montbonnot
  _/ _/   38334 Saint Ismier cedex
 _/ _/  FRANCE
_/ _/_/_/_/_/

http://www.inrialpes.fr/is2


--982FBF2E2FA5C1B960626D56
Content-Type: text/x-vcard; charset=us-ascii;
 name=Franck.Corset.vcf
Content-Transfer-Encoding: 7bit
Content-Description: Carte pour Franck Corset
Content-Disposition: attachment;
 filename=Franck.Corset.vcf

begin:vcard 
n:Corset;Franck
tel;cell:0610487239
tel;home:0476700659
tel;work:0476615355
x-mozilla-html:FALSE
org:Inria;Isère
adr:;;18, rue Nicolas Chorier;Grenoble;;38000;France
version:2.1
email;internet:[EMAIL PROTECTED]
title:Doctorant
fn:Franck Corset
end:vcard

--982FBF2E2FA5C1B960626D56--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: probability question

2001-12-04 Thread Franck Corset


Il s'agit d'un message multivolet au format MIME.
--F4F503AE2A9C358CB6A37D62
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hi,
This assertion is true
Franck

Matt Dobrin a écrit :

 Does P(A*B|C)=P(A|C)*P(B|A*C)?  If not, what does it equal?  Thanks in
 advance.
 -Matt

--
_/_/_/_/_/ _/_/_/_/_/
   _/ _/   Franck Corset
  _/ _/   Projet IS2
 _/ _/   Inria Rhone-Alpes
_/_/_/_/_/ _/   ZIRST, 655, avenue de l'Europe
   _/ _/   Montbonnot
  _/ _/   38334 Saint Ismier cedex
 _/ _/  FRANCE
_/ _/_/_/_/_/

http://www.inrialpes.fr/is2


--F4F503AE2A9C358CB6A37D62
Content-Type: text/x-vcard; charset=us-ascii;
 name=Franck.Corset.vcf
Content-Transfer-Encoding: 7bit
Content-Description: Carte pour Franck Corset
Content-Disposition: attachment;
 filename=Franck.Corset.vcf

begin:vcard 
n:Corset;Franck
tel;cell:0610487239
tel;home:0476700659
tel;work:0476615355
x-mozilla-html:FALSE
org:Inria;Isère
adr:;;18, rue Nicolas Chorier;Grenoble;;38000;France
version:2.1
email;internet:[EMAIL PROTECTED]
title:Doctorant
fn:Franck Corset
end:vcard

--F4F503AE2A9C358CB6A37D62--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: probability question

2001-12-04 Thread Nathaniel


It's true. If you are concerned with proof, following the this belove

according to conditional probability p(a|b)=p(a,b)/p(b)

(1) P(A,B|C)=P(A,B,C)/P(C)
(2) P(A,B,C)=P(A,C)*P(B|A,C)
(3) P(A,C)=P(C)*P(A|C)

WITH (2) AND (3) WE GET

(4) P(A,B,C)=P(C)*P(A|C)*P(B|A,C)

TAKING (1) AND (4) WE GET P(A*B|C)=P(A|C)*P(B|A*C)?

Hope this help
Nathaniel

U¿ytkownik Matt Dobrin [EMAIL PROTECTED] napisa³ w wiadomo¶ci
9uh8ge$5hv$[EMAIL PROTECTED]">news:9uh8ge$5hv$[EMAIL PROTECTED]...
 Does P(A*B|C)=P(A|C)*P(B|A*C)?  If not, what does it equal?  Thanks in
 advance.
 -Matt






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

simple Splus question - plot regression function

2001-12-03 Thread Alexander Sirotkin


Hi.

After fitting a linear regression model I need to do an extremely 
simple thing - plot the regression function along with the original 
data. Splus has a simple way to plot quite a few complex plots and 
a very complicated way to do this simple one !

Is there a simple way to plot the regression function and the data ?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

probability question

2001-12-03 Thread Matt Dobrin


Does P(A*B|C)=P(A|C)*P(B|A*C)?  If not, what does it equal?  Thanks in
advance.
-Matt




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Stat question

2001-12-01 Thread Stan Brown


Elliot Cramer [EMAIL PROTECTED] wrote in sci.stat.edu:
Sima [EMAIL PROTECTED] wrote:
: I have missed some lectures on statistics due to heavy illness
: and now i got an assignment which i cannot solve.

We all feel sorry for you Sima, but perhaps you should talk to your
instructor about it.  He undoubtedly has office hours.

While that's the conventional advice, speaking as an instructor I do 
get tired of students who miss class for whatever reason, don't 
crack the textbook, and expect me to give them a private lesson that 
duplicates what was done in class. I don't know what if anything the 
OP has done about making up the missed material.

I would respectfully suggest that the OP _first_ carefully study the 
textbook sections that correspond to the missed lectures, get notes 
from a classmate, and _then_ contact the instructor to fill in any 
remaining gaps or answer any questions.

-- 
Stan Brown, Oak Road Systems, Cortland County, New York, USA
  http://oakroadsystems.com
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Stat question

2001-12-01 Thread Jon Miller


Stan Brown wrote:

 I would respectfully suggest that the OP _first_ carefully study the
 textbook sections that correspond to the missed lectures, get notes from
 a classmate

This part is of doubtful usefulness.

 , and _then_ contact the instructor to fill in any remaining gaps or
 answer any questions.

Jon Miller



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Stat question

2001-12-01 Thread dennis roberts


At 06:13 PM 12/1/01 -0500, Stan Brown wrote:
Jon Miller [EMAIL PROTECTED] wrote in sci.stat.edu:
 
 Stan Brown wrote:
 
  I would respectfully suggest that the OP _first_ carefully study the
  textbook sections that correspond to the missed lectures, get notes from
  a classmate
 
 This part is of doubtful usefulness.

Doubtful? It is of doubtful usefulness to get notes from a
classmate and study the covered section of the textbook? Huh?

perhaps doubtful IF the students OP asked to look at were terrible students 
who took terrible notes ... and/or ... OP when reading the text could not 
make anything of it ...

but, those are two big ifs

usually, students won't ask to see the notes of students whom they know are 
not too swift ... and, also ... usually students who read the book do get 
something out of it ... maybe not enough

the issue here is ... it appeared (though we have no proof of this) that 
the original poster did little, if anything, on his/her own ... prior to 
posting a HELP to the list

stan seemed to be reacting to that assumption and, i don't blame him


--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
   http://oakroadsystems.com/
My theory was a perfectly good one. The facts were misleading.
-- /The Lady Vanishes/ (1938)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Stat question

2001-11-30 Thread Elliot Cramer


Sima [EMAIL PROTECTED] wrote:
: Dear List Members,

: I have missed some lectures on statistics due to heavy illness
: and now i got an assignment which i cannot solve.

We all feel sorry for you Sima, but perhaps you should talk to your
instructor about it.  He undoubtedly has office hours.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Optimal filtering question

2001-11-29 Thread Alex Zhu


Hi All, 

Suppose we have a stochastic processes
with an unknown parameter (the parameter
is used in a general sense, it may a stochastic
mean of the process, then it's current value is also
a parameter). 
We observe the dynamics of this process 
and update our estimate of this parameter. 

It may be the case that our estimate of
this parameter will always be imprecise
in the sense that the variance of the estmator
is greater than zero and does not converge
to zero (like in the case of learning
about a stochastic mean)

However, it seems that if we start from
different priors about this parameter, 
then the estimates x1(t) and x2(t) 
obtained with priors x1(0) and x2(0) respectively
always converge at infinity as time t goes
to infinity. 

Is it always true? 
If yes, is there a theorem stating this?
If not, is there a counterexample?

Many thanks 

Alex


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Stat question

2001-11-25 Thread Sima


Dear List Members,

I have missed some lectures on statistics due to heavy illness
and now i got an assignment which i cannot solve.
Please help me.
Below is assignment text:

Question 3
Manufacturers of Xeno fuel additive claim that their product increases fuel
efficiency by over 10%. A consumer representative organisation decides to
check this claim.
(a) Assuming that the consumer organisation is able to obtain 20 identical
cars for use in the experiment, draw a diagram outlining an appropriate
design for the experiment. What is this type of design called?
(b) Assuming that the consumer organisation is able to borrow 10 cars
of type A and 10 cars of type B for use in the experiment,
draw a diagram outlining an appropriate design for the experiment.
What is this type of design called?

===

Thank you very much for your help,

Sincerely,

Sima.

25 Nov, 2001





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Question on Gaussian distribution

2001-09-27 Thread Law Hiu Chung


[ This is a repost of the following article:   ]
[ From: Law Hiu Chung [EMAIL PROTECTED]   ]
[ Subject: Question on Gaussian distribution   ]
[ Newsgroups: sci.stat.math]
[ Message-ID: 9ond41$sk6$[EMAIL PROTECTED]   ]


We define a function f(x) as a Gaussian process if

for any n, and for any x1, ... xn, f(x1), f(x2), ... f(xn) 

follows a Gaussian distribution. 


Can I interpret this definition intuitively as

Given a f(x) in a set X of functions (satisfying some conditions), 
the projection of f(x) to a finite set of basis 
{ delta(x1), delta(x2), ... delta(xn) }
must be Gaussian irrespect of the number of xi's and their values.
Then f(x) follows a Gaussian distribution.


(The above is meaningless without defining the inner product, but I would
 like to know if my intuition is correct or not.)


Can I generalize the above to:

Given an inner product space X (with possibly infinite dimension), I can 
define a Gaussian distribution (or other appropriate term)
on X such that

For x \in X, if we project it to a finite set of orthonormal vectors
( phi_1, phi_2, ..., phi_n) and get the projection (a1, a2, ... an),
the tuple follows an n-dimensional Gaussian distribution.
This should hold for all values of n and all set of orthonormal vectors.


Is this definition legal?

I guess X being an inner product space may not be enough. If that is 
the case, what other conditions are needed?


If this looks like a text book question to you, can you point me to
some good introductory books on this topic? I have tried to read some books 
on Gaussian measures, but 
they are too technical for me -- an engineering person without a strong 
background in measure theory. 


Thank you for your help.


-- 
Martin Law
Computer Science Department
Hong Kong University of Science and Technology

-- 
Martin Law
Computer Science Department
Hong Kong University of Science and Technology


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Simple Median Question

2001-09-24 Thread Edwina Chappell


 I have a question about averaging medians.  My dataset consists of median values 
for a variable of interest. To find the average, do I average the medians and get a 
mean median, or do I find the median of the median values?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Simple Median Question

2001-09-24 Thread Dennis Roberts


At 12:01 PM 9/24/01 -0500, you wrote:
  I have a question about averaging medians.  My dataset consists of 
 median values for a variable of interest. To find the average, do I 
 average the medians and get a mean median, or do I find the median of the 
 median values?


since we don't know how many of these medians you have ... or anything 
about the shapes of the distributions on which you have (only) median 
values ... we don't know if it really makes much of or any difference BUT, 
to be consistent ... if you have collected medians ... ie, Q2 values ... 
then, it makes most consistent sense (to me anyway) if you need an average 
of these ... to take the median of these ...

by the way ... why would you have the medians of this variable ... and not 
the means? was there some important reason?






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

non stat question: existance of educational programming list?

2001-09-19 Thread EAKIN MARK E


Besides teaching statistics, I have been teaching programming recently.
I know there exists a Visual Basic list but does anyone know of a list
similar to this one but for teaching programming?


Mark Eakin  
Associate Professor
Information Systems and Management Sciences Department
University of Texas at Arlington
[EMAIL PROTECTED] or
[EMAIL PROTECTED]



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: question re: problem

2001-09-18 Thread Anon.


@Home wrote:
 
 I had the following to solve:
 
 51% of all domestic cars being shipped have power windows. If a lot contains
 five such cars:
 
 a. what is probability that only one has power windows?
 b. what is probability that at least one has power windows?
 
 I solved each of these problems in two ways, one using std probability
 theory and one by using a binomial distribution. I seemingly had no problem
 w/part b., but in part a. the probability theory did not seem to produce the
 correct answer. I have listed these below. What is wrong w/the probability
 equation listed below?  Also is my answer to part b. correct?
 
   a. Randomly Draw Five Samples (Cars)
 
   Independent EventsOnly 1 w/Power Windows
 
   P{Only 1 Power} = P (Power) x P (NotPower)  x P (NotPower) x P
 (NotPower) x P (NotPower)
  0.51 0.49 0.49 0.49 0.49 =
 
What you've got here is the probability that the first car has Power,
but the rest do not.  You also need the probability that the second,
third, fourth or fifth is the one with the Power.

Bob

-- 
Bob O'Hara
Metapopulation Research Group
Division of Population Biology
Department of Ecology and Systematics
PO Box 17 (Arkadiankatu 7)
FIN-00014 University of Helsinki
Finland

NOTE: NEW TELEPHONE NUMBER
tel: +358 9 191 28779  fax: +358 9 191 28701
email: [EMAIL PROTECTED]
To induce catatonia, visit:
http://www.helsinki.fi/science/metapop/

It is being said of a certain poet, that though he tortures the English
language, he has still never yet succeeded in forcing it to reveal his
meaning
- Beachcomber


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: question re: problem

2001-09-18 Thread @Home


Thanks alot - it worked. How would you compose a short formula depicting:

  P {Only 1} =  [P (Power) x P (NotPower)  x P (NotPower) x P (NotPower)
x P (NotPower)] +
  [P (NotPower) x P (Power)  x P (NotPower) x P (NotPower) x P
(NotPower)] +
  [P (NotPower) x P (NotPower)  x P (Power) x P (NotPower) x P
(NotPower)] +

  [P (NotPower) x P (NotPower)  x P (NotPower) x P (Power) x P
(NotPower)]+
  [P (NotPower) x P (NotPower)  x P (NotPower) x P (NotPower) x P
(Power)]


Anon. [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 @Home wrote:
 
  I had the following to solve:
 
  51% of all domestic cars being shipped have power windows. If a lot
contains
  five such cars:
 
  a. what is probability that only one has power windows?
  b. what is probability that at least one has power windows?
 
  I solved each of these problems in two ways, one using std probability
  theory and one by using a binomial distribution. I seemingly had no
problem
  w/part b., but in part a. the probability theory did not seem to produce
the
  correct answer. I have listed these below. What is wrong w/the
probability
  equation listed below?  Also is my answer to part b. correct?
 
a. Randomly Draw Five Samples (Cars)
 
Independent EventsOnly 1 w/Power Windows
 
P{Only 1 Power} = P (Power) x P (NotPower)  x P (NotPower) x P
  (NotPower) x P (NotPower)
   0.51 0.49 0.49 0.49 0.49 =
 
 What you've got here is the probability that the first car has Power,
 but the rest do not.  You also need the probability that the second,
 third, fourth or fifth is the one with the Power.

 Bob

 --
 Bob O'Hara
 Metapopulation Research Group
 Division of Population Biology
 Department of Ecology and Systematics
 PO Box 17 (Arkadiankatu 7)
 FIN-00014 University of Helsinki
 Finland

 NOTE: NEW TELEPHONE NUMBER
 tel: +358 9 191 28779  fax: +358 9 191 28701
 email: [EMAIL PROTECTED]
 To induce catatonia, visit:
 http://www.helsinki.fi/science/metapop/

 It is being said of a certain poet, that though he tortures the English
 language, he has still never yet succeeded in forcing it to reveal his
 meaning
 - Beachcomber




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: question re: problem

2001-09-18 Thread Arto Huttunen


Your probability distribution is binomial
p = 0.51q = 0.49

In five trials, the distribution is  ( p + q ) ^ 5

= p^5 + 5 p^4q + 10 p^3q^2 + 10 p^2q^3 + 5 pq^4 + q^5

So the probability for one power and four not is 5 pq^4  and
for at least one  is 1 - q^5

Arto Huttunen



Anon. [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 @Home wrote:
 
  I had the following to solve:
 
  51% of all domestic cars being shipped have power windows. If a lot
contains
  five such cars:
 
  a. what is probability that only one has power windows?
  b. what is probability that at least one has power windows?
 
  I solved each of these problems in two ways, one using std probability
  theory and one by using a binomial distribution. I seemingly had no
problem
  w/part b., but in part a. the probability theory did not seem to produce
the
  correct answer. I have listed these below. What is wrong w/the
probability
  equation listed below?  Also is my answer to part b. correct?
 
a. Randomly Draw Five Samples (Cars)
 
Independent EventsOnly 1 w/Power Windows
 
P{Only 1 Power} = P (Power) x P (NotPower)  x P (NotPower) x P
  (NotPower) x P (NotPower)
   0.51 0.49 0.49 0.49 0.49 =
 
 What you've got here is the probability that the first car has Power,
 but the rest do not.  You also need the probability that the second,
 third, fourth or fifth is the one with the Power.

 Bob

 --
 Bob O'Hara
 Metapopulation Research Group
 Division of Population Biology
 Department of Ecology and Systematics
 PO Box 17 (Arkadiankatu 7)
 FIN-00014 University of Helsinki
 Finland

 NOTE: NEW TELEPHONE NUMBER
 tel: +358 9 191 28779  fax: +358 9 191 28701
 email: [EMAIL PROTECTED]
 To induce catatonia, visit:
 http://www.helsinki.fi/science/metapop/

 It is being said of a certain poet, that though he tortures the English
 language, he has still never yet succeeded in forcing it to reveal his
 meaning
 - Beachcomber




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: question re: problem

2001-09-18 Thread Anon.


@Home wrote:
 
 Thanks alot - it worked. How would you compose a short formula depicting:
 
   P {Only 1} =  [P (Power) x P (NotPower)  x P (NotPower) x P (NotPower)
 x P (NotPower)] +
   [P (NotPower) x P (Power)  x P (NotPower) x P (NotPower) x P
 (NotPower)] +
   [P (NotPower) x P (NotPower)  x P (Power) x P (NotPower) x P
 (NotPower)] +
 
   [P (NotPower) x P (NotPower)  x P (NotPower) x P (Power) x P
 (NotPower)]+
   [P (NotPower) x P (NotPower)  x P (NotPower) x P (NotPower) x P
 (Power)]
 
Have a look at Arto's reply, and simple stuff on permutations and
combinations (it's the combinations bit that's relevant).

I assumethat this is homework, so your course notes should help.  Or an
elementary textbook on  probability and statistics should derive the
binomial distribution for you.  But it looks like you've got the basic
idea.

Bob

-- 
Bob O'Hara
Metapopulation Research Group
Division of Population Biology
Department of Ecology and Systematics
PO Box 17 (Arkadiankatu 7)
FIN-00014 University of Helsinki
Finland

NOTE: NEW TELEPHONE NUMBER
tel: +358 9 191 28779  fax: +358 9 191 28701
email: [EMAIL PROTECTED]
To induce catatonia, visit:
http://www.helsinki.fi/science/metapop/

It is being said of a certain poet, that though he tortures the English
language, he has still never yet succeeded in forcing it to reveal his
meaning
- Beachcomber


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: question re: problem

2001-09-18 Thread Jay Warner


(sending to all - @Home is a non-functioning address) - Jay
@Home wrote:

 I had the following to solve:

 51% of all domestic cars being shipped have power windows. If a lot contains
 five such cars:

 a. what is probability that only one has power windows?
 b. what is probability that at least one has power windows?

 I solved each of these problems in two ways, one using std probability
 theory and one by using a binomial distribution. I seemingly had no problem
 w/part b., but in part a. the probability theory did not seem to produce the
 correct answer. I have listed these below. What is wrong w/the probability
 equation listed below?  Also is my answer to part b. correct?

   a. Randomly Draw Five Samples (Cars)

   Independent EventsOnly 1 w/Power Windows

   P{Only 1 Power} = P (Power) x P (NotPower)  x P (NotPower) x P
 (NotPower) x P (NotPower)
  0.51 0.49 0.49 0.49 0.49 =

Don't forget, you listed only 1 way to get 1 PW (power window) and 4 not.
There are 5 wys you could get this result, if you don't count the order (which
the question doesn't include).  C(5,1) = 5!/(4!*1!) = 5.

So:  0.51*0.49*0.49 * 0.49 * 0.49 * 5 =  = 0.14700

   Also Solve Using BINOMDIST Function in Excel
   n 5
   ? 0.51 Success - PW
   x 1
   p(x) 0.14700

   b. At least  1 w/Power Windows
   P {At Least 1} = 1 - P {0}

   P {0} =  P (NotPower) x P (NotPower)  x P (NotPower) x P (NotPower) x
 P (NotPower)
  0.49 0.49 0.49 0.49 0.49

  Prob 0 0.02825

  1 - 0.02825
  At least 1 0.97175

In this one, all the outcomes are alike, so there is no combination effect.

   Also Solve Using BINOMDIST Function in Excel  ~ 97%
   n 5
   ? 0.49 Success - No Power
   x 0
   p(x) 0.02825

   1 - 0.028247525
  97%

So you got it!  Or nearly so.

Cheers,
Jay

--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

question re: problem

2001-09-17 Thread @Home


I had the following to solve:

51% of all domestic cars being shipped have power windows. If a lot contains
five such cars:

a. what is probability that only one has power windows?
b. what is probability that at least one has power windows?

I solved each of these problems in two ways, one using std probability
theory and one by using a binomial distribution. I seemingly had no problem
w/part b., but in part a. the probability theory did not seem to produce the
correct answer. I have listed these below. What is wrong w/the probability
equation listed below?  Also is my answer to part b. correct?

  a. Randomly Draw Five Samples (Cars)

  Independent EventsOnly 1 w/Power Windows

  P{Only 1 Power} = P (Power) x P (NotPower)  x P (NotPower) x P
(NotPower) x P (NotPower)
 0.51 0.49 0.49 0.49 0.49 =

  Also Solve Using BINOMDIST Function in Excel
  n 5
  ? 0.51 Success - PW
  x 1
  p(x) 0.14700


  b. At least  1 w/Power Windows
  P {At Least 1} = 1 - P {0}

  P {0} =  P (NotPower) x P (NotPower)  x P (NotPower) x P (NotPower) x
P (NotPower)
 0.49 0.49 0.49 0.49 0.49

 Prob 0 0.02825

 1 - 0.02825
 At least 1 0.97175
  Also Solve Using BINOMDIST Function in Excel  ~ 97%
  n 5
  ? 0.49 Success - No Power
  x 0
  p(x) 0.02825

  1 - 0.028247525
 97%





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Minitab question

2001-08-19 Thread Deth


Hi,

Does anyone know how to run banner points in Minitab? I have a survey, and
would like to cross-tabulate it based on responses to certain questions on
the survey.

Thanks,
Erik





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: canonical correlation question

2001-08-17 Thread Elliot Cramer


Gardburyb [EMAIL PROTECTED] wrote:
: Hi all,

: I'm new to the group. I'm doing my dissertation, and I am doing a canonical
: correlation analysis. My question is, what is the best way to compare canonical

The test of parallelism in mancova is an equivalent test



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: canonical correlation question

2001-08-17 Thread Paige Miller


Elliot Cramer wrote:
 
 Gardburyb [EMAIL PROTECTED] wrote:
 : Hi all,
 
 : I'm new to the group. I'm doing my dissertation, and I am doing a canonical
 : correlation analysis. My question is, what is the best way to compare canonical
 
 The test of parallelism in mancova is an equivalent test

I'd like to ask a follow-up question then. MANCOVA uses least squares as
its objective function to estimate relationships, while canonical
correlation uses a different objective function. They don't seem
equivalent to me, so my question is: is there some math that I'm not
aware of that shows these two are equivalent? If so, could you provide a
reference?

-- 
Paige Miller
Eastman Kodak Company
[EMAIL PROTECTED]

It's nothing until I call it! -- Bill Klem, NL Umpire
When you get the choice to sit it out or dance,
   I hope you dance -- Lee Ann Womack


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

1 2 3 >

1 - 100 of 291 matches

Mail list logo