Re: Applied analysis question
On 28 Feb 2002 07:37:16 -0800, [EMAIL PROTECTED] (Brad Anderson) wrote: Rich Ulrich [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... On 27 Feb 2002 11:59:53 -0800, [EMAIL PROTECTED] (Brad Anderson) wrote: BA I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, which is the modal category. The mean is about 60 and the median is 3; the distribution is highly skewed, extremely kurtotic, etc. Obviously, none of the power transformations are especially useful. The product [ snip, my own earlier comments ] BA I should have been more precise. It's technically a count variable representing the number of times respondents report using dirty needles/syringes after someone else had used them during the past 90 days. Subjects were first asked to report the number of days they had injected drugs, then the average number of times they injected on injection days, and finally, on how many of those total times they had used dirty needles/syringes. All of the subjects are injection drug users, but not all use dirty needles. The reliability of reports near 0 is likely much better than the reliability of estimates near 750. Indeed, substantively, the difference between a 0 and 1 is much more significant than the difference between a 749 and a 750--0 represents no risk, 1 represents at least some risk, and high values--regardless of the precision, represent high risk. Okay, here is a break for some comment by me. There are two immediate aims of analyses: to show that results are extreme enough that they don't happen by chance - statistical testing; and to characterize the results so that people can understand them - estimation. When the mean is 60 and the median is 3, giving report on averages, as if they were reports on central tendencies, is not going to help much with either aim. If you want to look at outcomes, you make groups (as you did) that seem somewhat homogeneous. 0 (if it is). 1. 2-3 eventually, your top group of 90+, which comes out to 'daily', seems reasonable as a top-end. Using groups ought to give you a robust test, whatever you are testing, unless those distinctions between 10 and 500 needle-sticks become important. Using groups also lets you inspect, in particular, the means for 0, 1, 2 and 3. I started thinking that the dimension is something like 'promiscuous use of dirty needles'; and I realized that an analogy to risky sex was not far wrong. Or, at any rate, doesn't seem far wrong to me. But your measure (the one that you mention, anyway) does not distinguish between 1 act each with 100 risky partners, and 100 acts with one. Anyway, one way to describe the groups is to have some experts place the reports of behaviors into 'risk-groups'. Or assign the risk scores. Assuming that those scores do describe your sample, without great non-normality, you should be able to use averages of risk-scores for a technical level of testing and reporting, and convert them back to the verbal anchor-descriptions in order to explain what they mean. [ ...Q about zero; kurtosis.] RU Categorizing the values into a few categories labeled, none, almost none, is one way to convert your scores. If those labels do make sense. Makes sense at the low end 0 risk. And at the high end I used 90+ representing using a dirty needle/syringe once a day or more often. The 2 middle categories were pretty arbitrary. [ snip, other procedures ] One of the other posters asked about the appropriate error term--I guess that lies at the heart of my inquiry. I have no idea what the appropriate error term would be, and to best model such data. I often deal with similar response variables that have distributions in which observations are clustered at 1 or both ends of the continuum. In most cases, these distributions are not even approximately unimodal and a bit skewed--variables for which normalizing power transformations make sense. Additionally, these typically aren't outcomes that could be thought of as being generated by a gaussian process. Can you describe them usefully? What is the shape of the behaviors that you observe or expect, corresponding to the drop-off of density near either extreme? In some cases I think it makes sense to consider poisson and generalizations of poisson processes although there is clearly much greater between subject heterogeneity than assumed by a poisson process. I estimated poission and negative binomial regression models--there was compelling evidence that the poission was overdispersed. I also used a Vuong statistic to compare NB regression [ snip, more detail ] I think a lot of folks just run standard analyses or arbitrarily apply some normalizing transformation because that's whats done in their field. Then report the results without really examining the underlying
Re: Applied analysis question
Rolf Dalin [EMAIL PROTECTED] wrote: Brad Anderson wrote: I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, What if you treated the information collected by that variable as really two variables, one categorical variable indicating zero or non-zero value. Then the remaining numerical variable could only be analyzed conditionally on the category was non-zero. In many cases when you collect data on consumers consumption of some commodity, you would end up in a big number of them not using the product at all, while those who used the product would consume different amounts. IIRC, your example is exactly the sort of situation for which Tobit modelling was invented. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
[EMAIL PROTECTED] (Eric Bohlman) wrote in message news:a5o5b1$fi0$[EMAIL PROTECTED]... Rolf Dalin [EMAIL PROTECTED] wrote: IIRC, your example is exactly the sort of situation for which Tobit modelling was invented. Considered that (actually estimated a couple of Tobit models and if I use a log transformed or box-cox transformed response the results are consistent with the ordinal logit I originally described) but Tobt assumes a normally distributed censored response -- the observed distribution for the non-zero responses is not approximately normal (even with transformations) and I don't think it's reasonable to assume the errors are generated by an underlying gaussian process. My understanding of the Tobit model is that it's not especially robust to violations of the this assumption. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
Rich Ulrich [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... On 27 Feb 2002 11:59:53 -0800, [EMAIL PROTECTED] (Brad Anderson) wrote: I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, which is the modal category. The mean is about 60 and the median is 3; the distribution is highly skewed, extremely kurtotic, etc. Obviously, none of the power transformations are especially useful. The product I guess it is 'continuous' except for having 26 ties at 0. I have to wonder how that set of scores arose, and also, what should a person guess about the *error* associated with those: Are the numbers near 750 measured with as much accuracy as the numbers near 3? I should have been more precise. It's technically a count variable representing the number of times respondents report using dirty needles/syringes after someone else had used them during the past 90 days. Subjects were first asked to report the number of days they had injected drugs, then the average number of times they injected on injection days, and finally, on how many of those total times they had used dirty needles/syringes. All of the subjects are injection drug users, but not all use dirty needles. The reliability of reports near 0 is likely much better than the reliability of estimates near 750. Indeed, substantively, the difference between a 0 and 1 is much more significant than the difference between a 749 and a 750--0 represents no risk, 1 represents at least some risk, and high values--regardless of the precision, represent high risk. How do zero scores arise? Is this truncation; the limit of practical measurement; or just what? Zero scores are logical and represent no risk, negative values are not logical. Extremely kurtotic, you say. That huge lump at 0 and skew is not consistent with what I think of as kurtosis, but I guess I have not paid attention to kurtosis at all, once I know that skewness is extraordinary. True, the kurtosis statistic exceeded 11, and and a plot against the normal indicates a huge lump in the low end of the tail, and also a larger proportion of very high values than expected. Categorizing the values into a few categories labeled, none, almost none, is one way to convert your scores. If those labels do make sense. Makes sense at the low end 0 risk. And at the high end I used 90+ representing using a dirty needle/syringe once a day or more often. The 2 middle categories were pretty arbitrary. If I analyze a contingency Table using the 4-category response and a 3-category measure of the primary covariate (categories defined using clinically meaningful categories, the association is quite strong and I used the exact p-value associated with the CMH difference in row means test (using SAS) and the association is signficant. I also used the 3-category predictor and the procedures outlined by Stokes et al. (2000) to estimate a rank analysis of covariance--again with consistent results. I've also run a few other analyses I didn't describe. I used the Box-Cox procedure to find a power transformation. Although the skewness statistic then looks great, the distribution is still not approximately normal. However, a regression using the transformed variable is consistent with the ordered logit and the contingency table analysis. One of the other posters asked about the appropriate error term--I guess that lies at the heart of my inquiry. I have no idea what the appropriate error term would be, and to best model such data. I often deal with similar response variables that have distributions in which observations are clustered at 1 or both ends of the continuum. In most cases, these distributions are not even approximately unimodal and a bit skewed--variables for which normalizing power transformations make sense. Additionally, these typically aren't outcomes that could be thought of as being generated by a gaussian process. In some cases I think it makes sense to consider poisson and generalizations of poisson processes although there is clearly much greater between subject heterogeneity than assumed by a poisson process. I estimated poission and negative binomial regression models--there was compelling evidence that the poission was overdispersed. I also used a Vuong statistic to compare NB regression with zero-inflated NB regression--the results support the zero-inflated model. The model standard errors for a zero-inflated model are wildly different than the Huber-White sandwich robust standard errors. The later give results that are fairly consistent with the ordered logit, the model based standard errors are huge--given that these are asymptotic statistics and I have a relatively small sample, I don't really trust either. I think a lot of folks just run standard analyses or arbitrarily apply some normalizing transformation because that's whats done in
Re: Applied analysis question
At 07:37 AM 2/28/02 -0800, Brad Anderson wrote: I think a lot of folks just run standard analyses or arbitrarily apply some normalizing transformation because that's whats done in their field. Then report the results without really examining the underlying distributions. I'm curious how folks procede when they encounter very goofy distrubions. Thanks for your comments. i think the lesson to be gained from this is that, we seem to be focusing on (or the message that students and others get) getting the analysis DONE and summarizied ... and with most standard packages ... that is relatively easy to do for example, you talk about a simple regression analysis and then show them in minitab that you can do that like: mtb regr 'height' 1 'weight' and, when they do it, lots of output comes out BUT, the first thing is the best fitting straight line equation like: The regression equation is Weight = - 205 + 5.09 Height and THAT's where they start AND stop (more or less) while software makes it rather easy to do lots of prelim inspection of data, it also makes it very easy to SKIP all that too before we do any serious analysis ... we need to LOOK at the data ... carefully ... make some scatterplots (to check for outliers, etc.), to look at some frequency distributions ON the variables, to even just look at the means and sds ... to see if some serious restriction of range issue pops up ... THEN and ONLY then, after we get a feel for what we have ... THEN and ONLY then should we be doing the main part of our analysis ... ie, testing some hypothesis or notion WITH the data (actually, i might call the prelims the MAIN part but, others might disagree) we put the cart before the horse ... in fact, we don't even pay any attention to the horse unfortunately, far too much of this is caused by the dominant and preoccupation of doing significance tests so we run routines that give us these p values and are done with it ... without paying ANY attention to just looking at the data my 2 cents worth = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 Emailto: [EMAIL PROTECTED] WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
On 27 Feb 2002 14:14:44 -0800, [EMAIL PROTECTED] (Dennis Roberts) wrote: At 04:11 PM 2/27/02 -0500, Rich Ulrich wrote: Categorizing the values into a few categories labeled, none, almost none, is one way to convert your scores. If those labels do make sense. well, if 750 has the same numerical sort of meaning as 0 (unit wise) ... in terms of what is being measured then i would personally not think so SINCE, the categories above 0 will encompass very wide ranges of possible values [ ... ] Frankly, the question is about meaning of numbers, and I would to ask it. I don't expect a bunch of zeros, with 3 as median, and values up to 750. Numbers like that *might* reflect, say, the amount of gold detected in some assays. Then, you want to know the handful of locations with numbers near 750. If any of the numbers at all are big enough to be interesting. Data like those are *not* apt to be congenial for taking means. And if 750 is meaningful, using ranks is apt to be nonsensical, too. In this example, the median was 3. Does *that* represent a useful interval from 0? - if so, *that* tells me scaling or scoring is probably not well-chosen. Is there a large range of 'meaning' between 0 and non-zero? Is there a range of meaning concealed within zero? Zero children as outcome of a marriage can reflect (a) a question being asked too early; (b) unfortunate happenstance; or (c) personal choice - categories, within 0, and none of them are necessarily a good 'interval' from the 1, 2, 3... answers. But that (further) depends on what questions are being asked. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Applied analysis question
I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, which is the modal category. The mean is about 60 and the median is 3; the distribution is highly skewed, extremely kurtotic, etc. Obviously, none of the power transformations are especially useful. The product moment correlation between the response and the primary covariate is near zero, however, a rank-order correlation coefficient is about .3 and is signficant. We have 5 additional control variables. I'm convinced that any attempt to model the conditional mean response is completely inappropriate, yet all of the alternatives appear flawed as well. Here's what I've done: I've collapsed the outcome into 3- and 4- category ordered response variables and estimated ordered logit models. I dichotomized the response (any vs none) and estimated binomial logit. All of these approaches yield substantively consistent results using both the model based standard errors and the Huber-White sandwich robust standard errors. My concerns about this approach are 1) the somewhat arbitrary classification restricts the observed variability, and 2) the estimators assume large sample sizes. I rank transformed the response variable and estimated a robust regression (using the rreg procedure in Stata)--results were consistent with those obtained for the ordered and binomial logit models described above. I know that Stokes, Davis, and Koch have presented procedures to estimate analysis of covariance on ranks, but I've not seen reference to the use of rank transformed response variables in a regression context. A plot of the rank-transformed response with the primary covariate clearly suggests a meaningful pattern. Contingency table analysis with a collapsed covariate strongly suggest a meaningful pattern. But I'm at something of a loss to know the best way to analyze and report the results. Thanks in advance. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
At 04:11 PM 2/27/02 -0500, Rich Ulrich wrote: Categorizing the values into a few categories labeled, none, almost none, is one way to convert your scores. If those labels do make sense. well, if 750 has the same numerical sort of meaning as 0 (unit wise) ... in terms of what is being measured then i would personally not think so SINCE, the categories above 0 will encompass very wide ranges of possible values if the scale was # of emails you look at in a day ... and 1/3 said none or 0 ... we could rename the scale 0 = not any, 1 to 50 as = some, and 51 to 750 as = many (and recode as 1, 2, and 3) .. i don't think anyone who just saw the labels ... and were then asked to give some extemporaneous 'values' for each of the categories ... would have any clue what to put in for the some and many categories ... but i would predict they would seriously UNderestimate the values compared to the ACTUAL responses this just highlights that for some scales, we have almost no differentiation at one end where they pile up ... perhaps (not saying one could have in this case) we could have anticipated this ahead of time and put scale categories that might have anticipated that after the fact, we are more or less dead ducks i would say this though ... treating the data only in terms of ranks ... does not really solve anything ... and clearly represents being able to say LESS about your data or interrelationships (even if the rank order r is .3 compared to the regular pearson of about 0) ... than if you did not resort to only thinking about the data in rank terms -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 Emailto: [EMAIL PROTECTED] WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
Brad Anderson wrote: I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, What if you treated the information collected by that variable as really two variables, one categorical variable indicating zero or non-zero value. Then the remaining numerical variable could only be analyzed conditionally on the category was non-zero. In many cases when you collect data on consumers consumption of some commodity, you would end up in a big number of them not using the product at all, while those who used the product would consume different amounts. Rolf Dalin ** Rolf Dalin Department of Information Tchnology and Media Mid Sweden University S-870 51 SUNDSVALL Sweden Phone: 060 148690, international: +46 60 148690 Fax: 060 148970, international: +46 60 148970 Mobile: 0705 947896, intnational: +46 70 5947896 mailto:[EMAIL PROTECTED] http://www.itk.mh.se/~roldal/ ** = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on Conditional PDF
Chia C Chong [EMAIL PROTECTED] wrote in message a5d38d$63e$[EMAIL PROTECTED]">news:a5d38d$63e$[EMAIL PROTECTED]... Glen [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... Do you want to make any assumptions about the form of the conditional, or the joint, or any of the marginals? Well, the X Y are dependent and hence there are being descibed by a joint PDF. This much is clear. I am not sure what other assumption I can make though.. I merely though you may have domain specific knowledge of the variables and their likely relationships which might inform the choice a bit (cut down the space of possibilities). Can you at least indicate whether any of them are restricted to be positive? Glen = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on Conditional PDF
Glen Barnett [EMAIL PROTECTED] wrote in message a5dev7$8jn$[EMAIL PROTECTED]">news:a5dev7$8jn$[EMAIL PROTECTED]... Chia C Chong [EMAIL PROTECTED] wrote in message a5d38d$63e$[EMAIL PROTECTED]">news:a5d38d$63e$[EMAIL PROTECTED]... Glen [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... Do you want to make any assumptions about the form of the conditional, or the joint, or any of the marginals? Well, the X Y are dependent and hence there are being descibed by a joint PDF. This much is clear. I am not sure what other assumption I can make though.. I merely though you may have domain specific knowledge of the variables and their likely relationships which might inform the choice a bit (cut down the space of possibilities). Can you at least indicate whether any of them are restricted to be positive? All values of X and Z are positive while Y can have both positive and negative values. In fact, X has the range span from 0 to 250 (time) and Y has values that span from -60 to +60 (angle) and Z has some positive values. Note that, the joint PDF of X Y was defined as f(X,Y)=f(Y|X)f(X) in which f(Y|X) is a conditional Gaussian PDF and f(X) is an exponential PDF. The plot of the 3rd variable, Z (Power) i.e. Z vs X and Z vs.Y, respectively shows that Z has some kind of dependency on X and Y, hence, my original post was asking the possible method of finding the conditional PDF of Z on both X and Y. I hope this makes things a little bit clearer or more complicated??? Thanks.. CCC Glen = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on CDF
Henry [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... On Fri, 22 Feb 2002 08:55:42 +1100, Glen Barnett [EMAIL PROTECTED] wrote: Bob [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... A straight line CDF would imply the data is uniformly distributed, that is, the probability of one event is the same as the probability of any other event. The slope of the line would be the probability of an event. I doubt that - if the data were distributed uniformly on [0,1/2), say, then the slope of the line would be 2! I suspect he meant probability density. I guess that's actually correct - the slope of the pdf is zero. However, I'm fairly certain that's not what he meant. Glen = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on CDF
On Sat, 23 Feb 2002 00:27:00 +1100, Glen Barnett [EMAIL PROTECTED] wrote: Henry [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... On Fri, 22 Feb 2002 08:55:42 +1100, Glen Barnett [EMAIL PROTECTED] wrote: Bob [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... A straight line CDF would imply the data is uniformly distributed, that is, the probability of one event is the same as the probability of any other event. The slope of the line would be the probability of an event. I doubt that - if the data were distributed uniformly on [0,1/2), say, then the slope of the line would be 2! I suspect he meant probability density. I guess that's actually correct - the slope of the pdf is zero. However, I'm fairly certain that's not what he meant. I was trying to suggest that he meant the slope of the CDF was the height of the PDF. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on CDF
Henry [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... I was trying to suggest that he meant the slope of the CDF was the height of the PDF. Oh, okay. Yes, that would be correct, but it shouldn't be called probability! Glen = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on CDF
Bob [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... [EMAIL PROTECTED] (Linda) wrote in message news:[EMAIL PROTECTED]... Hi! If I plot CDF of a sample data and this CDF looks like a straight line cross through 0. What does this implies?? Normally, CDF will not look like a straight line but sth like a S2 shape, isn't?? Linda A straight line CDF would imply the data is uniformly distributed, that is, the probability of one event is the same as the probability of any other event. The slope of the line would be the probability of an event. I doubt that - if the data were distributed uniformly on [0,1/2), say, then the slope of the line would be 2! Glen = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on CDF
[EMAIL PROTECTED] (Linda) wrote in message news:[EMAIL PROTECTED]... Hi! If I plot CDF of a sample data and this CDF looks like a straight line cross through 0. What does this implies?? Normally, CDF will not look like a straight line but sth like a S2 shape, isn't?? Linda A straight line CDF would imply the data is uniformly distributed, that is, the probability of one event is the same as the probability of any other event. The slope of the line would be the probability of an event. Bob = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on CDF
On Fri, 22 Feb 2002 08:55:42 +1100, Glen Barnett [EMAIL PROTECTED] wrote: Bob [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... A straight line CDF would imply the data is uniformly distributed, that is, the probability of one event is the same as the probability of any other event. The slope of the line would be the probability of an event. I doubt that - if the data were distributed uniformly on [0,1/2), say, then the slope of the line would be 2! I suspect he meant probability density. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Herman Rubin wrote: ExpVar = -ln(UnifVar); It is not a good method in the tails, and is much too slow. If I recall correctly, transcendental operations on a Pentium require only a couple hundred clock cycles and can usually be optimized to take place during other calculations; so a few million simulations per second ought to be possible on the average domestic machine. -Robert Dawson = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
In article [EMAIL PROTECTED], Robert J. MacG. Dawson [EMAIL PROTECTED] wrote: Linda wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? For untruncated exponential RV's the negative-log method of converting a uniform [0,1] RV is about as good as you can get: ExpVar = -ln(UnifVar); It is not a good method in the tails, and is much too slow. It can easily be adjusted to censor to any interval [a,b] by prescaling onto [exp(-b),exp(-a)]; TruncExpVar = -ln(exp(-b) + (exp(-a)-exp(-b))*UnifVar); This is efficient but slow, and has the same inaccuracy in the tails if b a. It is also unnecessarily complex; equivalent results, are obtained by writing it as TruncExpVar = a - ln(exp(a-b) + (1.0-exp(a-b))*UnifVar); -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Newbie question
AP wrote: Hi all: I would appreciate your help in solving this question. calculate the standard deviation of a sample where the mean and standard deviation from the process are provided? E.g. Process mean = 150; standard deviation = 20. What is the SD for a sample of 25? The answer suggested is 4.0 Right answer, wrong question... You were, almost certainly, not asked for the standard deviation of the sample, but for the standard deviation of the MEAN of the sample. The thing you need to note here is that the sample is obtained through a random process, so that most things computed from the sample are likewise randomized through the sampling process. It is often helpful to think of taking a lot of samples all of the same size, computing the mean (or whatever) for each of them, and then analyzing that set of numbers. In particular, you can calculate the standard deviation. Probability theory tells us that in the population of ALL samples of size N from a population with mean mu and standard deviation sigma, the sample means will have mean mu and standard deviation sigma/sqrt(N). Moreover, as N gets larger, the sampling distribution gets closer to a normal distribution, which under some circumstances lets us say more about the distribution based on mu and sigma/sqrt(N). -Robert Dawson = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? For untruncated exponential RV's the negative-log method of converting a uniform [0,1] RV is about as good as you can get: ExpVar = -ln(UnifVar); It can easily be adjusted to censor to any interval [a,b] by prescaling onto [exp(-b),exp(-a)]; TruncExpVar = -ln(exp(-b) + (exp(-a)-exp(-b))*UnifVar); -R. Dawson = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Alan Miller wrote (six times): Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. Alternatively, following Herman Rubin's idea, you can post the same message to EDSTAT-L repeatedly and let X be the delay until somebody points this out. This should be geometrically distributed, which will approximate the desired exponential distribution grin, duck, run For most purposes I do not share Herman's concern about the tails of the distribution. If we use (say) a 64-bit integer as the basis of the uniform distribution, granularity will only be significant for the last few dozen values, which will turn up once every quintillion or so runs. Moreover, fast hardware logarithms are almost a given today. However, his gimmick of randomizing the mantissa and characteristic separately is a good one and well worth remembering. If I recall correctly, math coprocessors use binary logs too, so a super-fast algorithm for (say) a Pentium would probably tie the two approaches together. -Robert Dawson = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Thanks everyone for helping me... Regards, Linda Art Kendall [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... try this SPSS syntax. new file. * this program generates 200 cases * trims those outside the desired range * and takes the first 100 of the remaining. * change lines flagged with . input program. loop #i = 1 to 200. /* . compute mu= .005. /* . compute x = rv.exp(mu). end case. end loop. end file. end input program. formats mu (f6.3). select if x gt 0 and x le 150. /* . compute seqnum =$casenum. execute. select if seqnum le 100. /* . execute. Linda wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Newbie question
On 15 Feb 2002 14:38:49 -0800, [EMAIL PROTECTED] (AP) wrote: Hi all: I would appreciate your help in solving this question. calculate the standard deviation of a sample where the mean and standard deviation from the process are provided? E.g. Process mean = 150; standard deviation = 20. What is the SD for a sample of 25? The answer suggested is 4.0 Here is a vocabulary distinction. Or error. I don't know if you are repeating the problem wrong, or you are speaking from a tradition that I am not familiar with. As I am familiar with it, statisticians say that the standard deviation is the standard deviation of the sample. We say that the standard deviation of the sample *mean* will be frequently referred to as the standard error; and The SD of the mean [or the SE] equals SD/sqrt(N). That is confusing enough. I hope this makes your sources clear. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
In article [EMAIL PROTECTED], Bill Rowe [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Linda) wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? I am a little unclear on what you want. A random variable with an exponential distribution has no upper bound. Are you looking for a random deviate from a truncated distribution? In any case, X = -ln(U) where U is a uniform random deviate will be exponentially distributed with lambda = 1. For a different lambda simply scale -ln(U) by a suitable constant. To have a different minimum, simply add whatever offset you want. To truncate the distribution, simply throw away values above the desired limit. Note this can be made a bit more computationally efficient by truncating the uniform distribution prior to taking the logartithm. One can use a much faster algorithm than using a logarithm, unless the logarithm is a fast hardware one. Also, the logarithm routine used gives poor accuracy in the tails, and there are reasons for wanting good accuracy there. If one is going to use a logarithm, I suggest using X = -ln(U) + K*ln(2.), where U is uniform (.5, 1) and K is the number of 0's until a 1 in a random bit stream. High quality is needed in K. Now as to how to generate the distribution wanted, the random variable X is a linear function of an exponential truncated to be between 0 and M. One could take the remainder of an exponential random variable when divided by M, or modify the generating algorithm never to generate one that large. If M is small, the following is a simple methods, not necessarily optimal. Let V be uniform (0, M) and T a test exponential. Replace T by T-V. If this is positive, use V and a truncated exponential, and continue. If not, we lose both V and T. My faster method of generating exponentials is based on this general idea, but with the range divided. A more detailed preliminary description of the process is available, and a student is working on putting it into a program library. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
try this SPSS syntax. new file. * this program generates 200 cases * trims those outside the desired range * and takes the first 100 of the remaining. * change lines flagged with . input program. loop #i = 1 to 200. /* . compute mu= .005. /* . compute x = rv.exp(mu). end case. end loop. end file. end input program. formats mu (f6.3). select if x gt 0 and x le 150. /* . compute seqnum =$casenum. execute. select if seqnum le 100. /* . execute. Linda wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Newbie question
Hi all: I would appreciate your help in solving this question. calculate the standard deviation of a sample where the mean and standard deviation from the process are provided? E.g. Process mean = 150; standard deviation = 20. What is the SD for a sample of 25? The answer suggested is 4.0 TIA /anil = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Linda) wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? I am a little unclear on what you want. A random variable with an exponential distribution has no upper bound. Are you looking for a random deviate from a truncated distribution? In any case, X = -ln(U) where U is a uniform random deviate will be exponentially distributed with lambda = 1. For a different lambda simply scale -ln(U) by a suitable constant. To have a different minimum, simply add whatever offset you want. To truncate the distribution, simply throw away values above the desired limit. Note this can be made a bit more computationally efficient by truncating the uniform distribution prior to taking the logartithm. -- - PGPKey fingerprint: 6DA1 E71F EDFC 7601 0201 9243 E02A C9FD EF09 EAE5 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Linda wrote in message [EMAIL PROTECTED]... I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda Is MU the mean before truncation? - or afterwards? The ziggurat algorithm seems to be the fastest for generating exponentially-distributed RV's. You can then simply scale them, by multiplying by the mean BEFORE truncation, and then throw away any which exceed the upper bound. -- Alan Miller (Honorary Research Fellow, CSIRO Mathematical Information Sciences) http://www.ozemail.com.au/~milleraj http://users.bigpond.net.au/amiller/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on random number generator
Hi, Define Y = X if X=T = 0 otherwise For your problem, T=150 (threshold) and X is exponential random variable with mean, MU. So, first generate X and compare with T and assign a value to Y as specified in the above rule. Alternatively, find the CDF (distribution function) of Y from the above rule and then use a uniform random variable in (0, 1) to generate Y itself. hope this helps regards Ramesh Linda wrote: I want to generate a series of random variables, X with exponential PDF with a given mean,MU value. However, I only want X to be in some specified lower and upper limit?? Say between 0 - 150 i.e. rejected anything outside this range Does anyone have any ideas how should I do that?? Regards, Linda = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: one-way ANOVA question
On 13 Feb 2002 09:48:41 -0800, [EMAIL PROTECTED] (Dennis Roberts) wrote: At 09:21 AM 2/13/02 -0600, Mike Granaas wrote: On Fri, 8 Feb 2002, Thomas Souers wrote: 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? I would second those who've already indicated that planned comparisons are superior in answering theoretical questions and add a couple of comments: another way to think about this issue is: what IF we never had ... nor will in the future ... the overall omnibus F test? would this help us or hurt us in the exploration of the experimental/research questions of primary interest? - not having it available, even abstractly, would HURT, because we would be without that reminder of 'too many hypotheses'. In practice, I *do* consider the number of tests. Just about always. Now, I am not arguing that the particular form of having an ANOVA omnibus-test is essential. Bonferroni correction can do a lot of the same. It just won't always be as efficient. i really don't see ANY case that it would hurt us ... and, i can't really think of cases where doing the overall F test helps us ... But, Dennis, I thought you told us before, you don't appreciate hypothesis testing ... I thought you could not think of cases where doing *any* F-test helps us. [ ... ] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: one-way ANOVA question
On Fri, 8 Feb 2002, Thomas Souers wrote: 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? I would second those who've already indicated that planned comparisons are superior in answering theoretical questions and add a couple of comments: 1) an omnibus test followed by pairwise comparisons cannot clearly answer theoretical questions involving more than two groups. Trend analysis is one example where planned comparisons can give a relatively unambigious answer (is there a linear, quadratic, etc trend?) where pairwise tests leave the research trying to interpret the substantive meaning of a particular pattern of pairwise differences. 2) planned comparisons require that the researcher think through the theoretical implications of their research efforts prior to collecting data. It is too common for folks to gather some data appropriate for an ANOVA, without thinking through the theoretical implications of their possible results, analyze it with an omnibus test (Ho: all the means the same) and rely on post-hoc pairwise comparisons to understand the theoretical meaning of their findings. In a multi-group design if you cannot think of at least one meaningful contrast code prior to collecting the data, you haven't really thought through your research. 3) your power is better. It is well known that when you toss multiple potential predictors into a multiple regression equation you run the risk of washing out the effect of a single good predictor by combining it with one or more bad predictors. ANOVA is a special case of multiple regression where each df in the between subjects line represents a predictor (contrast code). By combining two or more contrast codes into a single omnibus test you reduce your ability to detect meaningful differences amongst the collection of non-differences. Hope this helps. Michael *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: one-way ANOVA question
Thomas Souers wrote: Hello, I have two questions regarding multiple comparison tests for a one-way ANOVA (fixed effects model). 1) Consider the Protected LSD test, where we first use the F statistic to test the hypothesis of equality of factor level means. Here we have a type I error rate of alpha. If the global F test is significant, we then perform a series of t-tests (pairwise comparisons of factor level means), each at a type I error rate of alpha. This may seem like a stupid question, but how does this test preserve a type I error for the entire experiment? As you (nearly) say, [Only i]f the global F test is significant, we then perform a series of t-tests 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? It depends on the research question. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: one-way ANOVA question
At 09:21 AM 2/13/02 -0600, Mike Granaas wrote: On Fri, 8 Feb 2002, Thomas Souers wrote: 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? I would second those who've already indicated that planned comparisons are superior in answering theoretical questions and add a couple of comments: another way to think about this issue is: what IF we never had ... nor will in the future ... the overall omnibus F test? would this help us or hurt us in the exploration of the experimental/research questions of primary interest? i really don't see ANY case that it would hurt us ... and, i can't really think of cases where doing the overall F test helps us ... i think mike's point about planning comparisons making us THINK about what is important to explore in a given study ... is really important because, we have gotten lazy when it comes to this ... we take the easy way out of testing all possible paired comparisons when, it MIGHT be that NONE of these are really the crucial things to be examined Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 Emailto: [EMAIL PROTECTED] WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
one-way ANOVA question
Hello, I have two questions regarding multiple comparison tests for a one-way ANOVA (fixed effects model). 1) Consider the Protected LSD test, where we first use the F statistic to test the hypothesis of equality of factor level means. Here we have a type I error rate of alpha. If the global F test is significant, we then perform a series of t-tests (pairwise comparisons of factor level means), each at a type I error rate of alpha. This may seem like a stupid question, but how does this test preserve a type I error for the entire experiment? I understand that with a Bonferroni-type procedure, we can test each pairwise comparison at a certain rate, so that the overall type I error rate of the experiment will be at most a certain level. But with the Protected LSD test, I don't quite see how the comparisons are being protected. Could someone please explain to me the logic behind the LSD test? 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? I would very much appreciate it if someone could take the time to explain this to me. Many thanks. Go Get It! Send FREE Valentine eCards with Lycos Greetings http://greetings.lycos.com = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: one-way ANOVA question
You have to keep in mind that the LSD is concerned with familywise error rate, which is the probability that you will make at least one type I error in your set of conclusions. For the familywise error rate, 3 errors are no worse than 1. Suppose that you have three groups. If the omnibus null is true, the probability of erroneously rejecting the null with the overall Anova is equal to alpha, which I'll assume you set at .05. IF you reject the null, you have already made one type I error, so the chances of making more do not matter to the familywise error rate. Your Type I error rate is .05. Now suppose that the null is false-- mu(1) = mu(2) /= mu(3). Then it is not possible to make a Type I error in the overall F, because the omnibus null is false. There is one chance of making a Type I error in testing individual means, because you could erroneously declare mu(1) /= mu(2). But since the other nulls are false, you can't make an error there. So again, your familywise probability of a Type I error is .05. Now assume 4 means. Here you have a problem. It is possible that mu(1) = mu(2) /= mu(3) = mu(4). You can't make a Type I error on the omnibus test, because that null is false. But you will be allowed to test mu(1) = mu(2), and to test mu(3) = mu(4), and each of those is true. So you have 2 opportunities to make a Type I error, giving you a familywise rate of 2*.05 = .10. So with 2 or 3 means, the max. familywise error rate is .05. With 4 or 5 means it is .10, with 6 or 7 means it is .15, etc. But keep in mind that, at least in psychology, the vast majority of experiments have no more than 5 means, and many have only 3. In that case, the effective max error rate for the LSD is .10 or .05, depending on the number of means. Other the other hand, if you have many means, the situation truly gets out of hand. Dave Howell At 10:37 AM 2/8/2002 -0800, you wrote: Hello, I have two questions regarding multiple comparison tests for a one-way ANOVA (fixed effects model). 1) Consider the Protected LSD test, where we first use the F statistic to test the hypothesis of equality of factor level means. Here we have a type I error rate of alpha. If the global F test is significant, we then perform a series of t-tests (pairwise comparisons of factor level means), each at a type I error rate of alpha. This may seem like a stupid question, but how does this test preserve a type I error for the entire experiment? I understand that with a Bonferroni-type procedure, we can test each pairwise comparison at a certain rate, so that the overall type I error rate of the experiment will be at most a certain level. But with the Protected LSD test, I don't quite see how the comparisons are being protected. Could someone please explain to me the logic behind the LSD test? 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? I would very much appreciate it if someone could take the time to explain this to me. Many thanks. Go Get It! Send FREE Valentine eCards with Lycos Greetings http://greetings.lycos.com = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = ** David C. Howell Phone: (802) 656-2670 Dept of Psychology Fax: (802) 656-8783 University of Vermont email: [EMAIL PROTECTED] Burlington, VT 05405 http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html http://www.uvm.edu/~dhowell/gradstat/index.html
Re: one-way ANOVA question
At 10:37 AM 2/8/02 -0800, Thomas Souers wrote: 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? well, in the typical rather complex study ... all pairs of possible mean differences (as one example) are NOT equally important to the testing of your theory or notions so, why not set up ahead of time ... THOSE that are (not necessarily restricted to pairs) you then follow ... let the other ones alone no law says that if you had a 3 by 4 by 3 design, that the 3 * 4 * 3 = 36 means all need pairs testing ... in fact, come combinations may not even make a whole lot of sense EVEN if it is easier to work them into your design I would very much appreciate it if someone could take the time to explain this to me. Many thanks. Go Get It! Send FREE Valentine eCards with Lycos Greetings http://greetings.lycos.com = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 Emailto: [EMAIL PROTECTED] WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: one-way ANOVA question
Hi On 8 Feb 2002, Thomas Souers wrote: 2) Secondly, are contrasts used primarily as planned comparisons? If so, why? There are a great many possible contrasts even with a relatively small number of means. If you examine the data and then decide what contrasts to do, then you have in some informal sense performed a much larger set of contrasts than you actually formally test. Specifying the contrasts in advance means that you have only performed the number of statistical tests actually calculated. Another (related) way to think of it is that planned contrasts take advantage of pre-existing theory and data to perform tests that favor certain outcomes. To do this, however, contrasts must be specified independently of the data (i.e., planned). Perhaps could be thought of as some kind of quasi-bayesian thinking? That is, given a priori factors favoring certain outcomes, the actual data does not need to be as strong to tilt the results in that direction. Best wishes Jim James M. Clark (204) 786-9757 Department of Psychology(204) 774-4134 Fax University of Winnipeg 4L05D Winnipeg, Manitoba R3B 2E9 [EMAIL PROTECTED] CANADA http://www.uwinnipeg.ca/~clark = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Question on Poisson -- Multinomial Relationship
Hi all, The conditional distribution of Poisson variates given their sum is multinomial. Does anyone know the densitity of Poisson variates, given their partial sums S1, S2, etc. Sk, with each Si possibly overlapping with one or more of the other sums? Thanks in advance. Bhaskara = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
(Probably Simple) Chi-Square Intuition Question
I'm looking at forced-reponse answers to a question where there a several possible choices. I'm trying to test the significance of the difference between the proportion choosing answer A and the proportion choosing answer B. I've got the fairly-simple formula for a chi-square-distributed test statistic. I'm puzzled, however, by the effect of changing the number of answer options on the chi-square critical value. Suppose 34% of the sample always choses A and 21% always answers B (no matter how many choices there are). Because the test-statics reportedly has the number of total choices (minus 1) as the degrees of freedom, this implies that as the number of choices goes up, it's going to be harder and harder for me to show that the A and B proportions are statically different. I would think that the differences between the A and B proportions (13% in my example here), would be more impressive as you increased the number of total options. Please help, I'm totally stumped! Thanks much in advance! andy leventis [EMAIL PROTECTED] = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: cell-counts question
I'm not clear on your level of understanding, so apology if I repeat ground you already have plowed twice. 1)The symbol:6.27X10^7 means (is mathematically equal to) the symbol: 62,700,000. Could be the biostatistician counted one heck of a lot of cells, or had some means to estimate the count from a smaller volume than a standard volume used for reporting. 2)When calculating average and standard deviation, we can 'adjust' the actual measured numbers by adding (or subtracting) a constant to each measurement, or by multiplying (or dividing) a constant to each measurement. Each of these possible adjustments changes the average and standard deviation in known ways. thus, we can divide each measurement by 10,000,000 (10^7), do the average stdev calculation, and then 'adjust' the result back again at the end. the equation for the relationships is given by: If U = a*X + bthenEq. 1 xbar(U) = a* xbar(X) +bEq. 2 3 stdev(U) = a* stdev(X) + 0(b does not change stdev) so in your case, the report measured 6.27*10^7, etc. they divided each measurement by 10^7 to get 6.27, etc. This is using Eq. 1 above, with a = 1/10^7. Then they calculated the average and standard deviation (which is much easier without all those 0's hanging around :) . then they can multiply xbar and stdev by 10^7, and report the average and stdev of the original measurements for all to see. This is using Eq. 2 3, only first solving for xbar(X) stdev(X) to get Eq. 4 5: xbar(X) = (xbar(U) - b)/aEq. 4 (from Eq. 2) stdev(X) = xbar(U)/aEq. 5(from Eq. 3) 1/a = 1/(1/10^7) = 10^7 in your case, so stdev(X) = stdev(U)*10^7. Result: easier calculation, easier visualization of the number crunching, easier display on a graph for example, BUT: no change in result. Requirements:a and b must be constants. Eq. 1 must be applied to _all_ the data used in the calculations. this kind of thing is often done without noticing, when we change the scale of the measurements. Some length measurements are written in a log book in 'mils' in the USA, where 1 mil = 0.001 inches. the calculations are done in mils, then reported in a report in inches. Hence, an average of 9.4 mils becomes an average of 0.0094 inches. I believe European locomotive (train engine) plans are documented in mm, from one end to the other. But the overall length is reported to management and the public in meters. Does this help? Jay Wei Wang wrote: Dear Friends, Here is an exam question which I don't know how to do. Can anyone help me? The question is a biostatistician was asked to analyze some data regarding cell counts, and the values were reported like 6.27x10^7, 72.5x10^7, 3.42x10^7, etc. rather than using the data exactly as reported, the biostatistician used the values as 6.27, 72.5, 3.42, etc. what effect does this have on estimation of mean and standard deviation? What effect does this have on hypothesis testing about the mean? Why? Thank you very much for your help. Christine, -- Jay Warner Principal Scientist Warner Consulting, Inc. North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
cell-counts question
Dear Friends, Here is an exam question which I don't know how to do. Can anyone help me? The question is a biostatistician was asked to analyze some data regarding cell counts, and the values were reported like 6.27x10^7, 72.5x10^7, 3.42x10^7, etc. rather than using the data exactly as reported, the biostatistician used the values as 6.27, 72.5, 3.42, etc. what effect does this have on estimation of mean and standard deviation? What effect does this have on hypothesis testing about the mean? Why? Thank you very much for your help. Christine, = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: SAT Question Selection
[cc'd to previous poster; please follow up in newsgroup] L.C. [EMAIL PROTECTED] wrote in sci.stat.edu: Back in my day (did we have days back then?) I recall talk of test questions on the SAT. That is, these questions were not counted; they were being tested for (I presume) some sort of statistical validity. Does anyone have any statistical insight into the SAT question selection process. Does anyone have a specific lead? I can find virtually nothing. I remember reading a good book about the inner operation of ETS (administers the SATs), with some bits about the test questions you refer to, but I can't quite remember the title. I've searched the catalog of my old library, and this _may_ be it: Lemann, Nicholas. The big test : the secret history of the American meritocracy New York : Farrar, Straus and Giroux, 1999. -- Stan Brown, Oak Road Systems, Cortland County, New York, USA http://oakroadsystems.com/ What in heaven's name brought you to Casablanca? My health. I came to Casablanca for the waters. The waters? What waters? We're in the desert. I was misinformed. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: SAT Question Selection
for the SAT ... which is still paper and pencil ... you will find multiple sections ... math and verbal ... as far as i know ... there usually are 3 of one and 2 of the other ... the one with 3 has A section that is called operational ... which does NOT count ... but is used for trialing new items ... revised items ... etc. don't expect them to tell you which one that is however ... in a sense ... they are making YOU pay for THEIR pilot work ... and, of course, if you happen to really get fouled up on the section that is operational and does not count ... it could carry over emotionally to another section ... and have some (maybe not much) impact on your motivation to do well on that next section unless it has changed ... At 05:19 PM 1/14/02 -0500, you wrote: [cc'd to previous poster; please follow up in newsgroup] L.C. [EMAIL PROTECTED] wrote in sci.stat.edu: Back in my day (did we have days back then?) I recall talk of test questions on the SAT. That is, these questions were not counted; they were being tested for (I presume) some sort of statistical validity. Does anyone have any statistical insight into the SAT question selection process. Does anyone have a specific lead? I can find virtually nothing. I remember reading a good book about the inner operation of ETS (administers the SATs), with some bits about the test questions you refer to, but I can't quite remember the title. I've searched the catalog of my old library, and this _may_ be it: Lemann, Nicholas. The big test : the secret history of the American meritocracy New York : Farrar, Straus and Giroux, 1999. -- Stan Brown, Oak Road Systems, Cortland County, New York, USA http://oakroadsystems.com/ What in heaven's name brought you to Casablanca? My health. I came to Casablanca for the waters. The waters? What waters? We're in the desert. I was misinformed. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
SAT Question Selection
Back in my day (did we have days back then?) I recall talk of test questions on the SAT. That is, these questions were not counted; they were being tested for (I presume) some sort of statistical validity. Does anyone have any statistical insight into the SAT question selection process. Does anyone have a specific lead? I can find virtually nothing. Thanks and Regards, -Larry Curcio = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: SAT Question Selection
On Sun, 13 Jan 2002 13:04:14 GMT, L.C. [EMAIL PROTECTED] wrote: Back in my day (did we have days back then?) I recall talk of test questions on the SAT. That is, these questions were not counted; they were being tested for (I presume) some sort of statistical validity. Does anyone have any statistical insight into the SAT question selection process. Does anyone have a specific lead? I can find virtually nothing. I believe that they have to change their questions a lot more often than they used to, now that they occasionally reveal some questions and answers. The Educational Testing Service has a web site that looks pretty nice, in my 60-second opinion. http://www.ets.org/research/ They do seem to invite communication -- I suggest you e-mail, if you don't find what you are looking for in their 8 research areas, or elsewhere. It seems to me that I found a statistics journal produced by ETS when I was looking up references for scaling, a year or so ago. But I don't remember that for a fact. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Sorry for question, but how is the english word for @
at Nathaniel [EMAIL PROTECTED] wrote in message news:9v3d79$2rj$[EMAIL PROTECTED]... Hi, Sorry for question, but how is the english word for @ Pleas forgive me. N. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on 2-D joint distribution...
Chia C Chong [EMAIL PROTECTED] wrote in message a145qk$qfq$[EMAIL PROTECTED]">news:a145qk$qfq$[EMAIL PROTECTED]... Hi! I have a series of observations of 2 random variables (say X and Y) from my measurement data. These 2 RVs are not independent and hence f(X,Y) ~= f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to plot the 2-D kernel density estimates of these 2 RVs and from the it looks like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other side looks like Gamma/Weibull/Exponential shape. My intention is to find the joint 2-D distribution of these 2 RVs so that I can reprenseted this by an equation (so that I could regenerate this plot using simulation later on). I wonder whether anyone has come across this kind of problem and what method that I should use?? = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on 2-D joint distribution...
In article a145qk$qfq$[EMAIL PROTECTED], Chia C Chong [EMAIL PROTECTED] wrote: Hi! I have a series of observations of 2 random variables (say X and Y) from my measurement data. These 2 RVs are not independent and hence f(X,Y) ~= f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to plot the 2-D kernel density estimates of these 2 RVs and from the it looks like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other side looks like Gamma/Weibull/Exponential shape. My intention is to find the joint 2-D distribution of these 2 RVs so that I can reprenseted this by an equation (so that I could regenerate this plot using simulation later on). I wonder whether anyone has come across this kind of problem and what method that I should use?? There is, in the collection by Johnson and Kotz (and others for some of the volumes), a listing of classical bivariate distributions. It is hard enough to estimate one-dimensional distributions; it gets worse as the dimension increases. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Question on 2-D joint distribution...
Hi! I have a series of observations of 2 random variables (say X and Y) from my measurement data. These 2 RVs are not independent and hence f(X,Y) ~= f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to plot the 2-D kernel density estimates of these 2 RVs and from the it looks like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other side looks like Gamma/Weibull/Exponential shape. My intention is to find the joint 2-D distribution of these 2 RVs so that I can reprenseted this by an equation (so that I could regenerate this plot using simulation later on). I wonder whether anyone has come across this kind of problem and what method that I should use?? Thanks... Regards, CCC = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Question on 2-D joint distribution...
Chia C Chong [EMAIL PROTECTED] wrote in message a145qk$qfq$[EMAIL PROTECTED]">news:a145qk$qfq$[EMAIL PROTECTED]... Hi! I have a series of observations of 2 random variables (say X and Y) from my measurement data. These 2 RVs are not independent and hence f(X,Y) ~= f(X)f(Y). Hence, I can't investigate f(X) and f(Y) separately. I tried to plot the 2-D kernel density estimates of these 2 RVs and from the it looks like Laplacian/Gaussian/Generalised Gaussian shape in one side and the other side looks like Gamma/Weibull/Exponential shape. My intention is to find the joint 2-D distribution of these 2 RVs so that I can reprenseted this by an equation (so that I could regenerate this plot using simulation later on). I wonder whether anyone has come across this kind of problem and what method that I should use?? Thanks... Regards, CCC In plotting the distributions of these two RVs, were you looking at the MARGINAL distributions? If so, it might be more useful to look at a range of CONDITIONAL distributions for each variable, since it is the conditional distributions that you ultimately need to define in order to arrive at a joint distribution. One variable's conditional distribution could conceivably change substantially over the range of the other variable's values. By looking at how each variable's conditional pdf shape changes at different values of the other variable, you may be able to select a distributional form (Weibull, Gamma, etc.) that is able to represent the varying shape of one variable's pdf by a change of parameter values. Whichever variable has a conditional pdf form that seems best suited to representation by a known distributional form (with varying parameters), is the one you can choose as the dependent variable. For example, let's say that, in looking at the conditional distributions for each variable, you decide that the pdf for one of the variables can be represented pretty well by a Gamma distribution, with parameters b and c. Let Y be the variable whose pdf can be represented by the Gamma distribution, and call the other variable X. Then f(Y) = Gamma[Y,b,c], where Gamma[Y,b,c] denotes the Gamma probability density as a function of Y, with parameters b and c. By changing b and c, you are able to obtain the different shapes that f(Y) assumes over the range of values of X. Thus, you can fit a different Gamma distribution for Y, AT EVERY VALUE OF X. This will give you a set of b and c parameter values for each X. If you plot the different b and c values as functions of X, you can get some idea of what the functional form of the dependence might be. For the sake of simplicity, let's say that it turns out to be linear for both b and c. Then... Gamma parameter b = P0 + P1*X Gamma parameter c = Q0 + Q1*X You can now do regressions to determine the coefficients. Of course, the functional form will probably NOT be linear. And the functional form may also not be the same for both parameters. With the parameters expressed as a function of X, you can write... f(X,Y) = Gamma[Y,b(X),c(X)]. And this is, in fact, the joint distribution you are looking for! WARNING! You will need a LOT of data. You first need to determine a conditional distribution for Y, at every value of X, which is one set of regressions (but, hopefully, you have software that will do the distribution fits automatically for you). Then you have to do another regression for each distribution parameter. And you will probably need fairly good fits to do a reasonable job of reproducing the overall joint pdf. The difficult part of this will probably be trying to find a single distributional form (Weibull, or Gamma, or whatever) that can represent all of the conditional pdf shapes for one of the variables. Of course, if you can't, then you could define several intervals for one of the variables, and apply a different distributional form for each interval. But things can get very messy very quickly! This is probably not the only way to approach the problem, but I hope this helps. -- T. Arthur Wheeler MathCraft Consulting Columbus, OH 43017 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Measure of Association Question.
[EMAIL PROTECTED] (Petrus Nel) wrote in message news:000201c18fe2$f73aeee0$ed9e22c4@oemcomputer... I require some advice regarding the following: One set of variables is the grades obtained by students for different high school subjects (i.e. the symbols candidates obtained such as A, B, C, D, etc. for each subject). The other set of variables are the scores obtained for a college level subject (i.e. no symbols, just their percentages ... The grades obtained for their high school subjects were coded on the questionnaire as follows - 1=A, 2=B, 3=C, 4=D, 5=E, 6=F. ... How do I proceed? Simpler answer: First, change the coding to 1=F, 2=E, 3=D, 4=C, 5=B, 6=A. In the US at least there is no 'E'; if so, the correct coding would be 1=F, 2=D, 3=C, 4=B, 5=A. If the latter coding is used, calculate the Spearman rank correlation between the grade in a given high school course and the college score. If the former coding is used, you can use either the Pearson correlation or the Spearman rank correlation; the Pearson correlation would probably be better. More complex answer: The approach above ignores the fact that within each letter grade there is variation--e.g., all students who get a 'B' are not at the same level. Further, there is censoring at the upper end and lower ends of the scale--e.g., no matter how well a person does, the highest grade they can get is an 'A'. The polyserial correlation can account for this. The polyserial correlation estimates what the correlation of grade and score would be if grades were measured on a continuous scale. An assumption is that there is a bivariate normal distribution between (1) the continuous latent variable of which grade is a manifest representation and (2) the percentage score. The polyserial correlation is related to the polychoric correlation. For information about the polychoric correlation, see: http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm Drasgow F. Polychoric and polyserial correlations. In Kotz L, Johnson NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74). New York: Wiley, 1988. I don't know if SPSS will calculate the polyserial correlation--the last I heard it did not. If not, the polyserial correlation can be calculated with the program PRELIS, which is distributed with LISREL. Many universities have copies of LISREL/PRELIS. If you are interested in comparing to see which high school classes best predict college scores, then, as a practical matter, I would expect you would draw the same conclusions regardless of whether you used the Pearson, the Spearman, or the polyserial correlation coefficients. Good luck! John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych Diet Fitness:http://members.aol.com/WeightControl101 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Measure of Association Question.
Dear members, I require some advice regarding the following: One set of variables is the grades obtained by students for different highschool subjects (i.e. the symbols candidates obtained such as A, B, C, D, etc. for each subject). The other set of variables are the scores obtained for a college level subject (i.e. no symbols, just their percentages obtained). I want to determine the correlation between their grades for different high school subjects (A, B, C, D, etc.) and their percentage scores for a college level subject. The grades obtained for their high school subjects were coded on the questionnaire as follows - 1=A, 2=B, 3=C, 4=D, 5=E, 6=F. I`ve entered the data for the grades as 1,2,3, etc. to indicate the grade (category) and the percentages (as theother variable)into SPSS. How do I proceed? Any comments are welcome. Regards, Petrus Nel
Re: Maximum Likelihood Question
To all, Thanks so much for all your ideas and insights thus far. To those who have suggested a Baysean approach, I am interested, but I am weeks away from understanding it well enough to figure out if I can use it. Also, I think I am close to developing a usable technique along my current line. The only constrain on my parameters is that they remain positive. Occassionally one will approach zero, not often. I am reposting because I have another focused question stemming from the same problem. MY SITUATION: I am studying a time-dependent stochastic Markov process. The conventional method involved fitting data to exponential decay equations and using the F-test to determine the number of components required. The problem (as I am sure you all see) is that the F-test assumes the data is iid, and conflicting results are often observed. As a first step, I have been attempting to fit similar (simulated) data directly to Markov models using the Q-matrix and maximum likelihood methods. The likelihood function is: L= (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)) Where | CV-Matrix | is the determinant of the Covariance matrix, (O) is the vector of observed values in time order and (E) is the vector of the values predicted by the Markov model for the corresponding times. The Covariance matrix is generated by the Markov model. My two objectives are to determine the number of free parameters, and to estimate the values of the parameters. Because the data is simulated I know what the number of parameters and their values are. MY PROBLEM: I have been using the Log(Likelihood) method to compare the results of fitting to the correct model and to a simpler sub-hypothesis (H0). I am getting very small Log(Likelihood ratio)?¡¥s when I know the more complex model is correct (i.e. H0 should be rejected). When I first observed this I tried increasing the N values, and found a decrease rather than an increase in the Log(Likelihood ratio). When I look at the likelihood function, the weighted Sum of Squares factor : ( (O-E).CV^-1.(O-E) ) is very different between the two hypotheses (i.e. favoring rejection of H0), but difference in the determinant portion ( (1/Sqrt( | CV-Matrix |)) ) is in the opposite direction. As a result, the Log(Likelihood ratio) is below that needed to reject H0. I asked about just fitting (O-E).CV^-1.(O-E) and was reminded that without the determinant factor, the likelihood would be maximized by simply increasing the variance. This appears to be true in practice. In learning about the quadratic form, I read in several places that, for the distribution to approach a chi square distribution, the Covariance Matrix must be idempotent (CV^2 = CV). I am almost certain this is not the case. I am hoping to get feedback on this idea: THE QUESTION: Following maximization of the full likelihood function ( (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)) ) for both models, can I use the F-test to compare the weighted Sum of Squares (i.e. (O-E).CV^-1.(O-E) ) of the two models, rather than the likelihood ratio test. In other words, does correcting each (O-E) for its variance and covariance legitimize the F-test? Any insight is greatly appreciated. Thanks for your patience and consideration. James Celentano = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Maximum Likelihood Question
Herman Rubin [EMAIL PROTECTED] wrote in message 9vqoln$[EMAIL PROTECTED]">news:9vqoln$[EMAIL PROTECTED]... Maximum likelihood is ASYMPTOTICALLY optimal in LARGE samples. It may not be good for small samples; it pays to look at how the actual likelihood function behaves. The fit is always going to improve with more parameters. This may be the trouble in the actual problem being attempted, but there are other possibilities, besides the potential for having programmed things incorrectly. One such trouble might be that the parameters are constrained and that the maximum-likelihood estimates given such constraints are falling on the edge of the allowed region .. then the usual asymptotics don't apply. David Jones = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Maximum Likelihood Question
In article [EMAIL PROTECTED], Jimc10 [EMAIL PROTECTED] wrote: To all who have helped me on the previous thread thank you very much. I am reposting this beause the question has become more focused. I am studying a stochastic Markov process and using a maximum likelihood technique to fit observed data to theoretical models. As a first step I am using a Monte Carlo technique to generate simulated data from a known model to see if my fitting method is acurate. In particular I want to know if I can use this techniques to dtermine the number of free parameters in the Markov Model. I have been using the Log(Likelihood) method which seems to be widely acceted. I am getting very small Log(Likelihood ratios) in cases when I know the more complex model is correct (i.e. H0 should be rejected). When I first observed this I tried increasing the N values, and found a decrease rather than an increase in the Log(Likelihood ratio). I now think I know why. I am posting in hopes of finding out if my proposed solution is 1)statistical heracy, 2)so obvious that I should have realized it 6 months ago, or 3)a plausible idea in need of validation. The likelihood fuction I have been using up to now which I will call the FULL likelihood function is: L= (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)) Where | CV-Matrix | is the determinant of the Covariance matrix, (O) is the vector of observed values in time order and (E) is the vector of the values predicted by the Markov model for the corresponding times. The Covariance matrix is generated by the Markov model. IN A NUTSHELL: It appears that the factor (1/Sqrt( | CV-Matrix |)) is the source of the problem. In many MLE discriptions this is a constant and drops out. In my case there is a big difference between the (1/Sqrt( | CV-Matrix |)) for different models (several log units). I believe this may be biasing the fit in some way. MY PROPOSAL: I have begun fitting my data to the follwing simplified likelihood formula: L= exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)). Does this seem reasonable? It is highly unlikely that it would give asymptotically optimal estimators, although there are cases where this does happen. It can happen that it will be consistent and have positive efficiency, for example if the parameter effect on E is such that L would be O(n) for any wrong parameter, and O(1) for the true parameter, all this in probability, and the covariance matrix does not blow up in too bad a manner. If the major problem is with the fit of the covariance matrix, it will not be good, and if E does not involve some of the parameters, but the covariance matrix can go to infinity on those, by doing that, L can go to 0, which would maximize it as it is negative. As you say the covariance matrix varies considerably, I would suggest including it. Maximum likelihood is ASYMPTOTICALLY optimal in LARGE samples. It may not be good for small samples; it pays to look at how the actual likelihood function behaves. The fit is always going to improve with more parameters. I believe your best bet would be robust approximate Bayesian analysis. This is hard to describe in a newsgroup posting, and in any case requires some user input. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Maximum Likelihood Question
To all who have helped me on the previous thread thank you very much. I am reposting this beause the question has become more focused. I am studying a stochastic Markov process and using a maximum likelihood technique to fit observed data to theoretical models. As a first step I am using a Monte Carlo technique to generate simulated data from a known model to see if my fitting method is acurate. In particular I want to know if I can use this techniques to dtermine the number of free parameters in the Markov Model. I have been using the Log(Likelihood) method which seems to be widely acceted. I am getting very small Log(Likelihood ratios) in cases when I know the more complex model is correct (i.e. H0 should be rejected). When I first observed this I tried increasing the N values, and found a decrease rather than an increase in the Log(Likelihood ratio). I now think I know why. I am posting in hopes of finding out if my proposed solution is 1)statistical heracy, 2)so obvious that I should have realized it 6 months ago, or 3)a plausible idea in need of validation. The likelihood fuction I have been using up to now which I will call the FULL likelihood function is: L= (1/Sqrt( | CV-Matrix |))*exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)) Where | CV-Matrix | is the determinant of the Covariance matrix, (O) is the vector of observed values in time order and (E) is the vector of the values predicted by the Markov model for the corresponding times. The Covariance matrix is generated by the Markov model. IN A NUTSHELL: It appears that the factor (1/Sqrt( | CV-Matrix |)) is the source of the problem. In many MLE discriptions this is a constant and drops out. In my case there is a big difference between the (1/Sqrt( | CV-Matrix |)) for different models (several log units). I believe this may be biasing the fit in some way. MY PROPOSAL: I have begun fitting my data to the follwing simplified likelihood formula: L= exp((-1/2)*(O-E).(CV-Matrix^-1).(O-E)). Does this seem reasonable? Thanks for any insight James Celentano = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sorry for question, but how is the english word for @
U¿ytkownik Nathaniel [EMAIL PROTECTED] napisa³ w wiadomo¶ci news:9v3d79$2rj$[EMAIL PROTECTED]... Hi, Sorry for question, but how is the english word for @ Pleas forgive me. N. Thank everyone for valuable information. Nathaniel = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sorry for question, but how is the english word for @
Nathaniel wrote: Hi, Sorry for question, but how is the english word for @ Pleas forgive me. You're forgiven...grin The New Hacker's Dictionary gives: common: at sign; at; strudel rare (and often facetious): vortex, whorl, whirlpool , cyclone, snail, ape, cat, rose, cabbage. Official ANSI name: commercial at -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sorry for question, but how is the english word for @
Thank everyone for valuable information. Nathaniel Uzytkownik Art Kendall [EMAIL PROTECTED] napisal w wiadomosci [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... atusually indicate some kind of rate or unit price 10 pounds @ $1 per pound on the net is is used as a separator between the id of an individual and his/her location [EMAIL PROTECTED] id spoken as john dot smith at harvard dot e d u. until the early-80's or so dot was spoken as point as in filname point ext (extension indicating type). Sometimes addresses were given as john.smith at harvard.edu Nathaniel wrote: Hi, Sorry for question, but how is the english word for @ Pleas forgive me. N. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sorry for question, but how is the english word for @
Nathaniel: The symbol @ belongs to the cateqory of special characters in English. Although it is often rendered as "commercial at" in a technical context, in the vernacular (and on the net) it is most often rendered as simply"at." I can't help but advise that, since English is clearly your second language, you would do very well to utterly ignore the, er, uh, erudite message from Dr. Kendall dated 12/10/2001 It could do damage to your vocabulary. The kindest thing that can be said about said message, is that it must have been very hastily written. ('Twas most certainly very carelessly written) For example, Dr. Kendall's second line reads: on the net is is used as a separator between the id of an individual and his/her location Apart from the fact that the first word should have been capitalized (a very minor matter), the sentence 'would have been much better written: "On the net it is used as a separator between the screen name and the domain name in an e-mail address." I realize that you might well need definitions for the technical terms "screen name" and "domain name." They can be found in the Webopedia: Online Computer Dictionary for Internet Terms and Technical Support. at @ http://www.webopedia.com/ -- a truely excellent online reference work. By the way, I can't help chuckling a bit at Dr. Kendall's use of "id" as an abbreviation for "identification" If you will check with Merriam-Webster OnLine @ http://www.m-w.com/netdict.htm you will find that correct abbreviation (acronym or initialism) is "ID." Meanwhile, "id" is a psychoanalytical term that has something to do with the psyche. I could go on, but 'nuff said [enough said] for present purposes. Respectfully: Harley Upchurch
Re: Sorry for question, but how is the english word for @
The name given to the symbol @ in international standard character sets is 'commercial at'. See http://www.quinion.com/words/articles/whereat.htm for a history of the symbol. Richard Wright On Mon, 10 Dec 2001 23:34:19 +0100, Nathaniel [EMAIL PROTECTED] wrote: Hi, Sorry for question, but how is the english word for @ Pleas forgive me. N. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sorry for question, but how is the english word for @
atusually indicate some kind of rate or unit price 10 pounds @ $1 per pound on the net is is used as a separator between the id of an individual and his/her location [EMAIL PROTECTED] id spoken as john dot smith at harvard dot e d u. until the early-80's or so dot was spoken as point as in filname point ext (extension indicating type). Sometimes addresses were given as john.smith at harvard.edu Nathaniel wrote: Hi, Sorry for question, but how is the english word for @ Pleas forgive me. N. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Question about concatenating probability distributions
RE: The Poisson process and Lognormal action time. This kind of problem arises a lot in the actuarial literature (a process for the number of claims and a process for the claim size), and the Poisson and the lognormal have been used in this context - it might be worth your while to look there for results. Glen ... This is a very general and important event process. It is also used to describe the general failure-repair process that occurs at any repair shop. The Poisson is a good approximation of the arrival times of equipment to be repaired, and the log-normal is a good approximation of the time it takes to repair it. From an operations standpoint, the downtime is approximated by the exponential distribution (occurrence) and a log-normal repair time, which includes diagnosis, replacement and validation. In the Air Force (1982-1995) where the reliability and maintainability of equipment has to be characterized, the means are determined and used in a form called availability. We never got beyond the use of availability. They never got into the distribution and confidence interval aspects. As a general approximation, the log-normal distribution approximates human reaction times to events. DAHeiser = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question about concatenating probability distributions
Jacek Gomoluch [EMAIL PROTECTED] wrote in message news:9uqkmv$954$[EMAIL PROTECTED]... In a stochastic process the number of customers which are arriving at a server (during a time intervall) is desribed by a Poisson distribution: P(n)=exp(-v) * (v^n)/(n!) Each arriving customer has a task to be carried out of which the size (in units) is described by a lognormal distribution: f(u)= exp(-(ln u)^2 / (2*a^2)) / (u*a*SQRT(2*PI)) Question: What is the total number of units (i.e. size of all tasks) requested during the time intervall ? I wonder how these distributions can be concatenated, and if there is a formula for this. If the count variable and the size variable are independent, calculation of the mean and variance of the total is straightforward. This kind of problem arises a lot in the actuarial literature (a process for the number of claims and a process for the claim size), and the Poisson and the lognormal have been used in this context - it might be worth your while to look there for results. Glen = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Question about concatenating probability distributions
In a stochastic process the number of customers which are arriving at a server (during a time intervall) is desribed by a Poisson distribution: P(n)=exp(-v) * (v^n)/(n!) Each arriving customer has a task to be carried out of which the size (in units) is described by a lognormal distribution: f(u)= exp(-(ln u)^2 / (2*a^2)) / (u*a*SQRT(2*PI)) Question: What is the total number of units (i.e. size of all tasks) requested during the time intervall ? I wonder how these distributions can be concatenated, and if there is a formula for this. Thanks for any help! Jacek Gomoluch = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question about concatenating probability distributions
If the poisson arrival process and the work process are independent, then have a look at Wald's law in (almost) any probability book. For example, the mean amount of work is then simply the product of the means of each RV, in your case: E(amount of work in a fixed time interval)=v*E(U) where U is your lognormal RV. Jacek Gomoluch wrote: In a stochastic process the number of customers which are arriving at a server (during a time intervall) is desribed by a Poisson distribution: P(n)=exp(-v) * (v^n)/(n!) Each arriving customer has a task to be carried out of which the size (in units) is described by a lognormal distribution: f(u)= exp(-(ln u)^2 / (2*a^2)) / (u*a*SQRT(2*PI)) Question: What is the total number of units (i.e. size of all tasks) requested during the time intervall ? I wonder how these distributions can be concatenated, and if there is a formula for this. Thanks for any help! Jacek Gomoluch -- Peter Rabinovitch = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stat question
the reality of this is ... sometimes getting notes from other students is helpful ... sometimes it is not ... there is no generalization one can make about this most student who NEED notes are not likely to ask people other than their friends ... and, in doing so, probably know which of their friends they have the best chance of getting good notes from ... (at least READABLE!) ...even lazy students are not likely to ask for notes from people that even THEY know are not going to be able to do them any good but i don't think we can say anything really systematic about this activity other than, sometimes it helps ... sometimes it does not help At 06:24 PM 12/5/01 -0800, Glen wrote: Jon Miller [EMAIL PROTECTED] wrote in message You can ask the top students to look at their notes, but you should be prepared to find that their notes are highly idiosyncratic. Maybe even unusable. Having seen notes of some top students on a variety of occasions (as a student and as a lecturer), that certainly does happen sometimes. But just about as likely is to find a set of notes that are actually better than the lecturer would prepare themselves. Glen = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stat question
Stan Brown wrote: Jon Miller [EMAIL PROTECTED] wrote in sci.stat.edu: Stan Brown wrote: I would respectfully suggest that the OP _first_ carefully study the textbook sections that correspond to the missed lectures, get notes from a classmate This part is of doubtful usefulness. Doubtful? It is of doubtful usefulness to get notes from a classmate and study the covered section of the textbook? Huh? Sorry, bad editing on my part. Getting notes from a classmate is of doubtful usefulness. Plenty of anecdotes on request. If Cathy Cheng is in your class, you can just photocopy her notes and use them as a textbook. But most students? Why would you care what someone who is struggling to pass thinks the prof might have said? You can ask the top students to look at their notes, but you should be prepared to find that their notes are highly idiosyncratic. Maybe even unusable. Jon Miller = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: simple Splus question - plot regression function
Alexander Sirotkin wrote: Hi. After fitting a linear regression model I need to do an extremely simple thing - plot the regression function along with the original data. Splus has a simple way to plot quite a few complex plots and a very complicated way to do this simple one ! Is there a simple way to plot the regression function and the data ? abline! e.g. reg1 - lm(y~x) plot(x,y) abline(reg1) I can do naught more than suggest reading Venables Ripley's Modern Applied Statistics with S-plus. And seeing as this is going to and Aussie NG, I suspect by doing this I'll warm the cockles of the heart of at least one of the authors. Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland NOTE: NEW TELEPHONE NUMBER tel: +358 9 191 28779 fax: +358 9 191 28701 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ It is being said of a certain poet, that though he tortures the English language, he has still never yet succeeded in forcing it to reveal his meaning - Beachcomber = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: probability question
Il s'agit d'un message multivolet au format MIME. --982FBF2E2FA5C1B960626D56 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Hi, This assertion is true Franck Matt Dobrin a écrit : Does P(A*B|C)=P(A|C)*P(B|A*C)? If not, what does it equal? Thanks in advance. -Matt -- _/_/_/_/_/ _/_/_/_/_/ _/ _/ Franck Corset _/ _/ Projet IS2 _/ _/ Inria Rhone-Alpes _/_/_/_/_/ _/ ZIRST, 655, avenue de l'Europe _/ _/ Montbonnot _/ _/ 38334 Saint Ismier cedex _/ _/ FRANCE _/ _/_/_/_/_/ http://www.inrialpes.fr/is2 --982FBF2E2FA5C1B960626D56 Content-Type: text/x-vcard; charset=us-ascii; name=Franck.Corset.vcf Content-Transfer-Encoding: 7bit Content-Description: Carte pour Franck Corset Content-Disposition: attachment; filename=Franck.Corset.vcf begin:vcard n:Corset;Franck tel;cell:0610487239 tel;home:0476700659 tel;work:0476615355 x-mozilla-html:FALSE org:Inria;Isère adr:;;18, rue Nicolas Chorier;Grenoble;;38000;France version:2.1 email;internet:[EMAIL PROTECTED] title:Doctorant fn:Franck Corset end:vcard --982FBF2E2FA5C1B960626D56-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: probability question
Il s'agit d'un message multivolet au format MIME. --F4F503AE2A9C358CB6A37D62 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Hi, This assertion is true Franck Matt Dobrin a écrit : Does P(A*B|C)=P(A|C)*P(B|A*C)? If not, what does it equal? Thanks in advance. -Matt -- _/_/_/_/_/ _/_/_/_/_/ _/ _/ Franck Corset _/ _/ Projet IS2 _/ _/ Inria Rhone-Alpes _/_/_/_/_/ _/ ZIRST, 655, avenue de l'Europe _/ _/ Montbonnot _/ _/ 38334 Saint Ismier cedex _/ _/ FRANCE _/ _/_/_/_/_/ http://www.inrialpes.fr/is2 --F4F503AE2A9C358CB6A37D62 Content-Type: text/x-vcard; charset=us-ascii; name=Franck.Corset.vcf Content-Transfer-Encoding: 7bit Content-Description: Carte pour Franck Corset Content-Disposition: attachment; filename=Franck.Corset.vcf begin:vcard n:Corset;Franck tel;cell:0610487239 tel;home:0476700659 tel;work:0476615355 x-mozilla-html:FALSE org:Inria;Isère adr:;;18, rue Nicolas Chorier;Grenoble;;38000;France version:2.1 email;internet:[EMAIL PROTECTED] title:Doctorant fn:Franck Corset end:vcard --F4F503AE2A9C358CB6A37D62-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: probability question
It's true. If you are concerned with proof, following the this belove according to conditional probability p(a|b)=p(a,b)/p(b) (1) P(A,B|C)=P(A,B,C)/P(C) (2) P(A,B,C)=P(A,C)*P(B|A,C) (3) P(A,C)=P(C)*P(A|C) WITH (2) AND (3) WE GET (4) P(A,B,C)=P(C)*P(A|C)*P(B|A,C) TAKING (1) AND (4) WE GET P(A*B|C)=P(A|C)*P(B|A*C)? Hope this help Nathaniel U¿ytkownik Matt Dobrin [EMAIL PROTECTED] napisa³ w wiadomo¶ci 9uh8ge$5hv$[EMAIL PROTECTED]">news:9uh8ge$5hv$[EMAIL PROTECTED]... Does P(A*B|C)=P(A|C)*P(B|A*C)? If not, what does it equal? Thanks in advance. -Matt = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
simple Splus question - plot regression function
Hi. After fitting a linear regression model I need to do an extremely simple thing - plot the regression function along with the original data. Splus has a simple way to plot quite a few complex plots and a very complicated way to do this simple one ! Is there a simple way to plot the regression function and the data ? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
probability question
Does P(A*B|C)=P(A|C)*P(B|A*C)? If not, what does it equal? Thanks in advance. -Matt = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stat question
Elliot Cramer [EMAIL PROTECTED] wrote in sci.stat.edu: Sima [EMAIL PROTECTED] wrote: : I have missed some lectures on statistics due to heavy illness : and now i got an assignment which i cannot solve. We all feel sorry for you Sima, but perhaps you should talk to your instructor about it. He undoubtedly has office hours. While that's the conventional advice, speaking as an instructor I do get tired of students who miss class for whatever reason, don't crack the textbook, and expect me to give them a private lesson that duplicates what was done in class. I don't know what if anything the OP has done about making up the missed material. I would respectfully suggest that the OP _first_ carefully study the textbook sections that correspond to the missed lectures, get notes from a classmate, and _then_ contact the instructor to fill in any remaining gaps or answer any questions. -- Stan Brown, Oak Road Systems, Cortland County, New York, USA http://oakroadsystems.com My reply address is correct as is. The courtesy of providing a correct reply address is more important to me than time spent deleting spam. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stat question
Stan Brown wrote: I would respectfully suggest that the OP _first_ carefully study the textbook sections that correspond to the missed lectures, get notes from a classmate This part is of doubtful usefulness. , and _then_ contact the instructor to fill in any remaining gaps or answer any questions. Jon Miller = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stat question
At 06:13 PM 12/1/01 -0500, Stan Brown wrote: Jon Miller [EMAIL PROTECTED] wrote in sci.stat.edu: Stan Brown wrote: I would respectfully suggest that the OP _first_ carefully study the textbook sections that correspond to the missed lectures, get notes from a classmate This part is of doubtful usefulness. Doubtful? It is of doubtful usefulness to get notes from a classmate and study the covered section of the textbook? Huh? perhaps doubtful IF the students OP asked to look at were terrible students who took terrible notes ... and/or ... OP when reading the text could not make anything of it ... but, those are two big ifs usually, students won't ask to see the notes of students whom they know are not too swift ... and, also ... usually students who read the book do get something out of it ... maybe not enough the issue here is ... it appeared (though we have no proof of this) that the original poster did little, if anything, on his/her own ... prior to posting a HELP to the list stan seemed to be reacting to that assumption and, i don't blame him -- Stan Brown, Oak Road Systems, Cortland County, New York, USA http://oakroadsystems.com/ My theory was a perfectly good one. The facts were misleading. -- /The Lady Vanishes/ (1938) = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = == dennis roberts, penn state university educational psychology, 8148632401 http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stat question
Sima [EMAIL PROTECTED] wrote: : Dear List Members, : I have missed some lectures on statistics due to heavy illness : and now i got an assignment which i cannot solve. We all feel sorry for you Sima, but perhaps you should talk to your instructor about it. He undoubtedly has office hours. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Optimal filtering question
Hi All, Suppose we have a stochastic processes with an unknown parameter (the parameter is used in a general sense, it may a stochastic mean of the process, then it's current value is also a parameter). We observe the dynamics of this process and update our estimate of this parameter. It may be the case that our estimate of this parameter will always be imprecise in the sense that the variance of the estmator is greater than zero and does not converge to zero (like in the case of learning about a stochastic mean) However, it seems that if we start from different priors about this parameter, then the estimates x1(t) and x2(t) obtained with priors x1(0) and x2(0) respectively always converge at infinity as time t goes to infinity. Is it always true? If yes, is there a theorem stating this? If not, is there a counterexample? Many thanks Alex = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Stat question
Dear List Members, I have missed some lectures on statistics due to heavy illness and now i got an assignment which i cannot solve. Please help me. Below is assignment text: Question 3 Manufacturers of Xeno fuel additive claim that their product increases fuel efficiency by over 10%. A consumer representative organisation decides to check this claim. (a) Assuming that the consumer organisation is able to obtain 20 identical cars for use in the experiment, draw a diagram outlining an appropriate design for the experiment. What is this type of design called? (b) Assuming that the consumer organisation is able to borrow 10 cars of type A and 10 cars of type B for use in the experiment, draw a diagram outlining an appropriate design for the experiment. What is this type of design called? === Thank you very much for your help, Sincerely, Sima. 25 Nov, 2001 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Question on Gaussian distribution
[ This is a repost of the following article: ] [ From: Law Hiu Chung [EMAIL PROTECTED] ] [ Subject: Question on Gaussian distribution ] [ Newsgroups: sci.stat.math] [ Message-ID: 9ond41$sk6$[EMAIL PROTECTED] ] We define a function f(x) as a Gaussian process if for any n, and for any x1, ... xn, f(x1), f(x2), ... f(xn) follows a Gaussian distribution. Can I interpret this definition intuitively as Given a f(x) in a set X of functions (satisfying some conditions), the projection of f(x) to a finite set of basis { delta(x1), delta(x2), ... delta(xn) } must be Gaussian irrespect of the number of xi's and their values. Then f(x) follows a Gaussian distribution. (The above is meaningless without defining the inner product, but I would like to know if my intuition is correct or not.) Can I generalize the above to: Given an inner product space X (with possibly infinite dimension), I can define a Gaussian distribution (or other appropriate term) on X such that For x \in X, if we project it to a finite set of orthonormal vectors ( phi_1, phi_2, ..., phi_n) and get the projection (a1, a2, ... an), the tuple follows an n-dimensional Gaussian distribution. This should hold for all values of n and all set of orthonormal vectors. Is this definition legal? I guess X being an inner product space may not be enough. If that is the case, what other conditions are needed? If this looks like a text book question to you, can you point me to some good introductory books on this topic? I have tried to read some books on Gaussian measures, but they are too technical for me -- an engineering person without a strong background in measure theory. Thank you for your help. -- Martin Law Computer Science Department Hong Kong University of Science and Technology -- Martin Law Computer Science Department Hong Kong University of Science and Technology = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Simple Median Question
I have a question about averaging medians. My dataset consists of median values for a variable of interest. To find the average, do I average the medians and get a mean median, or do I find the median of the median values? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Simple Median Question
At 12:01 PM 9/24/01 -0500, you wrote: I have a question about averaging medians. My dataset consists of median values for a variable of interest. To find the average, do I average the medians and get a mean median, or do I find the median of the median values? since we don't know how many of these medians you have ... or anything about the shapes of the distributions on which you have (only) median values ... we don't know if it really makes much of or any difference BUT, to be consistent ... if you have collected medians ... ie, Q2 values ... then, it makes most consistent sense (to me anyway) if you need an average of these ... to take the median of these ... by the way ... why would you have the medians of this variable ... and not the means? was there some important reason? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
non stat question: existance of educational programming list?
Besides teaching statistics, I have been teaching programming recently. I know there exists a Visual Basic list but does anyone know of a list similar to this one but for teaching programming? Mark Eakin Associate Professor Information Systems and Management Sciences Department University of Texas at Arlington [EMAIL PROTECTED] or [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: question re: problem
@Home wrote: I had the following to solve: 51% of all domestic cars being shipped have power windows. If a lot contains five such cars: a. what is probability that only one has power windows? b. what is probability that at least one has power windows? I solved each of these problems in two ways, one using std probability theory and one by using a binomial distribution. I seemingly had no problem w/part b., but in part a. the probability theory did not seem to produce the correct answer. I have listed these below. What is wrong w/the probability equation listed below? Also is my answer to part b. correct? a. Randomly Draw Five Samples (Cars) Independent EventsOnly 1 w/Power Windows P{Only 1 Power} = P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.51 0.49 0.49 0.49 0.49 = What you've got here is the probability that the first car has Power, but the rest do not. You also need the probability that the second, third, fourth or fifth is the one with the Power. Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland NOTE: NEW TELEPHONE NUMBER tel: +358 9 191 28779 fax: +358 9 191 28701 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ It is being said of a certain poet, that though he tortures the English language, he has still never yet succeeded in forcing it to reveal his meaning - Beachcomber = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: question re: problem
Thanks alot - it worked. How would you compose a short formula depicting: P {Only 1} = [P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower)] + [P (NotPower) x P (Power) x P (NotPower) x P (NotPower) x P (NotPower)] + [P (NotPower) x P (NotPower) x P (Power) x P (NotPower) x P (NotPower)] + [P (NotPower) x P (NotPower) x P (NotPower) x P (Power) x P (NotPower)]+ [P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) x P (Power)] Anon. [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... @Home wrote: I had the following to solve: 51% of all domestic cars being shipped have power windows. If a lot contains five such cars: a. what is probability that only one has power windows? b. what is probability that at least one has power windows? I solved each of these problems in two ways, one using std probability theory and one by using a binomial distribution. I seemingly had no problem w/part b., but in part a. the probability theory did not seem to produce the correct answer. I have listed these below. What is wrong w/the probability equation listed below? Also is my answer to part b. correct? a. Randomly Draw Five Samples (Cars) Independent EventsOnly 1 w/Power Windows P{Only 1 Power} = P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.51 0.49 0.49 0.49 0.49 = What you've got here is the probability that the first car has Power, but the rest do not. You also need the probability that the second, third, fourth or fifth is the one with the Power. Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland NOTE: NEW TELEPHONE NUMBER tel: +358 9 191 28779 fax: +358 9 191 28701 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ It is being said of a certain poet, that though he tortures the English language, he has still never yet succeeded in forcing it to reveal his meaning - Beachcomber = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: question re: problem
Your probability distribution is binomial p = 0.51q = 0.49 In five trials, the distribution is ( p + q ) ^ 5 = p^5 + 5 p^4q + 10 p^3q^2 + 10 p^2q^3 + 5 pq^4 + q^5 So the probability for one power and four not is 5 pq^4 and for at least one is 1 - q^5 Arto Huttunen Anon. [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... @Home wrote: I had the following to solve: 51% of all domestic cars being shipped have power windows. If a lot contains five such cars: a. what is probability that only one has power windows? b. what is probability that at least one has power windows? I solved each of these problems in two ways, one using std probability theory and one by using a binomial distribution. I seemingly had no problem w/part b., but in part a. the probability theory did not seem to produce the correct answer. I have listed these below. What is wrong w/the probability equation listed below? Also is my answer to part b. correct? a. Randomly Draw Five Samples (Cars) Independent EventsOnly 1 w/Power Windows P{Only 1 Power} = P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.51 0.49 0.49 0.49 0.49 = What you've got here is the probability that the first car has Power, but the rest do not. You also need the probability that the second, third, fourth or fifth is the one with the Power. Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland NOTE: NEW TELEPHONE NUMBER tel: +358 9 191 28779 fax: +358 9 191 28701 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ It is being said of a certain poet, that though he tortures the English language, he has still never yet succeeded in forcing it to reveal his meaning - Beachcomber = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: question re: problem
@Home wrote: Thanks alot - it worked. How would you compose a short formula depicting: P {Only 1} = [P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower)] + [P (NotPower) x P (Power) x P (NotPower) x P (NotPower) x P (NotPower)] + [P (NotPower) x P (NotPower) x P (Power) x P (NotPower) x P (NotPower)] + [P (NotPower) x P (NotPower) x P (NotPower) x P (Power) x P (NotPower)]+ [P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) x P (Power)] Have a look at Arto's reply, and simple stuff on permutations and combinations (it's the combinations bit that's relevant). I assumethat this is homework, so your course notes should help. Or an elementary textbook on probability and statistics should derive the binomial distribution for you. But it looks like you've got the basic idea. Bob -- Bob O'Hara Metapopulation Research Group Division of Population Biology Department of Ecology and Systematics PO Box 17 (Arkadiankatu 7) FIN-00014 University of Helsinki Finland NOTE: NEW TELEPHONE NUMBER tel: +358 9 191 28779 fax: +358 9 191 28701 email: [EMAIL PROTECTED] To induce catatonia, visit: http://www.helsinki.fi/science/metapop/ It is being said of a certain poet, that though he tortures the English language, he has still never yet succeeded in forcing it to reveal his meaning - Beachcomber = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: question re: problem
(sending to all - @Home is a non-functioning address) - Jay @Home wrote: I had the following to solve: 51% of all domestic cars being shipped have power windows. If a lot contains five such cars: a. what is probability that only one has power windows? b. what is probability that at least one has power windows? I solved each of these problems in two ways, one using std probability theory and one by using a binomial distribution. I seemingly had no problem w/part b., but in part a. the probability theory did not seem to produce the correct answer. I have listed these below. What is wrong w/the probability equation listed below? Also is my answer to part b. correct? a. Randomly Draw Five Samples (Cars) Independent EventsOnly 1 w/Power Windows P{Only 1 Power} = P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.51 0.49 0.49 0.49 0.49 = Don't forget, you listed only 1 way to get 1 PW (power window) and 4 not. There are 5 wys you could get this result, if you don't count the order (which the question doesn't include). C(5,1) = 5!/(4!*1!) = 5. So: 0.51*0.49*0.49 * 0.49 * 0.49 * 5 = = 0.14700 Also Solve Using BINOMDIST Function in Excel n 5 ? 0.51 Success - PW x 1 p(x) 0.14700 b. At least 1 w/Power Windows P {At Least 1} = 1 - P {0} P {0} = P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.49 0.49 0.49 0.49 0.49 Prob 0 0.02825 1 - 0.02825 At least 1 0.97175 In this one, all the outcomes are alike, so there is no combination effect. Also Solve Using BINOMDIST Function in Excel ~ 97% n 5 ? 0.49 Success - No Power x 0 p(x) 0.02825 1 - 0.028247525 97% So you got it! Or nearly so. Cheers, Jay -- Jay Warner Principal Scientist Warner Consulting, Inc. North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
question re: problem
I had the following to solve: 51% of all domestic cars being shipped have power windows. If a lot contains five such cars: a. what is probability that only one has power windows? b. what is probability that at least one has power windows? I solved each of these problems in two ways, one using std probability theory and one by using a binomial distribution. I seemingly had no problem w/part b., but in part a. the probability theory did not seem to produce the correct answer. I have listed these below. What is wrong w/the probability equation listed below? Also is my answer to part b. correct? a. Randomly Draw Five Samples (Cars) Independent EventsOnly 1 w/Power Windows P{Only 1 Power} = P (Power) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.51 0.49 0.49 0.49 0.49 = Also Solve Using BINOMDIST Function in Excel n 5 ? 0.51 Success - PW x 1 p(x) 0.14700 b. At least 1 w/Power Windows P {At Least 1} = 1 - P {0} P {0} = P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) x P (NotPower) 0.49 0.49 0.49 0.49 0.49 Prob 0 0.02825 1 - 0.02825 At least 1 0.97175 Also Solve Using BINOMDIST Function in Excel ~ 97% n 5 ? 0.49 Success - No Power x 0 p(x) 0.02825 1 - 0.028247525 97% = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Minitab question
Hi, Does anyone know how to run banner points in Minitab? I have a survey, and would like to cross-tabulate it based on responses to certain questions on the survey. Thanks, Erik = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: canonical correlation question
Gardburyb [EMAIL PROTECTED] wrote: : Hi all, : I'm new to the group. I'm doing my dissertation, and I am doing a canonical : correlation analysis. My question is, what is the best way to compare canonical The test of parallelism in mancova is an equivalent test = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: canonical correlation question
Elliot Cramer wrote: Gardburyb [EMAIL PROTECTED] wrote: : Hi all, : I'm new to the group. I'm doing my dissertation, and I am doing a canonical : correlation analysis. My question is, what is the best way to compare canonical The test of parallelism in mancova is an equivalent test I'd like to ask a follow-up question then. MANCOVA uses least squares as its objective function to estimate relationships, while canonical correlation uses a different objective function. They don't seem equivalent to me, so my question is: is there some math that I'm not aware of that shows these two are equivalent? If so, could you provide a reference? -- Paige Miller Eastman Kodak Company [EMAIL PROTECTED] It's nothing until I call it! -- Bill Klem, NL Umpire When you get the choice to sit it out or dance, I hope you dance -- Lee Ann Womack = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =