Re: regressive question
Thanks to everyone who answered my question. The various reservations about such a test were spot on, and helpful. My own reservations were because, I think, it is not at all clear what the null would be in this case. Are you testing mu = beta_0 (so using the null model with fixed mean) or beta_0 = mu (so using the regression model with potentially variable mean). Alan -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: A regressive question
What if all right-hand side variables have mean close to zero? Intercept will be close to the sample mean even if model is significant. On 15 May 2001, Alan McLean wrote: > Hi to all, > > The usual test for a simple linear regression model is to test whether > the slope coefficient is zero or not. However, if the slope is very > close to zero, the intercept will be very close to the dependent > variable mean, which suggests that a test could be based on the > difference between the estimated intercept and the sample mean. > > Does anybody know of a test of this sort? > > Regards, > Alan > > -- > Alan McLean ([EMAIL PROTECTED]) > Department of Econometrics and Business Statistics > Monash University, Caulfield Campus, Melbourne > Tel: +61 03 9903 2102Fax: +61 03 9903 2007 > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: A regressive question
If the mean of the predictor X is zero, the intercept is equal to the mean of the dependent variable Y, however steep or shallow the slope may be. And as Jim pointed out, the standard error of a predicted value depends on its distance from the mean of X (being larger the farther away it is from the mean, the confidence band being described by a hyperbola). It would seem to follow that a test such as Alan asks about would be unusable if the mean of X is too close to 0, and would be (too?) insensitive if the mean of X is too far from 0. An intermediate region, where a test of intercept vs. mean Y might be useful, might perhaps be defined in terms of the coefficient of variation of X (or perhaps its reciprocal, if the mean of X were in danger of actually BEING zero). One rather suspects that any such test would be less powerful than the usual test of the hypothesis that the true slope is zero, which might be an interesting proposition (for someone else!) to pursue. -- Don. On Wed, 16 May 2001, Alan McLean wrote: > The usual test for a simple linear regression model is to test whether > the slope coefficient is zero or not. However, if the slope is very > close to zero, the intercept will be very close to the dependent > variable mean, which suggests that a test could be based on the > difference between the estimated intercept and the sample mean. > > Does anybody know of a test of this sort? Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-472-3742 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: A regressive question
Hi On 15 May 2001, Alan McLean wrote: > The usual test for a simple linear regression model is to test whether > the slope coefficient is zero or not. However, if the slope is very > close to zero, the intercept will be very close to the dependent > variable mean, which suggests that a test could be based on the > difference between the estimated intercept and the sample mean. Would this not depend on the scale being used? If the predictor was some scale on which the normal range of values was quite large (e.g., GRE scores?), then the value at 0 might be some distance from the mean of Y even given a very shallow slope. So the test would somehow have to adjust for this; that is, the standard error of the difference from the mean of Y would have to vary as a function of the distance of 0 from the mean of X. And presumably the test should produce the equivalent results to the normal test of the slope. It would be interesting to see if there is such a test. Could it be related to the equations for confidence interval for predicted Y given X? There are separate formulas for individual and group predictions and the widths do vary with distance from the mean of X. Best wishes Jim James M. Clark (204) 786-9757 Department of Psychology(204) 774-4134 Fax University of Winnipeg 4L05D Winnipeg, Manitoba R3B 2E9 [EMAIL PROTECTED] CANADA http://www.uwinnipeg.ca/~clark = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
A regressive question
Hi to all, The usual test for a simple linear regression model is to test whether the slope coefficient is zero or not. However, if the slope is very close to zero, the intercept will be very close to the dependent variable mean, which suggests that a test could be based on the difference between the estimated intercept and the sample mean. Does anybody know of a test of this sort? Regards, Alan -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question
On 11 May 2001 07:34:38 -0700, [EMAIL PROTECTED] (Magill, Brett) wrote: > Don and Dennis, > > Thanks for your comments, I have some points and futher questions on the > ussue below. > > For both Dennis and Don: I think the option of aggregating the information > is a viable one. I would call it "unavoidable" rather than just "viable." The data that you show is basically aggregated already; there's just one item per-person. > Yet, I cannot help but think there is some way to do this > taking into account the fact that there is variation within organizations. > I mean, if I have a organizational salary mean of .70 (70%) with a very tiny [ snip, rest] - I agree, you can use the information concerning within-variation. I think it is totally proper to insist on using it, in order to validate the conclusions, to whatever degree is possible. You might be able to turn around that 'validation' to incorporate it into the initial test; but I think the role as "validation" is easier to see by itself, first. Here's a simple example where the 'variance' is Poisson. (Ex.) A town experiences some crime at a rate that declines steadily, from 20 000 incidents to 19 900 incidents, over a 5-year period. The linear trend fitted to the several points is "highly significant" by a regression test. Do you believe it? (Answer) What I would believe is: No, there is no trend, but it is probably true that someone is fudging the numbers. The *observed variation* in means is far too small for the totals to be seen be chance. And the most obvious sources of error would work in the opposite direction. [That is, if there were only a few criminals responsible for many crimes each, and the number-of-criminals is what was subject to Poisson variation, THEN the number-of-crimes should be even more variable.] In your present case, I think you can estimate on the basis of your factory (aggregate) data, and then you figure what you can about how consistent those numbers are with the un-aggregated data, in terms of means or variances. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Question
Don and Dennis, Thanks for your comments, I have some points and futher questions on the ussue below. For both Dennis and Don: I think the option of aggregating the information is a viable one. Yet, I cannot help but think there is some way to do this taking into account the fact that there is variation within organizations. I mean, if I have a organizational salary mean of .70 (70%) with a very tiny s.d. it is different than a mean of .70 with a large s.d. Should be some way to account for this. In addition, the problems with aggregation are well documented and I believe in gereneral suggest that aggregated results overestimate relationships. Don: I suggested that the problem was not a traditional multilevel problem. Perhaps I am wrong, but here is where I thought the difference was. Typically, say in a classroom problem, I want to assess the effect of classroom characterisitcs (student/teacher ratio, teacher experience, etc.) which are constant within classrooms on say student performance, which varies within classroom across individuals. The difference between this and the problem I presented is that the OUTCOME is a contextual variable. That is, rather than individual-level varaition, the outcome caries only at the organizational level. Perhaps this can be modeled with MLMs, but it is certainly different than the typical problem. With regard to independence, I am talking about the independence of the X2's. That is X2-1 is not independent of X2-2 and X2-4 is not independent of X2-5. This is because these cases come from the same organization. So, if we simply regressed Y~X2, not accounting for X1 in the model, this causes problems for ANOVA and regression, the GLM family more generally. The lack of independence here is exactly the reason for repeated measures and MLM more generally, no? Perhaps I am making to much of the issue, but the data structure is one that I have not encountered before and I found it something of an interesting and challenging problem, just hoping I might learn something along the way. Would appreciate any comments on my comments above. Oh, and just so there is no confusion, the data below I constructed. It reflects that structure of the data and nature of the relatinoship, but I generated this data set. In addition, the real thing does include variables such as tenure, previous experience, etc. that are also used as covariates at the individual level. Of course, this also means that these would need be aggregated as well if that approach is taken. Best > IDX1 X2 Y > 1 1 0.700.40 > 2 1 0.800.40 > 3 1 0.650.40 > 4 2 1.200.25 > 5 2 1.100.25 > 6 3 0.900.30 > 7 4 0.500.50 > 8 4 0.600.50 > 9 4 0.700.50 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question
On Thu, 10 May 2001, Magill, Brett wrote, inter alia: > How should these data be analyzed? The difficulty is that the data > are cross level. Not the traditional multi-level model however. Hi, Brett. I don't understand this statement. Looks to me like an obvious place to apply multilevel (aka "hierarchical") modelling. (Have you read Harvey Goldstein's text on the method?) You have persons within organizations (just as, in educational applications of ML models, one has pupils within schools for a two-level model, and pupils within schools within districts for a three-level model), and apparently want to carry out some estimation or other analysis while taking into account the (possible) covariances between levels. If you want a simpler method than ML modelling, the method Dennis proposed at least lets you see some aggregate effects. (This does, however, put me in mind of a paper of (I think) Brian Joiner's whose temporary working title was "To aggregate is to aggravate" -- though it was published under another title.) ;-) Along the lines of Dennis' suggestion, you could plot Y vs X2 (or X2 vs Y) directly, which would give you the visual effect Dennis showed while at the same time showing the scatter in the X2 dimension around the organization average. For larger data sets with more organizations in them (so that perhaps several organizations would have the same (or at any rate indistinguishable, at the resolution of the plotting device used) turnover rate), you could generate a letter-plot (MINITAB command: LPLOT), using the organization ID in X1 as a labelling variable. Brett's original post presented this data structure: > A colleague has a data set with a structure like the one below: > > IDX1 X2 Y > 1 1 0.700.40 > 2 1 0.800.40 > 3 1 0.650.40 > 4 2 1.200.25 > 5 2 1.100.25 > 6 3 0.900.30 > 7 4 0.500.50 > 8 4 0.600.50 > 9 4 0.700.50 > > Where X1 is the organization. X2 is the percent of market salary an > employee within the organization is paid -- i.e. ID 1 makes 70% of the > market salary for their position and the local economy. And Y is the > annual overall turnover rate in the organization, so it is constant > across individuals within the organization. There are different > numbers of employee salaries measured within each organization. The > goal is to assess the relationship between employee salary (as percent > of market salary for their position and location) and overall > organizational turnover rates. > > How should these data be analyzed? The difficulty is that the data are > cross level. Not the traditional multi-level model however. That > there is no variance across individuals within an organization on the > outcome is problematic. Of course, so is aggregating the individual > results. How can this be modeled both preserving the fact that there is > variance within organizations and between organizations? As I understand it (as implied above), this is exactly the kind of structure for which multilevel methods were invented. > I suggested that this was a repeated measures problem, with repeated > measurements within the organization, my colleague argued it was not. This strikes me as a possible approach (repeated measures can be treated as a special case of multilevel modelling). But most software that I know of that would handle repeated-measures ANOVA would tend to insist that there be equal numbers of levels of the repeated-measures factor throughout the design, and this appears not to be the case (your sample data, at any rate, have different numbers of individuals in the several organizations). > Can this be modeled appropriately with traditional regression models at > the individual level? That is, ignoring X1 and regressing Y ~ X2. That was, after a fashion, what Dennis illustrated. In a formal regression analysis, I should think it unnecessary to ignore X1; although it would doubtless be necessary to recode it into a series of indicator-variable dichotomies, ot something equivalent. > It seems to me that this violates the assumption of independence. Not altogether clear. By "this" do you mean regression analysis? Or, perhaps, the particular analysis you suggested, ignoring X1? Or...? And what "assumption of independence" are you referring to? (At any rate, what such assumption that would not be violated in other formal analyses, e.g. repeated-measures ANOVA?) > Certainly, the percent of market salary that an employee is paid is > correlated between employees within an organization (taking into > account things like tenure, previous experience, etc.). Well, would the desired model take such things into account? (If not, why not? If so, where is the problem that I rather vaguely sense lurking between the lines here?)
Re: Question
this is not unlike having scores for students in a class ... one score for each student and ... the age of the teacher of THOSE students ... for a class ... scores will vary but, age for the teacher remains the same ... but the age might be different in ANother class with a different teacher ... in a sense, the age is like a mean just like your turnover rate ... and you want to know the relationship between student scores and teachers ages something has to give i think you have to reduce the data points on X2 ... find the mean within organization 1 ... on X2 ... then have .4 next to it ... second data pair would be mean on X2 for organization 2 .. with .25 ... etc. so, in this case ... you have 4 values on X2 and 4 values on Y ... so, what is the relationship between those?? look at the following: Row C7 C8 1 0.72 0.40 2 1.15 0.25 3 0.90 0.30 4 0.60 0.50 MTB > plot c8 c7 Plot - * 0.48+ - C8 - - * - 0.36+ - - * - - 0.24+* +-+-+-+-+-+--C7 0.60 0.70 0.80 0.90 1.00 1.10 Correlations: C7, C8 Pearson correlation of C7 and C8 = -0.957 P-Value = 0.043 there might be a better way to do it but ... looks like a pretty clear case of the greater the % of market the organization pays ... the less is there turnover rate At 06:05 PM 5/10/01 -0400, Magill, Brett wrote: >A colleague has a data set with a structure like the one below: > >ID X1 X2 Y >1 1 0.700.40 >2 1 0.800.40 >3 1 0.650.40 >4 2 1.200.25 >5 2 1.100.25 >6 3 0.900.30 >7 4 0.500.50 >8 4 0.600.50 >9 4 0.700.50 > >Where X1 is the organization. X2 is the percent of market salary an >employee within the organization is paid--i.e. ID 1 makes 70% of the market >salary for their position and the local economy. And Y is the annual >overall turnover rate in the organization, so it is constant across >individuals within the organization. There are different numbers of >employee salaries measured within each organization. The goal is to assess >the relationship between employee salary (as percent of market salary for >their position and location) and overall organizational turnover rates. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Question
A colleague has a data set with a structure like the one below: ID X1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 Where X1 is the organization. X2 is the percent of market salary an employee within the organization is paid--i.e. ID 1 makes 70% of the market salary for their position and the local economy. And Y is the annual overall turnover rate in the organization, so it is constant across individuals within the organization. There are different numbers of employee salaries measured within each organization. The goal is to assess the relationship between employee salary (as percent of market salary for their position and location) and overall organizational turnover rates. How should these data be analyzed? The difficulty is that the data are cross level. Not the traditional multi-level model however. That there is no variance across individuals within an organization on the outcome is problematic. Of course, so is aggregating the individual results. How can this be modeled both preserving the fact that there is variance within organizations and between organizations. I suggested that this was a repeated measures problem, with repeated measurements within the organization, my colleague argued it was not. Can this be modeled appropriately with traditional regression models at the individual level? That is, ignoring X1 and regressing Y ~ X2. It seems to me that this violates the assumption of independence. Certainly, the percent of market salary that an employee is paid is correlated between employees within an organization (taking into account things like tenure, previous experience, etc.). Thanks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question: Assumptions for Statistical Clustering (ie. Euclidean distance based)
On Sun, 22 Apr 2001 16:23:46 GMT, Robert Ehrlich <[EMAIL PROTECTED]> wrote: > Clustering has a lot of associated problems. The first is tha tof cluster > validity--most algorithms define the existence of as many clusters as the user > demands. A very important problem is homogeneity of variance. So a Z > transformation is not a bad idea whether or not the variables are normal. Unless you want the 0-1 variable to count as 10% as potent as the variable scored 0-10. The classical default analysis does let you WEIGHT the variables, by using arbitrary scaling. (Years ago, it was typical, shoddy documentation of the standard default, that they didn't warn the tyro. Has it improved? Has the default changed?) > Quasi-normnality is about all you have to assume--the absence of intersample > polymodality and the aproximation of the mean and the mode. However, to my > knowledge, there is no satisfying "theory" associated withcluster analyis--only > rules of thumb. [ snip, original question ] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question: Assumptions for Statistical Clustering (ie. Euclidean
Robert Ehrlich wrote: >to my knowledge, there is no satisfying "theory" >associated withcluster analyis--only rules of thumb. The underlying theory is classification theory; see Jardine & Sibson, Sokal & Sneath, or The Classification Society Bulletin. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question: Assumptions for Statistical Clustering (ie. Euclidean
Clustering has a lot of associated problems. The first is tha tof cluster validity--most algorithms define the existence of as many clusters as the user demands. A very important problem is homogeneity of variance. So a Z transformation is not a bad idea whether or not the variables are normal. Quasi-normnality is about all you have to assume--the absence of intersample polymodality and the aproximation of the mean and the mode. However, to my knowledge, there is no satisfying "theory" associated withcluster analyis--only rules of thumb. Beng Hai Chea wrote: > Here is a statistical issue that I have been pondering for a few days now, > and I am hoping someone can shed some light or even help set me straight. > > Would like to know if we need to assume multivariate normality for the data > whenever we use the Euclidean distance based clustering? > > Or it is good to have but not necessary? > > The argument I used was that since we need to standardize the raw data for > this type of clustering, thus we need to assume normality or at least try to > make sure that the data is normally distributed. > > Would like to hear the opinions from this mailing list. > > Thanks in advance! > Beng Hai > > _ > Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Regression toward the Mean - search question
>A few weeks ago, I believe on this list, a quick discussion of Galton's >regression to the mean popped up. I downloaded some of Galton's data, >generated my own, and found some ways to express the effect in ways my >non-statistian education friends might understand. Still working on >that part. > >In addition, there was a reference to a wonderful article, which I read, >and which explained the whole thing in excellent terms and clarity for >me. The author is clearly an expert on the subject of detecting change >in things. He (I think) even listed people who had fallen into the >regression toward the mean fallacy, including himself. > >Problem: Now of course I really want that article again, and >reference. I cannot find it on my hard drive. Maybe I didn't download >it - it was large. But I can't find the reference to it, either. Bummer! > >Can anyone figure out who and what article I'm referring to, and >re-point me to it? > >Very much obliged to you all, >Jay > >-- >Jay Warner >Principal Scientist >Warner Consulting, Inc. > North Green Bay Road >Racine, WI 53404-1216 >USA > Trochim's page has a nice description of the problem but with few historical references: http://trochim.human.cornell.edu/kb/regrmean.htm Campbell, D. T. and D. A. Kenny 1999. A primer on regression artifacts. Guilford Press. This book is devoted almost entirely to regression to the mean and what to do about it. Stigler, S. M. 1999. Statistics on the table. Harvard University Press. [Stigler has several essays to the discovery of RTM under the heading "Galtonian Ideas" He also presents a sobering case study of poor Otto Secrist, whose 1933 magnum opus in econometrics is a classic RTM artifact. Eugene Gallagher ECOS UMASS/Boston = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Regression toward the Mean - search question
Dear Everyone, I feel singularly stupid. My filing system has collapsed, if it ever was structured. A few weeks ago, I believe on this list, a quick discussion of Galton's regression to the mean popped up. I downloaded some of Galton's data, generated my own, and found some ways to express the effect in ways my non-statistian education friends might understand. Still working on that part. In addition, there was a reference to a wonderful article, which I read, and which explained the whole thing in excellent terms and clarity for me. The author is clearly an expert on the subject of detecting change in things. He (I think) even listed people who had fallen into the regression toward the mean fallacy, including himself. Problem: Now of course I really want that article again, and reference. I cannot find it on my hard drive. Maybe I didn't download it - it was large. But I can't find the reference to it, either. Bummer! Can anyone figure out who and what article I'm referring to, and re-point me to it? Very much obliged to you all, Jay -- Jay Warner Principal Scientist Warner Consulting, Inc. North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX:(262) 681-1133 email: [EMAIL PROTECTED] web:http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fw: statistics question
In article <003101c0bea9$31b26820$[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> wrote: >Hi, >The below question was on my Doctorate Comprehensives in >Education at the University of North Florida. >Would one of you learned scholars pop me back with possible appropriate answers. >Carmen Cummings >An educational researcher was interested in developing a predictive scheme to >forecast success in an >elementary statistics course at a local university. He developed an instrument with >a range of scores from 0 >to 50. He administered this to 50 incoming frechmen signed up for the elementary >statistics course, before >the class started. At the end of the semester he obtained each of the 50 student's >final average. >Describe an appropriate design to collect data to test the hypothesis. What design? The data is already collected, assuming that the data matches the scores on the prediction instrument and the final result of the student. What hypothesis? The hypotheses and the assumptions come from the user of statistics alone; the learned scholars, as statisticians, should only try to extract these form the user, and to point out which assumptions are important and which are of little importance. For example, normality is usually of secondary importance, and is usually quite false, while the assumptions about the structure are of major importance. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Paired t test Question
Andy, With a sample size of 4, n=4, you need to get hold of the Stat EXACT software developed by Cyrus Mehta. My one cent- Luke "Andrew L." <[EMAIL PROTECTED]> wrote in message oEYy6.4479$[EMAIL PROTECTED]">news:oEYy6.4479$[EMAIL PROTECTED]... > I am anlaysing some data and want to administer a paired t test. Although i > can perform the test, i am not totally familiar with the t-test. Can anyone > tell me whether the test relies on having a large number of samples, or > whether i can still realte an accurate answer from n=4 (n= number of > participants). > > Also, does anyone know what the F stands for - i think it means F-test. > What is this test designed to show. > > I will be grateful for any help > > Thanks > > Andy > > > = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fw: statistics question
I reformatted this. Quoting a letter from Carmen Cummings to himself, On 6 Apr 2001 08:48:38 -0700, [EMAIL PROTECTED] wrote: > The below question was on my Doctorate Comprehensives in > Education at the University of North Florida. > > Would one of you learned scholars pop me back with >possible appropriate answers. ==== the question An educational researcher was interested in developing a predictive scheme to forecast success in an elementary statistics course at a local university. He developed an instrument with a range of scores from 0 to 50. He administered this to 50 incoming frechmen signed up for the elementary statistics course, before the class started. At the end of the semester he obtained each of the 50 student's final average. Describe an appropriate design to collect data to test the hypothesis. = end of cite. I hope the time of the Comprehensives is past. Anyway, this might be better suited for facetious answers, than serious ones. The "appropriate design" in the strong sense: Consult with a statistician IN ORDER TO "develop an instrument". Who decided only a single dimension should be of interest? (How else does one interpret a score with a "range" from 0 to 50?) Consult with a statistician BEFORE administering something to -- selected? unselected? -- freshman; and consult (perhaps) in order to develop particular hypotheses worth testing. I mean, the kids scoring over 700 on Math SATs will ace the course, and the kids under 400 will have trouble. Generalizing, of course. If "final average" (as suggested) is the criterion, instead of "learning." But you don't need a new study to tell you those results. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Fw: statistics question
Hi, The below question was on my Doctorate Comprehensives in Education at the University of North Florida. Would one of you learned scholars pop me back with possible appropriate answers. Carmen Cummings - Original Message - From: "Carmen Cummings" <[EMAIL PROTECTED]> To: "David Cummings" <[EMAIL PROTECTED]> Sent: Thursday, April 05, 2001 4:38 PM Subject: statistics question An educational researcher was interested in developing a predictive scheme to forecast success in an elementary statistics course at a local university. He developed an instrument with a range of scores from 0 to 50. He administered this to 50 incoming frechmen signed up for the elementary statistics course, before the class started. At the end of the semester he obtained each of the 50 student's final average. Describe an appropriate design to collect data to test the hypothesis. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Paired t test Question
"Andrew L." wrote: > > I am anlaysing some data and want to administer a paired t test. Although i > can perform the test, i am not totally familiar with the t-test. Can anyone > tell me whether the test relies on having a large number of samples, or > whether i can still realte an accurate answer from n=4 (n= number of > participants). > > Also, does anyone know what the F stands for - i think it means F-test. > What is this test designed to show. I think you should definitely get a basic introductory book on statistics and brush up on your statistical knowledge. In regards to your specific questions, the accuracy of your results doesn't really depend on the sample size, but the precision does. Your comparison of the means (You do want to compare means, don't you? You didn't actually say that...) will not be very precise with just 4 samples. F may stand for an F-test and it may stand for a lot of other things; I don't normally associate doing a F-test with a paired t-test. So I would advise, based upon your questions, don't just mechanically crank a paired t-test through whatever software you have ... sit down with someone who knows statistics and explain your entire problem to him or her, and find out if a paired t-test is the right thing to do, and how a sample size of 4 affects your comparison of the means. -- Paige Miller Eastman Kodak Company [EMAIL PROTECTED] "It's nothing until I call it!" -- Bill Klem, NL Umpire "Those black-eyed peas tasted all right to me" -- Dixie Chicks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Paired t test Question
I am anlaysing some data and want to administer a paired t test. Although i can perform the test, i am not totally familiar with the t-test. Can anyone tell me whether the test relies on having a large number of samples, or whether i can still realte an accurate answer from n=4 (n= number of participants). Also, does anyone know what the F stands for - i think it means F-test. What is this test designed to show. I will be grateful for any help Thanks Andy = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Easy question
Thanks for your comment. The message was accidentally sent from my wife's news account. I didn't take the measurement simultaneously, but it is not my major concern. My concern is: I did a regression on the mean(WT) against mean(AT). Is this good enough? Can I get more out of data? I've been trying to get QVF(Quasiliklihood estimation model from stat.tamu.edu) and some multivariate delta SAS macro to work. They seems too complicated for such a simple situation. Is there a simple way? Thanks again for your help. Cheers, Wenjing Dai ([EMAIL PROTECTED]) Department of Computer Science, University of Illinois "Donald Burrill" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > On Fri, 9 Mar 2001, Wei Xiao wrote: > > > Suppose I went to 10 lakes. I want to measure the relation with water > > temperature (WT) and air temperature (AT). So I can do a regression > > with these 10 points like this: > > |* > > |*AT > > |* > > |__*__ > > WT > > > > However, to be sure, I took 3 AT's and 3 WT's at each lake. Now any > > particular AT is not correlated with WT. > > How can that be? Did you not take each AT and WT at the same time and > in the same place? (Not necessarily at the same time, or in the same > place, as the other pairs of (WT,AT); in fact, preferably the > measurements should have been made at different (time, place) if what > you were trying to do was to get a measure of the variability in WT > and AT at each lake.) > If you claim they're not correlated because all six values were taken > more or less simultaneously at the same place, and they were not taken > in (WT,AT) pairs, then the three WT values are not independent > observations, nor are the three AT values, but within each of THESE > triplets the values are correlated in an unknown, and possibly > unknowable, way. Then all you can do is take the easy way out: > take the average of the three WT values as the WT for that lake, > and similarly for the three AT values. > > > Instead, they are kind of have error in both X and Y axis. > > This remark is not helpful. If you only had one value of (WT,AT) at > each lake, those values would surely have measurement error in both > measurements. > > > Can somebody show me a better way to analyze this? > > I prefer talking in SAS or SAS macro. > Sorry, not one of my languages. > > > Here is hypotheticall data sheet. > > Lake, WT, AT > > Lake11015 > > Lake11114 > > Lake11213 > > ... > > > > Notice there is no relation between WT and AT reading. > > I can record this way too: > > Lake, WT, AT > > Lake11013 > > Lake11114 > > Lake11215 > > ... > > It is not at all clear why you can legitimately shuffle these values > around with respect to each other: unless either (a) all 6 values are > recorded simultaneously in the same place; or (b) you took all 6 > values at 6 different times and places, so that there really is no > empirical connection between any particular AT and any particular WT. > Either case would seem to me to represent faulty experimental > procedure... to put it politely. > -- DFB. > -- > Donald F. Burrill[EMAIL PROTECTED] > 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] > MSC #29, Plymouth, NH 03264 (603) 535-2597 > 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Easy question
On Fri, 9 Mar 2001, Wei Xiao wrote: > Suppose I went to 10 lakes. I want to measure the relation with water > temperature (WT) and air temperature (AT). So I can do a regression > with these 10 points like this: > |* > |*AT > |* > |__*__ > WT > > However, to be sure, I took 3 AT's and 3 WT's at each lake. Now any > particular AT is not correlated with WT. How can that be? Did you not take each AT and WT at the same time and in the same place? (Not necessarily at the same time, or in the same place, as the other pairs of (WT,AT); in fact, preferably the measurements should have been made at different (time, place) if what you were trying to do was to get a measure of the variability in WT and AT at each lake.) If you claim they're not correlated because all six values were taken more or less simultaneously at the same place, and they were not taken in (WT,AT) pairs, then the three WT values are not independent observations, nor are the three AT values, but within each of THESE triplets the values are correlated in an unknown, and possibly unknowable, way. Then all you can do is take the easy way out: take the average of the three WT values as the WT for that lake, and similarly for the three AT values. > Instead, they are kind of have error in both X and Y axis. This remark is not helpful. If you only had one value of (WT,AT) at each lake, those values would surely have measurement error in both measurements. > Can somebody show me a better way to analyze this? > I prefer talking in SAS or SAS macro. Sorry, not one of my languages. > Here is hypotheticall data sheet. > Lake, WT, AT > Lake11015 > Lake11114 > Lake11213 > ... > > Notice there is no relation between WT and AT reading. > I can record this way too: > Lake, WT, AT > Lake11013 > Lake11114 > Lake11215 > ... It is not at all clear why you can legitimately shuffle these values around with respect to each other: unless either (a) all 6 values are recorded simultaneously in the same place; or (b) you took all 6 values at 6 different times and places, so that there really is no empirical connection between any particular AT and any particular WT. Either case would seem to me to represent faulty experimental procedure... to put it politely. -- DFB. -- Donald F. Burrill[EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 (603) 535-2597 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Easy question
Hi folks, I have this problem at hand: Suppose I went to 10 lakes. I want to measure the relation with water temperature (WT) and air temperature (AT). So I can do a regression with these 10 points like this: |* |*AT |* |__*__ WT However, to be sure, I took 3 AT's and 3 WT's at each lake. Now any particular AT is not correlated with WT. Instead, they are kind of have error in both X and Y axis. Can somebody show me a better way to analyze this? I prefer talking in SAS or SAS macro. Here is hypotheticall data sheet. Lake, WT, AT Lake11015 Lake11114 Lake11213 ... Notice there is no relation between WT and AT reading. I can record this way too: Lake, WT, AT Lake11013 Lake11114 Lake11215 ... Thanks in advance. Best regard, W = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question: follow-up
On 5 Mar 2001 16:41:22 -0800, [EMAIL PROTECTED] (Donald Burrill) wrote: > On Mon, 5 Mar 2001, Philip Cozzolino wrote in part: > > > Yeah, I don't know why I didn't think to compute my eta-squared on the > > significant trends. As I said, trend analysis is new to me (psych grad > > student) and I just got startled by the results. > > > > The "significant" 4th and 5th order trends only account for 1% of the > > variance each, so I guess that should tell me something. The linear > > trend accounts for 44% and the quadratic accounts for 35% more, so 79% > > of the original 82% omnibus F (this is all practice data). > > > > I guess, if I am now interpreting this correctly, the quadratic trend > > is the best solution. DB > > Well, now, THAT depends in part on what the > spectrum of candidate solutions is, doesn't it? For all that what you > have is "practice data", I cannot resist asking: Are the linear & > quadratic components both positive, and is the overall relationship > monotonically increasing? Then, would the context have an interesting > interpretation if the relationship were exponential? Does plotting [ snip, rest ] "Interesting interpretation" is important. In this example, the interest (probably) lies mainly with the variance-explained: in the linear and quadratic. It's hard for me to be highly interested in an order-5 polynomial, and sometimes a quadratic seems unnecessarily awkward. What you want is the convenient, natural explanation. If "baseline" is far different from what follows, that will induce a bunch of high order terms if you insist on modeling all the periods in one repeated measures ANOVA. A sensible interpretation in that case might be, to describe the "shock effect" and separately describe what happened later. Example. The start of Psychotropic medications has a huge, immediate, "normalizing" effect on some aspects of sleep of depressed patients (sleep latency, REM latency, REM time, etc.). Various changes *after* the initial jolt can be described as no-change; continued improvement; or return toward the initial baseline. In real life, linear trends worked fine for describing the on-meds followup observation nights (with - not accidentally - increasing intervals between them). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question: follow-up
On Mon, 5 Mar 2001, Philip Cozzolino wrote in part: > Yeah, I don't know why I didn't think to compute my eta-squared on the > significant trends. As I said, trend analysis is new to me (psych grad > student) and I just got startled by the results. > > The "significant" 4th and 5th order trends only account for 1% of the > variance each, so I guess that should tell me something. The linear > trend accounts for 44% and the quadratic accounts for 35% more, so 79% > of the original 82% omnibus F (this is all practice data). > > I guess, if I am now interpreting this correctly, the quadratic trend > is the best solution. Well, now, THAT depends in part on what the spectrum of candidate solutions is, doesn't it? For all that what you have is "practice data", I cannot resist asking: Are the linear & quadratic components both positive, and is the overall relationship monotonically increasing? Then, would the context have an interesting interpretation if the relationship were exponential? Does plotting log(Y) against X look approximately linear? If so, especially if your six values of X are points in time, Y can be described as exhibiting exponential growth over the period observed, and there is a constant doubling time (if Y is increasing) or half-life (if Y is decreasing). The formal equation for exponential growth in Y (with X = time) is Y = a*exp(b*X) and the doubling time is log(2)/b (using the natural logarithm); if b is negative, Y is exhibiting exponential decay and this quantity is its half-life. In the intermediate course (ANOVA and MLR), I used to use some old data on the mass of chick embryos to illustrate a period of exponential growth. 11 time points, 1 day apart, and a very nice exponential fit. A polynomial fit required a quartic equation. -- Don. -- Donald F. Burrill[EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 (603) 535-2597 Department of Mathematics, Boston University[EMAIL PROTECTED] 111 Cummington Street, room 261, Boston, MA 02215 (617) 353-5288 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article <52jo6.114$[EMAIL PROTECTED]>, Milo Schield <[EMAIL PROTECTED]> wrote: >But what does this (in)dependence really mean? >Can it change on conditioning? . >This seems related to Simpson's paradox. >In any event, it seems that independence can be conditional. >Is this so? If so, where is this discussed in more detail? Why does it have to be discussed in more detail? Conditional probability is probability. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article <[EMAIL PROTECTED]>, Richard A. Beldin <[EMAIL PROTECTED]> wrote: >You missed the point, Herman. I don't assert that these are independent >random variables. I claim that introducing students to the concept of >independent sample spaces from which we construct a cartesian product >sample space will make it easier for them to understand independent >events and random variables when we define them late. I believe that this will not do what is expected, and might even make it worse. When we introduce sample spaces, we do not, and should not, introduce the probabilities at that time. If we did, we could not have inference, and also I believe that we need to get across the idea that there is no "right" sample space for a problem, but merely adequate representations; the point in a sample space can represent the result of the experiment under consideration, but we might have more. Otherwise, how can we consider the number of successes to be a real-valued random variable? Sample spaces can be Cartesian products without the coordinates being independent; whenever we have a bivariate classification, we have a Cartesian product, whether or not there is independence. We do not want students to consider race and lactose intolerance to be independent. Presenting oversimplified special cases seems to make it harder for people to understand. I deliberately postpone all considerations of symmetry or equally likely, as the students (and also those using probability and statistics) have a major tendency to impose this when it is very definitely not the case. The "principle of insufficient reason" contributed to the demise of Bayesian statistics in the 19th century, and I see it going strong now. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question
"Philip Cozzolino" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > However, after the cubic non-significant finding, the 4th and 5th order > trends are significant. > > Intuitively, it seems that if there is no cubic trend of significance, > there will not be any higher order trend, but this is relatively new to > me. Hi Philip. In a trend analysis, each test is orthogonal (independent) of the other tests so the results reported are quite reasonable. Admittedly, in my experience at least, it's a little unusual to have 4 out of the 5 trends significant but such a finding does not indicate any problem with the analysis. Are there equal intervals between the six levels of your factor? Robert = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question - Thanks
Thanks Donald and Karl for your responses... Yeah, I don't know why I didn't think to compute my eta-squared on the significant trends. As I said, trend analysis is new to me (psych grad student) and I just got startled by the results. The "significant" 4th and 5th order trends only account for 1% of the variance each, so I guess that should tell me something. The linear trend accounts for 44% and the quadratic accounts for 35% more, so 79% of the original 82% omnibus F (this is all practice data). I guess, if I am now interpreting this correctly, the quadratic trend is the best solution. Thanks again for your help, -Philip --- "If we knew what we were doing, it wouldn't be called research, would it?" -Albert Einstein in article [EMAIL PROTECTED], Philip Cozzolino at [EMAIL PROTECTED] wrote on 3/3/01 7:23 PM: > Hi, > > I have a question on how to interpret a specific trend analysis summary > table. The IV has 6 levels, so I had SPSS run the analysis checking up > the 5th order trend. > > There is a significant linear and quadratic trend, but not cubic. > > However, after the cubic non-significant finding, the 4th and 5th order > trends are significant. > > Intuitively, it seems that if there is no cubic trend of significance, > there will not be any higher order trend, but this is relatively new to > me. > > Any help is greatly appreciated. > -Philip = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question
Philip has been unfortunate enough to get significance on his 4th and 5th order trends, and is hoping that nonsignificance of the 3rd order trend means the higher order trends are spurious. Sorry no. Consider a perfect quadratic relationship -- there will be absolutely no linear component. I wonder if one should even test for trends of an order that one could not interpret. They will always be present in some magnitude, and, given sufficient sample size, will be "significant." It might help to compute eta-squared (divide the trend SS by the total SS) and then use that statistic to decide whether you can dismiss the "significant trend" as trivial in magnitude -- I have generally been able to do so when having encountered such higher order trends that defy interpretation but meet our criterion of statistical significance. ++ Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question
On Sun, 4 Mar 2001, Philip Cozzolino wrote in part: > However, after the cubic non-significant finding, the 4th and 5th > order trends are significant. > > Intuitively, it seems that if there is no cubic trend of significance, > there will not be any higher order trend, but this is relatively new > to me. Your intuition is, in this case, incorrect. The five trends are mutually independent in the sense that any combination of them may be operating. (I am for the moment accepting the implied premise that a power function of the IV is a reasonable function to try to fit to your data. In most instances I know of, this is not "really" the case, and the power function is more usefully thought of as an approximation to whatever the "real" functionality is.) This may be seen by considering the following relationships between Y and X (think of them as DV and IV if you wish): I. + * * -* * Y - -* * - + * * - - * * - * - +-+-+-+-+-+- X II.+ * - * ** - Y - ** * - + * * * - - * * * - - * * +-+-+-+-+-+- X In I. above, the linear trend is approximately zero, and the quadratic component of X accounts for nearly all the variation in Y. A "rule" that claimed "If the linear trend is insignificant there can be no significant quadratic trend" is clearly false in this case. In II. above, both the linear and quadratic components of trend are virtually zero -- certainly insignificant -- and the cubic component accounts for nearly all the varition in Y. Similar situations can be imagined, where only the quartic, or only the quintic, or only the linear, quadratic, and quartic, or any other arbitrary combination of the basic trends are significant, and other components are not. If you are carrying out your trend analysis by using orthogonal polynomials (as you probably should be), try constructing the model derived from your linear + quadratic fit only, and plot those as predicted values against X; then construct the model derived from linear + quadratic + quartic + quintic, and plot those predicted values against X. You may find it illuminating also to plot the residuals in each case against X, especially if you force the same vertical scale on the two sets of residuals. I note in passing that you haven't stated how much of the variance of Y is accounted for by each of the significant components, nor how much residual variance there is after each component is entered. That also might be illuminating. -- DFB. -- Donald F. Burrill[EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 (603) 535-2597 Department of Mathematics, Boston University[EMAIL PROTECTED] 111 Cummington Street, room 261, Boston, MA 02215 (617) 353-5288 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
But what does this (in)dependence really mean? Can it change on conditioning? Suppose that we take into account a plausible confounder: defective equipment. Suppose blacks are more likely to have "defective equipment (broken light, etc.). Suppose we find that percentage who are black among those stopped for defective equipment is the same as the percentage who are black among those having defective equipment. Now we have independence at one level and non-independence at another. This seems related to Simpson's paradox. In any event, it seems that independence can be conditional. Is this so? If so, where is this discussed in more detail? "Lise DeShea" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... Re probability/independence, I've found that the most effective way to communicate this concept to my students (College of Education, not heavily math-oriented) is the following: Then you can move to an example of racial profiling. Out of all the people in your city who drive, what proportion are African-American? [p(African-American).] Now, GIVEN that you look only at drivers who are pulled over, what proportion of these people are African American? [p(African-American|pulled over).] If being black and being pulled over are independent events, then the probabilities should be equal. You can illustrate this graphically by drawing a large box to represent all the drivers, then mark the proportion representing African-American drivers. Then draw a smaller box representing the people being pulled over, with a proportion of the box marked to represent the African-American drivers who are pulled over. If the proportions of each box are equal, then the events are independent. So now, I would welcome comments from the more mathematically/statistically rigorous list members among us! ~~~ Lise DeShea, Ph.D. Assistant Professor Educational and Counseling Psychology Department University of Kentucky 245 Dickey Hall Lexington KY 40506 Email: [EMAIL PROTECTED] Phone: (859) 257-9884 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Trend analysis question
Hi, I have a question on how to interpret a specific trend analysis summary table. The IV has 6 levels, so I had SPSS run the analysis checking up the 5th order trend. There is a significant linear and quadratic trend, but not cubic. However, after the cubic non-significant finding, the 4th and 5th order trends are significant. Intuitively, it seems that if there is no cubic trend of significance, there will not be any higher order trend, but this is relatively new to me. Any help is greatly appreciated. -Philip -- "Leave the gun. Take the cannolis." --Peter Clemenza, "The Godfather" = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
This is a multi-part message in MIME format. --5A878725779696BDEEB17BCF Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit You missed the point, Herman. I don't assert that these are independent random variables. I claim that introducing students to the concept of independent sample spaces from which we construct a cartesian product sample space will make it easier for them to understand independent events and random variables when we define them late. --5A878725779696BDEEB17BCF Content-Type: text/x-vcard; charset=us-ascii; name="rabeldin.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Richard A. Beldin Content-Disposition: attachment; filename="rabeldin.vcf" begin:vcard n:Beldin;Richard tel;home:787-255-2142 x-mozilla-html:TRUE url:netdial.caribe.net/~rabeldin/Home.html org:BELDIN Consulting Services version:2.1 email;internet:[EMAIL PROTECTED] title:Professional Statistician (retired) adr;quoted-printable:;;PO Box 716=0D=0A;Boquerón;PR;00622; fn:Richard A. Beldin end:vcard --5A878725779696BDEEB17BCF-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article <[EMAIL PROTECTED]>, Richard A. Beldin <[EMAIL PROTECTED]> wrote: >The suits and ranks of cards in a bridge deck certainly can be presented >as independent sample spaces which we use as components of a cartesian >product. Whether one does so or not is a matter of choice. I am on >record as favoring the presentation as the cartesian product. Even the >sample mean and variance can be seen this way, in fact, every vector >valued random variable can be cast in the form of a random vector from a >cartesian product. This is the case for ONE card. Now suppose that one takes a sample without replacement; it still is the case that the suit of one card and the rank of another are independent, but it is not the case that the number of cards of a given suit and the number of cards of a given rank are independent. >My point is that if we introduce independence as an attribute of sample >spaces which we proceed to study as one, we can better motivate the idea >of independent random variables and independent events. How about this one, I believe due to Mandel? Take a sample from a trivariate independent normal distribution. Then each pair of correlations is independent, but the three correlations cannot be. Or this one, which leads to an easy derivation of the Wishart distribution, and generation of Wishart matrices? Let the sum of squares and cross products from a sample of size n from a p-dimensional normal distribution with mean 0 and covariance matrix I be written as AA', with A 0 above the main diagonal. Then if n>=p, (the changes for nhttp://jse.stat.ncsu.edu/ =
Re: basic stats question
This is a multi-part message in MIME format. --D6CAE5CBE7F2826036C27891 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit The suits and ranks of cards in a bridge deck certainly can be presented as independent sample spaces which we use as components of a cartesian product. Whether one does so or not is a matter of choice. I am on record as favoring the presentation as the cartesian product. Even the sample mean and variance can be seen this way, in fact, every vector valued random variable can be cast in the form of a random vector from a cartesian product. My point is that if we introduce independence as an attribute of sample spaces which we proceed to study as one, we can better motivate the idea of independent random variables and independent events. --D6CAE5CBE7F2826036C27891 Content-Type: text/x-vcard; charset=us-ascii; name="rabeldin.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Richard A. Beldin Content-Disposition: attachment; filename="rabeldin.vcf" begin:vcard n:Beldin;Richard tel;home:787-255-2142 x-mozilla-html:TRUE url:netdial.caribe.net/~rabeldin/Home.html org:BELDIN Consulting Services version:2.1 email;internet:[EMAIL PROTECTED] title:Professional Statistician (retired) adr;quoted-printable:;;PO Box 716=0D=0A;Boquerón;PR;00622; fn:Richard A. Beldin end:vcard --D6CAE5CBE7F2826036C27891-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
> > I think that introducing the word "independent" as a descriptor of > sample spaces and then carrying it on to the events in the product space > is much less likely to generate the confusion due to the common informal > description "Independent events don't have anything to do with each > other" and "Mutually exclusive events can't happen together." > I like Dick's idea a lot. To me, part of the problem is that textbooks fail to distinguish independence as a mathematical construct from independence as a modeling construct. Too many intro books put their expository effort into the mathematical definition, and then get obfuscatorily circular when it comes to the examples. Mathematicians *assume* independence, statisticians look at the data, and textbooks fail to recognize the difference. Dick's approach gives a nice way, in an elementary seting, to help students recognize situations where an assumption of independence is likely to stand up to empirical scrutiny. I agree, too, Dick, that this should help with mutually exclusive vs. independent. George Cobb George W. Cobb Mount Holyoke College South Hadley, MA 01075 413-538-2401 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article <[EMAIL PROTECTED]>, Richard A. Beldin <[EMAIL PROTECTED]> wrote: >This is a multi-part message in MIME format. >--20D27C74B83065021A622DE0 >Content-Type: text/plain; charset=us-ascii >Content-Transfer-Encoding: 7bit >I have long thought that the usual textbook discussion of independence >is misleading. In the first place, the most common situation where we >encounter independent random variables is with a cartesian product of >two indpendent sample spaces. Example: I toss a die and a coin. I have >reasonable assumptions about the distributions of events in either case >and I wish to discuss joint events. I have tried in vain to find natural >examples of independent random variables in a smple space not >constructed as a cartesian product. >I think that introducing the word "independent" as a descriptor of >sample spaces and then carrying it on to the events in the product space >is much less likely to generate the confusion due to the common informal >description "Independent events don't have anything to do with each >other" and "Mutually exclusive events can't happen together." >Comments? The usual definition of "independence" is a computational convenience, but an atrocious definition. A far better way to do it, which conveys the essence, is to use conditional probability. Random variables, or more generally partitions, are independent if, given any information about some of them, the conditional probability of any event formed from the others is the same as the unconditional probability. This is the way it is used. As for a "natural" example not coming from a Cartesian product, consider drawing a hand from an ordinary deck of cards. On another newsgroup, someone asked for a proof that the number of aces and the number of spades was uncorrelated; they are not independent. The proof I posted used that for the i-th and j-th cards dealt, the rank of the i-th card and the suit of the j-th are independent. For i=j, this can be looked upon as a product space, but not for i and j different. There are other examples. The independence of the sample mean and sample variance in a sample from a normal distribution is certainly an important example. The independence of the various sample variances in an ANOVA model is another. The independence for each t of X(t) and X'(t) in a stationary differentiable Gaussian process is another. This is thrown together off the cuff. There are lots of others. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
Re probability/independence, I've found that the most effective way to communicate this concept to my students (College of Education, not heavily math-oriented) is the following: Consider the student population of your university. Perhaps there is a fairly equal split of males and females in the student body. Now, put a condition upon the student body -- only those majoring in, say, psychology. Do you find the same proportion of students who are male within only psych majors, compared with the proportion of students in the entire student body who are male? If gender and psych major are independent, then the probability of a randomly chosen person at the university being male should equal the probability of a randomly chosen psych major being male. That is, p(male) = p(male|psych major) <==(p. of male, given you're looking at psych majors) Then you can move to an example of racial profiling. Out of all the people in your city who drive, what proportion are African-American? [p(African-American).] Now, GIVEN that you look only at drivers who are pulled over, what proportion of these people are African American? [p(African-American|pulled over).] If being black and being pulled over are independent events, then the probabilities should be equal. You can illustrate this graphically by drawing a large box to represent all the drivers, then mark the proportion representing African-American drivers. Then draw a smaller box representing the people being pulled over, with a proportion of the box marked to represent the African-American drivers who are pulled over. If the proportions of each box are equal, then the events are independent. So now, I would welcome comments from the more mathematically/statistically rigorous list members among us! ~~~ Lise DeShea, Ph.D. Assistant Professor Educational and Counseling Psychology Department University of Kentucky 245 Dickey Hall Lexington KY 40506 Email: [EMAIL PROTECTED] Phone: (859) 257-9884
Re: Satterthwaite-newbie question
Wow. I'm impressed with this group's thoughtful responses both privately and on the server. Yes, Hayes calls this the Behrens-Fisher problem too. I was always taught to use equal n's and then the homogeneity of variance assumptions were not as big of an issue (the ttest post alluded to this too). Since I'm working with a clinical sample, I'm stuck. Just to give more info. n1=6; n2=8 I started computing multiple ttests just to see how things changed when the n's were kept constant. Of course I knew in advance which was the one I wanted to use. The SD's are quite different for some of the comparisons since one group is impaired and one is generally normal. satterthwaite weighted df= a =SEM1^2 b =SEM2^2 c =a^2/(n+1) d =b^2/(n+1) [(a+b)^2/(c+d)]-2 (I hope I got this coding correct-see Hayes p328). I checked the spss algorithms web site you gave and all the formulas for ttests and t-statistics used only 1 term for n (I did find Satterthwaite listed in appendix 2 so I might try redoing this with spss) so I used Minitab (someone else suggested this package) after trying the calculations by hand (excel). Here are the SEM's for means 1 and 2. It looks like the df decreases as the difference (diff) between the SEM's goes up. I also added the SEM's (sum) just to see if there was a relationship to overall variability. It looks like it's working well for me. Thanks everyone! Allyson Here are my calculations with minitab SEM1= 51 SEM2 =73 diff =-22 sum= 124 df =11 SEM1 =39 SEM2 =114 diff =-75 sum =153 df =8 SEM1 = 42 SEM2 = 23 diff = 19 sum = 65 df = 8 SEM1 = 17 SEM2 = 20 diff = -3 sum = 37 df = 11 SEM1 = 21 SEM2 = 180 diff = -159 sum = 201 df = 7 SEM1 = 52 SEM2 = 36 diff = 16 sum = 88 df = 9 Rich Ulrich wrote in message <[EMAIL PROTECTED]>... >On Wed, 28 Feb 2001 08:26:30 -0500, Christopher Tong ><[EMAIL PROTECTED]> wrote: > >> On Tue, 27 Feb 2001, Allyson Rosen wrote: >> >> > I need to compare two means with unequal n's. Hayes (1994) suggests using a >> > formula by Satterthwaite, 1946. I'm about to write up the paper and I can't >> > find the full reference ANYWHERE in the book or in any databases or in my >> > books. Is this an obscure test and should I be using another? >> >> Perhaps it refers to: >> >> F. E. Sattherwaite, 1946: An approximate distribution of estimates of >> variance components. Biometrics Bulletin, 2, 110-114. >> >> According to Casella & Berger (1990, pp. 287-9), "this approximation >> is quite good, and is still widely used today." However, it still may >> not be valid for your specific analysis: I suggest reading the >> discussion in Casella & Berger ("Statistical Inference", Duxbury Press, >> 1990). There are more commonly used methods for comparing means with >> unequal n available, and you should make sure that they can't be used >> in your problem before resorting to Sattherwaite. > >I don't have access to Casella & Berger, but I am curious about what >they recommend or suggest. Compare means with Student's t-test or >logistic regression; or Satterthwaite t if you can't avoid it if both >means and variances are different enough, and you wouldn't rather do >some transformation (for example, to ranks: then test Ranks). And >there's randomization and bootstrap. Anything else? > >Yesterday (so it should still be on your server), there was a post >with comments about the t-tests. > from the header >From: [EMAIL PROTECTED] (Jay Warner) >Newsgroups: sci.stat.edu >Subject: Re: two sample t > > >There are *additional* methods for comparing, but the one that is >*more common* is probably the Student's t, which ignores the >inequality. > >Any intro-stat-book with the t-test is likely to have one or another >version of the Satterthwaite t. The SPSS website includes algorithms >for what that stat-package uses, under t-test, for "unequal >variances." I find it almost impossible to find the algorithms by >navigating the site, so here is an address -- >http://www.spss.com/tech/stat/Algorithms.htm > >-- >Rich Ulrich, [EMAIL PROTECTED] >http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Satterthwaite-newbie question
First, forgive me for mis-spelling Satterthwaite in my previous post. On Wed, 28 Feb 2001, Rich Ulrich wrote: > I don't have access to Casella & Berger, but I am curious about what > they recommend or suggest. Compare means with Student's t-test or > logistic regression; or Satterthwaite t if you can't avoid it if both > means and variances are different enough, and you wouldn't rather do > some transformation (for example, to ranks: then test Ranks). And > there's randomization and bootstrap. Anything else? Casella & Berger basically say that unknown, unequal variance is a hard problem but Satterthwaite is a good approximation. They call this the Behrens-Fisher problem and give references (e.g, Kendall & Stewart). = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Satterthwaite-newbie question
On Wed, 28 Feb 2001 08:26:30 -0500, Christopher Tong <[EMAIL PROTECTED]> wrote: > On Tue, 27 Feb 2001, Allyson Rosen wrote: > > > I need to compare two means with unequal n's. Hayes (1994) suggests using a > > formula by Satterthwaite, 1946. I'm about to write up the paper and I can't > > find the full reference ANYWHERE in the book or in any databases or in my > > books. Is this an obscure test and should I be using another? > > Perhaps it refers to: > > F. E. Sattherwaite, 1946: An approximate distribution of estimates of > variance components. Biometrics Bulletin, 2, 110-114. > > According to Casella & Berger (1990, pp. 287-9), "this approximation > is quite good, and is still widely used today." However, it still may > not be valid for your specific analysis: I suggest reading the > discussion in Casella & Berger ("Statistical Inference", Duxbury Press, > 1990). There are more commonly used methods for comparing means with > unequal n available, and you should make sure that they can't be used > in your problem before resorting to Sattherwaite. I don't have access to Casella & Berger, but I am curious about what they recommend or suggest. Compare means with Student's t-test or logistic regression; or Satterthwaite t if you can't avoid it if both means and variances are different enough, and you wouldn't rather do some transformation (for example, to ranks: then test Ranks). And there's randomization and bootstrap. Anything else? Yesterday (so it should still be on your server), there was a post with comments about the t-tests. from the header From: [EMAIL PROTECTED] (Jay Warner) Newsgroups: sci.stat.edu Subject: Re: two sample t There are *additional* methods for comparing, but the one that is *more common* is probably the Student's t, which ignores the inequality. Any intro-stat-book with the t-test is likely to have one or another version of the Satterthwaite t. The SPSS website includes algorithms for what that stat-package uses, under t-test, for "unequal variances." I find it almost impossible to find the algorithms by navigating the site, so here is an address -- http://www.spss.com/tech/stat/Algorithms.htm -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article <[EMAIL PROTECTED]>, Richard A. Beldin <[EMAIL PROTECTED]> wrote: >... I have tried in vain to find natural >examples of independent random variables in a smple space not >constructed as a cartesian product. An important example theoretically is the independence of the sample mean and the sample variance of a data set consisting of points drawn independently from a Gaussian distribution. Now, you might be able to view this in terms of a Cartesian product, but it's not obvious that that's a natural view. >I think that introducing the word "independent" as a descriptor of >sample spaces and then carrying it on to the events in the product space >is much less likely to generate the confusion due to the common informal >description "Independent events don't have anything to do with each >other" and "Mutually exclusive events can't happen together." I think this would be a bad idea. Events can be independent without being constructed to be independent in this way. As a definition, "Independent events don't have anything to do with each other" is dangerous because it leads one to think that independence is a property of events as physical phenomena. For instance, one might decide that the event of a person having a harmless variant of gene A is independent of the event of their having a harmless variant of gene B, on the grounds that the mechanisms for the two genes mutating are such that there's no reason for them to mutate together. But if the genes are linked, and the context is a sample of people from some community founded not too long ago by a small number of people, the events of the two variants occuring in a person may not be independent, even though they would be independent if the context were a sample of people from the whole world. Here, independence is not a property of the people, or of the genes, but of what is considered to be the sample space for whatever problem is being tackled. Regarding "Mutually exclusive events can't happen together", this is not an adequate definition if some non-null events have zero probability. I think that independence is not something that can be explained in ANY simple way. Multiple explanations and multiple examples are needed. Radford Neal Radford M. Neal [EMAIL PROTECTED] Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED] University of Toronto http://www.cs.utoronto.ca/~radford = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Satterthwaite-newbie question
On Tue, 27 Feb 2001, Allyson Rosen wrote: > I need to compare two means with unequal n's. Hayes (1994) suggests using a > formula by Satterthwaite, 1946. I'm about to write up the paper and I can't > find the full reference ANYWHERE in the book or in any databases or in my > books. Is this an obscure test and should I be using another? Perhaps it refers to: F. E. Sattherwaite, 1946: An approximate distribution of estimates of variance components. Biometrics Bulletin, 2, 110-114. According to Casella & Berger (1990, pp. 287-9), "this approximation is quite good, and is still widely used today." However, it still may not be valid for your specific analysis: I suggest reading the discussion in Casella & Berger ("Statistical Inference", Duxbury Press, 1990). There are more commonly used methods for comparing means with unequal n available, and you should make sure that they can't be used in your problem before resorting to Sattherwaite. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Satterthwaite-newbie question
I need to compare two means with unequal n's. Hayes (1994) suggests using a formula by Satterthwaite, 1946. I'm about to write up the paper and I can't find the full reference ANYWHERE in the book or in any databases or in my books. Is this an obscure test and should I be using another? Thanks, Allyson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
Richard A. Beldin wrote: > This is a multi-part message in MIME format. > --20D27C74B83065021A622DE0 > Content-Type: text/plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > > I have long thought that the usual textbook discussion of independence > is misleading. In the first place, the most common situation where we > encounter independent random variables is with a cartesian product of > two independent sample spaces. Example: I toss a die and a coin. I have > reasonable assumptions about the distributions of events in either case > and I wish to discuss joint events. I have tried in vain to find natural > examples of independent random variables in a sample space not > constructed as a cartesian product. > > I think that introducing the word "independent" as a descriptor of > sample spaces and then carrying it on to the events in the product space > is much less likely to generate the confusion due to the common informal > description "Independent events don't have anything to do with each > other" and "Mutually exclusive events can't happen together." > > Comments? 1)It is conceivable, that a plant making blue and red 'thingies' on the same production line would discover that the probability that the next thingie is internally flawed (in the cast portion) is independent of the probability that it is blue. BTW - 'Thingies' are so commonly used by everyone that it is not necessary to describe them in detail. :) 2)There are many terms, concepts, and definitions in the 'textbook' that have no exact match in reality. Common expressions include, "There is no such thing as random,' 'There is no such thing as Normal (distribution),' and my own contribution, "There is no such thing as a dichotomy this side of a theological discussion.' The abstract definitions are just that - theoretical ideals. Down here in the mud of reality, we recognize this, and try to decide if the theory is reasonably close to what is happening. A couple confirmation trials help, too. If the internal casting flaws are generated at an early point, and the paint is added later, depending on the orders received, then I would assert that independence was likely. If the paint is added to castings made on different dies or production machines, as a color code, then I would suspect independence was unlikely. 3)Presenting 'independence' as axes in a cartesian coordinate system is extremely handy, especially for discussing orthogonal arrays and designed experiments, etc. The presentation, however, does not make them independent. One has to check the physical system behavior to assure that. 4)I may have shot far wider than your intended mark, in which case, sorry for the interruption. Jay -- Jay Warner Principal Scientist Warner Consulting, Inc. North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX:(262) 681-1133 email: [EMAIL PROTECTED] web:http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
This is a multi-part message in MIME format. --20D27C74B83065021A622DE0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I have long thought that the usual textbook discussion of independence is misleading. In the first place, the most common situation where we encounter independent random variables is with a cartesian product of two indpendent sample spaces. Example: I toss a die and a coin. I have reasonable assumptions about the distributions of events in either case and I wish to discuss joint events. I have tried in vain to find natural examples of independent random variables in a smple space not constructed as a cartesian product. I think that introducing the word "independent" as a descriptor of sample spaces and then carrying it on to the events in the product space is much less likely to generate the confusion due to the common informal description "Independent events don't have anything to do with each other" and "Mutually exclusive events can't happen together." Comments? --20D27C74B83065021A622DE0 Content-Type: text/x-vcard; charset=us-ascii; name="rabeldin.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Richard A. Beldin Content-Disposition: attachment; filename="rabeldin.vcf" begin:vcard n:Beldin;Richard tel;home:787-255-2142 x-mozilla-html:TRUE url:netdial.caribe.net/~rabeldin/Home.html org:BELDIN Consulting Services version:2.1 email;internet:[EMAIL PROTECTED] title:Professional Statistician (retired) adr;quoted-printable:;;PO Box 716=0D=0A;Boquerón;PR;00622; fn:Richard A. Beldin end:vcard --20D27C74B83065021A622DE0-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sample size question
On 23 Feb 2001 12:08:45 -0800, [EMAIL PROTECTED] (Scheltema, Karen) wrote: > I tried the site but received errors trying to download it. It couldn't > find the FTP site. Has anyone else been able to access it? As of a few minutes ago, it downloaded fine for me, when I clicked on it with Internet Explorer. The .zip file expanded okay. I used right-click (I just learned that last week) in order to download the .pfd version of the help. [ ... ] < Earlier Q and Answer > "Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size?" > > Look here: > > http://www.interchg.ubc.ca/steiger/r2.htm Hmm. Placing limits on R^2. I have't read the accompanying documentation. On the general principal that you can't compute power if you don't know what power you are looking for, I suggest reading the relevant chapters in Jacob Cohen's book (1988+ edition). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
>I tried the site but received errors trying to download it. It couldn't >find the FTP site. Has anyone else been able to access it? > >Karen Scheltema >Statistician >HealthEast >Research and Education >1700 University Ave W >St. Paul, MN 55104 >(651) 232-5212 fax (651) 641-0683 >[EMAIL PROTECTED] > >> -Original Message- >> From:Chuck Cleland [SMTP:[EMAIL PROTECTED]] >> Sent:Friday, February 23, 2001 11:04 AM >> To: [EMAIL PROTECTED] >> Subject: Re: Sample size question >> >> "Scheltema, Karen" wrote: >> > Can anyone point me to software for estimating ANCOVA or regression >> sample >> > sizes based on effect size? >> >> Look here: >> >> http://www.interchg.ubc.ca/steiger/r2.htm >> >> Chuck > Karen, I just looked, and was able to access the site and download the files. Dan Nordlund = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
I tried the site but received errors trying to download it. It couldn't find the FTP site. Has anyone else been able to access it? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] > -Original Message- > From: Chuck Cleland [SMTP:[EMAIL PROTECTED]] > Sent: Friday, February 23, 2001 11:04 AM > To: [EMAIL PROTECTED] > Subject: Re: Sample size question > > "Scheltema, Karen" wrote: > > Can anyone point me to software for estimating ANCOVA or regression > sample > > sizes based on effect size? > > Look here: > > http://www.interchg.ubc.ca/steiger/r2.htm > > Chuck > > -<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>- > Chuck Cleland > Institute for the Study of Child Development > UMDNJ--Robert Wood Johnson Medical School > 97 Paterson Street > New Brunswick, NJ 08903 > phone: (732) 235-7699 >fax: (732) 235-6189 > http://www2.umdnj.edu/iscdweb/ > http://members.nbci.com/cmcleland/ > -<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>- > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sample size question
"Scheltema, Karen" wrote: > Can anyone point me to software for estimating ANCOVA or regression sample > sizes based on effect size? Look here: http://www.interchg.ubc.ca/steiger/r2.htm Chuck -<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>- Chuck Cleland Institute for the Study of Child Development UMDNJ--Robert Wood Johnson Medical School 97 Paterson Street New Brunswick, NJ 08903 phone: (732) 235-7699 fax: (732) 235-6189 http://www2.umdnj.edu/iscdweb/ http://members.nbci.com/cmcleland/ -<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sample size question
You can use Sample Power from SPSS (a.k.a. Power and Preceision) or PASS 2000 from NCSS. For more info, please visit: http://www.spss.com http://www.ncss.com http://seamonkey.ed.asu.edu/~alex/teaching/WBI/power_es.html --- --"Regression to the mean" is not always true. After 30, my weight never regresses to the mean. Chong-ho (Alex) Yu, Ph.D., MCSE, CNE Academic Research Professional/Manager Educational Data Communication, Assessment, Research and Evaluation Farmer 418 Arizona State University Tempe AZ 85287-0611 Email: [EMAIL PROTECTED] URL:http://seamonkey.ed.asu.edu/~alex/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
Thanks! This was exactly what I was looking for! Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] > -Original Message- > From: Magill, Brett [SMTP:[EMAIL PROTECTED]] > Sent: Friday, February 23, 2001 9:53 AM > To: 'Scheltema, Karen'; [EMAIL PROTECTED] > Subject: RE: Sample size question > > G*Power is a powere analysis package that is freely available. You can > download it at: > > http://www.psychologie.uni-trier.de:8000/projects/gpower.html > > You can calculate a sample size for a given effect size, alpha level, and > power value. > > > -Original Message- > From: Scheltema, Karen [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 23, 2001 10:07 AM > To: [EMAIL PROTECTED] > Subject: Sample size question > > > Can anyone point me to software for estimating ANCOVA or regression sample > sizes based on effect size? > > Karen Scheltema > Statistician > HealthEast > Research and Education > 1700 University Ave W > St. Paul, MN 55104 > (651) 232-5212 fax (651) 641-0683 > [EMAIL PROTECTED] > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
G*Power is a powere analysis package that is freely available. You can download it at: http://www.psychologie.uni-trier.de:8000/projects/gpower.html You can calculate a sample size for a given effect size, alpha level, and power value. -Original Message- From: Scheltema, Karen [mailto:[EMAIL PROTECTED]] Sent: Friday, February 23, 2001 10:07 AM To: [EMAIL PROTECTED] Subject: Sample size question Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Sample size question
Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: statistics question
In article <95nuk5$8df$[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > Thanks very much for your helpful response. > 1) My factors are continous. I have multiple responses. Some are > continous and some are categorical. I need to optimize my resonses. > The main region that they are interested is for A between-35 and 95 > and for B between 900 and 1750. > In addition they want to run couple of points outside of this region, > as there is reason to believe that it will optimize the response. > These are B=2000 and A is any pt between 65 and 95, say 80. > Also, they want to run the combination A=35 and B=1650. > Also, would like to include A=90 and B=1650. > Also, would like to include A=105 and B close to 1325. > these points are not totally fixed. If I can get close to it that will > work. > Everything else looks like flexible. I'll be able to run the experiment > 21 times. I can include replications. Will replication on some runs and > not on other destroy orthogonality? > > I'm not sure how to set this up. > I appreciate your help very much. > SH. Lee > In article <[EMAIL PROTECTED]>, > [EMAIL PROTECTED] wrote: > > Flash response: > > > > 1)Are the levels fixed by some characteristic of the process? > they > > look continuous, and you could do much better if they were, and you > > could select different intermediate levels. > > > > 2)the number of levels can be what you want of it. Some good > > response surface designs use 5 levels. some use more. > > > > 3)Factor B levels are equally spaced, which is good. Factor A > > levels are not evenly spaced. A full factorial will not give you a > > 'clean' design - Without doing the math, I don't believe it will be > > orthogonal, even if you did do all the combinations. > > > > 4)what are you going to do with the results of this experiment? > If > > you wish to build a model of the system behavior, then a full > factorial > > type approach is a waste of your effort, time, and experimental runs. > > > > 5)Suggest you look at a Response model, with maybe 3-5 levels in > > both factors, but using a proper RSM type design. If you do it > > properly, you can avoid a single 'corner' point and recover it > > mathematically. > > > > 6)I'd also ask if you have hard reason to believe that a RSM type > > model, which will get you quadratic terms in a model, is in fact > worth > > doing (financial/your time costs) the first time out? If little > prior > > information is available, it would probably be better to do a > simpler, > > 2-level factorial first, if at all possible. Doing this will teach > you > > a great deal [that you probably don't already know]. Your choice > here, > > but remember - most people overestimate their knowledge level :) > > > > 7)You haven't discussed the response yet. Please spend some time > > thinking about that, too. > > > > More later, if this helps at all. Let me know. > > > > Jay > > > > [EMAIL PROTECTED] wrote: > > > > > Hi, > > > > > > I have two factors A and B and I want to run a DOE to study my > response. > > > My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is > at 4 > > > levels 35, 65, 80 and 105. > > > First of all is it right to have one factor at 4 levels. I have > > > encountered situations where the factors are either at 2 levels or 3 > > > levels.? > > > This will require me to have 12 runs for a full factorial, right? > > > Also, I do not want to run only the level 35 of factor A with the > level > > > 900 of factor B. If I remove the combination 35, 1450 and 35, 2000; > > > I'll have only 10 runs and the resulting design space will not be > > > orthogonal. How do I tackle this problem? > > > Is there a different design that you would suggest. > > > Thanks for your help. > > > SH Lee > > > > > > > > > Sent via Deja.com > > > http://www.deja.com/ > > > > > > > > > = > > > Instructions for joining and leaving this list and remarks about > > > the problem of INAPPROPRIATE MESSAGES are available at > > > http://jse.stat.ncsu.edu/ > > > = > > > > > > > > > > > > > -- > > Jay Warner > > Principal Scientist > > Warner Consulting, Inc. > > North Green Bay Road > > Racine, WI 53404-1216 > > USA > > > > Ph: (262) 634-9100 > > FAX:(262) 681-1133 > > email: [EMAIL PROTECTED] > > web:http://www.a2q.com > > > > The A2Q Method (tm) -- What do you want to improve today? > > > > = > > Instructions for joining and leaving this list and remarks about > > the problem of INAPPROPRIATE MESSAGES are available at > > http://jse.stat.ncsu.edu/ > > = > > > > Sent via Deja.com > http://www.deja.com/ > Sent via Deja.com http://www.deja.com/ ==
Re: statistics question
Thanks very much for your helpful response. 1) My factors are continous. I have multiple responses. Some are continous and some are categorical. I need to optimize my resonses. The main region that they are interested is for A between-35 and 95 and for B between 900 and 1750. In addition they want to run couple of points outside of this region, as there is reason to believe that it will optimize the response. These are B=2000 and A is any pt between 65 and 95, say 80. Also, they want to run the combination A=35 and B=1650. Also, would like to include A=90 and B=1650. Also, would like to include A=105 and B close to 1325. these points are not totally fixed. If I can get close to it that will work. Everything else looks like flexible. I'll be able to run the experiment 21 times. I can include replications. Will replication on some runs and not on other destroy orthogonality? I'm not sure how to set this up. I appreciate your help very much. SH. Lee In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > Flash response: > > 1)Are the levels fixed by some characteristic of the process? they > look continuous, and you could do much better if they were, and you > could select different intermediate levels. > > 2)the number of levels can be what you want of it. Some good > response surface designs use 5 levels. some use more. > > 3)Factor B levels are equally spaced, which is good. Factor A > levels are not evenly spaced. A full factorial will not give you a > 'clean' design - Without doing the math, I don't believe it will be > orthogonal, even if you did do all the combinations. > > 4)what are you going to do with the results of this experiment? If > you wish to build a model of the system behavior, then a full factorial > type approach is a waste of your effort, time, and experimental runs. > > 5)Suggest you look at a Response model, with maybe 3-5 levels in > both factors, but using a proper RSM type design. If you do it > properly, you can avoid a single 'corner' point and recover it > mathematically. > > 6)I'd also ask if you have hard reason to believe that a RSM type > model, which will get you quadratic terms in a model, is in fact worth > doing (financial/your time costs) the first time out? If little prior > information is available, it would probably be better to do a simpler, > 2-level factorial first, if at all possible. Doing this will teach you > a great deal [that you probably don't already know]. Your choice here, > but remember - most people overestimate their knowledge level :) > > 7)You haven't discussed the response yet. Please spend some time > thinking about that, too. > > More later, if this helps at all. Let me know. > > Jay > > [EMAIL PROTECTED] wrote: > > > Hi, > > > > I have two factors A and B and I want to run a DOE to study my response. > > My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4 > > levels 35, 65, 80 and 105. > > First of all is it right to have one factor at 4 levels. I have > > encountered situations where the factors are either at 2 levels or 3 > > levels.? > > This will require me to have 12 runs for a full factorial, right? > > Also, I do not want to run only the level 35 of factor A with the level > > 900 of factor B. If I remove the combination 35, 1450 and 35, 2000; > > I'll have only 10 runs and the resulting design space will not be > > orthogonal. How do I tackle this problem? > > Is there a different design that you would suggest. > > Thanks for your help. > > SH Lee > > > > > > Sent via Deja.com > > http://www.deja.com/ > > > > > > = > > Instructions for joining and leaving this list and remarks about > > the problem of INAPPROPRIATE MESSAGES are available at > > http://jse.stat.ncsu.edu/ > > = > > > > > > > > -- > Jay Warner > Principal Scientist > Warner Consulting, Inc. > North Green Bay Road > Racine, WI 53404-1216 > USA > > Ph: (262) 634-9100 > FAX: (262) 681-1133 > email:[EMAIL PROTECTED] > web: http://www.a2q.com > > The A2Q Method (tm) -- What do you want to improve today? > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > Sent via Deja.com http://www.deja.com/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: statistics question
You've had a good "flash response" from Jay Warner. Other short answers embedded in original query below: On Sat, 3 Feb 2001 [EMAIL PROTECTED] wrote: > I have two factors A and B and I want to run a DOE to study my > response. My factor B is at 3 levels; (900, 1450 and 2000) , my factor > A is at 4 levels 35, 65, 80 and 105. > First of all is it right to have one factor at 4 levels. "Right" I don't know about. There's nothing _wrong_ with it. If it is logically required by the problem, or useful for design reasons of one sort or another, it is certainly defensible. > This will require me to have 12 runs for a full factorial, right? Your arithmetic is correct. > Also, I want to run the level 35 of factor A only with the level 900 > of factor B. If I remove the combination 35, 1450 and 35, 2000; > I'll have only 10 runs and the resulting design spce will not be > orthogonal. True. > How do I tackle this problem? > Is there a different design that you would suggest. Depends on what you're carrying out this experiment for, and why it makes sense to omit those two design points. But one way to approach the problem is to treat the data as a one-way design with 10 levels, and model the detailed questions you want to ask via assorted contrasts. Of course, the contrasts will probably not be orthogonal; but having found out some (preliminary?) things about the situation in this run, you can then more intelligently design a subsequent run, perhaps with fewer than 10 combinations, or with a design of the sort Jay suggested. -- DFB. -- Donald F. Burrill[EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 (603) 535-2597 Department of Mathematics, Boston University[EMAIL PROTECTED] 111 Cummington Street, room 261, Boston, MA 02215 (617) 353-5288 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: statistics question
Flash response: 1)Are the levels fixed by some characteristic of the process? they look continuous, and you could do much better if they were, and you could select different intermediate levels. 2)the number of levels can be what you want of it. Some good response surface designs use 5 levels. some use more. 3)Factor B levels are equally spaced, which is good. Factor A levels are not evenly spaced. A full factorial will not give you a 'clean' design - Without doing the math, I don't believe it will be orthogonal, even if you did do all the combinations. 4)what are you going to do with the results of this experiment? If you wish to build a model of the system behavior, then a full factorial type approach is a waste of your effort, time, and experimental runs. 5)Suggest you look at a Response model, with maybe 3-5 levels in both factors, but using a proper RSM type design. If you do it properly, you can avoid a single 'corner' point and recover it mathematically. 6)I'd also ask if you have hard reason to believe that a RSM type model, which will get you quadratic terms in a model, is in fact worth doing (financial/your time costs) the first time out? If little prior information is available, it would probably be better to do a simpler, 2-level factorial first, if at all possible. Doing this will teach you a great deal [that you probably don't already know]. Your choice here, but remember - most people overestimate their knowledge level :) 7)You haven't discussed the response yet. Please spend some time thinking about that, too. More later, if this helps at all. Let me know. Jay [EMAIL PROTECTED] wrote: > Hi, > > I have two factors A and B and I want to run a DOE to study my response. > My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4 > levels 35, 65, 80 and 105. > First of all is it right to have one factor at 4 levels. I have > encountered situations where the factors are either at 2 levels or 3 > levels.? > This will require me to have 12 runs for a full factorial, right? > Also, I do not want to run only the level 35 of factor A with the level > 900 of factor B. If I remove the combination 35, 1450 and 35, 2000; > I'll have only 10 runs and the resulting design space will not be > orthogonal. How do I tackle this problem? > Is there a different design that you would suggest. > Thanks for your help. > SH Lee > > > Sent via Deja.com > http://www.deja.com/ > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > > > -- Jay Warner Principal Scientist Warner Consulting, Inc. North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX:(262) 681-1133 email: [EMAIL PROTECTED] web:http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
statistics question
Hi, I have two factors A and B and I want to run a DOE to study my response. My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4 levels 35, 65, 80 and 105. First of all is it right to have one factor at 4 levels. I have encountered situations where the factors are either at 2 levels or 3 levels.? This will require me to have 12 runs for a full factorial, right? Also, I do not want to run only the level 35 of factor A with the level 900 of factor B. If I remove the combination 35, 1450 and 35, 2000; I'll have only 10 runs and the resulting design spce will not be orthogonal. How do I tackle this problem? Is there a different design that you would suggest. Thanks for your help. SH Lee Sent via Deja.com http://www.deja.com/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
updated statistics question
Hi, I have two factors A and B and I want to run a DOE to study my response. My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4 levels 35, 65, 80 and 105. First of all is it right to have one factor at 4 levels. I have encountered situations where the factors are either at 2 levels or 3 levels.? This will require me to have 12 runs for a full factorial, right? Also, I want to run only the level 35 of factor A with the level 900 of factor B. If I remove the combination 35, 1450 and 35, 2000; I'll have only 10 runs and the resulting design space will not be orthogonal. How do I tackle this problem? Is there a different design that you would suggest. Thanks for your help. SH Lee Sent via Deja.com http://www.deja.com/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =