Re: A regressive question
Hi On 15 May 2001, Alan McLean wrote: The usual test for a simple linear regression model is to test whether the slope coefficient is zero or not. However, if the slope is very close to zero, the intercept will be very close to the dependent variable mean, which suggests that a test could be based on the difference between the estimated intercept and the sample mean. Would this not depend on the scale being used? If the predictor was some scale on which the normal range of values was quite large (e.g., GRE scores?), then the value at 0 might be some distance from the mean of Y even given a very shallow slope. So the test would somehow have to adjust for this; that is, the standard error of the difference from the mean of Y would have to vary as a function of the distance of 0 from the mean of X. And presumably the test should produce the equivalent results to the normal test of the slope. It would be interesting to see if there is such a test. Could it be related to the equations for confidence interval for predicted Y given X? There are separate formulas for individual and group predictions and the widths do vary with distance from the mean of X. Best wishes Jim James M. Clark (204) 786-9757 Department of Psychology(204) 774-4134 Fax University of Winnipeg 4L05D Winnipeg, Manitoba R3B 2E9 [EMAIL PROTECTED] CANADA http://www.uwinnipeg.ca/~clark = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: A regressive question
If the mean of the predictor X is zero, the intercept is equal to the mean of the dependent variable Y, however steep or shallow the slope may be. And as Jim pointed out, the standard error of a predicted value depends on its distance from the mean of X (being larger the farther away it is from the mean, the confidence band being described by a hyperbola). It would seem to follow that a test such as Alan asks about would be unusable if the mean of X is too close to 0, and would be (too?) insensitive if the mean of X is too far from 0. An intermediate region, where a test of intercept vs. mean Y might be useful, might perhaps be defined in terms of the coefficient of variation of X (or perhaps its reciprocal, if the mean of X were in danger of actually BEING zero). One rather suspects that any such test would be less powerful than the usual test of the hypothesis that the true slope is zero, which might be an interesting proposition (for someone else!) to pursue. -- Don. On Wed, 16 May 2001, Alan McLean wrote: The usual test for a simple linear regression model is to test whether the slope coefficient is zero or not. However, if the slope is very close to zero, the intercept will be very close to the dependent variable mean, which suggests that a test could be based on the difference between the estimated intercept and the sample mean. Does anybody know of a test of this sort? Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-472-3742 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regressive question
Thanks to everyone who answered my question. The various reservations about such a test were spot on, and helpful. My own reservations were because, I think, it is not at all clear what the null would be in this case. Are you testing mu = beta_0 (so using the null model with fixed mean) or beta_0 = mu (so using the regression model with potentially variable mean). Alan -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question
On 11 May 2001 07:34:38 -0700, [EMAIL PROTECTED] (Magill, Brett) wrote: Don and Dennis, Thanks for your comments, I have some points and futher questions on the ussue below. For both Dennis and Don: I think the option of aggregating the information is a viable one. I would call it unavoidable rather than just viable. The data that you show is basically aggregated already; there's just one item per-person. Yet, I cannot help but think there is some way to do this taking into account the fact that there is variation within organizations. I mean, if I have a organizational salary mean of .70 (70%) with a very tiny [ snip, rest] - I agree, you can use the information concerning within-variation. I think it is totally proper to insist on using it, in order to validate the conclusions, to whatever degree is possible. You might be able to turn around that 'validation' to incorporate it into the initial test; but I think the role as validation is easier to see by itself, first. Here's a simple example where the 'variance' is Poisson. (Ex.) A town experiences some crime at a rate that declines steadily, from 20 000 incidents to 19 900 incidents, over a 5-year period. The linear trend fitted to the several points is highly significant by a regression test. Do you believe it? (Answer) What I would believe is: No, there is no trend, but it is probably true that someone is fudging the numbers. The *observed variation* in means is far too small for the totals to be seen be chance. And the most obvious sources of error would work in the opposite direction. [That is, if there were only a few criminals responsible for many crimes each, and the number-of-criminals is what was subject to Poisson variation, THEN the number-of-crimes should be even more variable.] In your present case, I think you can estimate on the basis of your factory (aggregate) data, and then you figure what you can about how consistent those numbers are with the un-aggregated data, in terms of means or variances. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Question
Don and Dennis, Thanks for your comments, I have some points and futher questions on the ussue below. For both Dennis and Don: I think the option of aggregating the information is a viable one. Yet, I cannot help but think there is some way to do this taking into account the fact that there is variation within organizations. I mean, if I have a organizational salary mean of .70 (70%) with a very tiny s.d. it is different than a mean of .70 with a large s.d. Should be some way to account for this. In addition, the problems with aggregation are well documented and I believe in gereneral suggest that aggregated results overestimate relationships. Don: I suggested that the problem was not a traditional multilevel problem. Perhaps I am wrong, but here is where I thought the difference was. Typically, say in a classroom problem, I want to assess the effect of classroom characterisitcs (student/teacher ratio, teacher experience, etc.) which are constant within classrooms on say student performance, which varies within classroom across individuals. The difference between this and the problem I presented is that the OUTCOME is a contextual variable. That is, rather than individual-level varaition, the outcome caries only at the organizational level. Perhaps this can be modeled with MLMs, but it is certainly different than the typical problem. With regard to independence, I am talking about the independence of the X2's. That is X2-1 is not independent of X2-2 and X2-4 is not independent of X2-5. This is because these cases come from the same organization. So, if we simply regressed Y~X2, not accounting for X1 in the model, this causes problems for ANOVA and regression, the GLM family more generally. The lack of independence here is exactly the reason for repeated measures and MLM more generally, no? Perhaps I am making to much of the issue, but the data structure is one that I have not encountered before and I found it something of an interesting and challenging problem, just hoping I might learn something along the way. Would appreciate any comments on my comments above. Oh, and just so there is no confusion, the data below I constructed. It reflects that structure of the data and nature of the relatinoship, but I generated this data set. In addition, the real thing does include variables such as tenure, previous experience, etc. that are also used as covariates at the individual level. Of course, this also means that these would need be aggregated as well if that approach is taken. Best IDX1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Question
A colleague has a data set with a structure like the one below: ID X1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 Where X1 is the organization. X2 is the percent of market salary an employee within the organization is paid--i.e. ID 1 makes 70% of the market salary for their position and the local economy. And Y is the annual overall turnover rate in the organization, so it is constant across individuals within the organization. There are different numbers of employee salaries measured within each organization. The goal is to assess the relationship between employee salary (as percent of market salary for their position and location) and overall organizational turnover rates. How should these data be analyzed? The difficulty is that the data are cross level. Not the traditional multi-level model however. That there is no variance across individuals within an organization on the outcome is problematic. Of course, so is aggregating the individual results. How can this be modeled both preserving the fact that there is variance within organizations and between organizations. I suggested that this was a repeated measures problem, with repeated measurements within the organization, my colleague argued it was not. Can this be modeled appropriately with traditional regression models at the individual level? That is, ignoring X1 and regressing Y ~ X2. It seems to me that this violates the assumption of independence. Certainly, the percent of market salary that an employee is paid is correlated between employees within an organization (taking into account things like tenure, previous experience, etc.). Thanks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question
this is not unlike having scores for students in a class ... one score for each student and ... the age of the teacher of THOSE students ... for a class ... scores will vary but, age for the teacher remains the same ... but the age might be different in ANother class with a different teacher ... in a sense, the age is like a mean just like your turnover rate ... and you want to know the relationship between student scores and teachers ages something has to give i think you have to reduce the data points on X2 ... find the mean within organization 1 ... on X2 ... then have .4 next to it ... second data pair would be mean on X2 for organization 2 .. with .25 ... etc. so, in this case ... you have 4 values on X2 and 4 values on Y ... so, what is the relationship between those?? look at the following: Row C7 C8 1 0.72 0.40 2 1.15 0.25 3 0.90 0.30 4 0.60 0.50 MTB plot c8 c7 Plot - * 0.48+ - C8 - - * - 0.36+ - - * - - 0.24+* +-+-+-+-+-+--C7 0.60 0.70 0.80 0.90 1.00 1.10 Correlations: C7, C8 Pearson correlation of C7 and C8 = -0.957 P-Value = 0.043 there might be a better way to do it but ... looks like a pretty clear case of the greater the % of market the organization pays ... the less is there turnover rate At 06:05 PM 5/10/01 -0400, Magill, Brett wrote: A colleague has a data set with a structure like the one below: ID X1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 Where X1 is the organization. X2 is the percent of market salary an employee within the organization is paid--i.e. ID 1 makes 70% of the market salary for their position and the local economy. And Y is the annual overall turnover rate in the organization, so it is constant across individuals within the organization. There are different numbers of employee salaries measured within each organization. The goal is to assess the relationship between employee salary (as percent of market salary for their position and location) and overall organizational turnover rates. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question
On Thu, 10 May 2001, Magill, Brett wrote, inter alia: How should these data be analyzed? The difficulty is that the data are cross level. Not the traditional multi-level model however. Hi, Brett. I don't understand this statement. Looks to me like an obvious place to apply multilevel (aka hierarchical) modelling. (Have you read Harvey Goldstein's text on the method?) You have persons within organizations (just as, in educational applications of ML models, one has pupils within schools for a two-level model, and pupils within schools within districts for a three-level model), and apparently want to carry out some estimation or other analysis while taking into account the (possible) covariances between levels. If you want a simpler method than ML modelling, the method Dennis proposed at least lets you see some aggregate effects. (This does, however, put me in mind of a paper of (I think) Brian Joiner's whose temporary working title was To aggregate is to aggravate -- though it was published under another title.) ;-) Along the lines of Dennis' suggestion, you could plot Y vs X2 (or X2 vs Y) directly, which would give you the visual effect Dennis showed while at the same time showing the scatter in the X2 dimension around the organization average. For larger data sets with more organizations in them (so that perhaps several organizations would have the same (or at any rate indistinguishable, at the resolution of the plotting device used) turnover rate), you could generate a letter-plot (MINITAB command: LPLOT), using the organization ID in X1 as a labelling variable. Brett's original post presented this data structure: A colleague has a data set with a structure like the one below: IDX1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 Where X1 is the organization. X2 is the percent of market salary an employee within the organization is paid -- i.e. ID 1 makes 70% of the market salary for their position and the local economy. And Y is the annual overall turnover rate in the organization, so it is constant across individuals within the organization. There are different numbers of employee salaries measured within each organization. The goal is to assess the relationship between employee salary (as percent of market salary for their position and location) and overall organizational turnover rates. How should these data be analyzed? The difficulty is that the data are cross level. Not the traditional multi-level model however. That there is no variance across individuals within an organization on the outcome is problematic. Of course, so is aggregating the individual results. How can this be modeled both preserving the fact that there is variance within organizations and between organizations? As I understand it (as implied above), this is exactly the kind of structure for which multilevel methods were invented. I suggested that this was a repeated measures problem, with repeated measurements within the organization, my colleague argued it was not. This strikes me as a possible approach (repeated measures can be treated as a special case of multilevel modelling). But most software that I know of that would handle repeated-measures ANOVA would tend to insist that there be equal numbers of levels of the repeated-measures factor throughout the design, and this appears not to be the case (your sample data, at any rate, have different numbers of individuals in the several organizations). Can this be modeled appropriately with traditional regression models at the individual level? That is, ignoring X1 and regressing Y ~ X2. That was, after a fashion, what Dennis illustrated. In a formal regression analysis, I should think it unnecessary to ignore X1; although it would doubtless be necessary to recode it into a series of indicator-variable dichotomies, ot something equivalent. It seems to me that this violates the assumption of independence. Not altogether clear. By this do you mean regression analysis? Or, perhaps, the particular analysis you suggested, ignoring X1? Or...? And what assumption of independence are you referring to? (At any rate, what such assumption that would not be violated in other formal analyses, e.g. repeated-measures ANOVA?) Certainly, the percent of market salary that an employee is paid is correlated between employees within an organization (taking into account things like tenure, previous experience, etc.). Well, would the desired model take such things into account? (If not, why not? If so, where is the problem that I rather vaguely sense lurking between the lines here?) -- Don.
Re: Fw: statistics question
In article 003101c0bea9$31b26820$[EMAIL PROTECTED], [EMAIL PROTECTED] wrote: Hi, The below question was on my Doctorate Comprehensives in Education at the University of North Florida. Would one of you learned scholars pop me back with possible appropriate answers. Carmen Cummings An educational researcher was interested in developing a predictive scheme to forecast success in an elementary statistics course at a local university. He developed an instrument with a range of scores from 0 to 50. He administered this to 50 incoming frechmen signed up for the elementary statistics course, before the class started. At the end of the semester he obtained each of the 50 student's final average. Describe an appropriate design to collect data to test the hypothesis. What design? The data is already collected, assuming that the data matches the scores on the prediction instrument and the final result of the student. What hypothesis? The hypotheses and the assumptions come from the user of statistics alone; the learned scholars, as statisticians, should only try to extract these form the user, and to point out which assumptions are important and which are of little importance. For example, normality is usually of secondary importance, and is usually quite false, while the assumptions about the structure are of major importance. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Fw: statistics question
Hi, The below question was on my Doctorate Comprehensives in Education at the University of North Florida. Would one of you learned scholars pop me back with possible appropriate answers. Carmen Cummings - Original Message - From: "Carmen Cummings" [EMAIL PROTECTED] To: "David Cummings" [EMAIL PROTECTED] Sent: Thursday, April 05, 2001 4:38 PM Subject: statistics question An educational researcher was interested in developing a predictive scheme to forecast success in an elementary statistics course at a local university. He developed an instrument with a range of scores from 0 to 50. He administered this to 50 incoming frechmen signed up for the elementary statistics course, before the class started. At the end of the semester he obtained each of the 50 student's final average. Describe an appropriate design to collect data to test the hypothesis. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fw: statistics question
I reformatted this. Quoting a letter from Carmen Cummings to himself, On 6 Apr 2001 08:48:38 -0700, [EMAIL PROTECTED] wrote: The below question was on my Doctorate Comprehensives in Education at the University of North Florida. Would one of you learned scholars pop me back with possible appropriate answers. the question An educational researcher was interested in developing a predictive scheme to forecast success in an elementary statistics course at a local university. He developed an instrument with a range of scores from 0 to 50. He administered this to 50 incoming frechmen signed up for the elementary statistics course, before the class started. At the end of the semester he obtained each of the 50 student's final average. Describe an appropriate design to collect data to test the hypothesis. = end of cite. I hope the time of the Comprehensives is past. Anyway, this might be better suited for facetious answers, than serious ones. The "appropriate design" in the strong sense: Consult with a statistician IN ORDER TO "develop an instrument". Who decided only a single dimension should be of interest? (How else does one interpret a score with a "range" from 0 to 50?) Consult with a statistician BEFORE administering something to -- selected? unselected? -- freshman; and consult (perhaps) in order to develop particular hypotheses worth testing. I mean, the kids scoring over 700 on Math SATs will ace the course, and the kids under 400 will have trouble. Generalizing, of course. If "final average" (as suggested) is the criterion, instead of "learning." But you don't need a new study to tell you those results. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Paired t test Question
"Andrew L." wrote: I am anlaysing some data and want to administer a paired t test. Although i can perform the test, i am not totally familiar with the t-test. Can anyone tell me whether the test relies on having a large number of samples, or whether i can still realte an accurate answer from n=4 (n= number of participants). Also, does anyone know what the F stands for - i think it means F-test. What is this test designed to show. I think you should definitely get a basic introductory book on statistics and brush up on your statistical knowledge. In regards to your specific questions, the accuracy of your results doesn't really depend on the sample size, but the precision does. Your comparison of the means (You do want to compare means, don't you? You didn't actually say that...) will not be very precise with just 4 samples. F may stand for an F-test and it may stand for a lot of other things; I don't normally associate doing a F-test with a paired t-test. So I would advise, based upon your questions, don't just mechanically crank a paired t-test through whatever software you have ... sit down with someone who knows statistics and explain your entire problem to him or her, and find out if a paired t-test is the right thing to do, and how a sample size of 4 affects your comparison of the means. -- Paige Miller Eastman Kodak Company [EMAIL PROTECTED] "It's nothing until I call it!" -- Bill Klem, NL Umpire "Those black-eyed peas tasted all right to me" -- Dixie Chicks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Easy question
Hi folks, I have this problem at hand: Suppose I went to 10 lakes. I want to measure the relation with water temperature (WT) and air temperature (AT). So I can do a regression with these 10 points like this: |* |*AT |* |__*__ WT However, to be sure, I took 3 AT's and 3 WT's at each lake. Now any particular AT is not correlated with WT. Instead, they are kind of have error in both X and Y axis. Can somebody show me a better way to analyze this? I prefer talking in SAS or SAS macro. Here is hypotheticall data sheet. Lake, WT, AT Lake11015 Lake11114 Lake11213 ... Notice there is no relation between WT and AT reading. I can record this way too: Lake, WT, AT Lake11013 Lake11114 Lake11215 ... Thanks in advance. Best regard, W = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question: follow-up
On 5 Mar 2001 16:41:22 -0800, [EMAIL PROTECTED] (Donald Burrill) wrote: On Mon, 5 Mar 2001, Philip Cozzolino wrote in part: Yeah, I don't know why I didn't think to compute my eta-squared on the significant trends. As I said, trend analysis is new to me (psych grad student) and I just got startled by the results. The "significant" 4th and 5th order trends only account for 1% of the variance each, so I guess that should tell me something. The linear trend accounts for 44% and the quadratic accounts for 35% more, so 79% of the original 82% omnibus F (this is all practice data). I guess, if I am now interpreting this correctly, the quadratic trend is the best solution. DB Well, now, THAT depends in part on what the spectrum of candidate solutions is, doesn't it? For all that what you have is "practice data", I cannot resist asking: Are the linear quadratic components both positive, and is the overall relationship monotonically increasing? Then, would the context have an interesting interpretation if the relationship were exponential? Does plotting [ snip, rest ] "Interesting interpretation" is important. In this example, the interest (probably) lies mainly with the variance-explained: in the linear and quadratic. It's hard for me to be highly interested in an order-5 polynomial, and sometimes a quadratic seems unnecessarily awkward. What you want is the convenient, natural explanation. If "baseline" is far different from what follows, that will induce a bunch of high order terms if you insist on modeling all the periods in one repeated measures ANOVA. A sensible interpretation in that case might be, to describe the "shock effect" and separately describe what happened later. Example. The start of Psychotropic medications has a huge, immediate, "normalizing" effect on some aspects of sleep of depressed patients (sleep latency, REM latency, REM time, etc.). Various changes *after* the initial jolt can be described as no-change; continued improvement; or return toward the initial baseline. In real life, linear trends worked fine for describing the on-meds followup observation nights (with - not accidentally - increasing intervals between them). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question
"Philip Cozzolino" [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... However, after the cubic non-significant finding, the 4th and 5th order trends are significant. Intuitively, it seems that if there is no cubic trend of significance, there will not be any higher order trend, but this is relatively new to me. Hi Philip. In a trend analysis, each test is orthogonal (independent) of the other tests so the results reported are quite reasonable. Admittedly, in my experience at least, it's a little unusual to have 4 out of the 5 trends significant but such a finding does not indicate any problem with the analysis. Are there equal intervals between the six levels of your factor? Robert = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article [EMAIL PROTECTED], Richard A. Beldin [EMAIL PROTECTED] wrote: You missed the point, Herman. I don't assert that these are independent random variables. I claim that introducing students to the concept of independent sample spaces from which we construct a cartesian product sample space will make it easier for them to understand independent events and random variables when we define them late. I believe that this will not do what is expected, and might even make it worse. When we introduce sample spaces, we do not, and should not, introduce the probabilities at that time. If we did, we could not have inference, and also I believe that we need to get across the idea that there is no "right" sample space for a problem, but merely adequate representations; the point in a sample space can represent the result of the experiment under consideration, but we might have more. Otherwise, how can we consider the number of successes to be a real-valued random variable? Sample spaces can be Cartesian products without the coordinates being independent; whenever we have a bivariate classification, we have a Cartesian product, whether or not there is independence. We do not want students to consider race and lactose intolerance to be independent. Presenting oversimplified special cases seems to make it harder for people to understand. I deliberately postpone all considerations of symmetry or equally likely, as the students (and also those using probability and statistics) have a major tendency to impose this when it is very definitely not the case. The "principle of insufficient reason" contributed to the demise of Bayesian statistics in the 19th century, and I see it going strong now. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
In article 52jo6.114$[EMAIL PROTECTED], Milo Schield [EMAIL PROTECTED] wrote: But what does this (in)dependence really mean? Can it change on conditioning? . This seems related to Simpson's paradox. In any event, it seems that independence can be conditional. Is this so? If so, where is this discussed in more detail? Why does it have to be discussed in more detail? Conditional probability is probability. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question
Philip has been unfortunate enough to get significance on his 4th and 5th order trends, and is hoping that nonsignificance of the 3rd order trend means the higher order trends are spurious. Sorry no. Consider a perfect quadratic relationship -- there will be absolutely no linear component. I wonder if one should even test for trends of an order that one could not interpret. They will always be present in some magnitude, and, given sufficient sample size, will be "significant." It might help to compute eta-squared (divide the trend SS by the total SS) and then use that statistic to decide whether you can dismiss the "significant trend" as trivial in magnitude -- I have generally been able to do so when having encountered such higher order trends that defy interpretation but meet our criterion of statistical significance. ++ Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question - Thanks
Thanks Donald and Karl for your responses... Yeah, I don't know why I didn't think to compute my eta-squared on the significant trends. As I said, trend analysis is new to me (psych grad student) and I just got startled by the results. The "significant" 4th and 5th order trends only account for 1% of the variance each, so I guess that should tell me something. The linear trend accounts for 44% and the quadratic accounts for 35% more, so 79% of the original 82% omnibus F (this is all practice data). I guess, if I am now interpreting this correctly, the quadratic trend is the best solution. Thanks again for your help, -Philip --- "If we knew what we were doing, it wouldn't be called research, would it?" -Albert Einstein in article [EMAIL PROTECTED], Philip Cozzolino at [EMAIL PROTECTED] wrote on 3/3/01 7:23 PM: Hi, I have a question on how to interpret a specific trend analysis summary table. The IV has 6 levels, so I had SPSS run the analysis checking up the 5th order trend. There is a significant linear and quadratic trend, but not cubic. However, after the cubic non-significant finding, the 4th and 5th order trends are significant. Intuitively, it seems that if there is no cubic trend of significance, there will not be any higher order trend, but this is relatively new to me. Any help is greatly appreciated. -Philip = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
But what does this (in)dependence really mean? Can it change on conditioning? Suppose that we take into account a plausible confounder: defective equipment. Suppose blacks are more likely to have "defective equipment (broken light, etc.). Suppose we find that percentage who are black among those stopped for defective equipment is the same as the percentage who are black among those having defective equipment. Now we have independence at one level and non-independence at another. This seems related to Simpson's paradox. In any event, it seems that independence can be conditional. Is this so? If so, where is this discussed in more detail? "Lise DeShea" [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... Re probability/independence, I've found that the most effective way to communicate this concept to my students (College of Education, not heavily math-oriented) is the following: SNIP Then you can move to an example of racial profiling. Out of all the people in your city who drive, what proportion are African-American? [p(African-American).] Now, GIVEN that you look only at drivers who are pulled over, what proportion of these people are African American? [p(African-American|pulled over).] If being black and being pulled over are independent events, then the probabilities should be equal. You can illustrate this graphically by drawing a large box to represent all the drivers, then mark the proportion representing African-American drivers. Then draw a smaller box representing the people being pulled over, with a proportion of the box marked to represent the African-American drivers who are pulled over. If the proportions of each box are equal, then the events are independent. So now, I would welcome comments from the more mathematically/statistically rigorous list members among us! ~~~ Lise DeShea, Ph.D. Assistant Professor Educational and Counseling Psychology Department University of Kentucky 245 Dickey Hall Lexington KY 40506 Email: [EMAIL PROTECTED] Phone: (859) 257-9884 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question
On Sun, 4 Mar 2001, Philip Cozzolino wrote in part: However, after the cubic non-significant finding, the 4th and 5th order trends are significant. Intuitively, it seems that if there is no cubic trend of significance, there will not be any higher order trend, but this is relatively new to me. Your intuition is, in this case, incorrect. The five trends are mutually independent in the sense that any combination of them may be operating. (I am for the moment accepting the implied premise that a power function of the IV is a reasonable function to try to fit to your data. In most instances I know of, this is not "really" the case, and the power function is more usefully thought of as an approximation to whatever the "real" functionality is.) This may be seen by considering the following relationships between Y and X (think of them as DV and IV if you wish): I. + * * -* * Y - -* * - + * * - - * * - * - +-+-+-+-+-+- X II.+ * - * ** - Y - ** * - + * * * - - * * * - - * * +-+-+-+-+-+- X In I. above, the linear trend is approximately zero, and the quadratic component of X accounts for nearly all the variation in Y. A "rule" that claimed "If the linear trend is insignificant there can be no significant quadratic trend" is clearly false in this case. In II. above, both the linear and quadratic components of trend are virtually zero -- certainly insignificant -- and the cubic component accounts for nearly all the varition in Y. Similar situations can be imagined, where only the quartic, or only the quintic, or only the linear, quadratic, and quartic, or any other arbitrary combination of the basic trends are significant, and other components are not. If you are carrying out your trend analysis by using orthogonal polynomials (as you probably should be), try constructing the model derived from your linear + quadratic fit only, and plot those as predicted values against X; then construct the model derived from linear + quadratic + quartic + quintic, and plot those predicted values against X. You may find it illuminating also to plot the residuals in each case against X, especially if you force the same vertical scale on the two sets of residuals. I note in passing that you haven't stated how much of the variance of Y is accounted for by each of the significant components, nor how much residual variance there is after each component is entered. That also might be illuminating. -- DFB. -- Donald F. Burrill[EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 (603) 535-2597 Department of Mathematics, Boston University[EMAIL PROTECTED] 111 Cummington Street, room 261, Boston, MA 02215 (617) 353-5288 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
I think that introducing the word "independent" as a descriptor of sample spaces and then carrying it on to the events in the product space is much less likely to generate the confusion due to the common informal description "Independent events don't have anything to do with each other" and "Mutually exclusive events can't happen together." I like Dick's idea a lot. To me, part of the problem is that textbooks fail to distinguish independence as a mathematical construct from independence as a modeling construct. Too many intro books put their expository effort into the mathematical definition, and then get obfuscatorily circular when it comes to the examples. Mathematicians *assume* independence, statisticians look at the data, and textbooks fail to recognize the difference. Dick's approach gives a nice way, in an elementary seting, to help students recognize situations where an assumption of independence is likely to stand up to empirical scrutiny. I agree, too, Dick, that this should help with mutually exclusive vs. independent. George Cobb George W. Cobb Mount Holyoke College South Hadley, MA 01075 413-538-2401 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
This is a multi-part message in MIME format. --D6CAE5CBE7F2826036C27891 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit The suits and ranks of cards in a bridge deck certainly can be presented as independent sample spaces which we use as components of a cartesian product. Whether one does so or not is a matter of choice. I am on record as favoring the presentation as the cartesian product. Even the sample mean and variance can be seen this way, in fact, every vector valued random variable can be cast in the form of a random vector from a cartesian product. My point is that if we introduce independence as an attribute of sample spaces which we proceed to study as one, we can better motivate the idea of independent random variables and independent events. --D6CAE5CBE7F2826036C27891 Content-Type: text/x-vcard; charset=us-ascii; name="rabeldin.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Richard A. Beldin Content-Disposition: attachment; filename="rabeldin.vcf" begin:vcard n:Beldin;Richard tel;home:787-255-2142 x-mozilla-html:TRUE url:netdial.caribe.net/~rabeldin/Home.html org:BELDIN Consulting Services version:2.1 email;internet:[EMAIL PROTECTED] title:Professional Statistician (retired) adr;quoted-printable:;;PO Box 716=0D=0A;BoquerĂ³n;PR;00622; fn:Richard A. Beldin end:vcard --D6CAE5CBE7F2826036C27891-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Satterthwaite-newbie question
On Tue, 27 Feb 2001, Allyson Rosen wrote: I need to compare two means with unequal n's. Hayes (1994) suggests using a formula by Satterthwaite, 1946. I'm about to write up the paper and I can't find the full reference ANYWHERE in the book or in any databases or in my books. Is this an obscure test and should I be using another? Perhaps it refers to: F. E. Sattherwaite, 1946: An approximate distribution of estimates of variance components. Biometrics Bulletin, 2, 110-114. According to Casella Berger (1990, pp. 287-9), "this approximation is quite good, and is still widely used today." However, it still may not be valid for your specific analysis: I suggest reading the discussion in Casella Berger ("Statistical Inference", Duxbury Press, 1990). There are more commonly used methods for comparing means with unequal n available, and you should make sure that they can't be used in your problem before resorting to Sattherwaite. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: basic stats question
Re probability/independence, I've found that the most effective way to communicate this concept to my students (College of Education, not heavily math-oriented) is the following: Consider the student population of your university. Perhaps there is a fairly equal split of males and females in the student body. Now, put a condition upon the student body -- only those majoring in, say, psychology. Do you find the same proportion of students who are male within only psych majors, compared with the proportion of students in the entire student body who are male? If gender and psych major are independent, then the probability of a randomly chosen person at the university being male should equal the probability of a randomly chosen psych major being male. That is, p(male) = p(male|psych major) ==(p. of male, given you're looking at psych majors) Then you can move to an example of racial profiling. Out of all the people in your city who drive, what proportion are African-American? [p(African-American).] Now, GIVEN that you look only at drivers who are pulled over, what proportion of these people are African American? [p(African-American|pulled over).] If being black and being pulled over are independent events, then the probabilities should be equal. You can illustrate this graphically by drawing a large box to represent all the drivers, then mark the proportion representing African-American drivers. Then draw a smaller box representing the people being pulled over, with a proportion of the box marked to represent the African-American drivers who are pulled over. If the proportions of each box are equal, then the events are independent. So now, I would welcome comments from the more mathematically/statistically rigorous list members among us! ~~~ Lise DeShea, Ph.D. Assistant Professor Educational and Counseling Psychology Department University of Kentucky 245 Dickey Hall Lexington KY 40506 Email: [EMAIL PROTECTED] Phone: (859) 257-9884
Re: basic stats question
In article [EMAIL PROTECTED], Richard A. Beldin [EMAIL PROTECTED] wrote: This is a multi-part message in MIME format. --20D27C74B83065021A622DE0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I have long thought that the usual textbook discussion of independence is misleading. In the first place, the most common situation where we encounter independent random variables is with a cartesian product of two indpendent sample spaces. Example: I toss a die and a coin. I have reasonable assumptions about the distributions of events in either case and I wish to discuss joint events. I have tried in vain to find natural examples of independent random variables in a smple space not constructed as a cartesian product. I think that introducing the word "independent" as a descriptor of sample spaces and then carrying it on to the events in the product space is much less likely to generate the confusion due to the common informal description "Independent events don't have anything to do with each other" and "Mutually exclusive events can't happen together." Comments? The usual definition of "independence" is a computational convenience, but an atrocious definition. A far better way to do it, which conveys the essence, is to use conditional probability. Random variables, or more generally partitions, are independent if, given any information about some of them, the conditional probability of any event formed from the others is the same as the unconditional probability. This is the way it is used. As for a "natural" example not coming from a Cartesian product, consider drawing a hand from an ordinary deck of cards. On another newsgroup, someone asked for a proof that the number of aces and the number of spades was uncorrelated; they are not independent. The proof I posted used that for the i-th and j-th cards dealt, the rank of the i-th card and the suit of the j-th are independent. For i=j, this can be looked upon as a product space, but not for i and j different. There are other examples. The independence of the sample mean and sample variance in a sample from a normal distribution is certainly an important example. The independence of the various sample variances in an ANOVA model is another. The independence for each t of X(t) and X'(t) in a stationary differentiable Gaussian process is another. This is thrown together off the cuff. There are lots of others. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Satterthwaite-newbie question
I need to compare two means with unequal n's. Hayes (1994) suggests using a formula by Satterthwaite, 1946. I'm about to write up the paper and I can't find the full reference ANYWHERE in the book or in any databases or in my books. Is this an obscure test and should I be using another? Thanks, Allyson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
G*Power is a powere analysis package that is freely available. You can download it at: http://www.psychologie.uni-trier.de:8000/projects/gpower.html You can calculate a sample size for a given effect size, alpha level, and power value. -Original Message- From: Scheltema, Karen [mailto:[EMAIL PROTECTED]] Sent: Friday, February 23, 2001 10:07 AM To: [EMAIL PROTECTED] Subject: Sample size question Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
Thanks! This was exactly what I was looking for! Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] -Original Message- From: Magill, Brett [SMTP:[EMAIL PROTECTED]] Sent: Friday, February 23, 2001 9:53 AM To: 'Scheltema, Karen'; [EMAIL PROTECTED] Subject: RE: Sample size question G*Power is a powere analysis package that is freely available. You can download it at: http://www.psychologie.uni-trier.de:8000/projects/gpower.html You can calculate a sample size for a given effect size, alpha level, and power value. -Original Message- From: Scheltema, Karen [mailto:[EMAIL PROTECTED]] Sent: Friday, February 23, 2001 10:07 AM To: [EMAIL PROTECTED] Subject: Sample size question Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sample size question
You can use Sample Power from SPSS (a.k.a. Power and Preceision) or PASS 2000 from NCSS. For more info, please visit: http://www.spss.com http://www.ncss.com http://seamonkey.ed.asu.edu/~alex/teaching/WBI/power_es.html --- --"Regression to the mean" is not always true. After 30, my weight never regresses to the mean. Chong-ho (Alex) Yu, Ph.D., MCSE, CNE Academic Research Professional/Manager Educational Data Communication, Assessment, Research and Evaluation Farmer 418 Arizona State University Tempe AZ 85287-0611 Email: [EMAIL PROTECTED] URL:http://seamonkey.ed.asu.edu/~alex/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
I tried the site but received errors trying to download it. It couldn't find the FTP site. Has anyone else been able to access it? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] -Original Message- From:Chuck Cleland [SMTP:[EMAIL PROTECTED]] Sent:Friday, February 23, 2001 11:04 AM To: [EMAIL PROTECTED] Subject: Re: Sample size question "Scheltema, Karen" wrote: Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size? Look here: http://www.interchg.ubc.ca/steiger/r2.htm Chuck Karen, I just looked, and was able to access the site and download the files. Dan Nordlund = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Sample size question
On 23 Feb 2001 12:08:45 -0800, [EMAIL PROTECTED] (Scheltema, Karen) wrote: I tried the site but received errors trying to download it. It couldn't find the FTP site. Has anyone else been able to access it? As of a few minutes ago, it downloaded fine for me, when I clicked on it with Internet Explorer. The .zip file expanded okay. I used right-click (I just learned that last week) in order to download the .pfd version of the help. [ ... ] Earlier Q and Answer "Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size?" Look here: http://www.interchg.ubc.ca/steiger/r2.htm Hmm. Placing limits on R^2. I have't read the accompanying documentation. On the general principal that you can't compute power if you don't know what power you are looking for, I suggest reading the relevant chapters in Jacob Cohen's book (1988+ edition). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =