This seems to me a rather serious issue, but one that comes up more frequently than it should. Let's assume the treatments were applied at random to the plots. There are two options with regard to pre conditions. One is to apply the treatments at random and simply remain blind to any existing variation among the plots. This approach relies on the process of randomization to provide adequate dispersion across the existing variation and deconfound existing variation and applied treatment. This is statistically valid, quite common, but does have its risks. The risk is that things can go "wrong" in the randomization processes and confounding does occur - you simply don't know it and must interpret your results under the assumption that treatment was the only factor that varied systematically. Treatments effects can be washed out or observed variation may be driven by the unmeasured variation in pre existing conditions. However, you are statistically justified in accep! ting the results as they present themselves. They may simply be wrong. Prior knowledge of variation among plots is important in the decision to go this route.
The second approach is the one that you took, which is to pre sample the plots to assess the variation in conditions. The value here is that you have a lot more information you can potentially bring to bear. The point that is most often missed is that it also allows you, and I would suggest requires you, to carefully tune your design to account for the existing variation. The "risk" in taking pre samples is that if you don't fine tune your design, you are still "stuck" with the information from the pre sample. It can't be ignored or simply made to go away. Thus, once you make the decision to pre sample plots, it is CRITICAL to use that variation in the assignment of treatments to assure adequate dispersion of treatments across the existing variation. I think the proper approach here would have been to block the plots by seedling health and then randomly apply the treatments within blocks. Alternatively one could use a stratified random assignment of plots, though this! limits the ability to extract information. The covariate approach ONLY works if there is no confounding of pre existing variation and treatment (and no interaction). I hate to be dour, but I'm not sure I see a way out of this situation. Can you really hope to determine whether it is treatment or initial seedling health that is driving the results? One would have to know more of the details, but either way the robustness of the results that typically derive from an experiment are seriously compromised. Had you blocked by seedling condition you could look at the effect of seedling health, treatment and their interaction. I think the most frustrating thing in such situations is that one ends up thwarted by one's own best intentions. On 11/11/10 3:04 PM, "Jing Luo" <luoj...@gmail.com> wrote: Dear All, I have a question about including covariates in the ANOVA analysis. We grew corn seedlings in about 32 field plots and then applied 4 different treatments to study their responses (plot is the experiment unit). However, we noticed quite big variation of seedling healthiness from plot to plot BEFORE the treatments were applied. So we scored the healthiness from 1 to 5 (least healthy to most healthy) and planned to include this as a covariate in the model. During data analysis, I noticed that the healthiness was confounded with treatments, with some treatments applied to most of the healthy plots, and other treatment applied to most of the not healthy plots (we could not control that because treatment to each plots was pre-determined). As a result, the analysis on some of the variables show some strange patterns, especially when the healthiness covariate was significant in the model. For one variable, for example, the least-square mean estimates of the four treatments were A=B=C<D if covariate was NOT included, but became A=B=D<C if covariates was included in the model. I acknowledge that covariates serve their important role in controlling factors that were not imposed by the treatment. However, I am just wondering when the covariate is confounded with treatment, and had significant affect on the results, can we argue that the covariate could be excluded from the model? Have you ever have to deal with this similar situation before? Any thoughts will be appreciated. Thanks. Jing Luo