On 12 Aug 2003, Elliot Coups wrote in part: > My goal is to determine the percentage (and 95% CI) of individuals who > are current vs. former vs. never smokers within each of the four time > since diagnosis groups.
In effect, it sounds as though you wish to estimate: <%-current-smokers> as a function of <time-since-diagnosis> and <age>, <%-former-smokers> as a function of <time-since-diagnosis> and <age>, <%-never-smokers> as a function of <time-since-diagnosis> and <age>; and in your current model <time-since-diagnosis> is represented in four categories (or, as some would say, is polychotomized). Thinking in these terms, you might wish to change from a two-way frequency-table model to a logistic regression model, carried out on three dependent variables (either separately or as a single multivariate model). In that case, your four <time-since-diagnosis> groups can be treated as the levels of a one-way ANOVA (represented in a regression model by three predictors, which might be organized as orthogonal linear, quadratic, and cubic components of <time-since-diagnosis>), and <age> as a linear predictor. (NB: you may also wish to consider modelling quadratic and cubic components of <age>, or some other (possibly more pertinent) nonlinear function of <age>.) OTOH, if you also have <time-since-diagnosis> in its original form (not categorized, but as numbers from 1 (not 0?) to, say, 25: then you could use that as a linear predictor (and still include quadratic and cubic orthogonal components if you wish), along with <age>. If I were conducting analyses of either sort, I'd want to start with four scatterplots of <time-since-diagnosis> vs. <age> for each of the three smoker-groups [current, former, never]. These would offer some hints as to the probable usefulness of including nonlinear terms (and what kind of nonlinear terms, if useful) in the model. However, I think both of the above approaches might produce misleading results unless (at least in preliminary analyses) you also include predictors representing the interaction beween <time-since-diagnosis> and <age>. There must surely be some such interaction, at least in a population of interest in which age is not artificially restricted, since you cannot have any cases for which, say, <age> = 19 and <time-since-diagnosis> = 21+. I should have to add that it is rather unclear to me what the utility of estimating these %s might be. (What do you intend to do with your results?) And I should think the three smoking levels to be a rather coarse and insensitive measure; with respect to the relationship between smoking behavior and <cancer-diagnosis>, I'd be interested in (e.g.) when current smokers began smoking and when former smokers stopped smoking, and possibly in more detailed smoking histories. (Of course, since you speak of "analyses on a dataset", more detailed information like this may simply be unavailable. In which case your report should mention this situation as a possibly severe defect in dataset design.) On 12 Aug 2003, Elliot Coups wrote: > I'm doing some analyses on a dataset of individuals who have/had > cancer, looking at the association between smoking status and the > number of years since the cancer diagnosis. I have three levels of > smoking status (current, former, never) and four levels of time since > cancer diagnosis (1-5, 6-10, 11-20, 21+ years ago). My goal is to > determine the percentage (and 95% CI) of individuals who are current > vs. former vs. never smokers within each of the four time since > diagnosis groups. That's simple enough (I can do it by looking at > frequency crosstabs), but I want to run the analysis while holding age > constant (since it is related to the time since diagnosis). I don't > have a large enough sample size to run the analysis stratified by age > group, so I want to partial out age. What is the best way to do that? > > Thank you in advance. > > Elliot > > > Elliot Coups, Ph.D. > Research Fellow > Department of Psychiatry and Behavioral Sciences > Memorial Sloan-Kettering Cancer Center > . > . > ================================================================= > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at: > . http://jse.stat.ncsu.edu/ . > ================================================================= > ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
