Dear List(s), Last week I have posted a request for help pertaining to the issue of how to analyse repeated measurement data with some rather unusual dimensions (N=120 subjects giving a time series with T=120 occassions each). To my pleasure, there have been as much as 12 replies which are given below. Beforehand, the original posting is given. For those of you who don't want to read through all of these, here is a short summary. Roughly, responses/recommendations can be classified into 5 categories: a) Make use of all information available and put them into a form suitable for state-space models or vector-ARIMA-analyses. Frankly: this seems to go beyond my capabilities, and I am inclined to take the penalty of reducing the data as pro- posed by other responders. b) Perform time series analyses for each subject and boil the data down to certain parameters. Read these into a secondary data set and merge it somehow with the original one in order to preserve design variables like treat/control, sex and age or the like. Finally, use these data for "standard" analyses to test for hypotheses of homogeneity of subjects within groups or differences across groups (implying some MANOVA-style model). Going to extremes, one could obtain a single parameter like a slope for each subject and perform univariate analyses with regard to higher stratum levels. Another response in this direction suggested fitting a spline model for each subject and use spline components for subsequent ANOVA-style models. I understand that this could be done using proc transreg in SAS, but I am not sure whether this procedure does in fact account for the time dependency in the individual data giving the spline. c) Reduce the repeated measurement frequency in the first place and then perform (M)ANOVA-style analyses with a time factor of (then) suitable level count. Test for time effects using standard contrast like polynomial decomposition or helmert coefficients (when interest lies with the point in time when responses cease to change any further). d) A particuarly interesting response suggested identifying "change profiles" within time series and submitting these to further analyses like permutation test. Still, I am unclear about how to aggregate data in order to make best use of all subjects' data. e) General remarks and caveats like paying regard to sample size issues, looking for cyclicity in individual data that generalze to the stratum, adjusting for cross-correlations in case of multi-variable outcome measures, and the complexity of assumptions required when analysing complex factorial designs involving a repeated measures factor. Again, thanks to all who took their time to help. I am committed to parti- cipate in this way of mutual assistance. Hans C Waldmann --------------------------------------------------------------------- Dr. Hans C Waldmann Methodology & Applied Statistics in Psychology & the Health Sciences ZFRF / University of Bremen / Grazer Str 6 / 28359 Bremen / Germany [EMAIL PROTECTED] / http://samson.fire.uni-bremen.de friend of: AIX PERL ADABAS SAS TEX --------------------------------------------------------------------- Following: 0) Original posting (request) 1-12) replies 0) ------------------------------------------------------------------------ ----- Original Message ----- From: "Dr. Hans-Christian Waldmann" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, October 12, 2000 2:18 PM Subject: a model for time series (T=120) for N=120 persons ? > > > Hello everybody, > > in one of the clinical projects we consult on data analysis, I am > facing a problem I have not yet come across and that leaves me with no > idea on how to proceed. The problem pertains to the dimension of > the outcome data set. In a repeated measures design, let N be the > number of people treated and T be the number of measurement occassions. > > I understand that N=1 (or _some_) more and T=120 would make up a time > series, and that I am supposed to fit ARIMA-MOdels or Transfer functions. > I could detect effects by structural breaks around the point of time of > intervention, that is: performing intervention analyses as proposed in > McDowell, McCleary, Meidinger and Hay, 1980, Interrupted time series > analysis, or other books on how to analyse data from single subject > designs. > Allright. > > I understand that N=120 (or any number more) and repeated measures like > 2<=T<="the-smaller-the-better" would make up a dataset suitable for > an ANOVA approach or mixed models using special covariance structures > like SAS's proc mixed. I know how to do that. > Allright. > > I understand that for each of this variants there are some alternatives > in statistical modeling (like non-parametric analyses etc.). > > Now, what am I supposed to do with data from a design giving a T=120 > time series for _each_ of 120 subjects ? There has been a controlled > study where patients in three independent groups were asked to keep > a diary on some outcome variables for ca. 4 months. There are some > design variables like treat/control or sex and age that are expected > to contribute systematically to variation between outcome measures. > But this outcome measure apparently is a time series. I don't think > I should perform an ANOVA-style analysis with a 120-level time factor. > Pooling data and performing ARIMA/transfer-functions on a single time > series of subjects' means for each point in time doesn't make sense > either, assuming that subjects differ in both measurement level and > covariance structure of their individual time series. I admit that > I have no idea how to evaluate, say, an effect of treatment on this > kind of outcome measure. > > Does anybody else have an idea ? I promise to post a summary of res- > ponses to the list. > > > Thanks in advance > > Hans-Christian Waldmann > > > 1. ----------------------------------------------------------------------- >From [EMAIL PROTECTED] Thu Oct 12 15:34:49 2000 Hi It may be easier in the long run to pose the time series in state space form. Then missing values are easy to deal with for a start and it can be easier to model what goes on. See papers by Durbin and Koopman, and software from Koopman (Ssfpack) http://www.econ.vu.nl/koopman/ssfpack/ This does all the hard work to leave you to concentrate on modelling. I think there may be a book coming out soon from Oxford University Press http://www.oup.co.uk/ Hope this helps. I intend to have a go with this software when I can. Robert West 2. ----------------------------------------------------------------------- >From [EMAIL PROTECTED] Thu Oct 12 16:25:09 2000 Status: RO Dear Hans The obvious way to proceed would be to analyse each of the 120 time series in some appropriate way, and then use the derived parameters in further analysis of the experiment overall. In a simple example, where the derived parameter is, say, a slope from Linear Regression, the slope estimates can then be used as the response in an ANOVA or Multiple Regression analysis. Your time series can be de-constructed in a suitable way, perhaps using a breakpoint detection method, some ARIMA model parameter, or even a more complex method such as describing the time series curve using principle components. Ones you have the derived parameters, these can then be modelled. You should end up with model/s that determine or predict the effect of each of the experimental factors such as treatment/control, age, sex, group type (in combination) on all the derived parameters. This should, in turn, lead to some standard values of the derived parameters that occur under particular combinations of the factor settings. Of course the simpler the derived parameters the better, and one must take care of correlations between these when establishing levels of uncertainty around estimates of the 'standard' values. Where the effect of factors on the parameters conflict in some way, if this is possible, joint optimisation methods can be utilised. The modelling techniques will probably involve response surface analysis. So there are three stages: Time series analysis - to find parameters Parameter modelling - to find factor effects and predictive equations Joint optimisation? - to resolve prediction-effect conflicts Regards Dave Stewardson 3. ----------------------------------------------------------------------- >From [EMAIL PROTECTED] Thu Oct 12 17:10:35 2000 Hi, This sort of things occurs quite often in clinical trials when you have diary data. The subject is on treatment for 12 weeks say and records their lung function (say) every day just after they get up. N=200 to 500 say and T=12*7=84 The crucial things is to get the client (medic) to state what it is that interests him. Then an easy approach is to do a two stage analysis. Within each individual analyse the data to produce a single summary value. Could be mean, slope, maximum, minimum, area under the curve (average), time when value drops to 50% of baseline. etc, etc. Which summary statistic you choose is determined by what question the client wants to ask of the data. Then analyse these summary values across subjects at the higher stratum level. For regular data many of these analyses are special cases of mixed models. (e.g. taking regression slope and analysing at higher level is exactly the same as random-coefficient regression in mixed models when the X values are the same for every subject.) But first of all - plot the data! James. 4. ----------------------------------------------------------------------- From: "Gaj Vidmar" <[EMAIL PROTECTED]> What I can propose is rather simple, so it may well be completely wrong (especially as no true expert has posted anything on the topic so far), but perhaps it will be of some use: why not pool data for an individual over time-periods - say, months, or to preserve more information, weeks? (Perhaps not by averaging, but - depending on data chracteristics - using median, geometric mean, or some fancy M-estimator?) - This will give you the possibility to conduct an ANOVA-type analysis - mixed model with some "nonrepeated" factors (three fixed, if I get it right, i.e., treat/control, sex and age, plus eventual others) and week (or whatever time-period) as "repeated". As emphasised in the introduction, this may be less than two cents. Best regards, Gaj Vidmar Univ. of Ljubljana, Dept. of Psychology 5. ----------------------------------------------------------------------- From: "Gaj Vidmar" <[EMAIL PROTECTED]> Dr. Waldman, there seems to be no word from professional statisticians yet, so here's an addenum. Namely, I have overlooked two important aspects of the study; which, hovever, doesn't invalidate the basic idea of pooling individual data over appropriate time-periods. The first aspect are the three groups of patients. - I'm not sure whether they were formed on the basis of the (quote) design variables (in which case there is one factor instead of the three nonrepeated ones), or they define another factor (a rondom one, I guess, as opposed to the three fixed ones), but the pooling approach is independent on this fact. The same goes for the second aspect, i.e., that there were several measures taken, not just one. Theoretically, MANOVA might thus be feasible instead of several ANOVAs. But with such a complex model (say, one random plus three fixed factors plus one repeated-measures factor) properly checking all the various assumptions and interpreting all the results is rather ... Not to mention that the analysis must be properly set up in the first place (contrasts issues ...), as well as sample size issues ... At least, fully understanding such an analysis is probably beyond the horizon of the majority of the "consumers" in social/health sciences, to which you will presumably have to present the findings. So if the outcome variables are not too many and/or they are not too correlated, I believe they can be analysed "one by one". Awaiting judgement from the sci.stat.* community and wishing you all the best with the research, Gaj Vidmar 6. ----------------------------------------------------------------------- From: MJ Ray <[EMAIL PROTECTED]> My own suggestion (mangled by a bad emailer) was to use vector time series methods, but this could lead to a fairly large computation probelm without extra information. I wasn't able to recommend a very good specialist reference off the top of my head, though. MJR 7. ----------------------------------------------------------------------- From: Elliot Cramer <[EMAIL PROTECTED]> you havn't really given enough information but here is a suggestion. you have three separate groups. If they are not the treatment groups with random assignment, anything else you do will be VERY dubious. You could use sex as a blocking factor and age as a covariate. What is the purpose of the 120 observations? You could construct a SMALL number of relevant variables from these observations and do a MANOVA, for example linear, quadratic and cubic trends if you are simply interested in what happens over time. You might also do a between groups analysis on the final time or average time. It's hard to say without knowing the details. What your REALLY should do is consult a statistician about the specifics. 8. ----------------------------------------------------------------------- From: [EMAIL PROTECTED] (Magill, Brett) I don't know enough about time series really to provide much advice. However, I have seen methods by which a slope was calculated across time for each subject with the first measurement as the incercept (within subjects). Subsequently, the individual slope was regressed on other factors. Thus, answering the question what factors (X) influence the rate of change/direction across time in Y. 9. ----------------------------------------------------------------------- From: [EMAIL PROTECTED] (Simon, Steve, PhD) Even though the researchers collected data on 120 consecutive days, I doubt that they are particularly interested in any one day in isolation. Look at some composite measures, such as the slope of the trend line, or the change score at the end of each month. Or perhaps an average for each month, or the standard deviation for each month. Your researchers should be able to elaborate on why they collected the data, and that elaboration should help you decide which composite measure you should use. Once you reduce it to a small number of composite measures, then you can apply the ANOVA types of procedures. An alternative that might be worth exploring is fitting a spline model to each subject's data and then pooling the splines across groups. This is messy and complex, but fun. I hope this helps. Good luck! 10. ----------------------------------------------------------------------- From: Rich Ulrich <[EMAIL PROTECTED]> Steve tells how to make the best of the data, making the likely assumptions about the 120 days - You don't say where the 120 days exist, so it might be that the are paycheck cycles of 7, 14, 28 days, or a month; or menstrual cycles, or some other. If the subjects have some overlapping '120 days' on the calendar, it might be reasonable to look at calendar-date for cycles, or for extreme events. That's assuming, there is a bit of day-to-day lability that might cover up some information. But if you aren't looking at (say) muggings on the day after Social Security Checks appear, then I doubt that cycles are likely. Still, the detail does allow you to exam on-set variations, or off-set -- That is, there might be a definite curve over the first week or so that does not exist later, if the ratings are something that entail learning or adaption. - This could be something interesting if it varies among the three groups, or it could be something to be eradicated because it is artifact. On the other hand, if the 120-days was known to be a limit, there might be some 'anticipation of the end' -- for instance, patients in hospitals may show remarkable recovery during the last week of the insurance coverage. So, you can probably lump data by weeks or months, but don't forget to take a look at start- and end-effects. If the measures have those hazards. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html 11. ----------------------------------------------------------------------- From: MJR http://stats.mth.uea.ac.uk/ Panic. Seriously, you need to be looking at multivariate or vector time series methods in this case. Unfortunately, without adding more assumptions (hopefully reasonable ones created from expert opinion) to the model, you are looking at a fairly large computation problem (many cross-correlations) at least, I think. I shan't say more for fear of making a fool of myself at this time of the morning, but suggest the weighty "Time Series Analysis" by Hamilton as a possible lead. 12. ----------------------------------------------------------------------- From: David Carr You are right that pooling the 120 people into one time series is not the correct solution (average behaviour is such a context is close to meaningless----a former colleague of mine, Dr. Wolfgang Keeser showed this several years ago, with time series of cigarettes smoked for people being treated to give up smoking). I think one of the better analysis strategies would to model for each person an intervention (interrupted) time series model made popular by Box and Tiao. With the resulting parameters one could classify patients to those with no change, only a transient change, and those with a permanent change (either positive or negative). With this classification one could then look into the possible influence of other covariates. As far as the time series modelling goes, I would be glad to help you out, if you happen to have any funding. If on the other hand, you need to do it yourself, I could send you a bibliography (or part thereof) of mine on time series topics (some 1500 references in all). Look forward to your reply. 12. ----------------------------------------------------------------------- From: [EMAIL PROTECTED] Lieber Hans (ich glaube es ist angebracht auf deutsch zu antworten), ich hatte vor ein paar Monaten mit einem ähnlichen Problem zu tun, als wir versuchten die Effekte von verschiedenen Reizen auf eine physiologische Response Variable (RR Intervalle, Pulsamplituden etc.) zu quantifizieren. Wir wussten a priori so gut wie nichts, weder wie stark die Effekte der einzenen Reize seien, noch wann nach dem Reiz der Effekt messbar ist. Deshalb wurde pro Patient in sehr kurzen Abständen gemessen. Wir hatten es somit mit einer Menge längererer Zeitreihen zu tun. Nach langwieriger Literatursuche bin ich im Biometrics auf eine interessante Arbeit eines ungarischen Kollegen gestossen, der ein nichtparametrisches Verfahren zur Analyse von RM Problemen entwickelt hat: J.Reiczigel: Analysis of Experimental Data with Repeated Measurements. Biometrics 1999; Vol 55.,No.4:P 1059-1063. Kurz: die Methode sucht in den einzelnen Verlaufsprofilen Abschnitte von k konsekutiven Punkten, die sich vom Rest deutlich unterscheiden (sog.Top Periods). Mittels eines Permutationstests wird überprüft, ob das Auftreten dieser Perioden systematisch (Effekt) oder zufällig ist. Ich habe mit dem Autor Kontakt aufgenommen, dieser stellte mir freundlicherweise seine selbst entwickelten S-PLUS Programme zur Verfügung. Weiters haben wir den Datensatz gemeinsam (!) analysiert. Es bietet sich an mit dem Autor Kontakt aufzunehmen, da er sehr an RM Problemen interessiert ist, vor allem am Verhalten seiner Methode an realen Daten. Seine Email Adresse lautet: [EMAIL PROTECTED] Wenn Du Fragen hast mail bitte zurück. Ich hoffe das hilft Dir weiter Liebe Gruesse aus Graz Bernd ------------------------------ STOP ---------------------- ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================