Pooled Cross-sectionnal time-series regression
Hi all, I am in the process of analyzing data of such type: We have data on 1800 doctors over 49 months:few dependent variables (a particular drug prescription level), few independent (some time related (severity of patients seen in the months, practice volume,...) and some constant over time: university, sex, and years of practive (which can also be considered as time dependant)). id month y timeind1 ... timedep1 ... 1 1 4130 1 2 6136 1 3 6138 ... 1 48 7 135 1 49 6 136 2 1 3.6 062 2 2 3 058 2 3 5 068 ... 2 48 5 075 2 49 2 070 3 Basically, a bulletin was introduced at month 37, we want to assess if this bulletin had an effect on a particular drug prescription pattern (y). What we plan to do is to model y in terms of the dependant variables based on the first 36 months, and then forecast (including a CI) on the last 13 months. We will compare the forecast to the observed values. I have a report from another team who did quite the same type of analysis. They used the PROC TSCSREG in SAS. The options were RANTWO: 2 random factors (time and MD), that's ok. And the Parks option which allows for a first-order autoregressive term in the model, which we need has the autocorrelation is present. Time independant variables were introduced by stratified analysis and not directly introduced in the model. Basically, I introduced in the model few explicative variables (number of patients,... the month (1 to 36) for historical trend, and few dummy variables (january (1/0, february... november) for seasonnality variations. I have several problems: the main one is the interpretation of the parameters: I have 2 axis: time and individual. For example: Do a positive parameter mean: MD's with a high level of this X tend to have a high level of Y? is that true a any time? Or does is mean that: when X increases over time, so Y increases over time. I am messed up with those 2 dimensions. I am interested in the evolution of Y along time, taking into account doctors' characteristics. My other problem is more technical: including the AR(1) term (the autocorrelation is often around 0.75 for each doctor), I easily get a r^2 of 0.99, with very few variables in the model (Y is quite stationnary over time). Does that means that Y is mainly explained by the autocorrelation, and that any slightly correlated variable just finish to wrap the left variance up. I also have problem with the estimations: 1 computed matrix is singular, and this matrix is used for estimations. It happens as the sample size increases. I cannot see how I will deal with that. i am supposed to stratify the analyses per doctors characterictics (sex and university (and age))., so it may be the way to get away with it. So, anyone has experience with such type of analyses? Some hints or views to share? Thanks in advance, JP PS: we plan other analyses, not related to the bulletin impact, but more on trends over the 49 months. Are the trends similar for the different types of MD's. We are more interested in the trends rather than in the actual level (at any time) for each group of MD's as the relation between mds' characterictics and level of prescription are already known. PS2: does anyone know a good biostat/epidemiology newsgroup === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Power for Pilot Studies
well, this is interesting indeed ... for let's say that you did adopt a .1 level for a pilot AND, you just happened to reject the null IN the pilot ... is THAT sufficient justification for committing more time and resources TO a large main study?? the implication from this pilot result is that ... this means that the trend would continue in a more dramatic or noticeable way IN a main study ... and you have no justification for that based on the p value you have adopted in your pilot study ... or, think about it this way ... what IF your p value had been .001 based on your pilot data ... why do any more? well, there are lots of reasons ... the importance of the p value is so minimal even in the best of circumstances ... to use it THIS way is even worse. the main thing you should be doing in a pilot study ... is to first ... iron out the bugs of the methods and procedures ... after all, someone should have already approved of the 'idea' ... and here is a chance to get your ducks in a row with protocols, instrumentation, times for doing things, etc. the second main purpose would be to see if there is ANY evidence at all in the generally predicted direction ... FORget the application of the fancy inferential tests and worrying about alpha ... or power at this stage ... At 01:21 PM 03/15/2000 -0800, Andy Avins wrote: >We proposed a pilot clinical trial that was shot down by a local review >committee. are you saying that it was shot down BECAUSE you proposed to use a pilot alpha of .25??? this is hard to believe ... but, it could be true ... if it were and that were the ONLY problem ... is this not SO simple to fix??? just change it to .1! i would suspect that there are other more important issues that were considered ... but since we are not privy to these ... it is difficult to comment here > Lacking any other guidance, we arbitrarily chose an alpha of 0.25 for > doing the power calculations (reasoning that we didn't want to set too > stringent a standard for rejecting the null and not proceeding with a > more definitive trial). We were criticized for not adopting a more > conventional standard of alpha=0.10. I've never heard that there was any > convention for this sort of calculation. the basic idea is ... in a pilot ... don't make it AS hard to reject the null as in a full blown study ... but, the use of .1 as THE value for a pilot is just as arbitrary as it is to say that we will use .05 or .01 for the MAIN study ... focusing on alpha is only half the deal ... there is a type II error too that could be FAR more critical in a given setting than alpha ... WHEN WILL WE GET OVER OUR FIXATION ON ALPHA!!! >Does anyone have any thoughts or references for sample size calculations >for pilot studies? >Thanks much in advance! >--Andy >--Andy Avins, MD, MPH >Assistant Professor >Department of Medicine >Department of Epidemiology & Biostatistics >University of California, San Francisco >E-mail: [EMAIL PROTECTED] >Tel: 415-597-9196 > > > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Power for Pilot Studies
We proposed a pilot clinical trial that was shot down by a local review committee. Lacking any other guidance, we arbitrarily chose an alpha of 0.25 for doing the power calculations (reasoning that we didn't want to set too stringent a standard for rejecting the null and not proceeding with a more definitive trial). We were criticized for not adopting a more conventional standard of alpha=0.10. I've never heard that there was any convention for this sort of calculation. Does anyone have any thoughts or references for sample size calculations for pilot studies? Thanks much in advance! --Andy --Andy Avins, MD, MPH Assistant Professor Department of Medicine Department of Epidemiology & Biostatistics University of California, San Francisco E-mail: [EMAIL PROTECTED] Tel: 415-597-9196 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Off topic
Grandpa, what's a card punch key? --- You wrote: William Dudley wrote: > Please excuse an off topic question. > I am looking for a citation for a statement about learning statistics. > > I believe that Richard Harris wrote in his Primer of Multivariate > Statistics something to the effect that: > The ability to do statistics is as much in the fingertips poised above a > keyboard or calculator as it is in the brain... > > My memory of this comes from a graduate school course ten years ago and > is a bit foggy. > I checked the 1975 edition and did not find such a statment. > Any ideas out there? -- Look again. Page 28: "True understanding of any statistical technique resides at least as much in the fingertips (be they caressing a pencil or poised over desk calculator or card punch keys) as in the cortex." Daddy, what's a desk calculator? * `o^o' * Neil W. Henry ([EMAIL PROTECTED]) * -<:>- * Virginia Commonwealth University * _/ \_ * Richmond VA 23284-2014 * *(804)828-1301 x124 (math sciences, 2037c Oliver) * *FAX: 828-8785 http://saturn.vcu.edu/~nhenry * * === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === --- end of quote --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Cluster and outliers
I would start by looking in Seber's text, Multivariate Observations. I am not sure because I don't have it handy right now, but I think the topic is covered. There is an excellent discussion of principal components and outliers for sure in Seber. On Sun, 12 Mar 2000, Nicolas MEYER wrote: > Hi everybody !! > > I'm desperately looking for books or papers on possible links beetwen > cluster analysis and outliers, cluster being of course used to detect > outlier(s). > Does anybody knows anything about this ? > Thank's !! > > Nicolas MEYER > Interne en Santé Publique > CHU Strasbourg-FRANCE > > > > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Off topic
William Dudley wrote: > Please excuse an off topic question. > I am looking for a citation for a statement about learning statistics. > > I believe that Richard Harris wrote in his Primer of Multivariate > Statistics something to the effect that: > The ability to do statistics is as much in the fingertips poised above a > keyboard or calculator as it is in the brain... > > My memory of this comes from a graduate school course ten years ago and > is a bit foggy. > I checked the 1975 edition and did not find such a statment. > Any ideas out there? -- Look again. Page 28: "True understanding of any statistical technique resides at least as much in the fingertips (be they caressing a pencil or poised over desk calculator or card punch keys) as in the cortex." Daddy, what's a desk calculator? * `o^o' * Neil W. Henry ([EMAIL PROTECTED]) * -<:>- * Virginia Commonwealth University * _/ \_ * Richmond VA 23284-2014 * *(804)828-1301 x124 (math sciences, 2037c Oliver) * *FAX: 828-8785 http://saturn.vcu.edu/~nhenry * * === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: When *must* use weighted LS?
John-- If you are interested in PREDICTION then the way YOU use your information is up to YOU. By Cross-validation, Resampling etc. you can determine which prediction method seems to be "best" for your situation. -- Joe * Joe Ward Health Careers High School ** 167 East Arrowhead Dr 4646 Hamilton Wolfe ** San Antonio, TX 78228-2402 San Antonio, TX 78229 ** Phone: 210-433-6575 Phone: 210-617-5400 ** Fax: 210-433-2828 Fax: 210-617-5423 ** [EMAIL PROTECTED] ** http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: John Hendrickx <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, March 15, 2000 1:22 AM Subject: Re: When *must* use weighted LS? | In article <8am7d1$hqj$[EMAIL PROTECTED]">8am7d1$hqj$[EMAIL PROTECTED]>, | [EMAIL PROTECTED] says...| > | > I think I made the formulation too wordy in previous| > post. | > | > Let me try this simple question:| > | > When one wishes to do a (multi)linear regression on a set of | > observed data, and one is in the (unusual) position of possessing| > a set of sample standard deviations (of varying degrees of f.) | > at each value of the "explanatory" variable, how does one| > determine whether one ought or ought not to solve the weighted| > least squares problem using those sample standard deviations?| > | > What is the usual decision test for "heterscedasticity" *before* one| > solves the regression system? What do people do in practise?| > | Most social scientists don't worry very much about the assumptions of OLS | regression, noting that OLS estimates are fairly robust and can give | unbiased estimates even if those assumptions aren't fulfilled. Exceptions | are multilevel models and time series data, data for which the assumption | of uncorrelated error terms is violated. But these require special | programs, not weighted least squares.| | There is also some debate on using weights for stratified sampling and/or | to correct for sampling bias. Weighting leads to correct estimates but | incorrect standard errors. One solution is to include the design | variables in the model instead of weighting. Stata and Wesvar are two | programs that can take weighting into account when calculating standard | errors of estimates. But a quite common approach is to use weights for | descriptive statistics, but not in multivariate models.| | Weights can also be used for certain dependent variables that will | violate the assumption of heteroscedasticity, e.g. a dichotomous | dependent. I recently did a weighted least squares analysis for a co-| worker to replicate an analysis in another paper. The weight was | groupn*pct*(1-pct), where groupn was the number of cases per group and | pct was the proportion with a positive response within each group. But | this basically amounts to a poor approximation of a logit model. Programs | like GLIM that use iteratively reweighted least squares use pct*(1-pct) | as the weight when estimating the model, but now pct is the predicted | probability from the previous iteration.| | As for a test for heteroscedasticity, Stata has a "hettest", which | performs a Cook-Weisberg test and produces a chi-square statistic. They | wrote a book in 1982, "Residuals and influence in regression". I've never | used it though.| | Hope this helps,| John Hendrickx| | | ===| This list is open to everyone. Occasionally, less thoughtful| people send inappropriate messages. Please DO NOT COMPLAIN TO| THE POSTMASTER about these messages because the postmaster has no| way of controlling them, and excessive complaints will result in| termination of the list.| | For information about this list, including information about the| problem of inappropriate messages and information about how to| unsubscribe, please see the web page at| http://jse.stat.ncsu.edu/| ===|
help with crosstabs question in SPSS
1. Figures given in the UK Department of Education and Science publication Statistics of Education 1980. They classify a sample of 749 students leaving school in England in 1979-80 by sex and by achievement in public examinations in two subjects, Mathematics and French. At that time there were two separate examinations taken at about age 16, the CSE and the O-level. O-levels were considered more challenging and more academic than CSEs, but a grade 1 pass at CSE was considered equivalent to a pass at O-level. The four categories of achievement for each subject are as follows. ? (a) Did not attempt CSE or O-level. ? (b) Attempted CSE/O-level but did not pass. ? (c) Passed CSE at grades 2-5 or O-level at grades D-E. ? (d) Passed CSE at grade 1 or O-level at grades A-C. Is there a relationship between sex and achievement in either of the two subjects? What specific inferences can you make? (e.g. are males more likely to pass than females in mathematics?). I have tested to see if the data is normal, this seems to be the case. Should I be using crosstabs? Should I recode the data to combine subject with acheivement? I have attached the raw data below, can anyone help? Cheers Julian 82.00 mathematics male (a) 13.00 mathematics male ( b) 176.00 mathematics male (c) 112.00 mathematics male (d) 74.00 mathematics female(a) 23.00 mathematics female(b) 184.00 mathematics female(c) 85.00 mathematics female(d) 289.00 frenchmale (a) 10.00 frenchmale (b) 44.00 french male (c) 40.00 frenchmale (d) 221.00 french female (a) 8.00 french female(b) 74.00 french female (c) 63.00 french female (d) === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Off topic
Here's what I get from the 1985 edition, p. 39. "True understanding of any statistical technique resides at least as much in the fingertips (be they caressing a pencil or poised over a desk calculator or a CRT keyboard) as in the cortex." --- You wrote: Please excuse an off topic question. I am looking for a citation for a statement about learning statistics. I believe that Richard Harris wrote in his Primer of Multivariate Statistics something to the effect that: The ability to do statistics is as much in the fingertips poised above a keyboard or calculator as it is in the brain... My memory of this comes from a graduate school course ten years ago and is a bit foggy. I checked the 1975 edition and did not find such a statment. Any ideas out there? Thanks Bill === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === --- end of quote --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Helmert Transformation in GLM
Dear Martin, It's possible that the GLM procedure of SAS with HELMERT in the REPEATED may respond to your question, because, HELMERT permits to COMPARE each level of the factor (the within=time in this case) with the MEAN of the subsequent levels. In the output: "Analysis of contrast variables" you can see the level of significant (p value), You can consider that the plateau is reached at time where the p value become not significant. But if you want to see that for each treatment, you must specify the "By treatment" statement in Proc GLM and replace the statement "model Y1-Y7 = treatment /nouni" by "model model Y1-Y7 = /nouni" NB. I think, that you must begin by plotting Y versus time by treatment to look if you can expect a plateau from your data. Good luck. Hassane |===| | Hassane ABIDI (PhD) | | Unite d'Epidemiologie; Centre Hospitalier Lyon-Sud| | Pavillon 1.M, 69495 Pierre Benite Cedex, France | | Tel: (33) 04 78 86 56 87 ; Fax: (33) 04 78 86 33 31 | | E. mail: [EMAIL PROTECTED] | |===| [EMAIL PROTECTED] wrote: > > I have the following problem ... > > I have 5 treatments measured at 7 time > intervals. > > How can I answer the following question: "At > which point does each treatment reaches a > plateau?"? > > I think this question can be answered by the PROC > GLM procedure in SAS with HELMERT in the REPEATED > statement. > > model Y1-Y7 = treatment /nouni > repeated vis 7 helmert / canon printm printe; > > Am I right? If yes, how do I interpret the SAS > output? Is there a p-value associated with the > question "At which point does each treatment > reaches a plateau?"? > > Thank you, > > Martin > > Sent via Deja.com http://www.deja.com/ > Before you buy. > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: When *must* use weighted LS?
In article <8am7d1$hqj$[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > > I think I made the formulation too wordy in previous > post. > > Let me try this simple question: > > When one wishes to do a (multi)linear regression on a set of > observed data, and one is in the (unusual) position of possessing > a set of sample standard deviations (of varying degrees of f.) > at each value of the "explanatory" variable, how does one > determine whether one ought or ought not to solve the weighted > least squares problem using those sample standard deviations? > > What is the usual decision test for "heterscedasticity" *before* one > solves the regression system? What do people do in practise? > Most social scientists don't worry very much about the assumptions of OLS regression, noting that OLS estimates are fairly robust and can give unbiased estimates even if those assumptions aren't fulfilled. Exceptions are multilevel models and time series data, data for which the assumption of uncorrelated error terms is violated. But these require special programs, not weighted least squares. There is also some debate on using weights for stratified sampling and/or to correct for sampling bias. Weighting leads to correct estimates but incorrect standard errors. One solution is to include the design variables in the model instead of weighting. Stata and Wesvar are two programs that can take weighting into account when calculating standard errors of estimates. But a quite common approach is to use weights for descriptive statistics, but not in multivariate models. Weights can also be used for certain dependent variables that will violate the assumption of heteroscedasticity, e.g. a dichotomous dependent. I recently did a weighted least squares analysis for a co- worker to replicate an analysis in another paper. The weight was groupn*pct*(1-pct), where groupn was the number of cases per group and pct was the proportion with a positive response within each group. But this basically amounts to a poor approximation of a logit model. Programs like GLIM that use iteratively reweighted least squares use pct*(1-pct) as the weight when estimating the model, but now pct is the predicted probability from the previous iteration. As for a test for heteroscedasticity, Stata has a "hettest", which performs a Cook-Weisberg test and produces a chi-square statistic. They wrote a book in 1982, "Residuals and influence in regression". I've never used it though. Hope this helps, John Hendrickx === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Elliptical plots with 84% bivariate conficence regions
Hi to all, has anybody an idea, whether the following sentence should be correct? 'non-overlapping 84% bivariate confidence regions approximate statistically significant differences with p < 0.05. Two-sided test.' Many thanks, Ralf === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Turning point
Hello, There is a soft at the INE (Spanish Statistical Institute) for detecting turning points in time series. Not as user friendly as you could expect. The bibliography of the user manual is comprehensive. http://www.ine.es/htdocs/daco/daco42/daco4214/soft1.htm Yves Taweewan Sidthidet <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > Everyone know about "turining poing" in forecasting? > Can you anyone help me about how to calculate this turning point in > Econometric Model? > Thanking you in anticipation. > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===