[R] lmer() vs. fixed effects regression
Hi, First, some quick terminology I am using: Fixed effects = model with unit dummy variables Random effects = model without unit dummy variables, integrating unit-level variance out of likelihood I am confused about the difference between the multilevel modeling framework of lmer() and a fixed effects model with unit dummy variables. Say I had the following model with individual-level variable x, estimated with lmer(): model1 - lmer(y ~ x + (1|unit)) How is model1 different from this?: model2 - lm(y ~ x + factor(unit)) I was under the impression that the lmer() function was a random effects estimator as I have defined above. But if you use the command ranef(model1), R returns unit-specific deviations from the intercept. This seems to be more in line with the fixed effects estimator that returns intercept estimates for each unit. Question 2: Why is it that I can add in unit-level predictors into lmer(), which I cannot do in a standard fixed effects model with unit dummies? Thank you. -- View this message in context: http://www.nabble.com/lmer%28%29-vs.-%22fixed-effects%22-regression-tp25577800p25577800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc package: deff() command's formula for the design effect
One additional question: It seems that in practice the design effect is often calculated on the response variable or on individual predictor variables. I have some OLS models using observational data that are clustered. Does it make sense to calculate the design effect on the residuals of one of these models to see the extent to which there is still clustering left even after including covariates? Thomas Lumley wrote: The formula in Hmisc is correct (if the correlation doesn't vary with the cluster size). If you think of the formula for the variance of a sum, it involves adding up all the variances and covariances. A cluster of size k has k^2-k covariances between members, so the total number of covariances is sum(k^2-k) over all the clusters, plus the sum(k) variances. Another way to think of it is that the larger clusters get too much weight, so in addition to the rho*(B-1) factor that you would have for equal-sized clusters there is an additional loss of efficiency due to giving too much weight to the larger clusters. -thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Hmisc-package%3A-deff%28%29-command%27s-formula-for-the-design-effect-tp23415477p23767287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clustered data with Design package--bootcov() vs. robcov()
Another question related to bootcov(): A reviewer is concerned with the fact that bootstrapping the standard errors does not give the same answers each time. What is a good way to address this concern? Could I bootstrap, say, 100 times and report the mean standard error of those 100 estimates? I am already doing 1,000 replications in the bootstrap, but of course the answer is still slightly different each time. Frank E Harrell Jr wrote: robcov does not use bootstrapping. It uses the cluster sandwich (Huber-White) variance-covariance estimator for which there are references in the help file (see especially Lin). Both robcov and bootcov work best when there is a large number of small clusters. If the clusters are somewhat large and greatly vary in size, expect to be in trouble and consider a full modeling approach (generalized least squares, mixed models, etc.). One advantage of robcov is that you get the same result every time, unlike bootstrapping. But even in the case of cluster sizes of one, the sandwich estimator can be inefficient (see the Gould paper) or can result in the right estimates of the wrong quantity (see a paper by Friedman in American Statistician). Frank Thank you. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Clustered-data-with-Design-package--bootcov%28%29-vs.-robcov%28%29-tp23016400p23683238.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hmisc package: deff() command's formula for the design effect
Hello, I have been using the Hmisc package's deff() command for some research with clustered data. I noticed that the formula to calculate the design effect seems a bit different. The formula for the DE is: 1 + rho*(B - 1) In most resources I have seen the formula for B to simply be the average number of observations in a cluster: n/k if n is the total sample size and k is the number of clusters. However, the deff() command calculates B as: sum(number of observations in each cluster^2)/n. That is a bit hard to write without the Sigma operator. In English it is squaring the number of observations in each cluster, adding all those up, and dividing that total by n. Which formula is correct? Thank you! -- View this message in context: http://www.nabble.com/Hmisc-package%3A-deff%28%29-command%27s-formula-for-the-design-effect-tp23415477p23415477.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R graph into MS Word: which format to use?
Hello, The journal I am publishing in requires MS Word files. What is my best option for getting a high quality image of a graph done in R into Word? JPEG? Postscript? Thanks. -- View this message in context: http://www.nabble.com/R-graph-into-MS-Word%3A-which-format-to-use--tp23133745p23133745.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clustered data with Design package--bootcov() vs. robcov()
Hi, I am trying to figure out exactly what the bootcov() function in the Design package is doing within the context of clustered data. From reading the documentation/source code it appears that using bootcov() with the cluster argument constructs standard errors by resampling whole clusters of observations with replacement rather than resampling individual observations. Is that right, and is there any more detailed documentation on the math behind this? Also, what is the difference between these two functions: bootcov(my.model, cluster.id) robcov(my.model, cluster.id) Thank you. -- View this message in context: http://www.nabble.com/Clustered-data-with-Design-package--bootcov%28%29-vs.-robcov%28%29-tp23016400p23016400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulate binary data for a logistic regression Monte Carlo
Hello, I am trying to simulate binary outcome data for a logistic regression Monte Carlo study. I need to eventually be able to manipulate the structure of the error term to give groups of observations a random effect. Right now I am just doing a very basic set up to make sure I can recover the parameters properly. I am running into trouble with the code below. It works if you take out the object 'epsilon,' but I need that object because it is where I plan to add in the random effects eventually. Any suggestions? Thanks. set.seed(1) alpha - numeric(100) # Vector to store coefficient estimates X1 - numeric(100) X2 - numeric(100) a - 0 # True parameters B1 - .5 B2 - .85 for (i in 1:100){ x1 - rnorm(1000) x2 - rnorm(1000) epsilon - rnorm(1000) LP - a + B1*x1 + B2*x2 + epsilon # Linear predictor y - rbinom(length(LP), 1, plogis(LP)) # Binomial link model - glm(y ~ x1 + x2, family = binomial) alpha[i] - coef(model)[1] X1[i] - coef(model)[2] X2[i] - coef(model)[3] } mean(X1) # Shows a bias downward (should be about .5) mean(X2) # Shows a bias downward (should be about .85) mean(alpha) # Pretty close (should be about 0) -- View this message in context: http://www.nabble.com/Simulate-binary-data-for-a-logistic-regression-Monte-Carlo-tp22928872p22928872.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Choosing how to vary slopes and intercepts in a hierarchical model
Hi, Is there any set of rules for deciding exactly how to vary slopes and intercepts in a HLM? I have NOMINATE scores from the senate over three years. So I have multiple observations of senators, nested in states, nested in years. Covariates include level-1 (senator in a specific year) variables and state-level variables. I am having a hard time figuring out which combination of random slopes and intercepts to use. What is the best way to compare model specifications using the lmer() function? Should it just be a likelihood ratio test? Also, how do I test whether the random effects for a particular coefficient are significant? The model I fit first based on theory fit reasonably well, but I have been able to get specifications with higher likelihoods. How do I justify reporting those fits? Thanks. -- View this message in context: http://www.nabble.com/Choosing-how-to-vary-slopes-and-intercepts-in-a-hierarchical-model-tp2221p2221.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difference between GEE and Robust Cluster Standard Errors
Hello, I know that two possible approaches to dealing with clustered data would be GEE or a robust cluster covariance matrix from a standard regression. What are the differences between these two methods, or are they doing the same thing? Thanks. -- View this message in context: http://www.nabble.com/Difference-between-GEE-and-Robust-Cluster-Standard-Errors-tp22092897p22092897.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What is the R equivalent of STATA's 'drop' command?
Hello, I am trying to do some data cleaning in R. I need to drop observations that take on certain values of a variable. In STATA I might type something like: drop if variable name == 3 drop if variable name == 4 Is there an R equivalent of this? I have tried playing around with the subset command, but it seems a bit clunky. What would an advanced R user's approach be for something like this? Thank you! -- View this message in context: http://www.nabble.com/What-is-the-R-equivalent-of-STATA%27s-%27drop%27-command--tp21925249p21925249.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating clustered data with a random effect for a MC analysis
I am trying to create a data set in R with a random cluster effect. I would like to do it so I can set the intraclass correlation coefficient (ICC) in the data generating process. I have tried a few methods, but cannot seem to get enough control over the ICC. I have been estimating the ICC using the deff command in the Hmisc package. The main problem is that when I adjust the number of clusters, the ICC changes. Below is an example of one approach I have tried to generate the data. Thanks for any help you can give. set.seed(82184) n - 1200 # Sample size nc - 50 # Number of clusters p - .1 # ICC sims - 100 # Number of Monte Carlo simulations a - 0 # True model coefficients B1 - 0.85 B2 - 0.5 cluster - rep(1:nc, each = n/nc) # Cluster label for (i in 1:sims){ # Simulate a clustered data set c.sigma - matrix(c(4, 0, 0, p), ncol = 2) # Cluster-level random effects c.values - rmvnorm(n = nc, sigma = c.sigma) randeff1 - rep(c.values[ , 1], each = n/nc) randeff2 - rep(c.values[ , 2], each = n/nc) i.sigma - matrix(c(1, 0, 0, 1), ncol = 2) # Individual-level random effects i.values - rmvnorm(n = n, sigma = i.sigma) randeff3 - i.values[ , 1] randeff4 - i.values[ , 2] X1 - 3 + randeff1 + randeff3 # X1 values unique to individual observations X2 - randeff1 # X2 values unique to clusters of observations epsilon - randeff2 + randeff4 # Two components of the error term Y - a + B1*X1 + B2*X2 + epsilon # True model } -- View this message in context: http://www.nabble.com/Creating-clustered-data-with-a-random-effect-for-a-MC-analysis-tp21630341p21630341.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'equivalent' sign in plot title
Hello, I would like to add the 'equivalent' sign (i.e., three horizontal lines, almost like an equal sign) to a plot. Does R have anything like the LaTeX command '\equiv' that I could put into a plot title? Thanks -- View this message in context: http://www.nabble.com/%27equivalent%27-sign-in-plot-title-tp21360963p21360963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help creating spatial correlation for MC simulation
Thank you for the input. Which command in the spatstat package am I looking for? The documentation is unclear to me. milton ruser wrote: Dear J.J.Harden I think that on spatial stat you will find several ways of simulate spatial pattern that (point or line) that may be what you are looking for. Case not, please let me know and may be we can improve some solution. Best wishes, miltinho astronauta brazil On Wed, Sep 17, 2008 at 7:36 PM, jjh21 [EMAIL PROTECTED] wrote: I want to create a dataset in R with spatial correlation (i.e. clustering) built in for a linear regression analysis. Any tips on how to do this? Thanks. -- View this message in context: http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19610885.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help creating spatial correlation for MC simulation
I want to create a dataset in R with spatial correlation (i.e. clustering) built in for a linear regression analysis. Any tips on how to do this? Thanks. -- View this message in context: http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.