from:"jjh21"

[R] lmer() vs. fixed effects regression

2009-09-24 Thread jjh21


Hi,

First, some quick terminology I am using:
Fixed effects = model with unit dummy variables
Random effects = model without unit dummy variables, integrating unit-level
variance out of likelihood

I am confused about the difference between the multilevel modeling framework
of lmer() and a fixed effects model with unit dummy variables. Say I had
the following model with individual-level variable x, estimated with lmer():

model1 - lmer(y ~ x + (1|unit))

How is model1 different from this?:

model2 - lm(y ~ x + factor(unit))

I was under the impression that the lmer() function was a random effects
estimator as I have defined above. But if you use the command ranef(model1),
R returns unit-specific deviations from the intercept. This seems to be more
in line with the fixed effects estimator that returns intercept estimates
for each unit.

Question 2: Why is it that I can add in unit-level predictors into lmer(),
which I cannot do in a standard fixed effects model with unit dummies?

Thank you.

-- 
View this message in context: 
http://www.nabble.com/lmer%28%29-vs.-%22fixed-effects%22-regression-tp25577800p25577800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hmisc package: deff() command's formula for the design effect

2009-05-28 Thread jjh21

One additional question:

It seems that in practice the design effect is often calculated on the
response variable or on individual predictor variables. I have some OLS
models using observational data that are clustered. Does it make sense to
calculate the design effect on the residuals of one of these models to see
the extent to which there is still clustering left even after including
covariates?

Thomas Lumley wrote:

The formula in Hmisc is correct (if the correlation doesn't vary with the
cluster size). If you think of the formula for the variance of a sum, it
involves adding up all the variances and covariances. A cluster of size k
has k^2-k covariances between members, so the total number of covariances
is sum(k^2-k) over all the clusters, plus the sum(k) variances.

Another way to think of it is that the larger clusters get too much
weight, so in addition to the rho*(B-1) factor that you would have for
equal-sized clusters there is an additional loss of efficiency due to
giving too much weight to the larger clusters.

-thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
View this message in context:
http://www.nabble.com/Hmisc-package%3A-deff%28%29-command%27s-formula-for-the-design-effect-tp23415477p23767287.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clustered data with Design package--bootcov() vs. robcov()

2009-05-23 Thread jjh21

Another question related to bootcov():

A reviewer is concerned with the fact that bootstrapping the standard errors
does not give the same answers each time. What is a good way to address this
concern? Could I bootstrap, say, 100 times and report the mean standard
error of those 100 estimates? I am already doing 1,000 replications in the
bootstrap, but of course the answer is still slightly different each time.

Frank E Harrell Jr wrote:

robcov does not use bootstrapping. It uses the cluster sandwich
(Huber-White) variance-covariance estimator for which there are
references in the help file (see especially Lin).

Both robcov and bootcov work best when there is a large number of small
clusters. If the clusters are somewhat large and greatly vary in size,
expect to be in trouble and consider a full modeling approach
(generalized least squares, mixed models, etc.).

One advantage of robcov is that you get the same result every time,
unlike bootstrapping. But even in the case of cluster sizes of one, the
sandwich estimator can be inefficient (see the Gould paper) or can
result in the right estimates of the wrong quantity (see a paper by
Friedman in American Statistician).

Frank

Thank you.

--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University

--
View this message in context:
http://www.nabble.com/Clustered-data-with-Design-package--bootcov%28%29-vs.-robcov%28%29-tp23016400p23683238.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hmisc package: deff() command's formula for the design effect

2009-05-06 Thread jjh21


Hello,

I have been using the Hmisc package's deff() command for some research with
clustered data. I noticed that the formula to calculate the design effect
seems a bit different. The formula for the DE is:

1 + rho*(B - 1)

In most resources I have seen the formula for B to simply be the average
number of observations in a cluster: n/k if n is the total sample size and k
is the number of clusters.

However, the deff() command calculates B as: sum(number of observations in
each cluster^2)/n. 

That is a bit hard to write without the Sigma operator. In English it is
squaring the number of observations in each cluster, adding all those up,
and dividing that total by n.

Which formula is correct? Thank you!
-- 
View this message in context: 
http://www.nabble.com/Hmisc-package%3A-deff%28%29-command%27s-formula-for-the-design-effect-tp23415477p23415477.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R graph into MS Word: which format to use?

2009-04-20 Thread jjh21


Hello,

The journal I am publishing in requires MS Word files. What is my best
option for getting a high quality image of a graph done in R into Word?
JPEG? Postscript?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/R-graph-into-MS-Word%3A-which-format-to-use--tp23133745p23133745.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Clustered data with Design package--bootcov() vs. robcov()

2009-04-12 Thread jjh21


Hi,

I am trying to figure out exactly what the bootcov() function in the Design
package is doing within the context of clustered data. From reading the
documentation/source code it appears that using bootcov() with the cluster
argument constructs standard errors by resampling whole clusters of
observations with replacement rather than resampling individual
observations. Is that right, and is there any more detailed documentation on
the math behind this? Also, what is the difference between these two
functions:

bootcov(my.model, cluster.id)
robcov(my.model, cluster.id)

Thank you.
-- 
View this message in context: 
http://www.nabble.com/Clustered-data-with-Design-package--bootcov%28%29-vs.-robcov%28%29-tp23016400p23016400.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simulate binary data for a logistic regression Monte Carlo

2009-04-07 Thread jjh21


Hello,

I am trying to simulate binary outcome data for a logistic regression Monte
Carlo study. I need to eventually be able to manipulate the structure of the
error term to give groups of observations a random effect. Right now I am
just doing a very basic set up to make sure I can recover the parameters
properly. I am running into trouble with the code below. It works if you
take out the object 'epsilon,' but I need that object because it is where I
plan to add in the random effects eventually. Any suggestions? Thanks.

set.seed(1)
alpha - numeric(100) # Vector to store coefficient estimates
X1 - numeric(100) 
X2 - numeric(100)

a - 0 # True parameters
B1 - .5
B2 - .85

for (i in 1:100){
 x1 - rnorm(1000)
 x2 - rnorm(1000)
 epsilon - rnorm(1000) 
  
 LP - a + B1*x1 + B2*x2 + epsilon # Linear predictor
 y - rbinom(length(LP), 1, plogis(LP)) # Binomial link
 
 model - glm(y ~ x1 + x2, family = binomial)
 alpha[i] - coef(model)[1]
 X1[i] - coef(model)[2]
 X2[i] - coef(model)[3]
}

mean(X1) # Shows a bias downward (should be about .5)
mean(X2) # Shows a bias downward (should be about .85)
mean(alpha) # Pretty close (should be about 0)
-- 
View this message in context: 
http://www.nabble.com/Simulate-binary-data-for-a-logistic-regression-Monte-Carlo-tp22928872p22928872.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Choosing how to vary slopes and intercepts in a hierarchical model

2009-02-25 Thread jjh21


Hi,

Is there any set of rules for deciding exactly how to vary slopes and
intercepts in a HLM? I have NOMINATE scores from the senate over three
years. So I have multiple observations of senators, nested in states, nested
in years. Covariates include level-1 (senator in a specific year) variables
and state-level variables. I am having a hard time figuring out which
combination of random slopes and intercepts to use. What is the best way to
compare model specifications using the lmer() function? Should it just be a
likelihood ratio test? Also, how do I test whether the random effects for a
particular coefficient are significant? The model I fit first based on
theory fit reasonably well, but I have been able to get specifications with
higher likelihoods. How do I justify reporting those fits?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Choosing-how-to-vary-slopes-and-intercepts-in-a-hierarchical-model-tp2221p2221.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Difference between GEE and Robust Cluster Standard Errors

2009-02-18 Thread jjh21


Hello,

I know that two possible approaches to dealing with clustered data would be
GEE or a robust cluster covariance matrix from a standard regression. What
are the differences between these two methods, or are they doing the same
thing? Thanks.
-- 
View this message in context: 
http://www.nabble.com/Difference-between-GEE-and-Robust-Cluster-Standard-Errors-tp22092897p22092897.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] What is the R equivalent of STATA's 'drop' command?

2009-02-09 Thread jjh21


Hello,

I am trying to do some data cleaning in R. I need to drop observations that
take on certain values of a variable. In STATA I might type something like:

drop if variable name == 3
drop if variable name == 4

Is there an R equivalent of this? I have tried playing around with the
subset command, but it seems a bit clunky. What would an advanced R user's
approach be for something like this?

Thank you!


-- 
View this message in context: 
http://www.nabble.com/What-is-the-R-equivalent-of-STATA%27s-%27drop%27-command--tp21925249p21925249.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating clustered data with a random effect for a MC analysis

2009-01-23 Thread jjh21


I am trying to create a data set in R with a random cluster effect. I would
like to do it so I can set the intraclass correlation coefficient (ICC) in
the data generating process. I have tried a few methods, but cannot seem to
get enough control over the ICC. I have been estimating the ICC using the
deff command in the Hmisc package. The main problem is that when I
adjust the number of clusters, the ICC changes. Below is an example of one
approach I have tried to generate the data. Thanks for any help you can
give.



set.seed(82184)
n - 1200 # Sample size
nc - 50 # Number of clusters
p - .1 # ICC
sims - 100 # Number of Monte Carlo simulations

a - 0 # True model coefficients
B1 - 0.85
B2 - 0.5

cluster - rep(1:nc, each = n/nc) # Cluster label

  for (i in 1:sims){ # Simulate a clustered data set

  c.sigma - matrix(c(4, 0, 0, p), ncol = 2) # Cluster-level random effects
  c.values - rmvnorm(n = nc, sigma = c.sigma)
  randeff1 - rep(c.values[ , 1], each = n/nc)
  randeff2 - rep(c.values[ , 2], each = n/nc)

  i.sigma - matrix(c(1, 0, 0, 1), ncol = 2) # Individual-level random
effects
  i.values - rmvnorm(n = n, sigma = i.sigma)
  randeff3 - i.values[ , 1]
  randeff4 - i.values[ , 2]

  X1 - 3 + randeff1 + randeff3 # X1 values unique to individual
observations
  X2 - randeff1 # X2 values unique to clusters of observations
  epsilon - randeff2 + randeff4 # Two components of the error term

  Y - a + B1*X1 + B2*X2 + epsilon # True model
}
-- 
View this message in context: 
http://www.nabble.com/Creating-clustered-data-with-a-random-effect-for-a-MC-analysis-tp21630341p21630341.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 'equivalent' sign in plot title

2009-01-08 Thread jjh21


Hello,

I would like to add the 'equivalent' sign (i.e., three horizontal lines,
almost like an equal sign) to a plot. Does R have anything like the LaTeX
command '\equiv' that I could put into a plot title?

Thanks
-- 
View this message in context: 
http://www.nabble.com/%27equivalent%27-sign-in-plot-title-tp21360963p21360963.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help creating spatial correlation for MC simulation

2008-09-22 Thread jjh21


Thank you for the input.

Which command in the spatstat package am I looking for? The documentation is
unclear to me.


milton ruser wrote:
 
 Dear J.J.Harden
 
 I think that on spatial stat you will find several ways of simulate
 spatial
 pattern that (point or line) that may be what you are looking for. Case
 not,
 please let me know and may be we can improve some solution.
 
 Best wishes,
 
 miltinho astronauta
 brazil
 
 
 
 On Wed, Sep 17, 2008 at 7:36 PM, jjh21 [EMAIL PROTECTED] wrote:
 

 I want to create a dataset in R with spatial correlation (i.e.
 clustering)
 built in for a linear regression analysis. Any tips on how to do this?
 Thanks.
 --
 View this message in context:
 http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19610885.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need help creating spatial correlation for MC simulation

2008-09-17 Thread jjh21


I want to create a dataset in R with spatial correlation (i.e. clustering)
built in for a linear regression analysis. Any tips on how to do this?
Thanks.
-- 
View this message in context: 
http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.