from:"Andrew Miles"

[R] Lavaan: Immediate non-positive definite matrix

2012-08-10 Thread Andrew Miles

Hi,

I recently tried to estimate a linear unconditional latent growth curve on
7 repeated measures using lavaan (most recent version):

modspec='

alpha =~ 1*read_g0 + 1*read_g1 + 1*read_g2 + 1*read_g3 + 1*read_g4 +
1*read_g5 + 1*read_g6

beta =~ 0*read_g0 + 1*read_g1 + 2*read_g2 + 3*read_g3 + 4*read_g4 +
5*read_g5 + 6*read_g6

'

gmod=lavaan(modspec, data=math, meanstructure=T, int.ov.free=F, int.lv.free=
T, auto.var=T, auto.cov.lv.x=T, auto.cov.y=T, missing=direct, verbose=T)
The model immediately returned the following error message:

Error in chol.default(S) :

  the leading minor of order 5 is not positive definite

Error in lavSampleStatsFromData(Data = lavaanData, rescale =
(lavaanOptions$estimator ==  :

  lavaan ERROR: sample covariance can not be inverted
I ran the same model in Mplus and found that it has low covariance coverage
for the 7 repeated measures, but all coverage is greater than 0.  The model
ran fine in Mplus once I set the covariance coverage limit to .001.  That
provided me starting values to try in lavaan, but again it immediately
returned the same error message.  In fact, nothing I could do allowed me to
fit the model without it immediately returning the same error message
(e.g., I tried changing the optimizer, the type of likelihood being used).

Because the error message pops up immediately (i.e., not after lavaan tries
to estimate the model for a few seconds), it makes me think that my data is
failing some sort of initial data check.

Any ideas what is happening, or what to do about it?  Thanks.

P.S.  I haven't had success in attaching data files when I submit to the R
help list, so if you would like the data please email me at
andrewami...@hotmail.com.

Andrew Miles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FIML using lavaan returns zeroes for coefficients

2012-07-23 Thread Andrew Miles

Thanks for the helpful explanation.  

As to your question, I sometimes use lavaan to fit univariate regressions 
simply because it can handle missing data using FIML rather than listwise 
deletion.  Are there reasons to avoid this?

BTW, thanks for the update in the development version.

Andrew Miles


On Jul 21, 2012, at 12:59 PM, yrosseel wrote:

 On 07/20/2012 10:35 PM, Andrew Miles wrote:
 Hello!
 
 I am trying to reproduce (for a publication) analyses that I ran
 several months ago using lavaan, I'm not sure which version, probably
 0.4-12. A sample model is given below:
 
 pathmod='mh30days.log.w2 ~ mh30days.log + joingroup + leavegroup +
 alwaysgroup + grp.partic.w2 + black + age + bivoc + moved.conf +
 local.noretired + retired + ds + ministrytime + hrswork +
 nomoralescore.c + negint.c + cong.conflict.c + nomoraleXjoin +
 nomoraleXleave + nomoraleXalways + negintXjoin + negintXleave +
 negintXalways + conflictXjoin + conflictXleave + conflictXalways '
 mod1 = sem(pathmod, data=sampledat, missing=fiml, se=robust)
 
 At the time, the model ran fine.  Now, using version 0.4-14, the
 model returns all 0's for coefficients.
 
 What happened is that since 0.4-14, lavaan tries to 'detect' models that are 
 just univariate regression, and internally calls lm.fit, instead of the 
 lavaan estimation engine, at least when the missing=ml argument is NOT 
 used. (BTW, I fail to understand why you would use lavaan if you just want to 
 fit a univariate regression).
 
 When missing=ml is used, lavaan normally checks if you have fixed x 
 covariates (which you do), and if fixed.x=TRUE (which is the default). In 
 0.4, lavaan internally switches to fixed.x=FALSE (which implicitly assumes 
 that all your predictors are continuous, but I assume you would not using 
 missing=ml otherwise). Unfortunately, for the 'special' case of univariate 
 regression, it fails to do this. This behavior will likely change in 0.5, 
 where, by default, only endogenous/dependent variables will be handled by 
 missing=ml, not exogenous 'x' covariates.
 
 To fix it: simply add the fixed.x=FALSE argument, or revert to 0.4-12 to get 
 the old behavior.
 
 Hope this helps,
 
 Yves.
 http://lavaan.org
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FIML using lavaan returns zeroes for coefficients

2012-07-20 Thread Andrew Miles

Hello!

I am trying to reproduce (for a publication) analyses that I ran several months 
ago using lavaan, I'm not sure which version, probably 0.4-12. A sample model 
is given below:

pathmod='mh30days.log.w2 ~ mh30days.log + joingroup + leavegroup + alwaysgroup 
+ grp.partic.w2 + black + age + bivoc + moved.conf + local.noretired + retired 
+ ds + ministrytime + hrswork + nomoralescore.c + negint.c + cong.conflict.c + 
nomoraleXjoin + nomoraleXleave + nomoraleXalways + negintXjoin + negintXleave + 
negintXalways + conflictXjoin + conflictXleave + conflictXalways
'
mod1 = sem(pathmod, data=sampledat, missing=fiml, se=robust)

At the time, the model ran fine.  Now, using version 0.4-14, the model returns 
all 0's for coefficients.  This does not happen, however, when I run the model 
using listwise deletion for missing data.  Any idea what is happening, or how I 
can fix it?  For those wishing to reproduce the problem, you can download a 
sample code file and data frame from the following two links.

https://fds.duke.edu/db/aas/Sociology/grad/aam34/files/problem%20code.R
https://fds.duke.edu/db/aas/Sociology/grad/aam34/files/sampledat.rdata

Thanks for your help!

Andrew Miles



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Seeking pointers to various regression techniques with R?

2012-06-05 Thread Andrew Miles

The R book by Michael Crawley has some discussion the type of R syntax you are 
looking for in the chapter on statistical modeling.  As for the formulae you 
gave...

lm(y ~ x*w - 1)  fits an interaction between x and w without an intercept, 
along with the main effects for x and w
lm(y ~ x:w - 1)  fits just the interaction between x and w without an 
intercept, and without the main effects for x and w
lm(y ~ x/w - 1)  I believe this fits the nested factor w inside of x

Andrew Miles


On Jun 5, 2012, at 4:58 PM, Michael wrote:

 I read your website but still don't know the difference between the three
 formulas...
 
 Thank you!
 
 On Mon, Jun 4, 2012 at 11:14 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:
 
 Hi Michael,
 
 This is far from exhaustive (I wrote it as an introduction some years
 ago) but you may find it useful to start:
 https://joshuawiley.com/R/formulae_in_R.aspx
 
 Cheers,
 
 Josh
 
 On Mon, Jun 4, 2012 at 9:06 PM, Michael comtech@gmail.com wrote:
 Hi all,
 
 Could you please point me to good materials on various
 tricks/intuitions/techniques of regression, and hopefully in R?
 
 For example, what does lm(y~ x * w - 1) mean vs. lm(y ~ x/w -1 ) vs. lm
 (y
 ~ x:w-1), etc...
 
 I just found that even simple linear regression is not that simple and
 there are a lot of tricks/techniques in using them...
 
 Hopefully I can find good materials on these!
 
 Thank you!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I obtain the current active path of a function that's being called?

2012-06-05 Thread Andrew Miles

Can you provide an example of the code file that you use to call the different 
functions?  Without that it might be hard for people to answer your question.

Offhand, I'd say that it is whatever version of function A was called last.  If 
you are loading functions into the workspace, they are treated as any other 
object, which is to say that you can only have one function of the same name at 
a time.  Hence whenever you call a source file to load in function A, the old 
function A gets overwritten.

Andrew Miles


On Jun 5, 2012, at 4:58 PM, Michael wrote:

 Hi all,
 
 How do I obtain the current active path of a function that's being called?
 
 That's to say, I have several source files and they all contain definition
 of function A.
 
 I would like to figure out which function A and from which file is the one
 that's being called and is currently active?
 
 Thanks a lot!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Higher log-likelihood in null vs. fitted model

2012-05-31 Thread Andrew Miles

Two related questions.

First, I am fitting a model with a single predictor, and then a null model
with only the intercept.  In theory, the fitted model should have a higher
log-likelihood than the null model, but that does not happen.  See the
output below.  My first question is, how can this happen?

 m

Call:  glm(formula = school ~ sv_conform, family = binomial, data = dat,
weights = weight)

Coefficients:
(Intercept)   sv_conform
-2.5430   0.2122

Degrees of Freedom: 1488 Total (i.e. Null);  1487 Residual
Null Deviance:786.1
Residual Deviance: 781.9 AIC: 764.4
 null

Call:  glm(formula = school ~ 1, family = binomial, data = dat, weights =
weight)

Coefficients:
(Intercept)
 -2.532

Degrees of Freedom: 1488 Total (i.e. Null);  1488 Residual
Null Deviance:786.1
Residual Deviance: 786.1 AIC: 761.9
 logLik(m); logLik(null)
'log Lik.' -380.1908 (df=2)
'log Lik.' -379.9327 (df=1)


My second question grows out of the first.  I ran the same two model on the
same data in Stata and got identical coefficients.  However, the
log-likelihoods were different than the one's I got in R, and followed my
expectations - that is, the null model has a lower log-likelihood than the
fitted model.  See the Stata model comparison below.  So my question is,
why do identical models fit in R and Stata have different log-likelihoods?
-
   Model |Obsll(null)ll(model) df  AIC
BIC
-+---
mod1 |   1489-393.064   -390.9304 2785.8608796.4725
null |  1489-393.064   -393.064  1 788.1279
 793.4338

Thanks in advance for any input or references.

Andrew Miles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Higher log-likelihood in null vs. fitted model

2012-05-31 Thread Andrew Miles

Interesting.  If you use the deviances printed out by the fitted model
(including the null deviance) and back-engineer it to log-likelihoods, you
get results identical to Stata.

 m$deviance*(-.5)

[1] -390.9304

 m$null.deviance*(-.5)

[1] -393.064
However, using the deviance to calculate the AIC by hand does not get you
the AIC that the model returns.

 m$deviance + 2*2

[1] 785.8608

 m$aic

[1] 764.3816
The difference seems to be in how the function glm.fit() calculates the AIC
that it returns, and which the function logLik() uses to return the
log-likelihood, as Mark indicated.  The function is:

binomial()$aic

function (y, n, mu, wt, dev)

{

m - if (any(n  1))

n

else wt

-2 * sum(ifelse(m  0, (wt/m), 0) * dbinom(round(m * y),

round(m), mu, log = TRUE))

}
On the other hand, glm() calculates the deviance based on the sum of the
deviance residuals.

This gives me a sense for how log-likelihoods calculated using the deviance
and model AIC could differ (i.e., different ways of calculating them), but
it is still not clear to me *why *the two approaches differ.  And they only
differ when I use weights with the data.

More importantly, it makes me wonder which is more reliable - in my case,
it seems the deviance residuals approach is more reliable in that a) it
gives a sensible result (i.e., a model with a predictor has a higher
log-likelihood than the null model), and b) it matches an independent
estimation performed in Stata.  But is that always the case? And if so, why
use the other approach at all?

On Thu, May 31, 2012 at 9:55 AM, Mark Leeds marklee...@gmail.com wrote:

 Hi Duncan: I don't know if the following can help but I checked the code
 and logLik defines the log likelihood as (p -  glmobject$aic/2) where p is
 the glmobject$rank.  So,
 the reason for the likelihood being less is that, in the null, it ends up
 being ( 1 - glmobject$aic/2)  and in the other one it ends up being ( 2 -
 glmobject$aic/2).

 so

  2 - 764.4/2 = -380.2 and

  1 - 761.9/2 = -379.95 ( close enough for govt work )

 So, that's where the #'s are coming from but it really depends on how AIC
 is defined.
 Likelihoods should not involve degrees of freedom ( atleast not where they
 make
 likelihood less like in the above example ) so maybe backing the
 likelihood out using
 AIC is the issue ?  ( AIC = -2 * likelihood + 2p so   p - AIC/2 =
 likelihood). AIC is a function of the likelihood but , as far as I know,
 likelihood is not a function of the AIC.
 Thanks for any insight.






 On Thu, May 31, 2012 at 9:26 AM, Duncan Murdoch 
 murdoch.dun...@gmail.comwrote:

 On 12-05-31 8:53 AM, Andrew Miles wrote:

 Two related questions.

 First, I am fitting a model with a single predictor, and then a null
 model
 with only the intercept.  In theory, the fitted model should have a
 higher
 log-likelihood than the null model, but that does not happen.  See the
 output below.  My first question is, how can this happen?


 I suspect you'll need to give sample data before anyone can really help
 with this.


  m


 Call:  glm(formula = school ~ sv_conform, family = binomial, data = dat,
 weights = weight)

 Coefficients:
 (Intercept)   sv_conform
 -2.5430   0.2122

 Degrees of Freedom: 1488 Total (i.e. Null);  1487 Residual
 Null Deviance:786.1
 Residual Deviance: 781.9 AIC: 764.4

 null


 Call:  glm(formula = school ~ 1, family = binomial, data = dat, weights =
 weight)

 Coefficients:
 (Intercept)
  -2.532

 Degrees of Freedom: 1488 Total (i.e. Null);  1488 Residual
 Null Deviance:786.1
 Residual Deviance: 786.1 AIC: 761.9

 logLik(m); logLik(null)

 'log Lik.' -380.1908 (df=2)
 'log Lik.' -379.9327 (df=1)



 My second question grows out of the first.  I ran the same two model on
 the
 same data in Stata and got identical coefficients.  However, the
 log-likelihoods were different than the one's I got in R, and followed my
 expectations - that is, the null model has a lower log-likelihood than
 the
 fitted model.  See the Stata model comparison below.  So my question is,
 why do identical models fit in R and Stata have different
 log-likelihoods?


 That's easier:  they use different base measures.  The likelihood is only
 defined up to a multiplicative constant, so the log likelihoods can have an
 arbitrary constant added to them and still be valid.  But I would have
 expected both models to use the same base measure, so the differences in
 log-likelihood should match.

 Duncan Murdoch


  --**--**
 -
Model |Obsll(null)ll(model) df  AIC
 BIC
 -+**--**
 -
 mod1 |   1489-393.064   -390.9304 2785.8608
  796.4725
 null |  1489-393.064   -393.064  1 788.1279
  793.4338

 Thanks in advance for any input or references.

 Andrew Miles

[[alternative

Re: [R] Multivariate Multilevel Model: is R the right software for this problem

2012-04-06 Thread Andrew Miles

I recommend looking at chapter 6 of Paul Allison's Fixed Effects Regression 
Models.  This chapter outlines how you can use a structural equation modeling 
framework to estimate a multi-level model (a random effects model).  This 
approach is slower than just using MLM software like lmer() in the lme4 
package, but has the advantage of being able to specify correlations between 
errors across time, the ability to control for time-invariant effects of 
time-invariant variables, and allows you to use the missing data maximum 
likelihood that comes in structural equation modeling packages.

Andrew Miles
Department of Sociology
Duke University

On Apr 6, 2012, at 9:48 AM, Eiko Fried wrote:

 Hello,
 
 I've been trying to answer a problem I have had for some months now and
 came across multivariate multilevel modeling. I know MPLUS and SPSS quite
 well but these programs could not solve this specific difficulty.
 
 My problem:
 9 correlated dependent variables (medical symptoms; categorical, 0-3), 5
 measurement points, 10 time-varying covariates (life events; dichotomous,
 0-1), N ~ 900. Up to 35% missing values on some variables, especially at
 later measurement points.
 
 My exploratory question is whether there is an interaction effect between
 life events and symptoms - and if so, what the effect is exactly. E.g. life
 event 1 could lead to more symptoms A B D whereas life event 2 could lead
 to more symptoms A C D and less symptoms E.
 
 My question is: would MMM in R be a viable option for this? If so, could
 you recommend literature?
 
 Thank you
 --T
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multivariate Multilevel Model: is R the right software for this problem

2012-04-06 Thread Andrew Miles

I recommend looking at chapter 6 of Paul Allison's *Fixed Effects
Regression Models*.  This chapter outlines how you can use a structural
equation modeling framework to estimate a multi-level model (a random
effects model).  This approach is slower than just using MLM software like
lmer() in the lme4 package, but has the advantage of being able to specify
correlations between errors across time, the ability to control for
time-invariant effects of time-invariant variables, and allows you to use
the missing data maximum likelihood that comes in structural equation
modeling packages.

Andrew Miles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FIML in R

2012-03-29 Thread Andrew Miles

Does anyone know if someone is developing full-information maximum likelihood 
(FIML) estimation algorithms for basic regression functions, like glm()?  I 
think that having FIML options for workhorse functions that typically use ML 
would give R an edge over other statistical software, given how well FIML 
performs in missing data situations compared to ML.

While my current level of programming ability isn't up to the task, I bring 
this up to a) see if there is something already underway, and if not b) perhaps 
spark interest in doing so.  I think much of the raw material is already out 
there: there are other packages that use the EM algorithm, and other packages 
that search out missing data patterns (e.g., md.pattern() in the mice package). 
 To facilitate adoption, it might be useful to write a FIML module that other 
package maintainers can incorporate into their code.

I'd be interested in updates on this, and in your thoughts on this more 
generally.  Also, please let me know if there is a forum better suited for this 
kind of discussion.

Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fixed effects linear model in R

2012-02-07 Thread Andrew Miles

Based on Paul Allison's booklet Fixed Effect Regression Models (2009), the FE 
model can be estimated by person-mean centering all of your variables (but not 
the outcome), and then including a random intercept for each person.  The 
centering gives you the FE model estimates, and the random intercept adjusts 
the standard errors for clustering by individuals.  Note that your data must be 
in person-period (or long) format to do this.

In case you are unfamiliar with person-mean centering, that simply means taking 
the mean of each person's values for a given variable for all of the periods in 
your data, and then calculating a deviation from that mean at each time period. 
 For example, a person's average income over four years might be $50,000, but 
in each year their actual income would be slightly higher or lower than this 
(these would be the person-mean deviations).  In symbolic form, your code might 
look something like this:

library(lme4)
variable_pmcentered = variable - person_mean
mod = lmer(outcome ~ variable_pmcentered + person_mean + other predictors + 
(1|personID))

The advantage of this method (which Allison calls a hybrid method) over 
traditional FE models is that you get the benefits of a FE model (subtracting 
out time-invariant omitted variables) along with the benefits of random effect 
models (e.g., estimating coefficients for time-invariant variables, estimating 
interactions with time, letting intercepts and slopes varying randomly, etc.)  
See Allison's booklet for more details on this method.
Allison, Paul D. 2009. Fixed Effects Regression Models. Los Angeles, C.A.: Sage.


Andrew Miles


On Feb 7, 2012, at 5:00 PM, caribou...@gmx.fr wrote:

 Dear R-helpers,
 
 First of all, sorry for those who have (eventually) already received that 
 request.
 The mail has been bumped several times, so I am not sure the list has 
 received it... and I need help (if you have time)! ;-)
 
 I have a very simple question and I really hope that someone could help me
 
 I would like to estimate a simple fixed effect regression model with 
 clustered standard errors by individuals.
 For those using Stata, the counterpart would be xtreg with the fe option, 
 or areg with the absorb option and in both case the clustering is achieved 
 with vce(cluster id)
 
 My question is : how could I do that with R ? 
 An important point is that I have too many individuals, therefore I cannot 
 include dummies and should use the demeaning usual procedure.
 I tried with the plm package with the within option, but R quikcly tells me 
 that the memory limits are attained (I have over 10go ram!) while the dataset 
 is only 700mo (about 50 000 individuals, highly unbalanced)
 I dont understand... plm do indeed demean the data so the computation should 
 be fast and light enough... and instead it saturates my memory and do not 
 converge...
 
 Do you have an idea ?
 Moreover, it is possible to obtain cluster robust standard errors with plm ?
 
 Are there any other solutions for fixed effects linear models (with the 
 demean trick in order not to create as many dummies as individuals) ?
 Many thanks in advance ! ;)
 John
 
  
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help need

2012-02-07 Thread Andrew Miles

You need to store the values of each iteration in a vector, and then display 
the vector after you the loop terminates.  I made a few updates to your code, 
and it seems to do what you want now.

And thanks for including the code.  That made is easy to know how to help.

spectrum = c()
for(f in seq(0,0.5,0.1)) {
sigmasqaured - 1
i = complex(real = 0, imaginary = 1) 
spectrum - c(spectrum, 
(sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2))
}
spectrum

Andrew Miles


On Feb 7, 2012, at 4:08 PM, Jaymin Shah wrote:

 I have mad a for loop to try and output values which i have named spectrum.  
 However, I cannot seem to get the answers to come out as a vector which is 
 what i need. They come out as separate values which I am then unable to join 
 together. Thank you
 
 for(f in seq(0,0.5,0.1)) {
   sigmasqaured - 1
   i = complex(real = 0, imaginary = 1) 
   spectrum - 
 (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2)
 print(spectrum)
 }
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wide to long form conversion

2011-10-06 Thread Andrew Miles

Take a look here.

http://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format

Andrew Miles
Department of Sociology
Duke University

On Oct 6, 2011, at 4:28 PM, Gang Chen wrote:

 I have some data 'myData' in wide form (attached at the end), and
 would like to convert it to long form. I wish to have five variables
 in the result:

 1) Subj: factor
 2) Group: between-subjects factor (2 levels: s / w)
 3) Reference: within-subject factor (2 levels: Me / She)
 4) F: within-subject factor (2 levels: F1 / F2)
 5) J: within-subject factor (2 levels: J1 / J2)

 As this is the first time I'm learning such a conversion, could
 someone help me out?

 Many thanks,
 Gang

 myData

   Group MeF1 MeJ1 SheF1 SheJ1 MeF2 MeJ2 SheF2 SheJ2 Subj
 1  s45 61036 6 9   S1
 2  s65 5 643 5 6   S2
 3  s74 6 574 5 3   S3
 4  s85 8 771 8 6   S4
 5  s   106 4 796 4 6   S5
 6  s52 4 741 4 2   S6
 7  s   13210 4   112 4 3   S7
 8  s81 31160 310   S8
 9  s69 5 868 5 6   S9
 10 s   145 610   135 510  S10
 11 s   15218 2   14118 2  S11
 12 s69 4 95   11 3 8  S12
 13 s55 01243 0 8  S13
 14 s56 4 946 2 6  S14
 15 s   14512 3   12311 3  S15
 16 s7211 35210 2  S16
 17 s17 4 516 3 5  S17
 18 s62 7 462 7 4  S18
 19 s94 8 5   104 6 3  S19
 20 s82 6 592 6 4  S20
 21 s65 5 766 5 5  S21
 22 s88 3 767 5 3  S22
 23 s   114 6 711 6 4  S23
 24 s63 2 464 2 2  S24
 25 s44 6 623 4 6  S25
 26 w59 4 737 3 5  S26
 27 w76 3 541 0 4  S27
 28 w   10414 28410 2  S28
 29 w97 5 684 5 3  S29
 30 w92 7 562 6 5  S30
 31 w67 6 765 5 8  S31
 32 w7612 76310 7  S32
 33 w   123 8 9   113 4 7  S33
 34 w   12210 592 6 3  S34
 35 w6310 453 5 3  S35
 36 w93 9 963 7 8  S36
 37 w5   11 7 74   11 3 4  S37
 38 w74 4 673 1 5  S38
 39 w65 1 833 0 8  S39
 40 w   10310 273 7 2  S40
 41 w1   11 7 518 4 3  S41
 42 w   105 610   104 3 9  S42
 43 w63 9 242 6 0  S43
 44 w9511 454 7 3  S44
 45 w85 6 384 2 3  S45
 46 w84 8 741 2 6  S46
 47 w   122 6 2   101 5 2  S47
 48 w   106 9 875 7 8  S48
 49 w   13615 1   12414 0  S49
 50 w78 11247 111  S50
 51 w   123 9 491 7 4  S51

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mlogit error message Error in X[omitlines, ] - NA : subscript out of bounds

2011-05-14 Thread Andrew Miles

Yong,


I saw your post from April 29th about the error message in the mlogit
package, copied below.  I had the same problem, but solved it by omitting
all missing data from my data frame before running mlogit.data().


ex. mydata = na.omit(mydata)


I am posting this to the R help list as well so that an answer to this
question will be documented.


Andrew Miles


I am using the mlogit packages and get a data problem, for which I can't
find any clue from R archive.

code below shows my related code all the way to the error

*
#---
*
mydata - data.frame(dependent,x,y,z)

mydata$dependent-as.factor(mydata$dependent)

mldata-mlogit.data(mydata, varying=NULL, choice=dependent, shape=wide)

summary(mlogit.1- mlogit(dependent~1|x+y+z, data = mldata, reflevel=0))

Error in X[omitlines, ] - NA : subscript out of bounds ,
*
#---
*

Could anybody kindly tip how can I possibly solve this problem?

Thank you

yong

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Andrew Miles

I have been trying to use lavaan (version 0.4-7) for a simple path model,
but the program seems to be computing far less degrees of freedom for my
model then it should have.  I have 7 variables, which should give (7)(8)/2 =
28 covariances, and hence 28 DF.  The model seems to only think I have 13
DF.  The code to reproduce the problem is below.  Have I done something
wrong, or is this something I should take to the developer?


library(lavaan)

myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
.08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
.2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
, .02), nrow=7, byrow=TRUE, dimnames=list(c(Internet, Stress3,
HHincome, Race, Age, Gender, Stress1), c(Internet, Stress3,
HHincome, Race, Age, Gender, Stress1)))


model = '

Internet ~ HHincome + Race + Age + Gender + Stress1

Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1

'


fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)


#check the number of parameters being estimated

inspect(fit, what=free)


#Note the DF for the Chi-square is 0, when it should be 28-13 = 15

summary(fit)


Andrew Miles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression Testing

2011-01-20 Thread Andrew Miles

Perhaps the easiest way to incorporate the heteroskedasticity  
consistent SE's and output them in a familiar and easy to interpret  
format is to use coeftest() in the lmtest package.


coeftest(myModel, vcov=vcovHC(myModel))

Andrew Miles

On Jan 20, 2011, at 4:42 PM, Achim Zeileis wrote:


On Thu, 20 Jan 2011, Mojo wrote:

I'm new to R and some what new to the world of stats.  I got  
frustrated with excel and found R.  Enough of that already.


I'm trying to test and correct for Heteroskedasticity

I have data in a csv file that I load and store in a dataframe.


ds - read.csv(book2.csv)
df - data.frame(ds)


I then preform a OLS regression:


lmfit - lm(df$y~df$x)


Just btw: lm(y ~ x, data = df) is somewhat easier to read and also  
easier to write when the formula involves more regressors.



To test for Heteroskedasticity, I run the BPtest:


bptest(lmfit)


  studentized Breusch-Pagan test

data:  lmfit
BP = 11.6768, df = 1, p-value = 0.0006329

From the above, if I'm interpreting this correctly, there is  
Heteroskedasticity present.  To correct for this, I need to  
calculate robust error terms.


That is one option. Another one would be using WLS instead of OLS -  
or maybe FGLS. As the model just has one regressor, this might be  
possible and result in a more efficient estimate than OLS.



From my reading on this list, it seems like I need to vcovHC.


That's another option, yes.


vcovHC(lmfit)

(Intercept) df$x
(Intercept)  1.057460e-03 -4.961118e-05
df$x   -4.961118e-05  2.378465e-06

I'm having a little bit of a hard time following the help pages.


Yes, the manual page is somewhat technical but the first thing the  
Details section does is: It points you to some references that  
should be easier to read. I recommend starting with


Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL URL: http://www.jstatsoft.org/v11/i10/.

That has also some worked examples.

So is the first column the intercepts and the second column new  
standard errors?


As David pointed out, it's the full covariance matrix estimate.

hth,
Z


Thanks,
mojo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT: Reducing pdf file size

2011-01-05 Thread Andrew Miles

I assume you mean PDFs generated by R.  This topic has been addressed  
here: http://tolstoy.newcastle.edu.au/R/e2/help/07/05/17475.html

I have always just output the graphics then used an external PDF  
program (like Preview on the Mac) to do changes in file type, size  
reductions, etc.

Andrew Miles

On Jan 5, 2011, at 3:12 PM, kurt_h...@nps.gov wrote:

 Greetings
 Does anyone have any suggestions for reducing pdf file size,
 particularly pdfs containing photos, without sacrificing quality?   
 Thanks
 for any tips in advance.
 Cheers
 Kurt

 ***
 Kurt Lewis Helf, Ph.D.
 Ecologist
 EEO Counselor
 National Park Service
 Cumberland Piedmont Network
 P.O. Box 8
 Mammoth Cave, KY 42259
 Ph: 270-758-2163
 Lab: 270-758-2151
 Fax: 270-758-2609
 
 Science, in constantly seeking real explanations, reveals the true  
 majesty
 of our world in all its complexity.
 -Richard Dawkins

 The scientific tradition is distinguished from the pre-scientific  
 tradition
 in having two layers.  Like the latter it passes on its theories but  
 it
 also passes on a critical attitude towards them.  The theories are  
 passed
 on not as dogmas but rather with the challenge to discuss them and  
 improve
 upon them.
 -Karl Popper

 ...consider yourself a guest in the home of other creatures as  
 significant
 as yourself.
 -Wayside at Wilderness Threshold in McKittrick Canyon, Guadalupe  
 Mountains
 National Park, TX

 Cumberland Piedmont Network (CUPN) Homepage:
 http://tiny.cc/e7cdx

 CUPN Forest Pest Monitoring Website:
 http://bit.ly/9rhUZQ

 CUPN Cave Cricket Monitoring Website:
 http://tiny.cc/ntcql

 CUPN Cave Aquatic Biota Monitoring Website:
 http://tiny.cc/n2z1o

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memisc-Tables with robost standard errors

2011-01-05 Thread Andrew Miles

I always use apsrtable in the apsrtable package, which allows you to  
specify a vcov matrix using the se option.  The only trick is that  
you have to append it to your model object, something like this:


fit=lm(y ~ x)
fit$se=vcovHC(fit)
apsrtable(fit, se=robust)

Andrew Miles


On Jan 5, 2011, at 7:57 PM, Jan Henckens wrote:


Hello,

I've got a question concerning the usage of robust standard errors  
in regression using lm() and exporting the summaries to LaTeX using  
the memisc-packages function mtable():


Is there any possibility to use robust errors which are obtained by  
vcovHC() when generating the LateX-output by mtable()?


I tried to manipulate the lm-object by appending the new  
covariance matrix but mtable seems to generate the summary itself  
since it is not possible to call mtable(summary(lm1)).


I'd like to obtain a table with the following structure (using  
standard errors I already worked out how to archieve it):



Variable  Coeff.robust S.E.  lower 95% KI  upper 95% KI \\
Var1  x.22^(*)  (x)  [x x]  \\
.
.
.



Maybe someone has any suggestions how to implement this kind of table?

Best regards,
Jan Henckens


--
jan.henckens | jöllenbecker str. 58 | 33613 bielefeld
tel 0521-5251970

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] robust standard error of an estimator

2011-01-01 Thread Andrew Miles

It depends on what you mean by robust.  Robust to what?

I recommend looking at the sandwich package which gives  
heteroskedasticity and autocorrelation robust variance/covariance  
matrices.  For instance, you could do the following to get your OLS  
estimates with heteroskedasticity consistent SEs

library(sandwich)
library(lmtest)
reg=lm(fsn~lctot)
coeftest(reg, vcov=vcovHC(reg))

Or to get cluster robust SEs, check out this: people.su.se/~ma/ 
clustering.pdf

Hope that helps.

Andrew Miles


On Jan 1, 2011, at 10:09 AM, Charlène Cosandier wrote:

 Hi,

 I have ove the robust standard error of an estimator but I don't  
 know how to
 do this.
 The code for my regression is the following:
 reg-lm(fsn~lctot)
 But then what do I need to do?

 -- 
 Charlène Lisa Cosandier

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] After heteroskedasticity correction, how can I get new confidential interval?

2010-12-20 Thread Andrew Miles

It is hard to say without knowing more about the type of model you are  
running.


A good place to look would be at the coeftest() function in the  
package lmtest.


Andrew Miles


On Dec 20, 2010, at 10:04 AM, JoonGi wrote:




I just corrected std.error of my 'model'(Multi Regression).
Then how can I get new t and p-values?
Isn't there any R command which shows new t and p values?

--
View this message in context: 
http://r.789695.n4.nabble.com/After-heteroskedasticity-correction-how-can-I-get-new-confidential-interval-tp3095643p3095643.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Title for y-axis on right side

2010-12-17 Thread Andrew Miles

Take a look at mtext() which offers options for writing text in any  
margin of the table.


Andrew Miles

On Dec 17, 2010, at 6:41 AM, phils_mu...@arcor.de wrote:


Hi,

I want to have a title for the y-axis on the right side of the plot.
I know how to do it on the left side:


title(ylab=Title for y-axis)


But how can I have the title on the right side?

Greets,
Phil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xyplot

2010-12-16 Thread Andrew Miles

Try sample() which will allow you to randomly select 10 ID's from your  
ID variable, which you can then plot.


Andrew Miles


On Dec 16, 2010, at 10:49 AM, Rasanga Ruwanthi wrote:


Hi

I am using following code to produce a xyplot for some longitudinal  
data. There are 2 panels. It produced all longitudinal trajectories  
with mean profile. But since the dataset it very large plot looks  
very messy. I want to show, say 10 randomly selected individual  
longitudinal trajectories together with mean profile for entire  
dataset. Could any help me to alter the following code to do this?  
or is there an alternative way?


Thanks in advance.

xyplot(Y~ time|status,groups=ID,data=heart,
type=l,lty=1, layout=c(2,1),main=,
panel = function (x, y, subscripts, groups, ...) {
panel.superpose (x, y, panel.groups =  
panel.xyplot,subscripts,groups, ...)

panel.loess (x, y, col = Red, lwd = 2,span=0.75, ...)
})




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] draw categorical histogram

2010-12-01 Thread Andrew Miles


Try:

plot (myCatVariable)

Andrew Miles
Department of Sociology
Duke University

On Dec 1, 2010, at 2:51 PM, phoebe kong wrote:


Hi,

Can someone tell me how to draw a histogram for the following summary?

Richard   Minnie  Albert  Helen  Joe  Kingston
   1233   56   6715   66

The summary tell that Richard has occurrence 12, Minnie has  
occurrence 33,

and so on. I would like to view this summary in a histogram.

I want the X-axis be the person name (Richard, Minnie, ), Y-axis  
be the

frequency (12, 33).
How can I make the histogram has 6 bars, each bar was named as  
Richard,

Minnie, ... ,  Kingston?

Thanks,
Phoebe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] draw categorical histogram

2010-12-01 Thread Andrew Miles


Phoebe,

In addition to the barplot method below, you really can use plot() to  
draw what it sounds like you are looking for IF you have a categorical  
variable.  To illustrate, try running the following code:


x=sample(c(Richard, Minnie, Albert, Helen, Joe, Kingston),  
50, replace=T)

x=as.factor(x)
plot(x)

See also ?plot.factor

Andrew Miles


On Dec 1, 2010, at 4:06 PM, Jorge Ivan Velez wrote:


Hi Phoebe,

Try

x - c(12, 33, 56, 67, 15, 66)
names(x) - c('Richard','Minnie','Albert','Helen','Joe','Kingston')
barplot(x, las = 1, space = 0)

HTH,
Jorge


On Wed, Dec 1, 2010 at 2:51 PM, phoebe kong  wrote:


Hi,

Can someone tell me how to draw a histogram for the following  
summary?


Richard   Minnie  Albert  Helen  Joe  Kingston
  1233   56   6715   66

The summary tell that Richard has occurrence 12, Minnie has  
occurrence 33,

and so on. I would like to view this summary in a histogram.

I want the X-axis be the person name (Richard, Minnie, ), Y- 
axis be the

frequency (12, 33).
How can I make the histogram has 6 bars, each bar was named as  
Richard,

Minnie, ... ,  Kingston?

Thanks,
Phoebe

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2010-11-16 Thread Andrew Miles


You could try something like this:

Loop through your bootstrapped samples and store which ones have the  
outlier you are looking for using code like:


count = c(count, outlier.value %in% boot.sample$outlier.variable)

Then subtract the count variable from the total number of samples to  
get the number of samples without the outlier


N.nooutlier = Total - count


Andrew Miles


On Nov 16, 2010, at 4:55 PM, ufuk beyaztas wrote:



Hi dear all,

i have a data (data.frame) which contain y and x coloumn(i.e.

y   x
1   0.58545723  0.15113102
2   0.02769361 -0.02172165
3   1.00927527 -1.80072610
4   0.56504053 -1.12236685
5   0.58332337 -1.24263981
6  -1.70257274  0.46238255
7  -0.88501561  0.89484429
8   1.14466282  0.34193875
9   0.58827457  0.15923694
10 -0.79532232 -1.44193770)

i changed the first data points by an outlier (i.e.

   y  x
1  1025
2   0.02769361 -0.02172165
3   1.00927527 -1.80072610
4   0.56504053 -1.12236685
5   0.58332337 -1.24263981
6  -1.70257274  0.46238255
7  -0.88501561  0.89484429
8   1.14466282  0.34193875
9   0.58827457  0.15923694
10 -0.79532232 -1.44193770  )

then i generate the 1000 bootstrap sample with this data set, some  
of them

not contain these outliers, some of them contain once and some of them
contain many time... Now i want to count how many samples not  
contain these

outliers.
Thank so much any idea!


--
View this message in context: 
http://r.789695.n4.nabble.com/Counting-tp3045756p3045756.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot linear model problem

2010-11-16 Thread Andrew Miles

There may be an easier way to do this, but you could always just do it  
the long way.


Ex.

plot(residuals(test.lm)~fitted.values(test.lm))


Andrew Miles


On Nov 16, 2010, at 5:01 PM, casperyc wrote:



Hi all,

Say I fit a linear model, and saved it as 'test.lm'

Then if I use plot(test.lm)

it gives me 4 graphs

How do I ask for a 'subset' of it??

say just want the 1st graph,
the residual vs fitted values,

or the 1,3,4th graph?

I think I can use plot(test.lm[c(1,3,4)]) before,

but now, it's not working...

Every time, it goes to the end, the only thing I can click is 'next',
it is impossible to save them individually

I am using xp sp3, R 2.12.

Thanks!

casper
--
View this message in context: 
http://r.789695.n4.nabble.com/plot-linear-model-problem-tp3045763p3045763.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple imputation for nominal data

2010-11-02 Thread Andrew Miles

There are a couple of packages that do MI, including MI for nominal  
data.  The most recent of these is mi, but I believe mice might do  
it as well.  Both are available on the CRAN, and both have useful  
articles that teach you how to use them.  The citations for these  
articles can be found at the bottom of the help page that appears by  
typing


?mi
OR for mice
?mice

mi is the newer package and has some useful control features, but as  
it is newer it still is under development.


Andrew Miles


On Nov 2, 2010, at 3:38 PM, John Sorkin wrote:

I am looking for an R function that will run multiple imputation  
(perhaps fully conditional imputation, MICE, or sequential  
generalized regression) for non-MVN data, specifically nominal data.  
My dependent variable is dichotomous, all my predictors are nominal.  
I have a total of 4,500 subjects, 1/2 of whom are missing the main  
independent variables. I would appreciate any suggestions that the  
users of the listserver might have.

John


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped: 
6}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem exporting data using write.foreign

2010-10-20 Thread Andrew Miles

My question is about the write.foreign() command in the foreign  
package.  I use a command like the following to try and output data  
and a code file to read my data into SAS.


write.foreign(data.frame.object, datafile=filepath,  
codefile=filepath, package=SAS, dataname=myData)


With my data set, it gives the following error:

Error in make.SAS.names(names(df), validvarname = validvarname) :
  Cannot uniquely abbreviate the variable names to 32 or fewer  
characters


I tried to write reproducible code but could not.  I'm not sure where  
to go from here.  What are the naming protocols for variables so that  
they can be exported using write.foreign()?


Thanks!

Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem exporting data using write.foreign

2010-10-20 Thread Andrew Miles

I was able to figure this one out.  It turns out that truncation was  
not the problem, as I had no variables with names longer than 32  
characters (that is quite a long name!).  I document my process here  
so that future users with the same problem can benefit.


First, I examined the code for the function throwing the error:

foreign:::make.SAS.names

This revealed that the error message I was getting was thrown any time  
the transformed/reformatted variable name was too long or duplicated  
another variable name:


if (any(nchar(x)  nmax) || any(duplicated(x)))
stop(Cannot uniquely abbreviate the variable names to ,
nmax,  or fewer characters)

I was able to isolate which variables were the problem as follows:

which(duplicated(x))

where x is the vector of transformed variable names which were  
transformed using make.SAS.names code:


x - sub(^([0-9]), _\\1, varnames)
x - gsub([^a-zA-Z0-9_], _, x)
x - abbreviate(x, minlength = nmax)

This allowed to discover that I had two variables, retire_sp and  
retire.sp, which became the identical retire_sp after transformation.


Simple, really, but frustrating until you get into the code and see  
what is happening, and how to isolate the offending variables.


Andrew Miles


On Oct 20, 2010, at 1:18 PM, Nordlund, Dan (DSHS/RDA) wrote:


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Andrew Miles
Sent: Wednesday, October 20, 2010 10:10 AM
To: r-help@r-project.org
Subject: [R] Problem exporting data using write.foreign

My question is about the write.foreign() command in the foreign
package.  I use a command like the following to try and output data
and a code file to read my data into SAS.

write.foreign(data.frame.object, datafile=filepath,
codefile=filepath, package=SAS, dataname=myData)

With my data set, it gives the following error:

Error in make.SAS.names(names(df), validvarname = validvarname) :
  Cannot uniquely abbreviate the variable names to 32 or fewer
characters

I tried to write reproducible code but could not.  I'm not sure where
to go from here.  What are the naming protocols for variables so that
they can be exported using write.foreign()?

Thanks!

Andrew Miles



Well, the error message tells you that the names must be unique when  
truncated to 32 characters.  Apparently, you have at least 2  
variables that have the same name when truncated to 32 characters.


Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Many Thanks

2010-10-15 Thread Andrew Miles

And thank YOU for taking the time to express your gratitude.  I'm sure  
all those who regularly take the time to contribute to the list  
appreciate the appreciation.


Andrew Miles

On Oct 15, 2010, at 9:49 AM, Jumlong Vongprasert wrote:


Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the  
program.

In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing  
list.

Thanks R-help mailing list.
Thanks software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R on a ma c

2010-10-14 Thread Andrew Miles

R works great on my Mac.  In fact, the user interface in some ways  
seems to be more friendly (ex. you type an open parenthesis, it  
automatically includes a close parenthesis; color coding for coding  
files, etc.)


Andrew Miles


On Oct 14, 2010, at 2:49 PM, David Cross wrote:

I have had no problems with R on my Mac ... just download, install,  
and run ... let me know if you have any problems.


Cheers

David Cross
d.cr...@tcu.edu
www.davidcross.us




On Oct 14, 2010, at 11:25 AM, Tiffany Kinder wrote:


Hello,
Is R very compatible with a Mac?  A colleague of mine indicated that
everyone he knows with a Mac has problems with R.

What can you tell me about using R with a Mac.  What do I need to  
download?

I have downloaded the basic R package.

Thanks,

--
Tiffany Kinder
MS Student
Department of Watershed Science
Utah State University
tiffany.kin...@aggiemail.usu.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i do anova

2010-10-11 Thread Andrew Miles

Type ?anova on your R command line for the basic function, and links  
to related functions.


Also, try a google search of something like doing anova in R and you  
should find multiple tutorials or examples.


Andrew Miles


On Oct 11, 2010, at 11:33 AM, Mauluda Akhtar wrote:


Hi,

I've a table like the following. I want to do ANOVA. Could you  
please tell

me how can i do it.
I want to show whether the elements (3 for each column) of a column  
are

significantly different or not.
Just to inform you that i'm a new user of R


bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971  
bp_30050044
[1,]  69  46  43  54   
54  41
[2,]  68  22  39  31   
31  22
[3,]  91  54  57  63   
63  50



Thank you.

Moon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using a package function inside another function

2010-10-07 Thread Andrew Miles


Try adding a statement at the beginning of your function:

require(micEcon)

See if that helps.

Andrew Miles
Department of Sociology
Duke University

On Oct 7, 2010, at 11:47 AM, Alison Callahan wrote:


Hello all,

I am trying to use the micEcon 'insertRow' function inside a function
I have written. For example:

insert_row_test - function(m){

 insertRow(m,nrow(m)+1,v=0,rName=test)

}

However, when I try to call the 'insert_row_test' function (after
loading the micEcon package), it does not insert a row into the matrix
I pass in. When I call the insertRow function exactly as above in the
R console, it works with no problem.

Can anyone tell me why this is, and how to fix this problem?

Thank you!

Alison Callahan
PhD candidate
Department of Biology
Carleton University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sample size estimation for non-inferiority log-rank and Wilcoxon rank-sum tests

2010-09-27 Thread Andrew Miles

I haven't done much with the type of data you're working with, but  
here is a post that lists a few packages for doing sample size  
calculations in R.  Perhaps one of them will be helpful.

https://stat.ethz.ch/pipermail/r-help/2008-February/154223.html

Andrew Miles

On Sep 27, 2010, at 2:09 PM, Paul Miller wrote:

 Hello Everyone,

 I'm trying to conduct a couple of power analyses and was hoping  
 someone might be able to help. I want to estimate the sample size  
 that would be necessary to adequately power a couple of non- 
 inferiority tests. The first would be a log-rank test and the second  
 would be a Wilcoxon rank-sum test. I want to be able to determine  
 the sample size that would be necessary to test for a 3 day  
 difference in median recovery time between 2 groups of cancer  
 patients.

 Both of these tests are infeasible using SAS Proc Power and I  
 haven't been able to find information about how to do them using  
 either SAS or R.

 Does anyone know how to perform either of these calculations? If so,  
 I'd greatly appreciate it if you could share a couple of examples.


 Thanks,

 Paul


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] statistic test plot

2010-09-27 Thread Andrew Miles


If you type:

class(z.ex)

you'll see that your z-test produces an object with a class htest.   
Then type:


methods(plot)

you'll get a list of all the types of objects that the plot() function  
knows how to make plots for.  It doesn't look like plot has a sub- 
method for an htest object, so it probably doesn't know how to handle  
the type of data you are putting into it.


Andrew Miles
Department of Sociology
Duke University

On Sep 27, 2010, at 11:03 PM, Kie Kyon Huang wrote:

Hi, I am a beginner in R and is trying to plot some dot plot from t  
test

result.
I was following the example

## Example from z.test -- requires TeachingDemos package
# library(TeachingDemos)
# z.ex - z.test(rnorm(25,100,5),99,5)
# z.ex
# plot(z.ex)

but encounter this error for the last command

Error in plot.window(...) : need finite 'xlim' values

In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf

the z.ex looks fine though


   One Sample z-test

data:  rnorm(25, 100, 5)
z = 0.7622, n = 25, Std. Dev. = 5, Std. Dev. of the sample mean = 1,
p-value = 0.4459
alternative hypothesis: true mean is not equal to 99
95 percent confidence interval:
 97.80223 101.72216
sample estimates:
mean of rnorm(25, 100, 5)
 99.7622

Can someone help me out here?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bptest

2010-09-25 Thread Andrew Miles


First load the package lmtest.  Then run the bptest.

library(lmtest)
bptest(modelCH)

You don't need to tell the function which variables are explanatory or  
dependent.  Just give it the fitted model object, and it will sort all  
of that out and return the statistic.


Andrew Miles
Department of Sociology
Duke University

On Sep 24, 2010, at 9:53 AM, robm wrote:



Hi

I'm very new to R but have plenty of experience with statistics and  
other

packages like SPSS, SAS etc.

I have a dataset of around 20 columns and 200 rows.   I'm trying to  
fit a
very simple linear model between two variables.   Having done so, I  
want to

test the model for heteroscedasticity using the Breusch-Pagan test.
Apparently this is easy in R by simply doing

bptest(modelCH, data=KP)

I've tried this but I'm told it cannot find function bptest.   It's  
here
where I'm struggling.   I'm probably wrong but as far as I can see,  
bptest

is part of the lm package which, as far as I know, I have installed.

Irrespective of the fact I'm not sure how to tell bptest which is the
dependent and explanatory variables - there's a more fundamental  
problem if

it can't find the bptest function.

I have searched the documentation - albeit briefly so if anyone  
could help

I'd be very grateful

Rob

QBE Management
--
View this message in context: 
http://r.789695.n4.nabble.com/bptest-tp2553506p2553506.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple graph in one graph window

2010-09-25 Thread Andrew Miles

Another option is to use the par() command before executing a plot  
command, like so:

par(mfrow=c(2,2))
plot(...)

This will put the next four plots all in the same window.

Andrew Miles
Department of Sociology
Duke University

On Sep 24, 2010, at 10:28 PM, Dennis Murphy wrote:

 Which one do you want?

 library(lattice)
 d - data.frame(x = rep(1:3, 4), g = factor(rep(1:4, each = 3)),
 y = c(1, 2, 3, 3, 1, 7, 4, 2, 11, 5, 5, 9))
 xyplot(y ~ x | g, data = d, pch = 16)
 xyplot(y ~ x, data = d, groups = g, type = c('p', 'l'), pch = 16)

 Both provide 'multiple plots' and both are on the 'same page'.

 HTH,
 Dennis


 On Fri, Sep 24, 2010 at 2:37 PM, mrdaw...@abo.fi wrote:

 Dear R users;
 Could you tell me how to let R create multiple graphs in a graph  
 window.
 Please note that I am talking about creating multiple plots in the  
 same page
 of the graph window. For example:
 g1=c(1,2,3)
 g2=c(3,1,7)
 g3=c(4,2,11)
 g4=c(5,5,9)
 ...
 all in one graph window.
 Best Regards
 Reza

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] formatting data for predict()

2010-09-25 Thread Andrew Miles

I'm trying to get predicted probabilities out of a regression model,  
but am having trouble with the newdata option in the predict()  
function.  Suppose I have a model with two independent variables, like  
this:


y=rbinom(100, 1, .3)
x1=rbinom(100, 1, .5)
x2=rnorm(100, 3, 2)
mod=glm(y ~ x1 + x2, family=binomial)

I can then get the predicted probabilities for the two values of x1,  
holding x2 constant at 0 like this:


p2=predict(mod, type=response, newdata=as.data.frame(cbind(x1, x2=0)))
unique(p2)

However, I am running regressions as part of a function I wrote, which  
feeds in the independent variables to the regression in matrix form,  
like this:


dat=cbind(x1, x2)
mod2=glm(y ~ dat, family=binomial)

The results are the same as in mod.  Yet I cannot figure out how to  
input information into the newdata option of predict() in order to  
generate the same predicted probabilities as above.  The same code as  
above does not work:


p2a=predict(mod2, type=response, newdata=as.data.frame(cbind(x1,  
x2=0)))

unique(p2a)

Nor does creating a data frame that has the names datx1 and datx2,  
which is how the variables appear if you run a summary() on mod2.   
Looking at the model matrix of mod2 shows that the fitted model only  
shows two variables, the dependent variable y and one independent  
variable called dat.  It is as if my two variables x1 and x2 have  
become two levels in a factor variable called dat.


names(mod2$model)

My question is this:  if I have a fitted model like mod2, how do I use  
the newdata option in the predict function so that I can get the  
predicted values I am after?  I.E. how do I recreate a data frame with  
one variable called dat that contains two levels which represent my  
(modified) variables x1 and x2?


Thanks in advance!

Andrew Miles
Department of Sociology
Duke University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Standard Error for difference in predicted probabilities

2010-09-24 Thread Andrew Miles

Is there a way to estimate the standard error for the difference in  
predicted probabilities obtained from a logistic regression model?

For example, this code gives the difference for the predicted  
probability of when x2==1 vs. when x2==0, holding x1 constant at its  
mean:

y=rbinom(100,1,.4)
x1=rnorm(100, 3, 2)
x2=rbinom(100, 1, .7)
mod=glm(y ~ x1 + x2, family=binomial)
pred=predict(mod, newdata=data.frame(cbind(x1=rep(mean(x1), 100),  
x2)), type=response)
diff=unique(pred)[1]-unique(pred)[2]
diff

I know that predict() will output SE's for each predicted value, but  
how do I generate a SE for the difference in those predicted values?

Thanks in advance!

Andrew Miles


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error using the mi package

2010-07-15 Thread Andrew Miles

I'm trying to impute data using the mi package, but after running  
through almost the entire first round of imputations (which takes  
quite a while), it throws this error (I'll include the whole output  
prior to the error for context).  Does anyone know what is causing it,  
or how I can fix it?


More specifically, how can I tell what is throwing the error so I know  
what to fix?  Is it a problem with a variable?  With the formula I am  
using for imputations?


Beginning Multiple Imputation ( Thu Jul 15 09:10:33 2010 ):
Iteration 1
 Imputation 1 : min.func*  Tenure*  Salary*  Housing*  ord_stat*   
rural_now*  a3*  a7*  a8*  a9*  a10*  a12*  a13*  a14*  a15*  a16*   
a17a*  a17b*  a17c*  a17d*  a17e*  a17f*  a18*  a21*  b1a*  b1b*   
b1c*  b1d*  b1e*  b1f*  b1g*  b1h*  b1i*  b3a*  b3b*  b3c*  b3d*   
b3e*  b3f*  b3g*  b4a*  b4b*  b4c*  b4d*  b4e*  b4f*  b4g*  b4h*   
b4i*  b4j*  b4k*  b4l*  b4m*  b4n*  b4o*  b4p*  b5a*  b5b*  b5c*   
b5d*  b5e*  b5f*  b5g*  b5h*  b5i*  b5j*  b5k*  b5l*  b5m*  b5n*   
b5o*  b6a*  b6b*  b6c*  b6d*  b6e*  b6f*  b6g*  b6h*  b6i*  b6j*   
b6k*  b6l*  b7a*  b7b*  b7c*  b7d*  b7e*  b7f*  b7g*  b7h*  b7i*   
b7j*  b7k*  b9b*  b10a*  b10b*  b13a*  b13b*  b13c*  b14*  c3*  c4*   
c5*  c12a*  c12b*  c12c*  c12d*  c12e*  c12f*  c19*  d1*  d2*  d3*   
d6*  d7a*  d7b*  d7c*  d7d*  d7e*  d7f*  d7g*  d7h*  d7i*  d8a*  d8b*   
d8c*  d8d*  d8e*  d8f*  d8g*  d8h*  d8i*  d9a*  d9b*  d9c*  d9d*   
d9e*  d10a*  d10b*  d10c*  d10d*  d10e*  d10f*  d11*  d13*  d14*  e1*   
e2*  e3*  e5*  e6*  f2*  f3a*  f3b*  f3c*  f3d*  f3e*  f4*  f6*  f7*   
f8*  f9*  f12*  f14*  f15*  f16*  f20*  f21*  f22*  f23*  f25*  g2*   
g3*  g6*  g8*  g9*  g12*  g13*  g15*  g17*  g18*  g22*  g25*  g26*   
g27*  g28*  g31*  g34*  g43*  g55*  g58*  g59*  g60*  g61*  g63*   
g65*  g68a*  g68b*  g69*  g70*  g71*  g91*  g92*  g93*  g94*  g95*

Error in AveVar[s, i, ] - c(avevar.mean, avevar.sd) :
  number of items to replace is not a multiple of replacement length

And here is what traceback() gives:
 traceback()
3: .local(object, ...)
2: mi(imp.data, info = info2, n.iter = 6, preprocess = FALSE)
1: mi(imp.data, info = info2, n.iter = 6, preprocess = FALSE)

Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Changing model parameters in the mi package

2010-07-14 Thread Andrew Miles

I am trying to use the mi package to impute data, but am running into  
problems with the functions it calls.


For instance, I am trying to impute a categorical variable called  
min.func.  The mi() function calls the mi.categorical() function to  
deal with this variable, which in turn calls the nnet.default()  
function, and passes it a fixed parameter MaxNWts=1500.  However, as  
you can see below, the min.func variable needs a higher threshold than  
1500.  Is there a way that I can change the MaxNWts parameter that is  
being sent to nnet.default()?  I've investigated the mi() and  
mi.info() functions and cannot see a way.


Thanks for your help!  Error message and system info below:

Beginning Multiple Imputation ( Wed Jul 14 10:25:06 2010 ):
Iteration 1
 Imputation 1 : min.func*

Error while imputing variable: min.func , model: mi.categorical
 Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE,  
softmax = TRUE,  :

  too many (2608) weights

System is Mac OS X 10.5.8, R version 2.9.2


Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing model parameters in the mi package

2010-07-14 Thread Andrew Miles

I figured this one out.  I'm sending the solution back to the help  
list so that there is a record of how to deal with this problem, since  
it does not seem to be documented elsewhere.


The mi.info object that you pass to the the mi() function contains a  
value called params which allows you to set the parameters used in  
each imputation model.  If you access it like so


info=mi.info(impute.data)
info[[min.func]]$params

you are presented with a matrix showing a value for only two  
parameters that are passed to every model, n.iter and draw.from.beta.


You can add any other parameters that you want like so:

info[[min.func]]$params$MaxNWts=3000

This will add the parameter MaxNWts to the min.func model.  This seems  
to have solved the problem I outlined below.


Andrew Miles


On Jul 14, 2010, at 10:33 AM, Andrew Miles wrote:

I am trying to use the mi package to impute data, but am running  
into problems with the functions it calls.


For instance, I am trying to impute a categorical variable called  
min.func.  The mi() function calls the mi.categorical() function  
to deal with this variable, which in turn calls the nnet.default()  
function, and passes it a fixed parameter MaxNWts=1500.  However, as  
you can see below, the min.func variable needs a higher threshold  
than 1500.  Is there a way that I can change the MaxNWts parameter  
that is being sent to nnet.default()?  I've investigated the mi()  
and mi.info() functions and cannot see a way.


Thanks for your help!  Error message and system info below:

Beginning Multiple Imputation ( Wed Jul 14 10:25:06 2010 ):
Iteration 1
Imputation 1 : min.func*

Error while imputing variable: min.func , model: mi.categorical
Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE,  
softmax = TRUE,  :

 too many (2608) weights

System is Mac OS X 10.5.8, R version 2.9.2


Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to plot two histograms overlapped in the same plane coordinate

2010-07-09 Thread Andrew Miles




I'm not sure what you are trying to do.  Do you want one histogram for  
males and one for females on the same graph? If so, the simplest way  
to put two histograms together is to simply use the add parameter:


age.males=age[which(sex==M)]
age.females=age[which(sex==F)]

hist(age.males, col=blue)
hist(age.females, add=T)

The only problem is that the hist() function does not do semi- 
transparency.  I am not sure if other packages do.  The code above  
will give you a blue histogram for males, and clear histogram for  
females on top of it.  You'll probably have to manually alter the axes  
of the histogram to give the histograms for males and females the same  
break points (i.e. where one bar stops and another begins).  See ?hist  
for more information about that.


Andrew Miles
Department of Sociology
Duke University

On Jul 9, 2010, at 9:29 AM, Mao Jianfeng wrote:


Dear R-help listers,

I am new. I just want to get helps on how to plot two histograms
overlapped in the same plane coordinate. What I did is very ugly.
Could you please help me to improve it? I want to got a plot with  
semi-

transparent overlapping region. And, I want to know how to specify the
filled colors of the different histograms.

I also prefer other solutions other than ggplot2.

Many thanks to you.


What I have done:

library(ggplot2)

age-c(rnorm(100, 1.5, 1), rnorm(100, 5, 1))
sex-c(rep(F,100), rep(M, 100))
mydata-cbind(age, sex)
mydata-as.data.frame(mydata)
head(mydata)


qplot(age, data=mydata, geom=histogram, fill=sex, xlab=age,
ylab=count, alpha=I(0.5))


Best,


Mao J-F

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error message using mi() in mi package

2010-07-06 Thread Andrew Miles


Hello!

I get the following message when I run the mi() function from the mi  
package.


Error while imputing variable: c3 , model: mi.polr
 Error in eval(expr, envir, enclos) : could not find function  
c14ordered


Here's the situation:

I am running R v. 2.9.2 on Mac OSX v. 10.5.8.  I am trying to impute  
missing data in a data set that I've trimmed down to 302 variables.   
I've created an mi.info() object on the data, and I've updated the  
type of variable where necessary so that the mi() imputing function  
uses the most appropriate type of models to estimate imputed values.   
The data have no collinearlity.  I then run the mi function like this:


imp = mi(imp.data, info=info2, n.iter=10)

where imp.data is my data set of 302 variables and info2 is my  
modified mi.info object.  I get the message as posted above.  The  
message only occurs when working on a variable I have labeled as  
ordered-categorical.  But the mi() function processes most variables  
labeled as ordered-categorical just fine.  In fact, if shrink my  
data set (say, to just 5 variables) I can get mi() to process a  
problematic variable just fine as well.


I'm not sure what the function c14ordered is that the error message  
refers to.  My first thought is maybe it is referring to one of my  
variables in my data?  Variables names in my data follow a basic  
letter-number pattern (i.e. a1, a2, etc.), but there is no c14, rather  
c14a1, c14a2, etc.  So I'm not sure if the variable has anything to do  
with the problem, but I thought I'd mention it just in case someone  
wiser in this matter than I can see something I cannot.


I cannot post code that reproduces the problem due to the nature of  
the code and data involved.


Any help would be appreciated, as I am not sure what is happening, and  
can't see why I can sometimes impute a variable labeled as ordered- 
categorical and sometimes cannot.


Thanks!

Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error message using mi() in mi package

2010-07-06 Thread Andrew Miles


On Jul 6, 2010, at 1:30 PM, Peter Ehlers wrote:


On 2010-07-06 10:37, Andrew Miles wrote:

Hello!

I get the following message when I run the mi() function from the mi
package.

Error while imputing variable: c3 , model: mi.polr
Error in eval(expr, envir, enclos) : could not find function  
c14ordered


Here's the situation:

I am running R v. 2.9.2 on Mac OSX v. 10.5.8. I am trying to impute
missing data in a data set that I've trimmed down to 302 variables.  
I've
created an mi.info() object on the data, and I've updated the  
type of

variable where necessary so that the mi() imputing function uses the
most appropriate type of models to estimate imputed values. The data
have no collinearlity. I then run the mi function like this:

imp = mi(imp.data, info=info2, n.iter=10)

where imp.data is my data set of 302 variables and info2 is my  
modified

mi.info object. I get the message as posted above. The message only
occurs when working on a variable I have labeled as
ordered-categorical. But the mi() function processes most variables
labeled as ordered-categorical just fine. In fact, if shrink my  
data
set (say, to just 5 variables) I can get mi() to process a  
problematic

variable just fine as well.

I'm not sure what the function c14ordered is that the error message
refers to. My first thought is maybe it is referring to one of my
variables in my data? Variables names in my data follow a basic
letter-number pattern (i.e. a1, a2, etc.), but there is no c14,  
rather

c14a1, c14a2, etc. So I'm not sure if the variable has anything to do
with the problem, but I thought I'd mention it just in case someone
wiser in this matter than I can see something I cannot.

I cannot post code that reproduces the problem due to the nature of  
the

code and data involved.

Any help would be appreciated, as I am not sure what is happening,  
and

can't see why I can sometimes impute a variable labeled as
ordered-categorical and sometimes cannot.

Thanks!

Andrew Miles



This looks suspiciously like a syntax problem.
I would get my text editor to search for 'c14ordered'
in the code. You might have missed some punctuation.

 -Peter Ehlers


A good thought.  I checked my own code (the stuff coding the data and  
using the mi package) and, for good measure, the code of the mi.polr  
function in the mi package, but the string c14ordered never appears.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error message using mi() in mi package

2010-07-06 Thread Andrew Miles




On Jul 6, 2010, at 2:15 PM, Erik Iverson wrote:






This looks suspiciously like a syntax problem.
I would get my text editor to search for 'c14ordered'
in the code. You might have missed some punctuation.

-Peter Ehlers
A good thought.  I checked my own code (the stuff coding the data  
and using the mi package) and, for good measure, the code of the  
mi.polr function in the mi package, but the string c14ordered  
never appears.


And what does

traceback()

tell you?



Here is the original error message:

Error while imputing variable: a8 , model: mi.polr
 Error in eval(expr, envir, enclos) : could not find function  
c14ordered


I confess I'm not good at reading traceback() output, but here is what  
it says:


18: eval(expr, envir, enclos)
17: eval(predvars, data, env)
16: model.frame.default(formula = form, data = data,  
drop.unused.levels = FALSE)

15: model.frame(formula = form, data = data, drop.unused.levels = FALSE)
14: model.frame(formula = form, data = data, drop.unused.levels = FALSE)
13: eval(expr, envir, enclos)
12: eval(expr, p)
11: eval.parent(m)
10: bayespolr(formula = form, data = data, start = 0, method =  
c(logistic),

drop.unused.levels = FALSE, n.iter = n.iter)
9: mi.polr(formula = a8 ~ rural_now + a3 + a7 . . . [more arguments  
here including the rest of the variables in the data set and a list of  
all the actual data], start = NULL,

   n.iter = 100, draw.from.beta = TRUE)
8: do.call(model.type, args = c(list(formula = info[[CurrentVar]] 
$imp.formula,

   data = dat, start = if (!is.null(start.val[[i]][[jj]])) {
   start.val[[i]][[jj]]
   } else {
   NULL
   }), info[[CurrentVar]]$params))
7: eval(expr, envir, enclos)
6: eval(substitute(expr), data, enclos = parent.frame())
5: with.default(data = dat, do.call(model.type, args = c(list(formula  
= info[[CurrentVar]]$imp.formula,

   data = dat, start = if (!is.null(start.val[[i]][[jj]])) {
   start.val[[i]][[jj]]
   } else {
   NULL
   }), info[[CurrentVar]]$params)))
4: with(data = dat, do.call(model.type, args = c(list(formula =  
info[[CurrentVar]]$imp.formula,

   data = dat, start = if (!is.null(start.val[[i]][[jj]])) {
   start.val[[i]][[jj]]
   } else {
   NULL
   }), info[[CurrentVar]]$params)))
3: .local(object, ...)
2: mi(imp.data, info = info2, n.iter = 10)
1: mi(imp.data, info = info2, n.iter = 10)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to calculate summary statistics for each factor

2010-07-06 Thread Andrew Miles

You could try Summarize in the NCStats package, or aggregate in the  
epicalc package.


Andrew Miles
Department of Sociology
Duke University

On Jul 6, 2010, at 11:53 AM, karena wrote:



I have a dataset like the following:

subject   class value
123110
1241 12
125112
223223
224   2 18
225 219
3233 21
324   3  10
325   3   19
326   3   20

how to calculate the summary value for each factor?
thanks

karena
--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-calculate-summary-statistics-for-each-factor-tp2279777p2279777.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interaction terms in logistic regression using glm

2010-04-30 Thread Andrew Miles

Thanks for this, Kjetil.  I see that I was getting interpretation and  
estimation confused.

My apologies on not including the paper title.  For the benefit of  
those who read this post after me, it is:

Ai, Chunrong, and Edward C. Norton. 2003. Interaction Terms in Logit  
and Probit Models. Economic Letters 80:123-129.

You mentioned that you found a number of papers not supporting their  
conclusions.  I did a google search and found one paper amending Ai  
and Norton's results:  The Treatment Effect, the Cross Difference,  
and the Interaction Term in Nonlinear Difference-in-Differences  
Models by Patrick A. Puhani.  Did you find others?  If you wouldn't  
mind sending along the citations for what you found, that would be  
very helpful.

Many thanks!

Andrew Miles

On Apr 29, 2010, at 10:45 PM, Kjetil Halvorsen wrote:

 see comments below.

 On Wed, Apr 28, 2010 at 4:29 PM, Andrew Miles  
 rstuff.mi...@gmail.com wrote:
 I recently became aware of the article by Ai and Norton (2003)  
 about how
 interaction terms are problematic in nonlinear regression (such as  
 logistic
 regression).  They offer a correct way of estimating interaction  
 effects and
 their standard errors.

 My question is:  Does the glm() function take these corrections  
 into account
 when estimating interaction terms for a logistic regression (i.e.  
 when
 family=binomial)?

 No.

  If not, is there a function somewhere that allows for
 correct estimation?

 The estimation you get from glm is correct. The discussion in the
 paper you referred
 is about how to interpret the estimation results! A google search on
 the referred paper
 (you did'nt give the title), show up various later papers referring to
 it, and not supporting their
 conclusions.

 Linear (and non-linear) model books badly needs chapters with titles
 such as post-estimation analysis. glm does the estimation for you.
 It cannot do the analysis for you!

 Probably you are looking for something such as CRAN package effects.

 Kjetil




 I've looked the documentation for glm and couldn't find an answer,  
 nor have
 I seen the issue addressed in the forums or in the examples of  
 logistic
 regression in R that I've found online.

 Thanks!

 Andrew Miles

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Interaction terms in logistic regression using glm

2010-04-28 Thread Andrew Miles

I recently became aware of the article by Ai and Norton (2003) about  
how interaction terms are problematic in nonlinear regression (such as  
logistic regression).  They offer a correct way of estimating  
interaction effects and their standard errors.


My question is:  Does the glm() function take these corrections into  
account when estimating interaction terms for a logistic regression  
(i.e. when family=binomial)?  If not, is there a function somewhere  
that allows for correct estimation?


I've looked the documentation for glm and couldn't find an answer, nor  
have I seen the issue addressed in the forums or in the examples of  
logistic regression in R that I've found online.


Thanks!

Andrew Miles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add a variable to a dataframe whose values are conditional upon the values of an existing variable

2010-02-26 Thread Andrew Miles

You could also try a series of simple ifelse statements.  I just tried  
the following and got it to work, though I am sure there is a faster  
way.


 t=c(cow, dog, chick)
 y=c(1,3,4)
mat=cbind(t,y)
mat=as.data.frame(mat)

 mat
  t y
1   cow 1
2   dog 3
3 chick 4

mat$g=ifelse(mat$t==cow, 1, 6)
mat$g=ifelse(mat$t==dog, 2, mat$g)
mat$g=ifelse(mat$t==chick, 3, mat$g)

 mat
  t y g
1   cow 1 1
2   dog 3 2
3 chick 4 3

To days of the week would only be 7 statements.

Andrew Miles
Department of Sociology
Duke University

On Feb 26, 2010, at 2:31 PM, Steve Matco wrote:


Hi everyone,

I am at my wits end with what I believe would be considered simple  
by a more experienced R user. I want to know how to add a variable  
to a dataframe whose values are conditional on the values of an  
existing variable. I can't seem to make an ifelse statement work for  
my situation. The existing variable in my dataframe is a character  
variable named DOW which contains abbreviated day names (SAT, SUN,  
MON.FRI). I want to add a numerical variable named DOW1 to my  
dataframe that will take on the value 1 if DOW equals SAT, 2 if  
DOW equals SUN, 3 if DOW equals MON,.,7 if DOW equals FRI.
I  know this must be a simple problem but I have searched everywhere  
and tried everything I could think of. Any help would be greatly  
appreciated.


Thank you,

Mike




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted descriptives by levels of another variables

2009-11-16 Thread Andrew Miles

Thanks!  Using the plyr package and the approach you outlined seems to  
work well for relatively simple functions (like wtd.mean), but so far  
I haven't had much success in using it with more complex descriptive  
functions like describe {Hmisc}.  I'll take a look later, though, and  
see if I can figure out why.


At any rate, ddply() looks like it will simplify writing a function  
that will allow for weighting data and subdividing it, but still give  
comprehensive summary statistics (i.e. not just the mean or quantiles,  
but all in one).  I'll post it to the list once I have the time to  
write it up.


I also took a stab at using the svyby funtion in the survey package,  
but received the following error message when I input :


 svyby(cbind(educ, age), female, svynlsy, svymean)
Error in `[.survey.design2`(design, byfactor %in% byfactor[i], ) :
  (subscript) logical subscript too long
__
In addition to using the survey package (and the svyby function), I've  
found
that many of the 'weighted' functions, such as wtd.mean, work well  
with the

plyr package.  For example,

wtdmean=function(df)wtd.mean(df$obese,df$sampwt);
ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean')

hth, david freedman


Andrew Miles-2 wrote:


I've noticed that R has a number of very useful functions for
obtaining descriptive statistics on groups of variables, including
summary {stats}, describe {Hmisc}, and describe {psych}, but none that
I have found is able to provided weighted descriptives of subsets of a
data set (ex. descriptives for both males and females for age, where
accurate results require use of sampling weights).

Does anybody know of a function that does this?

What I've looked at already:

I have looked at describe.by {psych} which will give descriptives by
levels of another variable (eg. mean ages of males and females), but
does not accept sample weights.

I have also looked at describe {Hmisc} which allows for weights, but
has no functionality for subdivision.

I tried using a by() function with describe{Hmisc}:

by(cbind(my, variables, here), division.variable, describe,
weights=weight.variable)

but found that this returns an error message stating that the
variables to be described and the weights variable are not the same
length:

Error in describe.vector(xx, nam[i], exclude.missing =
exclude.missing,  :
  length of weights must equal length of x
In addition: Warning message:
In present  !is.na(weights) :
  longer object length is not a multiple of shorter object length

This comes because the by() function passes down a subset of the
variables to be described to describe(), but not a subset of the
weights variable.  describe() then searches the whatever data set is
attached in order to find the weights variables, but this is in its
original (i.e. not subsetted) form.  Here is an example using the
ChickWeight dataset that comes in the datasets package.

data(ChickWeight)
attach(ChickWeight)
library(Hmisc)
#this gives descriptive data on the variables Time and Chick by
levels of Diet)
by(cbind(Time, Chick), Diet, describe)
#trying to add weights, however, does not work for reasons described
above
wgt=rnorm(length(Chick), 12, 1)
by(cbind(Time, Chick), Diet, describe, weights=wgt)

Again, my question is, does anybody know of a function that combines
both the ability to provided weighted descriptives with the ability to
subdivide by the levels of some other variable?


Andrew Miles
Department of Sociology
Duke University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weighted descriptives by levels of another variables

2009-11-14 Thread Andrew Miles

I've noticed that R has a number of very useful functions for  
obtaining descriptive statistics on groups of variables, including  
summary {stats}, describe {Hmisc}, and describe {psych}, but none that  
I have found is able to provided weighted descriptives of subsets of a  
data set (ex. descriptives for both males and females for age, where  
accurate results require use of sampling weights).

Does anybody know of a function that does this?

What I've looked at already:

I have looked at describe.by {psych} which will give descriptives by  
levels of another variable (eg. mean ages of males and females), but  
does not accept sample weights.

I have also looked at describe {Hmisc} which allows for weights, but  
has no functionality for subdivision.

I tried using a by() function with describe{Hmisc}:

by(cbind(my, variables, here), division.variable, describe,  
weights=weight.variable)

but found that this returns an error message stating that the  
variables to be described and the weights variable are not the same  
length:

Error in describe.vector(xx, nam[i], exclude.missing =  
exclude.missing,  :
   length of weights must equal length of x
In addition: Warning message:
In present  !is.na(weights) :
   longer object length is not a multiple of shorter object length

This comes because the by() function passes down a subset of the  
variables to be described to describe(), but not a subset of the  
weights variable.  describe() then searches the whatever data set is  
attached in order to find the weights variables, but this is in its  
original (i.e. not subsetted) form.  Here is an example using the  
ChickWeight dataset that comes in the datasets package.

data(ChickWeight)
attach(ChickWeight)
library(Hmisc)
#this gives descriptive data on the variables Time and Chick by  
levels of Diet)
by(cbind(Time, Chick), Diet, describe)
#trying to add weights, however, does not work for reasons described  
above
wgt=rnorm(length(Chick), 12, 1)
by(cbind(Time, Chick), Diet, describe, weights=wgt)

Again, my question is, does anybody know of a function that combines  
both the ability to provided weighted descriptives with the ability to  
subdivide by the levels of some other variable?


Andrew Miles
Department of Sociology
Duke University




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

52 matches

Mail list logo