Re: [R] Difference betweeen cor.test() and formula everyone says to use
Hi Jeremy, I don't know about references, but this around. See for example: http://afni.nimh.nih.gov/sscc/gangc/tr.html the relevant line in cor.test is: STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2)) You can convert *t*s to *r*s and vice versa. Best, Josh On Fri, Oct 17, 2014 at 10:32 AM, Jeremy Miles wrote: > I'm trying to understand how cor.test() is calculating the p-value of > a correlation. It gives a p-value based on t, but every text I've ever > seen gives the calculation based on z. > > For example: > > data(cars) > > with(cars[1:10, ], cor.test(speed, dist)) > > Pearson's product-moment correlation > > data: speed and dist > t = 2.3893, df = 8, p-value = 0.04391 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > 0.02641348 0.90658582 > sample estimates: > cor > 0.6453079 > > But when I use the regular formula: > > r <- cor(cars[1:10, ])[1, 2] > > r.z <- fisherz(r) > > se <- se <- 1/sqrt(10 - 3) > > z <- r.z / se > > (1 - pnorm(z))*2 > [1] 0.04237039 > > My p-value is different. The help file for cor.test doesn't (seem to) > have any reference to this, and I can see in the source code that it > is doing something different. I'm just not sure what. > > Thanks, > > Jeremy > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluate NA to FALSE instead of NA?
Hi, Perhaps still not as short as you want, but I normally use which(): p <- c(1:10/100, NA, NaN) p[which(p <= .05)] [1] 0.01 0.02 0.03 0.04 0.05 Cheers, Josh On Tue, Oct 14, 2014 at 8:51 PM, Rainer M Krug wrote: > Hi > > I want to evaluate NA and NaN to FALSE (for indexing) so I would like to > have the result as indicated here: > > , > | > p <- c(1:10/100, NA, NaN) > | > p > | [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 NA NaN > | > p[p<=0.05] > | [1] 0.01 0.02 0.03 0.04 0.05 NA NA > | > p[sapply(p<=0.05, isTRUE)] > | [1] 0.01 0.02 0.03 0.04 0.05 <<<=== I want this > ` > > Is there a way that I can do this more easily then in my example above? > It works, but it strikes me that there is not a better way of doing > this - am I missing a command or option? > > Thanks, > > Rainer > > -- > Rainer M. Krug > email: Rainerkrugsde > PGP: 0x0F52F982 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA and permutation tests : beware of traps
Hi Stephane, This is the well known result of limitted floating point precision (e.g., http://www.validlab.com/goldberg/addendum.html). Using a test of approximate rather than exact equality shows R yields the correct answer: nperm <- 1 Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) # calculates nperm F values (prob <- (sum(anova1$F[1] <= (Fperm + .Machine$double.eps ^ 0.5)) + 1)/(nperm +1)) # risk calculation yields 0.10009 I picked .Machine$double.eps ^ 0.5 somewhat arbitrarily as the default for equality testing from ?all.equal Cheers, Josh On Tue, Sep 23, 2014 at 5:14 PM, Stéphane Adamowicz < stephane.adamow...@avignon.inra.fr> wrote: > Recently, I came across a strange and potentially troublesome behaviour of > the lm and aov functions that ask questions about calculation accuracy. Let > us consider the 2 following datasets dat1 & dat2 : > > > (dat1 <- data.frame(Y=c(1:3, 10+1:3), F=c(rep("A",3), rep("B",3 >Y F > 1 1 A > 2 2 A > 3 3 A > 4 11 B > 5 12 B > 6 13 B > > (dat2 <- data.frame(Y=c(10+1:3, 1:3), F=c(rep("A",3), rep("B",3 >Y F > 1 11 A > 2 12 A > 3 13 A > 4 1 B > 5 2 B > 6 3 B > > They only differ in the order of values that were exchanged between > samples A and B. Thus the sd is 1 for each sample in either data sets, and > the absolute mean difference |A-B| is 10 in both datasets. > Now, let us perform an anova to compare samples A and B in both datasets > (of course, in such simple case, a bilateral T test would do the job, but > an anova is nevertheless allowed and should give the same probability than > Student's test): > > > (anova1 <- anova(lm(Y~F, dat1))) > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F valuePr(>F) > F 1150 150 150 0.0002552 *** > Residuals 4 4 1 > > > (anova2 <- anova(lm(Y~F, dat2))) > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F valuePr(>F) > F 1150 150 150 0.0002552 *** > Residuals 4 4 1 > > As expected, both datasets give a same anova table, but this is only > apparent. Indeed : > > > anova1$F[1] == anova2$F[1] > [1] FALSE > > anova1$F[1] - anova2$F[1] > [1] 5.684342e-14 > > In fact the F values differ slightly, and this holds also for the aov > function. I checked also (not shown) that both the residual and factorial > sums of squares differ between dat1 and dat2. Thus, for some undocumented > reason (at least for the end user), the F values depend on the order of > data! > While such tiny differences (e-14 in this example) are devoid of > consequences on the risk evaluation by Fisher's distribution, they may have > huge consequences on the risk evaluation by the permutation method. Indeed, > the shift from continuous to discrete distributions is far from being > insignificant. > > For instance, the following code in R is at the basis of many permutation > algorithms found in the internet and in teaching because it seems quite > straightforward (see for example > http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html > http://www.usna.edu/Users/math/jager/courses/sm439/labs/Lab7.pdf > > http://biostatistics1014.blogspot.fr/2013/04/one-way-anova-permutation-test-and.html > http://adn.biol.umontreal.ca/~numericalecology/Rcode/): > > > nperm <- 1000 # number of permutations > > Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) # > calculates nperm F values > > (prob <- (sum(anova1$F[1] <= Fperm) + 1)/(nperm +1)) # risk calculation > [1] 0.04695305 > > Of course, because of the sample function, repeating this code gives > different prob values, but they remain always close to 5% instead of the > exact probability of 10%. Indeed, there are only choose(6,3) = 20 possible > permutations, but because they are symmetric, they give only 10 absolute > mean differences. Thus, the only exact probabilities are 10%, 20% ... 100%. > In our example where samples do not overlap, 10% is obviously the right > answer. > > Thus, the use of lm and aov functions in permutation methods does not seem > a good idea as it results in biases that underestimate the exact risk. In > the simple case of one-way anova, it is quite simple to remedy this > problem. As the total Sums of squares and the degrees of freedom do not > change with permutations, it is easier and much faster to compare the > residual sums of squares. For instance, the exact probabilities can be > calculated that way : > > > combi <- combn(6, 3, FUN=function(i) {append(dat1$Y[i], dat1$Y[-i])}) # > all permutations > > SCEResid <- apply(combi, 2, FUN=function(x) sum(tapply(x, dat1$F, > function(x) sum((x - mean(x))^2 # all resi SS > > (prob <- mean(SCEResid <= SCEResid[1])) # risk calculation > [1] 0.1 > > 10% is indeed the exact risk. > > Finally, my problem is : How can we know if R libraries that use > randomization procedures are not biased ? In the basic case of one way > anova, it seem
Re: [R] Simulating from a Weibull distribution
Hi Lucy, Try the gamlss.dist package, specifically the rWEI2() function. Cheers, Josh On Wed, Sep 3, 2014 at 11:52 AM, Lucy Leigh wrote: > Hi, > I wish to simulate some data from a Weibull distribution. The rweibull > function in R uses the parameterisation > > 'with shape parameter a and scale parameter b has density given by f(x) = > (a/b) (x/b)^(a-1) exp(- (x/b)^a)'. > > However, it would be much more useful for me to simulate data using a > different parameterisation of the > > Weibull, with shape a1 and scale b1, namely f(x) = > a1*b1*x^(a1-1)exp(-b1*x^a1). > > However I'm not sure how to do this, as I have only ever used the random > generators that come pre-programmed in R. > > Is there a package or example someone can point me to that demonstrates > how to simulate data from any given distribution? > > Cheers, > > Lucy > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A basic statistics question
On Wed, Aug 13, 2014 at 7:41 AM, Rolf Turner wrote: > On 13/08/14 07:57, Ron Michael wrote: > >> Hi, >> >> I would need to get a clarification on a quite fundamental statistics >> property, hope expeRts here would not mind if I post that here. >> >> I leant that variance-covariance matrix of the standardized data is equal >> to the correlation matrix for the unstandardized data. So I used following >> data. >> > > > > > (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] >> >> >> >> Point is that I am not getting exact CORR matrix. Can somebody point me >> what I am missing here? >> > > You are using a denominator of "n" in calculating your "covariance" matrix > for your normalized data. But these data were normalized using the sd() > function which (correctly) uses a denominator of n-1 so as to obtain an > unbiased estimator of the population standard deviation. > As a small point n - 1 is not _quite_ an unbiased estimator of the population SD see Cureton. (1968). Unbiased Estimation of the Standard Deviation, The American Statistician, 22(1). To see this in action: res <- unlist(parLapply(cl, 1:1e7, function(i) sd(rnorm(10, mean = 0, sd = 1 correction <- function(n) { gamma((n-1)/2) * sqrt((n-1)/2) / gamma(n/2) } mean(res) # 0.972583 mean(res * correction(10)) # 0.216 The calculation for sample variance is an unbiased estimate of the population variance, but square root is a nonlinear function and the square root of an unbiased estimator is not itself necessarily unbiased. > > If you calculated > > >(t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) > > then you would get the same result as you get from cor(Data) (to within > about 1e-15). > > cheers, > > Rolf Turner > > -- > Rolf Turner > Technical Editor ANZJS > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical operators and named arguments
On Sat, Aug 9, 2014 at 9:56 AM, Patrick Burns wrote: > On 07/08/2014 07:21, Joshua Wiley wrote: > >> Hi Ryan, >> >> It does work, but the *apply family of functions always pass to the first >> argument, so you can specify e2 = , but not e1 =. For example: >> >> sapply(1:3, `>`, e2 = 2) >>> >> [1] FALSE FALSE TRUE >> > > That is not true: > But it is passed as the first argument, not by name, but positionally. The reason it works with your gt() is because R with regular functions is flexible: > f <- function(x, y) x > y > f(1:3, x = 2) [1] TRUE FALSE FALSE but primitives ARE positionally matched > `>`(1:3, 2) [1] FALSE FALSE TRUE > `>`(1:3, e1 = 2) [1] FALSE FALSE TRUE > > gt <- function(x, y) x > y > > > sapply(1:3, gt, y=2) > [1] FALSE FALSE TRUE > > sapply(1:3, gt, x=2) > [1] TRUE FALSE FALSE > > Specifying the first argument(s) in an apply > call is a standard way of getting flexibility. > > I'd hazard to guess that the reason the original > version doesn't work is because `>` is Primitive. > There's speed at the expense of not behaving quite > the same as typical functions. > > Pat > > >> From ?sapply >>> >> >> 'lapply' returns a list of the same length as 'X', each element of >> which is the result of applying 'FUN' to the corresponding element >> of 'X'. >> >> so `>` is applied to each element of 1:3 >> >> `>`(1, ...) >> `>`(2, ...) >> `>`(3, ...) >> >> and if e2 is specified than that is passed >> >> `>`(1, 2) >> `>`(2, 2) >> `>`(3, 2) >> >> Further, see ?Ops >> >> If the members of this group are called as functions, any >>argument names are removed to ensure that positional matching >>is always used. >> >> and you can see this at work: >> >> `>`(e1 = 1, e2 = 2) >>> >> [1] FALSE >> >>> `>`(e2 = 1, e1 = 2) >>> >> [1] FALSE >> >> If you want to the flexibility to specify which argument the elements of X >> should be *applied to, use a wrapper: >> >> sapply(1:3, function(x) `>`(x, 2)) >>> >> [1] FALSE FALSE TRUE >> >>> sapply(1:3, function(x) `>`(2, x)) >>> >> [1] TRUE FALSE FALSE >> >> >> HTH, >> >> Josh >> >> >> >> On Thu, Aug 7, 2014 at 2:20 PM, Ryan wrote: >> >> Hi, >>> >>> I'm wondering why calling ">" with named arguments doesn't work as >>> expected: >>> >>> args(">") >>>> >>> function (e1, e2) >>> NULL >>> >>> sapply(c(1,2,3), `>`, e2=0) >>>> >>> [1] TRUE TRUE TRUE >>> >>> sapply(c(1,2,3), `>`, e1=0) >>>> >>> [1] TRUE TRUE TRUE >>> >>> Shouldn't the latter be FALSE? >>> >>> Thanks for any help, >>> Ryan >>> >>> >>> The information in this e-mail is intended only for th...{{dropped:23}} >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > -- > Patrick Burns > pbu...@pburns.seanet.com > twitter: @burnsstat @portfolioprobe > http://www.portfolioprobe.com/blog > http://www.burns-stat.com > (home of: > 'Impatient R' > 'The R Inferno' > 'Tao Te Programming') > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical operators and named arguments
Hi Ryan, It does work, but the *apply family of functions always pass to the first argument, so you can specify e2 = , but not e1 =. For example: > sapply(1:3, `>`, e2 = 2) [1] FALSE FALSE TRUE >From ?sapply 'lapply' returns a list of the same length as 'X', each element of which is the result of applying 'FUN' to the corresponding element of 'X'. so `>` is applied to each element of 1:3 `>`(1, ...) `>`(2, ...) `>`(3, ...) and if e2 is specified than that is passed `>`(1, 2) `>`(2, 2) `>`(3, 2) Further, see ?Ops If the members of this group are called as functions, any argument names are removed to ensure that positional matching is always used. and you can see this at work: > `>`(e1 = 1, e2 = 2) [1] FALSE > `>`(e2 = 1, e1 = 2) [1] FALSE If you want to the flexibility to specify which argument the elements of X should be *applied to, use a wrapper: > sapply(1:3, function(x) `>`(x, 2)) [1] FALSE FALSE TRUE > sapply(1:3, function(x) `>`(2, x)) [1] TRUE FALSE FALSE HTH, Josh On Thu, Aug 7, 2014 at 2:20 PM, Ryan wrote: > Hi, > > I'm wondering why calling ">" with named arguments doesn't work as > expected: > > > args(">") > function (e1, e2) > NULL > > > sapply(c(1,2,3), `>`, e2=0) > [1] TRUE TRUE TRUE > > > sapply(c(1,2,3), `>`, e1=0) > [1] TRUE TRUE TRUE > > Shouldn't the latter be FALSE? > > Thanks for any help, > Ryan > > > The information in this e-mail is intended only for th...{{dropped:23}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a knitr question
On Thu, Jul 31, 2014 at 9:47 AM, Duncan Murdoch wrote: > On 30/07/2014, 2:20 PM, Yihui Xie wrote: > > As a reader, I often want to run the code by myself _while_ I'm > > reading a particular part of an article/report. I find it convenient > > to be able to copy the code as I'm reading it, instead of minimizing > > my current window, opening an R script, and running the part that I'm > > interested in. Of course, this may not work if the code I copy is not > > self-contained; your purl() approach certainly has an advantage > > sometimes. > > > > I do not see a whole lot of value in maintaining the same appearance > > of the R code in the R console and a report. You can teach your > > students what the prompt characters mean, and I think that is enough. > > Journal of Statistical Software requires "R> " as the prompt character > > (which is worse), and your students will probably be confused when > > reading JSS papers if they have been seeing the default prompts all > > the time. I see the point of keeping prompts (i.e. I do not completely > > disagree), but I do not think it is an essential or important thing to > > do. Personally I prefer reading "vanilla" code, and >/+ may confuse my > > eyes occasionally, e.g. > > > >> z > 5 > >> x + > > + y > > > > (More on prompts: > > http://yihui.name/en/2013/01/code-pollution-with-command-prompts/) > > > > Re Rich: yes, I'm aware of approaches of post-processing the prompts, > > but this problem would not have existed in the first place if we do > > not include prompts at all. I'm not sure if it makes much sense to > > create some mess and clean it afterwards. > > > > So your suggestion is that the R console should not prompt for input? > Do you know of *any* interactive system which doesn't prompt for input? > How would users be able to tell the difference between R waiting for > input, and R busy on the last calculation? > > I don't think that this is about prompts in interactive R, but when a document is knit, should the echoed code in the report have prompts or not. > Duncan Murdoch > > > > Regards, > > Yihui > > -- > > Yihui Xie > > Web: http://yihui.name > > > > > > On Wed, Jul 30, 2014 at 12:50 PM, Greg Snow <538...@gmail.com> wrote: > >> My preference when teaching is to have the code and results look the > >> same as it appears in the R console window, so with the prompts and > >> without the output commented. But then I also `purl` my knitr file to > >> create a script file to give to the students that they can copy and > >> paste from easily. > >> > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a package for EFA with multiple groups?
Hi Elizabeth, In confirmatory factor analysis with multiple groups, the reason one needs to estimate the models simultaneously is that, typically, one is interested in applying constraints (e.g., forcing all or some of the factor loadings to be equal across groups). In exploratory factor analysis, constraints are uncommon (they are somewhat un-exploratory). I would suggest simply using the psych package and subsetting your data to the particular group, as in: efa( data = subset(data, Group == "Group1") ) efa( data = subset(data, Group == "Group2") ) etc. As you noted, lavaan will allow you to test multiple group CFAs, so if/when you are ready to see whether the same configural factor structure or any other level of invariance holds across your groups, you can use it. Sincerely, Josh On Mon, Jul 28, 2014 at 2:46 PM, Elizabeth Barrett-Cheetham < ebarrettcheet...@gmail.com> wrote: > Hello R users, > > Iâm hoping to run an exploratory and confirmatory factor analysis on a > psychology survey instrument. The data has been collected from > multiple groups, and itâs likely that the data is hierarchical/has 2nd > order factors. > > It appears that the lavaan package allows me to run a multiple group > hierarchical confirmatory factor analysis. Yet, I canât locate a > package that can run the equivalent exploratory analysis. > > Could anyone please direct me to an appropriate package? > > Many thanks, > > Elizabeth > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-cox transformation
Dear Ravi, In my previous example, I used the residuals, so: sum [ (r_i / scaling)^2 ] If you want to use the deviance from glm, that gives you: sum [ r_i^2 ] and since the scaling factor is just a constant for any given lambda, then the modification would be: sum [ r_i^2 ] / ( scaling^2 ) and is given in the modified code below (posted back to R-help in case any else has this question). Hope this helps, Josh ## require(MASS) myp <- function(y, lambda) (y^lambda-1)/lambda lambda <- seq(-0.05, 0.45, len = 20) N <- nrow(quine) res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c("Lambda", "LL"))) # scaling contant C <- exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { SS <- deviance(glm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL <- (- (N/2) * log(SS/((C^lambda[i])^2))) res[i, ] <- c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:57 PM, Ravi Varadhan wrote: > Dear Josh, > Thank you very much. I knew that the scaling had to be adjusted, but was not > sure on how to do this. > > Can you please show me how to do this scaling with `glm'? In other words, how > would I scale the deviance from glm? > > Thanks, > Ravi > > -Original Message- > From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] > Sent: Sunday, July 06, 2014 11:34 PM > To: Ravi Varadhan > Cc: r-help@r-project.org > Subject: Re: [R] Box-cox transformation > > Hi Ravi, > > Deviance is the SS in this case, but you need a normalizing constant adjusted > by the lambda to put them on the same scale. I modified your example below > to simplify slightly and use the normalization (see the LL line). > > Cheers, > > Josh > > ## > > require(MASS) > > myp <- function(y, lambda) (y^lambda-1)/lambda > > > lambda <- seq(-0.05, 0.45, len = 20) > N <- nrow(quine) > res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, > c("Lambda", "LL"))) > > # scaling contant > C <- exp(mean(log(quine$Days+1))) > > for(i in seq_along(lambda)) { > r <- resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) > LL <- (- (N/2) * log(sum((r/(C^lambda[i]))^2))) > res[i, ] <- c(lambda[i], LL) > } > > # box cox > boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our > points on top to verify match points(res[, 1], res[,2], pch = 16) > > ## > > > > On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan wrote: >> Hi, >> >> I am trying to do Box-Cox transformation, but I am not sure how to do it >> correctly. Here is an example showing what I am trying: >> >> >> >> # example from MASS >> >> require(MASS) >> boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, >>lambda = seq(-0.05, 0.45, len = 20)) >> >> # Here is My attempt at getting the profile likelihood for the Box-Cox >> parameter lam <- seq(-0.05, 0.45, len = 20) dev <- rep(NA, length=20) >> >> for (i in 1:20) { >> a <- lam[i] >> ans <- glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = >> quine) dev[i] <- ans$deviance } >> >> plot(lam, dev, type="b", xlab="lambda", ylab="deviance") >> >> I am trying to create the profile likelihood for the Box-Cox parameter, but >> obviously I am not getting it right. I am not sure that ans$deviance is the >> right thing to do. >> >> I would appreciate any guidance. >> >> Thanks & Best, >> Ravi >> >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Joshua F. Wiley > Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior > Analyst, Elkhart Group Ltd. > http://elkhartgroup.com > Office: 260.673.5518 -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-cox transformation
Hi Ravi, Deviance is the SS in this case, but you need a normalizing constant adjusted by the lambda to put them on the same scale. I modified your example below to simplify slightly and use the normalization (see the LL line). Cheers, Josh ## require(MASS) myp <- function(y, lambda) (y^lambda-1)/lambda lambda <- seq(-0.05, 0.45, len = 20) N <- nrow(quine) res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c("Lambda", "LL"))) # scaling contant C <- exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { r <- resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL <- (- (N/2) * log(sum((r/(C^lambda[i]))^2))) res[i, ] <- c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan wrote: > Hi, > > I am trying to do Box-Cox transformation, but I am not sure how to do it > correctly. Here is an example showing what I am trying: > > > > # example from MASS > > require(MASS) > boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, >lambda = seq(-0.05, 0.45, len = 20)) > > # Here is My attempt at getting the profile likelihood for the Box-Cox > parameter > lam <- seq(-0.05, 0.45, len = 20) > dev <- rep(NA, length=20) > > for (i in 1:20) { > a <- lam[i] > ans <- glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = quine) > dev[i] <- ans$deviance > } > > plot(lam, dev, type="b", xlab="lambda", ylab="deviance") > > I am trying to create the profile likelihood for the Box-Cox parameter, but > obviously I am not getting it right. I am not sure that ans$deviance is the > right thing to do. > > I would appreciate any guidance. > > Thanks & Best, > Ravi > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame(1)*1:4 = 1?
Hi Spencer, One piece is that a data frame of the same dimensions as went in comes out. The second piece is that the vector is recycled. So in your first example: data.frame(1) * 1:4 you only end up with the first element: data.frame(1) * 1 If you try: data.frame(1) * 4:1 you get a data frame with a value of 4. Now for: data.frame(1:2, 3:4) * 5:7 recycling kicks in again, and you get: 1 * 5, 2 * 6, 3 * 7, and 4 * 5 When working with vectors, you get recycling and it expands to the greater length vector: 1:3 * 1:6 has length 6. But data frames are sort of a 'higher' class and the dimensions of the data frame trump the vector. A slightly different behavior is observed for matrices: matrix(1:6, ncol=2) * 1:3 Gives recycling as expected to the longer of the vectors, but matrix(1:6, ncol=2) * 1:9 gives an error, but the error is _not_ directly in the multiplication, as it were, but rather the results (which because matrices are stored as vectors has expanded to be the length of the longer vector, here 1:9) do not match the input dimensions of the matrix. In particular, this is the same as trying to do: x <- 1:9 attributes(x)$dim <- c(3, 2) Error in attributes(x)$dim <- c(3, 2) : dims [product 6] do not match the length of object [9] basically, R gets the result of 1:6 * 1:9, but then cannot format it back as a matrix, because the saved dimensions do not fit the new resulting data. You can verify that R does indeed to the calculations if you go under the hood --- the multiplication is done, and then it tries to apply the dims and it errors out. Cheers, Josh On Wed, Apr 2, 2014 at 11:42 PM, Spencer Graves < spencer.gra...@structuremonitoring.com> wrote: > Hello, All: > > > What's the logic behind "data.frame(1)*1:4" producing a scalar 1? > Or the following: > > > data.frame(1:2, 3:4)*5:7 > X1.2 X3.4 > 15 21 > 2 12 20 > > > I stumbled over this, because I thought I was multiplying a scalar > times a vector, and obtaining a scalar rather than the anticipated vector. > I learned that my "scalar" was in fact a data.frame with one row and one > column. > > > What am I missing? > > > Thanks, > Spencer > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predicted values
Dear Felipe, That is a normal behavior --- The prediction for that simple model decreases over time, and ends up negative. If the outcome cannot take on negative values, treating it as a continuous gaussian may not be optimal --- perhaps some transformation, like using a log link so that the expoentiated values are always positive would be better? Alternately, if the predictions are going negative, not because the data is over all, but say there is a quick decrease in values in the first part of time but later on it slows, but if you have an overly simplisitic time model, it may just keep decreasing. Using a smoother with a higher basis dimensions may help more accurately model the function over the span of time in your dataset and then not have predicted values. I do not think that there would be any straight forward 'force' the model to be positive only. Best, Joshua On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo wrote: > Consider this dummy dataset. > My real dataset with over 1000 records has > scatter large and small values. > I want to predict for values with NA but I > get negative predictions. Is this a normal > behaviour or I am missing a gam argument > to force the model to predict positive values. > library(mgcv) > test <- data.frame(iddate=seq(as.Date("2014-01-01"), > as.Date("2014-01-12"), by="days"), > value=c(300,29,22,NA,128,24,15,1,3,30,NA,2)) > test > str(test) > mod <- gam(value ~ s(as.numeric(iddate)),data=test) > # Predict for values with NA's > test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value)) > test > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] useR! 2014 cal for tutorials
The R User Conference, useR! 2014 is scheduled for July 1-3, 2014 at the University of California, Los Angeles. Before the official program, half-day tutorials will be offered on Monday, June 30. We invite R users to submit proposals for three hour tutorials on special topics regarding R. The proposals should include: 1) A brief description (abstract) of the tutorial 2) Goals 3) Detailed outline 4) Justification of why the tutorial is important 5) Background knowledge required and potential attendees The deadline for tutorial submission is January 5, 2014. Please email all tutorial proposals to useR-2014_at_R-project.org. A web page offering more information on the useR! conference is available at http://www.R-project.org/useR-2014 We look forward to your submissions, The organizing committee: Yunyun Dai, Phil Ender, Jan de Leeuw, David McArthur, Amelia McNamara, Sanjog Misra, Katharine Mullen, Jeroen Ooms, Szilard Pafka, Tim Triche, Joshua Wiley __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting contrasts
1 0.1558 0.6940313 > #Condition 351.52 4 10.9500 2.80e-07 *** > #Age:Condition 190.30 4 5.9279 0.0002793 *** > #Residuals722.30 90 > > ##Damn! We are still wrong even though the above shows sum contrasts > > ## So we do it another way > options(contrasts = c("contr.sum","contr.poly")) > getOption("contrasts") > #[1] "contr.sum" "contr.poly" > resutsCar4 <- lm(Recall~Age*Condition, data = Eysenck ) > type4 <- Anova(resultsCar4, type = "III") > #print(type4) # Now we're back where we should be > #Anova Table (Type III tests) > # > #Response: Recall > # Sum Sq Df F value Pr(>F) > #(Intercept) 13479.2 1 1679.5361 < 2.2e-16 *** > #Age 240.2 1 29.9356 3.981e-07 *** > #Condition 1514.9 4 47.1911 < 2.2e-16 *** > #Age:Condition 190.3 45.9279 0.0002793 *** > #Residuals 722.3 90 > ## Now it works just fine. > ##So what is the difference between setting contrasts individuall and > setting > ##them through the options? > > I get similar results if I use drop1, but perhaps that is what Fox did > also. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How would i sum the number of NA's in multiple vectors
Or faster (both computational speed and amount of code): colSums(is.na(rbind(c1, c2, c3))) On Thu, Oct 17, 2013 at 4:34 AM, Carl Witthoft wrote: > mattbju2013 wrote > > Hi guys this is my first post, i need help summing the number of NA's in > a > > few vectors > > > > for example.. > > > > c1<-c(1,2,NA,3,4) > > c2<-c(NA,1,2,3,4) > > c3<-c(NA,1,2,3,4) > > > > how would i get a result that only sums the number of NA's in the vector? > > the.result.i.want<-c(2,0,1,0,0) > > See ?is.na . > Now, if I can interpret your question correctly, you're actually looking > for > the number of NA per *position* in the vectors, so let's make them into a > matrix first. > > cmat<-rbind(c1,c2,c3) > then use apply over columns > apply(cmat,2,function(k)sum(is.na(k))) > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-would-i-sum-the-number-of-NA-s-in-multiple-vectors-tp4678411p4678432.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to compile R in win 7 (with Rtools) ... tcl.h
Hi Cleber, When you install Rtools, it asks you the home directory of R, and there it puts a directory called src and Tcl. You need to copy those over to whereever you are making R. So for example, I have: C:\usr\R\R-devel\Tcl Where I tar -xf R devel into C:\usr\R\ and then copy the src and Tcl dirs from R tools over into C:\usr\R\R-devel\ Also don't forget to when you finish make all recommended to cd bitmap make all so you can save graphs as PNG, etc. Cheers, Josh On Fri, Oct 4, 2013 at 7:29 PM, Cleber N.Borges wrote: > stop because > *had a stone in the middle of the way* > *in the middle of the way had a stone* > (by vinicius de moraes) > # > > so, one more help? somebody? :-) > > thanks... > cleber > > building package 'tcltk' > making init.d from init.c > making tcltk.d from tcltk.c > making tcltk_win.d from tcltk_win.c > gcc -I"../../../../include" -DNDEBUG -I "../../../../Tcl"/include -DWin32 > -O3 -Wall -std=gnu99 -mtune=core2 -c init.c -o init.o > In file included from init.c:22:0: > tcltk.h:23:17: fatal error: tcl.h: No such file or directory > compilation terminated. > make[4]: *** [init.o] Error 1 > make[3]: *** [mksrc-win2] Error 1 > make[2]: *** [all] Error 2 > make[1]: *** [R] Error 1 > make: *** [all] Error 2 > > > > > Em 04/10/2013 22:46, Cleber N.Borges escreveu: > >> >> bingo! :-) >> I got one pass to advanced! >> >> my TMP environment variable is: >> %SystemRoot%\TEMP >> >> >> thanks >> cleber >> >> >> Em 04/10/2013 22:02, Joshua Wiley escreveu: >> >>> Hi Cleber, >>> >>> You need to set TMPDIR to a valid directory, the default /tmp/ does not >>> work on Windows. >>> >>> From the cmd shell: >>> >>> set TMPDIR=C:/TMP >>> >>> for example >>> >>> and then run make all recommended >>> >>> Cheers, >>> >>> Josh >>> >>> >>> > ______** > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to compile R in win 7 (with Rtools)
not create /tmp/R4428: directory nonexistent > mv: cannot stat `/tmp/R4428': No such file or directory > make[3]: *** [mkR1] Error 1 > make[2]: *** [all] Error 2 > make[1]: *** [R] Error 1 > make: *** [all] Error 2 > > C:\Rsrc\R-3.0.2\src\gnuwin32> > > __** > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (sans objet)
Hi Jesse, Here is one approach: score <- function(dat, minimumN) { # get the number of columns (variables) k <- ncol(dat) # take the row means, excluding missing mean <- rowMeans(dat, na.rm=TRUE) # get the number missing for each row nmiss <- rowSums(is.na(dat)) # if nmiss is greater than threshold, set to missing # k = columns, minimum N is mean[nmiss > (k - minimumN)] <- NA # calculate the sum (reweighted by missing) sum <- mean * k # put results in a dataframe and return data.frame(Mean = mean, Sum = sum, Nmissing = nmiss) } # score 4 variables, requiring at least 3 score(mtcars[, c("mpg", "disp", "hp", "wt")], 3) # or put back into dataset, just the mean mtcars$ScaleMean <- score(mtcars[, c("mpg", "disp", "hp", "wt")], 3)$Mean It will be somewhat faster than using apply() because all you need are rowMeans, which uses more optimized code. Cheers, Josh On Thu, Oct 3, 2013 at 3:42 PM, Jesse Gervais wrote: > Hello there, > > I try to construct a variable with R, but I have some difficulties. > > Assume that I use a data set named = mydata. I want to create a variable > that is the mean (totmean) or the sum (totsum) of 6 variables (var1, var2, > var3, var4, var5, var6). However, I want only participants who have > responded to at least 4 items to be include. Those who have one or two > missing for var1-var6 should be coded NA for totmean and totsum. > > How I do that? > > Thank you! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Translating recoding syntax from SPSS to R
Just a slight addition to Arun's answer: within() saves typing and makes life a bit easier # set.seed(429) dat1 <- data.frame( race=sample(1:3,20,replace=TRUE), usborn=sample(0:2,20,replace=TRUE)) # within is a nice way to add variables # and make modifications within a dataset # saves typing lots of dat1$ dat1 <- within(dat1, { confused <- 1 * ((race==1 | race==2) & usborn==0) }) head(dat1) # only confused 0 dat1 <- subset(dat1, confused == 0) # On Sat, Sep 21, 2013 at 11:47 PM, arun wrote: > Hi, > Try: > set.seed(429) > dat1<- > data.frame(race=sample(1:3,20,replace=TRUE),usborn=sample(0:2,20,replace=TRUE)) > dat1$confused<- 1*((dat1$race==1|dat1$race==2) & dat1$usborn==0) > head(dat1) > # race usborn confused > #13 20 > #21 01 > #32 10 > #43 20 > #51 20 > #61 10 > A.K. > > > > - Original Message - > From: Mosi Ifatunji > To: "r-help@r-project.org" > Cc: > Sent: Saturday, September 21, 2013 6:03 PM > Subject: [R] Translating recoding syntax from SPSS to R > > Colleagues, > > I am in the process of learning R. I've been able to import my dataset (from > Stata) and do some simple coding. I have now come to coding situation that > requires some assistance. This is some code in SPSS that I would like to be > able to execute in R: > > if (race eq 1 and usborn=0) confused=1 . > if (race eq 2 and usborn=0) confused=1 . > if (race eq 1 and usborn=1) confused=0 . > if (race eq 2 and usborn=1) confused=0 . > if (race eq 3 and usborn=1) confused=0 . > if (race eq 3 and cohort=1) confused=0 . > if (race eq 3 and cohort=2) confused=0 . > variable labels confused "R claims to be both an African American and foriegn > born" . > value labels confused > 1 "Both AfAm and Foreign" > 2 "Not" . > select if (confused eq 0) . > > Any assistance would be greatly appreciated. > > -- Mosi > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with converting F to FALSE
Hi, You can either manually specify colClasses or the asis argument. See ?read.csv for more details. If you just had those two columns, something like: read.table(header = TRUE, text = " sex group F 1 T 2 ", colClasses = c("character", "integer")) Cheers, Josh read.csv("file.csv", colClasses = c("character", "integer")) On Thu, Sep 5, 2013 at 5:44 AM, Venkata Kirankumar wrote: > Hi, > I have a peculier problem in R-Project that is when my CSV file have one > column with all values as 'F' the R-Project converting this 'F' to FALSE. > Can some one please suggest how to stop this convertion. Because I want to > use 'F' in my calculations and show it in screen. for example my data is > like > > sex group > F 1 > F 2 > F 3 > > but when I use read.csv and load the csv file data is converting it to > > sex group > FALSE 1 > FALSE 2 > FALSE 3 > but i want it as source data like > > sex group > F 1 > F 2 > F 3 > > > Thanks in advance, > D V Kiran Kumar > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read a Google Spreadsheet?
Hi Spencer, It really is not very hard, and I have never had issue with it: http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html Just download the x86 and x64 versions for your OS and install. Worst case, you need to add the directory to the PATH variable in Windows. I do this regularly so I can use/test either version of R. Cheers, Josh P.S. Emacs + ESS allows for different versions of R and it is not too difficult to use the 64 or 32 bit version... M-x R-version-architecture On Wed, Sep 4, 2013 at 6:36 PM, Spencer Graves wrote: > On 9/4/2013 6:09 PM, Ista Zahn wrote: >> >> Hi Spencer, >> >> Why don't you want to install 64bit Java? > > > > That may be a reasonable approach. > > > I may have Java confused with something else, but I remember hearing > that it was difficult or unwise to try to install both 32- and 64-bit > versions of something like Java or Java Script on the same Windows operating > system. If I need to uninstall 32-bit Java to install 64-bit, who knows > what else I could break. I'm a statistician, not an information > technologist: If I spend more time playing with Java, I'll have less time > for other things I want to do. > > > Thanks for the reply. > Spencer >> >> >> On Wed, Sep 4, 2013 at 6:12 PM, Spencer Graves >> wrote: >>> >>> Hello, All: >>> >>> >>> What do you recommend for reading a Google Spreadsheet into R? I didn't >>> find >>> anything useful using library(sos); findFn('google spreadsheet'). >>> >>> >>> I can solve the problem by downloading the file either as *.ods or *.xlsx >>> format, then opening it and saving it as *.xls, then using >>> read.xls{gdata}. >>> >>> >>> Alternatives I haven't tried use read.xlsx{xlsx} and >>> readWorksheetFromFile{XLConnect} with 32-bit R. Neither of these work for >>> me >>> with 64-bit R, because they can't find an appropriate rJava on my >>> computer; >>> see below. (I've been using 64-bit R with Emacs, so switching to 32-bit R >>> is >>> not completely trivial.) Similarly, read.gnumeric.sheet{gnumeric} >>> requires >>> the "external program, ssconvert", which seems not to be available on my >>> computer or installed for 64-bit R. >>> >>> >>> What do you suggest? Avoid 64-bit R unless I really need it? That seems >>> to >>> be the message I'm getting from this. (The writeFindFn2xls{sos} also >>> works >>> in 32-bit R but fails in 64-bit apparently for the same reason.) >>> >>> >>> Thanks, >>> Spencer >>> >>> >>>> library(xlsx) >>> >>> Loading required package: xlsxjars >>> Loading required package: rJava >>> Error : .onLoad failed in loadNamespace() for 'rJava', details: >>> call: fun(libname, pkgname) >>> error: No CurrentVersion entry in Software/JavaSoft registry! Try >>> re-installing Java and make sure R and Java have matching architectures. >>> Error: package ‘rJava’ could not be loaded >>>> >>>> library(XLConnect) >>> >>> Loading required package: rJava >>> Error : .onLoad failed in loadNamespace() for 'rJava', details: >>> call: fun(libname, pkgname) >>> error: No CurrentVersion entry in Software/JavaSoft registry! Try >>> re-installing Java and make sure R and Java have matching architectures. >>> Error: package ‘rJava’ could not be loaded >>>> >>>> sessionInfo() >>> >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 >>> [2] LC_CTYPE=English_United States.1252 >>> [3] LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C >>> [5] LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple regression (with interactions) by hand
Hi Christoph, ginv() computes the Moore-Penrose generalized inverse by way of a singular value decomposition. Part of the calculation involves taking the reciprocal of the non zero values. In practice, non zero is really "within some precision tolerance of zero". Numerical precision can bite you in scientific computing. There are many examples where the most conceptually straightforward approach is not the best approach because whereas the equation may be easy to write symbolically, it is more vulnerable to rounding or truncation errors that occur in floating point representations. Aside from working through some matrix algebra for understanding, using established code (like lm) for models where the authors will have taken issues like numerical precision and stability into consideration is generally safest. Cheers, Josh On Tue, Sep 3, 2013 at 6:22 AM, Christoph Scherber wrote: > Dear all, > > But why are there such huge differences betwen solve() and ginv()? (see code > below)? > > ## > m1=lm(Ozone~Solar.R*Wind,airquality) > > # remove NA´s: > airquality2=airquality[complete.cases(airquality$Ozone)& > complete.cases(airquality$Solar.R)& > complete.cases(airquality$Wind),] > > # create the model matrix by hand: > X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) > # is the same as: > model.matrix(m1) > # create the response vector by hand: > Y=airquality2$Ozone > # is the same as: > m1$model$Ozone > # Now solve for the parameter estimates: > > solve(crossprod(X)) %*% crossprod(X,Y) #gives the correct answer > > library(MASS) > ginv(t(X)%*%X)%*%t(X)%*%Y #gives a wrong answer > > > > > > Am 03/09/2013 12:29, schrieb Joshua Wiley: >> Hi Christoph, >> >> Use this matrix expression instead: >> >> solve(crossprod(X)) %*% t(X) %*% Y >> >> Note that: >> >> all.equal(crossprod(X), t(X) %*% X) >> >> Cheers, >> >> Joshua >> >> >> >> On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber >> wrote: >>> Dear all, >>> >>> I´ve played around with the "airquality" dataset, trying to solve the >>> matrix equations of a simple >>> multiple regression by hand; however, my matrix multiplications don´t lead >>> to the estimates returned >>> by coef(). What have I done wrong here? >>> >>> ## >>> m1=lm(Ozone~Solar.R*Wind,airquality) >>> >>> # remove NA´s: >>> airquality2=airquality[complete.cases(airquality$Ozone)& >>> complete.cases(airquality$Solar.R)& >>> complete.cases(airquality$Wind),] >>> >>> # create the model matrix by hand: >>> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) >>> # is the same as: >>> model.matrix(m1) >>> >>> # create the response vector by hand: >>> Y=airquality2$Ozone >>> # is the same as: >>> m1$model$Ozone >>> >>> # Now solve for the parameter estimates: >>> library(MASS) >>> ginv(t(X)%*%X)%*%t(X)%*%Y >>> >>> # is not the same as: >>> coef(m1) >>> >>> ## >>> Now why is my result (line ginv(...)) not the same as the one returned by >>> coef(m1)? >>> >>> Thanks very much for your help! >>> >>> Best regards, >>> Christoph >>> >>> [using R 3.0.1 on Windows 7 32-Bit] >>> >>> >>> >>> >>> >>> -- >>> PD Dr Christoph Scherber >>> Georg-August University Goettingen >>> Department of Crop Science >>> Agroecology >>> Grisebachstrasse 6 >>> D-37077 Goettingen >>> Germany >>> phone 0049 (0)551 39 8807 >>> fax 0049 (0)551 39 8806 >>> http://www.gwdg.de/~cscherb1 >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim evils
Hi Michael, You do not need to create a self-contained example from the mass of code where it is embedded, but given that optim() works in many cases, to file a bug report, you do need to give _an_ example where it is failing. Here is an example where it works great: > optim(1, fn = function(x) x - 5, method = "CG", lower = 3) $par [1] 3 $value [1] -2 $counts function gradient 11 $convergence [1] 0 $message [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL" Warning message: In optim(1, fn = function(x) x - 5, method = "CG", lower = 3) : bounds can only be used with method L-BFGS-B (or Brent) and it gives a warning at the end regarding L-BFGS-B. On Wed, Sep 4, 2013 at 1:34 AM, Michael Meyer wrote: > It would take some effort to extract selfcontained code from the mass of code > wherein this optimization is embedded. Moreover I would have to obtain > permission from my employer to do so. > > This is not efficient. > However some things are evident from the trace log which I have submitted: > (a) L-BFGS-B does not identify itself even though it was called overriding > the method > parameter in optim. > (b) Optim reports as final converged minimum value a function value that is > much larger than > others computed during the optimization. > > I think we can agree on calling this a bug. > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple regression (with interactions) by hand
Hi Christoph, Use this matrix expression instead: solve(crossprod(X)) %*% t(X) %*% Y Note that: all.equal(crossprod(X), t(X) %*% X) Cheers, Joshua On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber wrote: > Dear all, > > I´ve played around with the "airquality" dataset, trying to solve the matrix > equations of a simple > multiple regression by hand; however, my matrix multiplications don´t lead to > the estimates returned > by coef(). What have I done wrong here? > > ## > m1=lm(Ozone~Solar.R*Wind,airquality) > > # remove NA´s: > airquality2=airquality[complete.cases(airquality$Ozone)& > complete.cases(airquality$Solar.R)& > complete.cases(airquality$Wind),] > > # create the model matrix by hand: > X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) > # is the same as: > model.matrix(m1) > > # create the response vector by hand: > Y=airquality2$Ozone > # is the same as: > m1$model$Ozone > > # Now solve for the parameter estimates: > library(MASS) > ginv(t(X)%*%X)%*%t(X)%*%Y > > # is not the same as: > coef(m1) > > ## > Now why is my result (line ginv(...)) not the same as the one returned by > coef(m1)? > > Thanks very much for your help! > > Best regards, > Christoph > > [using R 3.0.1 on Windows 7 32-Bit] > > > > > > -- > PD Dr Christoph Scherber > Georg-August University Goettingen > Department of Crop Science > Agroecology > Grisebachstrasse 6 > D-37077 Goettingen > Germany > phone 0049 (0)551 39 8807 > fax 0049 (0)551 39 8806 > http://www.gwdg.de/~cscherb1 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble with Slidify and Latex
Hi Noah, Looks like mathjax --- \( is the inline escape for mathjax (http://www.mathjax.org/download/) As an example, checkout slide 20 (http://elkhartgroup.com/tutorials/psychometrics.html) Cheers, Josh On Sun, Sep 1, 2013 at 1:40 PM, Noah Silverman wrote: > Hi, > > (Re-submitting as the original doesn't look like it made it to the list.) > > Just starting to play around with the awesome Slidify package. > > For some reason, I can't get it to render any Latex in the presentation. > Have reviews all the docs and think I'm doing things correctly. Is there > something possibly broken with my installation, or am I misunderstanding the > markdown syntax? > > I have, on a single slide: > > Test $A = 1+2$ and some text after > > What I see in the Presentation: > > > Test \(A = 1+2\) and some text after > > > Note: This is literally a "slash" followed by a "parenthesis" in the final > HTML slide. > > > Any ideas on what's wrong here? > > Thanks. > > -- > Noah Silverman, C.Phil > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bootstrapping respecting subject level information
Hi, It is not the easiest to follow code, but when I was working at UCLA, I wrote a page demonstrating a multilevel bootstrap, where I use a two stage sampler, (re)sampling at each level. In your case, could be first draw subjects, then draw observations within subjects. A strata only option does not resample all sources of variability, which are: 1) which subjects you get and 2) which observations within those The page is here: http://www.ats.ucla.edu/stat/r/dae/melogit.htm As a side note, it demonstrates a mixed effects model in R, although as I mentioned it is not geared for beginners. Cheers, Josh On Wed, Jul 3, 2013 at 7:19 PM, Sol Lago wrote: > Hi there, > > This is the first time I use this forum, and I want to say from the start I > am not a skilled programmer. So please let me know if the question or code > were unclear! > > I am trying to bootstrap an interaction (that is my test statistic) using the > package "boot". My problem is that for every resample, I would like the > randomization to be done within subjects, so that observations from different > subjects are not mixed. Here is the code to generate a dataframe similar to > mine: > > Subject = rep(c("S1","S2","S3","S4"),4) > Num = rep(c("singular","plural"),8) > Gram= rep(c("gram","gram","ungram","ungram"),4) > RT = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920) > data= data.frame(Subject,Num,Gram,RT) > > This is the code I used to get the empirical interaction value: > > summary(lm(RT ~ Num*Gram, data=data)) > > As you can see, the interaction between my two factors is -348. I want to get > a bootstrap confidence interval for this statistic, which I can generate > using the "boot" package: > > #Function to create the statistic to be boostrapped > boot.huber <- function(data, indices) { > data <- data[indices, ] #select obs. in bootstrap sample > mod <- lm(RT ~ Num*Gram, data=data) > coefficients(mod) #return coefficient vector > } > > #Generate bootstrap estimate > data.boot <- boot(data, boot.huber, 1999) > > #Get confidence interval > boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets > the CI for the interaction > > My problem is that I think the resamples should be generated without mixing > the individual subjects observations: that is, to generate the new resamples, > the observations from subject 1 (S1) should be shuffled within subject 1, not > mixing them with the observations from subjects 2, etc... I don't know how > "boot" is doing the resampling (I read the documentation but don't understand > how the function is doing it) > > Does anyone know how I could make sure that the resampling procedure used by > "boot" respects the subject level information? > > Thanks a lot for your help/advice! > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help
Well, it implies either 2help or help^2, right? In this case, it seems oddly appropriate, as really needing help getting packages is sort of greater in a "need hierarchy" than coding help, as the latter is largely useless without the former. At one point, I seem to recall discussion of whether there was use for a sort of "beginners" R help and a more "advanced" one (part of the discussion was that R-devel is kind of advanced), but perhaps there is implicit sorting, by virtue of subjects, yes, even one as simple as "Help" as that allows seasoned veterans to distinguish a new asker from someone who has been through the ropes here many times. Cheers, Josh On Thu, Jul 4, 2013 at 2:59 PM, Rolf Turner wrote: > > Please (for crying out loud!) use a meaningful subject line!!! This > list is called "r-help"!!! Using a subject line of "Help" is completely > vacuous. > > cheers, > > Rolf Turner > > > On 04/07/13 20:33, Aboagye-sarfo, Patrick wrote: >> >> I am trying to download packages and the message I get is as follows >> > > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode: how to avoid nested ifelse
I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es))) > system.time(replicate(1000, f1(cg, hs, es))) user system elapsed 22.730.03 22.76 > system.time(replicate(1000, f2(cg, hs, es))) user system elapsed 0.920.040.95 > system.time(replicate(1000, f3(cg, hs, es))) user system elapsed 0.190.020.20 > system.time(replicate(1000, f4(cg, hs, es))) user system elapsed 0.950.030.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz wrote: > I would do this to get the highest non-missing level: > > x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) > > rock chalk... > > -nfultz > > On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: >> Hi Paul, >> >> Unless you have truly offended the data generating oracle*, the >> pattern: NA, 1, NA, should be a data entry error --- graduating HS >> implies graduating ES, no? I would argue fringe cases like that >> should be corrected in the data, not through coding work arounds. >> Then you can just do: >> >> x <- do.call(paste0, list(es, hs, cg)) >> >> > table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", >> > "es","hs", "cg"))) >> none es hs cg >>4112 >> >> Cheers, >> >> Josh >> >> *Drawn from comments by Judea Pearl one lively session. >> >> >> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson wrote: >> > In our Summer Stats Institute, I was asked a question that amounts to >> > reversing the effect of the contrasts function (reconstruct an ordinal >> > predictor from a set of binary columns). The best I could think of was to >> > link together several ifelse functions, and I don't think I want to do this >> > if the example became any more complicated. >> > >> > I'm unable to remember a less error prone method :). But I expect you >> > might. >> > >> > Here's my working example code >> > >> > ## Paul Johnson >> > ## 2013-06-07 >> > >> > ## We need to create an ordinal factor from these indicators >> > ## completed elementary school >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1) >> > ## completed high school >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0) >> > ## completed college graduate >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0) >> > >> > ed <- ifelse(cg == 1, 3, >> > ifelse(hs == 1, 2, >> > ifelse(es == 1, 1, 0))) >> > >> > edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg")) >> > data.frame(es, hs, cg, ed, edf) >> > >> > ## Looks OK, but what if there are missings? >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) >> > ed <- ifelse(cg == 1, 3, >> > ifelse(hs == 1, 2, >> > ifelse(es == 1, 1, 0))) >> > cbind(es, hs, cg, ed) >> > >> > ## That's bad, ifelse returns NA too frequently. >> > ## Revise (becoming tedious!) >> > >> > ed <- ifelse(!is.na(cg) & cg == 1, 3, >> > ifelse(!is.na(hs) & hs == 1, 2, >> > ifelse(!is.na(es) & es == 1, 1, >> >ifelse(is.na(es), NA, 0 >> > cbind(es, hs, cg, ed) >> > >> > >> > ## Does the project director want us to worry about >> > ## logical inconsistencies, such as es = 0 but cg = 1? >> > ## I hope not. >> > >> > Thanks in advance, I hope you are having a nice summer. >> > >> > pj >> > >> > -- >> > Paul E. Johnson >> > Professor, Political Science Assoc. Director >>
Re: [R] recode: how to avoid nested ifelse
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x <- do.call(paste0, list(es, hs, cg)) > table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", > "es","hs", "cg"))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson wrote: > In our Summer Stats Institute, I was asked a question that amounts to > reversing the effect of the contrasts function (reconstruct an ordinal > predictor from a set of binary columns). The best I could think of was to > link together several ifelse functions, and I don't think I want to do this > if the example became any more complicated. > > I'm unable to remember a less error prone method :). But I expect you might. > > Here's my working example code > > ## Paul Johnson > ## 2013-06-07 > > ## We need to create an ordinal factor from these indicators > ## completed elementary school > es <- c(0, 0, 1, 0, 1, 0, 1, 1) > ## completed high school > hs <- c(0, 0, 1, 0, 1, 0, 1, 0) > ## completed college graduate > cg <- c(0, 0, 0, 0, 1, 0, 1, 0) > > ed <- ifelse(cg == 1, 3, > ifelse(hs == 1, 2, > ifelse(es == 1, 1, 0))) > > edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg")) > data.frame(es, hs, cg, ed, edf) > > ## Looks OK, but what if there are missings? > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) > ed <- ifelse(cg == 1, 3, > ifelse(hs == 1, 2, > ifelse(es == 1, 1, 0))) > cbind(es, hs, cg, ed) > > ## That's bad, ifelse returns NA too frequently. > ## Revise (becoming tedious!) > > ed <- ifelse(!is.na(cg) & cg == 1, 3, > ifelse(!is.na(hs) & hs == 1, 2, > ifelse(!is.na(es) & es == 1, 1, >ifelse(is.na(es), NA, 0 > cbind(es, hs, cg, ed) > > > ## Does the project director want us to worry about > ## logical inconsistencies, such as es = 0 but cg = 1? > ## I hope not. > > Thanks in advance, I hope you are having a nice summer. > > pj > > -- > Paul E. Johnson > Professor, Political Science Assoc. Director > 1541 Lilac Lane, Room 504 Center for Research Methods > University of Kansas University of Kansas > http://pj.freefaculty.org http://quant.ku.edu > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate simple function with pre-defined constants
On Thu, Jun 6, 2013 at 9:05 AM, William Dunlap wrote: > > I said the force was 'required' in the sense that without it > the function will fail to do what you want in some situations. With the Force on your side, functions always do what you want. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrapped 1-sided confidence intervals
Hi Janh, I do not believe that a "one sided" or "two sided" bootstrap makes any sense. It is just a resampling procedure that constructs an empirical distribution. If you wish to examine the point where 5% of the distribution falls in one tail instead of the ends of both tails being 5%, you could simply look at the 90% CI, which will have 5% below and 5% above. Cheers, Josh On Tue, May 7, 2013 at 8:21 PM, Janh Anni wrote: > Hello All, > > Does anyone know if thereâs a function for computing 1-sided confidence > intervals for bootstrapped statistics (mean, median, percentiles, > etc.)? Thanks > in advance > > Janh > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed Modeling in lme4
Hi Indrajit, In your first SAS code, change to type=un. cs imposes the (somewhat dubious) assumption that the variance of both the intercept and slope are equal. If you are using lme4, all random effects in a single block (e.g., (1 + month | batch) the 1 = intercept and month = random slope) will have an unstructured (or freely estimated) variance covariance matrix. Cheers, Josh On Mon, Apr 29, 2013 at 11:26 PM, Indrajit Sengupta < indrajitsg2...@gmail.com> wrote: > Hi All, > > I am trying to shift from running mixed models in SAS using PROC MIXED > to using lme4 package in R. In trying to match the coefficients of R > output to that of SAS output, I came across this problem. > > The dataset I am using is this one: > > > http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect034.htm > > If I run the following code: > > proc mixed data=rc method=ML covtest; > class Batch; > model Y = Month / s; > random Int Month / type=cs sub=Batch s; > run; > > The Fixed effect coefficients match with that of R. But the random > effect does not. Here is the R code: > > rc <- read.table('rc.csv', sep = ',', header=T, na.strings=".") > > m1 <- lmer(formula = Y ~ Month + (Month|Batch), data = rc, REML = F) > > summary(m1) > > fixef(m1) > > ranef(m1) > > But if I change the SAS code as follows: > > proc mixed data=rc method=ML covtest; > class Batch; > model Y = Month / s; > random Int / type=cs sub=Batch s; > run; > > and the R code as follows: > > m2 <- lmer(formula = Y ~ Month + (1|Batch), data = rc, REML = F) > > summary(m2) > > fixef(m2) > > ranef(m2) > > both fixed and random effect coefficients match. I am unable to > understand this discrepancy. Am I wrongly specifying the model in the > first case? > > It would be helpful if someone can throw some light on this. > > Regards, > Indrajit > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Polynomial Regression and NA coefficients in R
Hi Lucas, You may find some of these examples useful (towards the end): http://elkhartgroup.com/rmodels.php For example in your case you could be using b splines instead of an 11th order polynomial, or use thin plate regression splines from the mgcv package. I will also humbly suggest that ggplot2 overlaying observed values with predicted lines is a more elegant way to visualize the data and the results. Cheers, Josh On Sat, Apr 27, 2013 at 8:48 AM, Lucas Holland wrote: > Hey all, > > I'm performing polynomial regression. I'm simulating x values using > runif() and y values using a deterministic function of x and rnorm(). > > When I perform polynomial regression like this: > > fit_poly <- lm(y ~ poly(x,11,raw = TRUE)) > > I get some NA coefficients. I think this is due to the high correlation > between say x and x^2 if x is distributed uniformly on the unit interval > (as is the case in my example). However, I'm still able to plot a > polynomial fit like this: > > points(x, predict(fit_poly), type="l", col="green", lwd=2) > > What I'm interested in finding out is, how R handles the NA values I get > for some coefficients (and how that affects the polynomial I see plotted). > > Thanks! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double exponential regression R
Hi CR, Assuming your data are stored in "d", perhaps something like this: m <- nls(log(proc) ~ k + a*(b^cls), start = list(a = 2, b = .25, k = 0), data = d) summary(m) # gives ## Formula: log(proc) ~ k + a * (b^cls) ## Parameters: ## Estimate Std. Error t value Pr(>|t|) ## a 5.246170.50905 10.306 1.90e-09 *** ## b 0.696400.07718 9.023 1.73e-08 *** ## k 1.930550.42331 4.561 0.00019 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Residual standard error: 0.6967 on 20 degrees of freedom ## Number of iterations to convergence: 8 ## Achieved convergence tolerance: 4.268e-06 and you can examine the fit: plot(fitted(m), log(d$proc)) which looks not too bad (see attached) Cheers, Joshua On Sun, Apr 21, 2013 at 12:34 PM, catalin roibu wrote: > Hello all! > I have a problem with a double exponential equation. > This are my data's> structure(list(proc = c(1870.52067384719, > 766.789388745793, 358.701545859122, 237.113777545511, 43.2726259059654, > 148.985133316262, 92.6242882655781, 88.4521557193262, 56.6404686159112, > 27.0374477259404, 34.3347291080268, 18.3226992991316, 15.2196612445747, > 5.31600719692165, 16.7015717397302, 16.3923389973684, 24.2702542054496, > 21.247247993673, 18.3070717608672, 2.8811892177331, 3.18018869564679, > 8.74204132937479, 7.11596966047229 ), cls = c(0.25, 0.5, 0.75, 1, 1, 1.5, > 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10)), > .Names = c("proc", "cls"), row.names = c("0.25", "0.5", "0.75", "1", "11", > "1.5", "2", "2.5", "3", "3.5", "4", "4.5", "5", "5.5", "6", "6.5", "7", > "7.5", "8", "8.5", "9", "9.5", "10"), class = "data.frame") > I want to compute a double exponential equation like this: > proc=a*exp(b*class)+c*exp(d*class) > > Thank you very much for your help! > > Best regards! > > CR > -- > --- > Catalin-Constantin ROIBU > Lecturer PhD, Forestry engineer > Forestry Faculty of Suceava > Str. Universitatii no. 13, Suceava, 720229, Romania > office phone +4 0230 52 29 78, ext. 531 > mobile phone +4 0745 53 18 01 > +4 0766 71 76 58 > FAX:+4 0230 52 16 64 > silvic.usv.ro > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com <>__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pls help to prevent my post from being indexed on google
ec>0]) > DiffVec<-w5[,p]-w1[,p]FiveMinusOne[p]<-length(DiffVec[DiffVec>0]) > DiffVec<-w10[,p]-w5[,p]TenMinusFive[p]<-length(DiffVec[DiffVec>0])} > diff<-cbind(FiveMinusOne,TenMinusOne)diff<-cbind(diff, > TenMinusFive)sn<-seq(1, length(lamdaseq))f2<-cbind(sn, diff)f2 > ##END > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Full Documentation of R analysis
Hi Will, I think that save.image() is the way to go. Regarding the directory, its possible to pass arguments to an R script from the command line, or as long as the R script is in the directory you want the image saved in, just save it to the current working directory. Cheers, Josh On Thu, Apr 4, 2013 at 7:18 AM, wrote: > >Dear all, > >I would like to fully document an analysis in R including script, output > and >workspace data. To do this I currently run under Linux > >$ R CMD BATCH myscript.R > >This command combines and saves only the script commands and the output >./myscript.Rout. To save the workspace, I have to explicitly specify the >path in the R file and apply save.image(...). > >How can I save the full workspace with the same naming scheme under the > same >path, e.g. ./myscript.RData similar to ./myscript.Rout? > >Note: R CMD BATCH myscript.R --save does not work. > >Best regards and thanks in advance, > >Will > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM model with interactions between continuous variables and factors
Yep that's exactly right! :) On Mon, Mar 25, 2013 at 6:22 PM, Antonio P. Ramos wrote: > Just to clarify: I should include wealth - the categorical variable - as a > fixed effects *and* within the smooth using the argument "by". It that > correct? thanks a bunch > > > On Mon, Mar 25, 2013 at 6:18 PM, Joshua Wiley > wrote: >> >> Hi Antonio, >> >> If wealth is a factor variable, you should include the main effect in >> the model, as the smooths will be centered. >> >> Cheers, >> >> Josh >> >> >> >> On Mon, Mar 25, 2013 at 6:09 PM, Antonio P. Ramos >> wrote: >> > Hi all, >> > >> > I am not sure how to handle interactions with categorical predictors in >> > the >> > GAM models. For example what is the different between these bellow two >> > models. Tests are indicating that they are different but their >> > predictions >> > are essentially the same. >> > >> > Thanks a bunch, >> > >> >> gam.1 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+ >> > +s(birth_year,by=wealth) + >> > ++ wealth + sex + >> > +residence+ maternal_educ + birth_order, >> > + ,data=rwanda2,family="binomial") >> >> >> >> gam.2 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+ >> > +s(birth_year,by=wealth) + >> > + + sex + >> > +residence+ maternal_educ + birth_order, >> > + ,data=rwanda2,family="binomial") >> >> >> >> anova(gam.1,gam.2,test="Chi") >> > Analysis of Deviance Table >> > >> > Model 1: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) + >> > s(birth_year, >> > by = wealth) + +wealth + sex + residence + maternal_educ + >> > birth_order >> > Model 2: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) + >> > s(birth_year, >> > by = wealth) + +sex + residence + maternal_educ + birth_order >> > Resid. Df Resid. Dev Df Deviance Pr(>Chi) >> > 1 28986 24175 >> > 2 28989 24196 -3.6952 -21.378 0.0001938 *** >> > --- >> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> >> str(rwanda2) >> > 'data.frame': 29027 obs. of 18 variables: >> > $ CASEID: Factor w/ 10718 levels "1 5 2",..: 289 >> > 2243 7475 9982 6689 10137 7426 428 8415 10426 ... >> > $ mortality.under.2 : int 0 1 0 0 0 0 0 0 1 0 ... >> > $ maternal_age_disct: Factor w/ 3 levels "-25","+35","25-35": 1 1 1 1 1 >> > 1 >> > 3 1 3 1 ... >> > $ maternal_age : int 18 21 21 23 21 22 26 18 27 21 ... >> > $ time : int 3 3 3 3 3 3 3 3 3 3 ... >> > $ child_mortality : num 0.232 0.232 0.232 0.232 0.232 ... >> > $ democracy : Factor w/ 1 level "dictatorship": 1 1 1 1 1 1 1 1 >> > 1 >> > 1 ... >> > $ wealth: Factor w/ 5 levels "Lowest quintile",..: 2 4 1 4 >> > 5 1 >> > 4 1 4 5 ... >> > $ birth_year: int 1970 1970 1970 1970 1970 1970 1970 1970 1970 >> > 1970 ... >> > $ residence : Factor w/ 2 levels "Rural","Urban": 1 1 1 1 2 1 1 >> > 1 >> > 1 2 ... >> > $ birth_order : int 1 2 2 5 1 1 3 1 2 2 ... >> > $ maternal_educ : Factor w/ 4 levels "Higher","No education",..: 3 >> > 2 2 >> > 3 4 2 3 2 2 2 ... >> > $ sex : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 1 2 >> > 2 >> > 2 2 ... >> > $ quinquennium : Factor w/ 7 levels "00-5's","70-4",..: 2 2 2 2 2 >> > 2 2 >> > 2 2 2 ... >> > $ time.1: int 3 3 3 3 3 3 3 3 3 3 ... >> > $ new_time : int 0 0 0 0 0 0 0 0 0 0 ... >> > $ maternal_age_c: num -6.12 -3.12 -3.12 -1.12 -3.12 ... >> > $ birth_year_c : num -14.8 -14.8 -14.8 -14.8 -14.8 ... >> > >> > [[alternative HTML version deleted]] >> > >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> University of California, Los Angeles >> http://joshuawiley.com/ >> Senior Analyst - Elkhart Group Ltd. >> http://elkhartgroup.com > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM model with interactions between continuous variables and factors
Hi Antonio, If wealth is a factor variable, you should include the main effect in the model, as the smooths will be centered. Cheers, Josh On Mon, Mar 25, 2013 at 6:09 PM, Antonio P. Ramos wrote: > Hi all, > > I am not sure how to handle interactions with categorical predictors in the > GAM models. For example what is the different between these bellow two > models. Tests are indicating that they are different but their predictions > are essentially the same. > > Thanks a bunch, > >> gam.1 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+ > +s(birth_year,by=wealth) + > ++ wealth + sex + > +residence+ maternal_educ + birth_order, > + ,data=rwanda2,family="binomial") >> >> gam.2 <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+ > +s(birth_year,by=wealth) + > + + sex + > +residence+ maternal_educ + birth_order, > + ,data=rwanda2,family="binomial") >> >> anova(gam.1,gam.2,test="Chi") > Analysis of Deviance Table > > Model 1: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) + > s(birth_year, > by = wealth) + +wealth + sex + residence + maternal_educ + > birth_order > Model 2: mortality.under.2 ~ maternal_age_c + I(maternal_age_c^2) + > s(birth_year, > by = wealth) + +sex + residence + maternal_educ + birth_order > Resid. Df Resid. Dev Df Deviance Pr(>Chi) > 1 28986 24175 > 2 28989 24196 -3.6952 -21.378 0.0001938 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> str(rwanda2) > 'data.frame': 29027 obs. of 18 variables: > $ CASEID: Factor w/ 10718 levels "1 5 2",..: 289 > 2243 7475 9982 6689 10137 7426 428 8415 10426 ... > $ mortality.under.2 : int 0 1 0 0 0 0 0 0 1 0 ... > $ maternal_age_disct: Factor w/ 3 levels "-25","+35","25-35": 1 1 1 1 1 1 > 3 1 3 1 ... > $ maternal_age : int 18 21 21 23 21 22 26 18 27 21 ... > $ time : int 3 3 3 3 3 3 3 3 3 3 ... > $ child_mortality : num 0.232 0.232 0.232 0.232 0.232 ... > $ democracy : Factor w/ 1 level "dictatorship": 1 1 1 1 1 1 1 1 1 > 1 ... > $ wealth: Factor w/ 5 levels "Lowest quintile",..: 2 4 1 4 5 1 > 4 1 4 5 ... > $ birth_year: int 1970 1970 1970 1970 1970 1970 1970 1970 1970 > 1970 ... > $ residence : Factor w/ 2 levels "Rural","Urban": 1 1 1 1 2 1 1 1 > 1 2 ... > $ birth_order : int 1 2 2 5 1 1 3 1 2 2 ... > $ maternal_educ : Factor w/ 4 levels "Higher","No education",..: 3 2 2 > 3 4 2 3 2 2 2 ... > $ sex : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 1 2 2 > 2 2 ... > $ quinquennium : Factor w/ 7 levels "00-5's","70-4",..: 2 2 2 2 2 2 2 > 2 2 2 ... > $ time.1: int 3 3 3 3 3 3 3 3 3 3 ... > $ new_time : int 0 0 0 0 0 0 0 0 0 0 ... > $ maternal_age_c: num -6.12 -3.12 -3.12 -1.12 -3.12 ... > $ birth_year_c : num -14.8 -14.8 -14.8 -14.8 -14.8 ... > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About name of list elements
Hi Max, This is known as fuzzy matching. When using `$`, if R can uniquely match the element name based on what is typed, it returns it. Thus, in your example, foo uniquely matches foobar, but if you had foobar, foobox, $foo would not be a unique match. Cheers, Josh On Mon, Mar 25, 2013 at 12:21 PM, Andrew Lin wrote: > Hi folks, > > I am starter for R. While I tried list as following: > >> l <- list() >> l$foo > NULL >> l$foobar <- 1 >> l$foo > [1] 1 > > Apparently, foo and foobar are different name for elements in list (actually > foo does not exist). But why they are sharing same value? > > Thanks a lot! > > Max > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identifying and drawing from T distribution
Hi Ivo, Try something like this: rt(1e5, df = 2.6, ncp = (1 - 0) * sqrt(2.6 + 1)/2) The NCP comes from the mean, N, and SD. See ?rt Cheers, Josh On Fri, Mar 15, 2013 at 6:58 PM, ivo welch wrote: > dear R experts: > > fitdistr suggests that a t with a mean of 1, an sd of 2, and 2.6 > degrees of freedom is a good fit for my data. > > now I want to draw random samples from this distribution.should I > draw from a uniform distribution and use the distribution function > itself for the transform, or is there a better way to do this? there > is a non-centrality parameter ncp in rt, but one parameter ncp cannot > subsume two (m and s), of course. my first attempt was to draw > rt(..., df=2.63)*s+m, but this was obviously not it. > > advice appreciated. > > /iaw > > > Ivo Welch (ivo.we...@gmail.com) > http://www.ivo-welch.info/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filling an array
Hi Jannetta, Try this: rep(c(0, v), each = 10) See ?rep for details Cheers, Josh On Sat, Feb 23, 2013 at 7:56 PM, Jannetta Steyn wrote: > Hi All > > I'm just wondering if there is a quick way of filling a way with the > following. > > I want to declare array I with a specific length and then alternatively > fill it with 10 zeros and 10 specified values: > > v<- 14 > I <- c(0,length(t)) > > But in stead of just filling I with 0 I want 10 zeros and then 10 fourteens > (i.e. the value in v) > > Hope that makes sense. > > Regards > Jannetta > > -- > > === > Web site: http://www.jannetta.com > Email: janne...@henning.org > === > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with R CMD check --as-cran
Hi Tao, I wonder if you are on Windows? If so, make sure you have qpdf installed, and the location is in your path. Cheers, Josh On Fri, Feb 22, 2013 at 3:32 PM, Tao Wang wrote: > Hi Everyone! > > This is my first time using R-help. I am trying to do R CMD check before > uploading my package to CRAN. > > R CMD check --as-cran "my package folder". > > However, it spits out this warning: > > "pdf is needed for checks on size reduction of PDFs" > > I searched online but found no clue to solve this problem. Can someone tell > me what could be wrong with my package? > > Thanks a lot! > > Tao > > > > > UT Southwestern Medical Center > The future of medicine, today. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] foreach loop, stata equivalent
; > >> > reg `var' `y' >> > >> > } >> > >> > } >> > >> > >> > >> > It's looping p1-p15, p1-p16, p1-p269, p2-p15, p2-p16,... p2-p269,... >> > variable pairs. >> > >> > >> > >> > How can I write something similar in R? >> > >> > I 'tried' understanding the package.foreach but can't get it to work. >> You do not need package foreach, which is intended at a completely different >> problem. >> >> R does not really have the syntactic equivalent of "varlist", but you can >> easily do something like: >> for(var in paste0("p", 1:14)) { >> for(y in paste0("p", 15:269)) >> lm(yourData[[var]] ~ yourData[[y]]) } >> >> provided that yourData is the data frame in which the p* variables are >> stored. >> >> There are probably more direct ways of doing the same thing and storing the >> resulting lm objects in a list, but you did not state what you intend to do >> with this enormous set of regressions... >> >> >> Regards >> >> > Thanks for any help >> > >> > Nelissa >> > >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] improving/speeding up a very large, slow simulation
n, project.sd.max, > length.out=length.out) # assume verification sd is the same as the project > sd to simplify - seems a reasonable simpification > > number.verification.plots<- seq(number.verification.plots.min, > number.verification.plots.max, length.out=length.out) > > verification.range <- seq(verification.mean.min, verification.mean.max, > length.out=length.out) > > permut.grid<-expand.grid(number.strata.range, project.n.range, > project.acreage.range, project.mean, project.sd.range, > number.verification.plots, verification.range, allowed.deviation) # create > a matrix with all combinations of the supplied vectors > > #assign names to the colums of the grid of combinations > names.permut<-c("number.strata", "project.n.plots", "project.acreage", > "project.mean", "project.sd", "number.verification.plots", > "verification.mean", "allowed.deviation") > > names(permut.grid)<-names.permut # done > > combinations<-length(permut.grid[,1]) > > size <-reps*combinations #need to know the size of the master matrix, which > is the the number of replications * each combination of the supplied factors > > # we want a df from which to read all the data into the simulation, and > record the results > permut.col<-ncol(permut.grid) > col.base<-ncol(permut.grid)+2 > base <- matrix(nrow=size, ncol=col.base) > base <-data.frame(base) > > # supply the names > names.base<-c("number.strata", "project.n.plots", "project.acreage", > "project.mean", "project.sd", "number.verification.plots", > "verification.mean", "allowed.deviation","success.fail", "plots.to.success") > > names(base)<-names.base > > # need to create index vectors for the base, master df > ends <- seq(reps+1, size+1, by=reps) > begins <- ends-reps > index <- cbind(begins, ends-1) > #done > > # next, need to assign the first 6 columns and number of rows = to the > number of reps in the simulation to be the given row in the permut.grid > matrix > > pb <- winProgressBar(title="Create base, big loop 1 of 2", label="0% done", > min=0, max=100, initial=0) > > for (i in 1:combinations) { > > base[index[i,1]:index[i,2],1:permut.col] <- permut.grid[i,] > #progress bar > info <- sprintf("%d%% done", round((i/combinations)*100)) > setWinProgressBar(pb, (i/combinations)*100, label=info) > } > > close(pb) > > # now, simply feed the values replicated the number of times we want to run > the simulation into the sequential.unpaired function, and assign the values > to the appropriate columns > > out.index1<-ncol(permut.grid)+1 > out.index2<-ncol(permut.grid)+2 > > #progress bar > pb <- winProgressBar(title="fill values, big loop 2 of 2", label="0% done", > min=0, max=100, initial=0) > > for (i in 1:size){ > > scalar.base <- base[i,] > verification.plots <- rnorm(scalar.base$number.verification.plots, > scalar.base$verification.mean, scalar.base$project.sd) > result<- sequential.unpaired(scalar.base$number.strata, > scalar.base$project.n.plots, scalar.base$project.mean, scalar.base$ > project.sd, verification.plots, scalar.base$allowed.deviation, > scalar.base$project.acreage, min.plots='default', alpha) > > base[i,out.index1] <- result[[6]][1] > base[i,out.index2] <- result[[7]][1] > info <- sprintf("%d%% done", round((i/size)*100)) > setWinProgressBar(pb, (i/size)*100, label=info) > } > > close(pb) > #return the results > return(base) > > } > > # I would like reps to = 1000, but that takes a really long time right now > test5 <- simulation.unpaired(reps=5, project.acreage.range = c(99, > 110,510,5100,11000), project.mean=100, project.n.min=10, project.n.max=100, > project.sd.min=.01, project.sd.max=.2, verification.mean.min=100, > verification.mean.max=130, number.verification.plots.min=10, > number.verification.plots.max=50, length.out = 5) > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Migrating R packages from svn/R-Forge to git/Github
Hi Michael, Just to add to what Yihui suggested, I have used Cygwin on all my Windows machines to create a Linux environment for years. It is stable and allows for many features. You can easily get git, svn, and ruby --- I'm not sure about rubygems. Anyway, with a bit of work, I suspect you could use Cygwin to get an environment on Windows that would allow you to run svn2git to handle the conversions. If you're willing to have a bit of a mess or drop some history, git svn will work nicely too. http://www.cygwin.com/ Cheers, Josh On Sun, Feb 10, 2013 at 10:22 AM, Yihui Xie wrote: > In the past Github allows one to import from an existing SVN > repository automatically via its web interface, but I just checked it > and it seems to have gone. What is left is: > https://help.github.com/articles/importing-from-subversion Perhaps it > is best for you to do the conversion under Linux and then work under > Windows. > > Regards, > Yihui > -- > Yihui Xie > Phone: 515-294-2465 Web: http://yihui.name > Department of Statistics, Iowa State University > 2215 Snedecor Hall, Ames, IA > > > On Sun, Feb 10, 2013 at 12:02 PM, Michael Friendly wrote: >> [I'm not sure if this post should go to R-devel, but thought I'd start here >> for wider readership.] >> >> I have a collection of R packages that I maintain using the eclipse/StatET >> IDE >> (mainly on Windows) with SVN on the R-Forge system, >> https://r-forge.r-project.org/users/friendly/ >> >> I'd like to consider moving these to git projects on GitHub, but I'm not >> sure how best to get started. I'm not sure if can can clone/copy >> a project from the R-Forge svn directly to github, or if I have to first >> create a local git repo from the R-Forge svn and then push to github >> from there. >> >> However, there are linux tools for this (git-svn and svn2git), but I haven't >> found equivalents >> for Windows. I suppose I could do this initial part on my linux server >> where I now use git and >> github for other projects. >> >> I'm also contemplating moving from eclipse/StatET to RStudio for >> development, but that would be >> down the road a bit. >> >> Has anyone made this transition or have suggestions for how to ease it? >> What are the >> gotchas to watch out for? >> >> -Michael >> >> >> >> -- Michael Friendly >> Email: friendly AT yorku DOT ca >> Professor, Psychology Dept. & Chair, Quantitative Methods York University >> Voice: 416 736-2100 x66249 >> Web: http://www.datavis.ca >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] decimal places in R2HTML
Realized I did not reply to this list. On Sat, Jan 26, 2013 at 7:54 PM, Joshua Wiley wrote: > Hi Erin, > > Most packages creating output for inclusion in pages, reports, books, > etc. do some rounding as R's default level of printed precision tends > to be overkill. R2HTML is no different, but you can control this. To > see what it currently is: > > getOption("R2HTML.format.digits") > > and you can set it via: > > options("R2HTML.format.digits" = 4) > > > after which, I get: > >> HTML(confint(aov(Sepal.Length ~ Species, iris)), file=stdout()) > > > class=captiondataframe> > > >2.5 %97.5 % > > (Intercept) > 4.8621 > 5.1499 > > Speciesversicolor > 0.7265 > 1.1335 > > Speciesvirginica > 1.3785 > 1.7855 > > > > > > > Cheers, > > Josh > > > On Sat, Jan 26, 2013 at 7:36 PM, Erin Hodgess wrote: >> Dear R People: >> >> I have an AOV model that I get confidence intervals from via >> >>> confint(chick1.aov1) >> 2.5 % 97.5 % >> trtA 1.472085 1.607915 >> trtB 1.512085 1.647915 >> trtC 1.328751 1.464582 >>> >> >> I am using R2HTML to produce HTML output. However, the HTML code >> itself just has rounded values, i.e., 1.5 and 1.6. >> >> Has anyone run across this, please? >> Any suggestions would be much appreciated. >> Sincerely, >> Erin >> >> >> -- >> Erin Hodgess >> Associate Professor >> Department of Computer and Mathematical Sciences >> University of Houston - Downtown >> mailto: erinm.hodg...@gmail.com >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SEM validation: Cross-Validation vs. Bootstrapping
r life may become quite painful. You will then need to combine the results from multiply imputed datasets as well as from the bootstrap. I know of no canned tools for this so you'll be doing a lot on your own. I would suggest instead that you use maximum likelihood estimation where the likelihood function allows missingness (sometimes called full information maximum likelihood). Both the lavaan and OpenMx package can handle this. If you have additional variables you think may predict missingness, you should condition on those by adding them into the model as auxillary variables. Complicates the assessment of fit somewhat as they contribute but you do not care about their contribution so you have to parcel it out, but it is a good way to deal with missingness. Craig Enders has a relatively nice book on missing data approaches that includes SEM. If you stay in Mplus, it can natively bootstrap most things, if it cannot I have written R functions to help it out. I have been writing a mediation tutorial that is available here: http://joshuawiley.com/statistics/medanal.html You can find the source code for that page here: https://github.com/jwiley as well as the source for the semutils package which contains many of the interface functions to link R to Mplus. I should point out that semutils is not on CRAN and will not be, as I have teamed up with Michael Hallquist who wrote the MplusAutomation package and will be folding the additional functionality of semutils into MplusAutomation. However, that may take some time, so in the meantime, source is available from github. Cheers, Josh > > Thanks, > > Paul > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change lm log(x) to glm poisson
Hi Elaine, If you want identical models, you need to use the same family and then the formula is the same. Here is an example with a built in dataset: ## these two are identical > coef(lm(mpg ~ hp + log(wt), data = mtcars)) (Intercept) hp log(wt) 38.86095585 -0.02808968 -13.06001270 > coef(glm(mpg ~ hp + log(wt), data = mtcars, family = gaussian)) (Intercept) hp log(wt) 38.86095585 -0.02808968 -13.06001270 ## not identical > coef(glm(mpg ~ hp + wt, data = mtcars, family = gaussian(link = "log"))) (Intercept) hp wt 3.88335638 -0.00173717 -0.20851238 I show the log link because the poisson family default to a log link, but that is equivalent to: log(E(y)) = Xb where X is your design matrix (intercept, A, B, log(C), log(D) for you). In short the link function operates on the outcome, not the predictors so even though the poisson family includes a log link, it will not yield the same results as a log transformation of two of your predictors. I do not have any online references off the top of my head, but it seems like you may be well served by reading some about generalized linear models and the concept of link functions. Cheers, Josh On Sun, Oct 28, 2012 at 8:01 PM, Elaine Kuo wrote: > > Hello list, > > I am running a regression using > > lm(Y~A+B+log(C)+log(D)) > > > Now, I would like to test if glm can produce similar results. > So the code was revised as > > glm(Y~A+B+C+D, family=poisson) (code 1) > > > However, I found some example using glm for lm. > It suggests that the code should be revised like > glm(Y~A+B+log(C)+log(D), family=poisson) (code 2) > > Please kindly advise which code is correct. > Thank you. > > Elaine > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test in R
Hi, I can think of no reason a Hausman test could not be used for OLS---it is a comparison of vectors of coefficients from different models usually assumed to produce similar estimates under certain conditions. Dissimilarity is taken as indicative of a lack of some or all the conditions required for the two models to yield similar parameters. I suggest you look at the plm and systemfit packages. They have many functions for OLS, 2SLS, tests of endogeneity, etc. The plm (and maybe systemfit?) package also has a vignette which is a good thing to read. It has a lot of useful information on the code and examples of comparing different types of models, that you may find instructive. Hope this helps, Josh On Sun, Oct 28, 2012 at 1:33 PM, fxen3k wrote: > Hi there, > > I am really new to statistics in R and statistics itself as well. > My situation: I ran a lot of OLS regressions with different independent > variables. (using the lm() function). > After having done that, I know there is endogeneity due to omitted > variables. (or perhaps due to any other reasons). > And here comes the Hausman test. I know this test is used to identify > endogeneity. > But what I am not sure about is: "Can I use the Hausman test in a simple > OLS > regression or is it only possible in a 2SLS regression model?" "And if it > is > possible to use it, how can I do it?" > > Info about the data: > > data = lots of data :) > > x1 <- data$x1 > x2 <- data$x2 > x3 <- data$x3 > x4 <- data$x4 > y1 <- data$y1 > > reg1 <- summary(lm(y1 ~ x1 + x2 + x3 + x4)) > > Thanks in advance for any support! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poisson Regression: questions about tests of assumptions
Hi Eiko, This is not a direct response to your question, but I thought you might find these pages helpful: On negative binomial: http://www.ats.ucla.edu/stat/R/dae/nbreg.htm and zero inflated poisson: http://www.ats.ucla.edu/stat/R/dae/zipoisson.htm In general this page lists a variety of different models with examples: http://www.ats.ucla.edu/stat/dae/ I wrote the code for most of those pages. Some of your questions seema little more specific than would be addressed on those general introductory pages, but if there are particular things you would like to see done, I am open to hearing about it. One thing I do want to mention is that for publication or presentation purposes, graphs of predicted counts can be quite nice for readers---I tried to show how to do that in all of those pages, and you can even use bootstrapping (which I gave examples for at least in the zero inflated models) to get robust confidence intervals. Cheers, Josh On Sun, Oct 14, 2012 at 3:14 PM, Eiko Fried wrote: > Thank you for the detailed answer, that was really helpful. > I did some excessive reading and calculating in the last hours since your > reply, and have a few (hopefully much more informed) follow up questions. > > > 1) In the vignette("countreg", package = "pscl"), LLH, AIC and BIC values > are listed for the models Negative-binomial (NB), Zero-Inflated (ZI), ZI > NB, Hurdle NB, and Poisson (Standard). And although I found a way to > determine LLH and DF for all models, BIC & AIC values are not displayed by > default, neither using the code given in the vignette. How do I calculate > these? (AIC is given as per default only in 2 models, BIC in none). > > > 2) For the zero-inflated models, the first block of count model > coefficients is only in the output in order to compare the changes, > correct? That is, in my results in a paper I would only report the second > block of (zero-inflation) model coefficients? Or do I misunderstand > something? I am confused because in their large table to compare > coefficients, they primarily compare the first block of coefficients (p. 18) > R> fm <- list("ML-Pois" = fm_pois, "Quasi-Pois" = fm_qpois, "NB" = fm_nbin, > + "Hurdle-NB" = fm_hurdle, "ZINB" = fm_zinb) > R> sapply(fm, function(x) coef(x)[1:8]) > > > 3) > >> There are various formal tests for this, e.g., dispersiontest() in package >> "AER". >> > > I have to run 9 models - I am testing the influence of several predictors > on different individual symptoms of a mental disorder, as "counted" in the > last week (0=never in the last week, to 3 = on all day within the last > week). > So I'm regressing the same predictors onto 9 different symptoms in 9 > models. > > Dispersiontest() for these 9 models result in 3-4 overdispersed models > (depending if testing one- or two-sided on p=.05 level), 2 underdispersed > models, and 4 non-significant models. The by far largest dispersion value > is 2.1 in a model is not overdispersed according to the test, but that's > the symptom with 80% zeros, maybe that plays a role here. > > Does BN also make sense in underdispersed models? > > However, overdispersion can already matter before this is detected by a >> significance test. Hence, if in doubt, I would simply use an NB model and >> you're on the safe side. And if the NB's estimated theta parameter turns >> out to be extremely large (say beyond 20 or 30), then you can still switch >> back to Poisson if you want. > > > Out of the 9 models, the 3 overdispersed NB models "worked" with theta > estimates between 0.4 and 8.6. The results look just fine, so I guess NB is > appropriate here. > > 4 other models (including the 2 underdispersed ones) gave warnings (theta > iteration / alternation limit reached), and although the other values look > ok (estimates, LLH, AIC), theta estimates are between 1.000 and 11.000. > Could you recommend me what to do here? > > Thanks > T > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] practical to loop over 2million rows?
Hi Jay, A few comments. 1) As you know, vectorize when possible. Even if you must have a loop, perhaps you can avoid nested loops or at least speed each iteration. 2) Write your loop in a function and then byte compile it using the cmpfun() function from the compiler package. This can help dramatically (though still not to the extent of vectorization). 3) If you really need to speed up some aspect and are stuck with a loop, checkout the R + Rcpp + inline + C++ tool chain, which allows you to write inline C++ code, compile it fairly easily, and move data to and from it. Here is an example of a question I answered on SO where the OP had an algorithm to implement in R and I ran through with the R implemention, the compiled R implementation, and one using Rcpp and compare timings. It should give you a bit of a sense for what you are dealing with at least. You are correct that some things can help speed in R loops, such as preallocation, and also depending what you are doing, some classes are faster than others. If you are working with a vector of integers, don't store them as doubles in a data frame (that is a silly extreme, but hopefully you get the point). Good luck, Josh On Wed, Oct 10, 2012 at 1:31 PM, Jay Rice wrote: > New to R and having issues with loops. I am aware that I should use > vectorization whenever possible and use the apply functions, however, > sometimes a loop seems necessary. > > I have a data set of 2 million rows and have tried run a couple of loops of > varying complexity to test efficiency. If I do a very simple loop such as > add every item in a column I get an answer quickly. > > If I use a nested ifelse statement in a loop it takes me 13 minutes to get > an answer on just 50,000 rows. I am aware of a few methods to speed up > loops. Preallocating memory space and compute as much outside of the loop > as possible (or use create functions and just loop over the function) but > it seems that even with these speed ups I might have too much data to run > loops. Here is the loop I ran that took 13 minutes. I realize I can > accomplish the same goal using vectorization (and in fact did so). > > y<-numeric(length(x)) > > for(i in 1:length(x)) > > ifelse(!is.na(x[i]), y[i]<-x[i], > > ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1])) > > Presumably, complicated loops would be more intensive than the nested if > statement above. If I write more efficient loops time will come down but I > wonder if I will ever be able to write efficient enough code to perform a > complicated loop over 2 million rows in a reasonable time. > > Is it useless for me to try to do any complicated loops on 2 million rows, > or if I get much better at programming in R will it be manageable even for > complicated situations? > > > Jay > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any R function for data normalization?
Hi Rui, It doesn't really need one... doit <- function(x) {(x - min(x, na.rm=TRUE))/(max(x,na.rm=TRUE) - min(x, na.rm=TRUE))} # use lapply to apply doit() to every column in a data frame # mtcars is built into R normed <- as.data.frame(lapply(mtcars, doit)) # very that the range of all is [0, 1] lapply(normed, range) Cheers, Josh On Tue, Oct 2, 2012 at 2:51 AM, Rui Esteves wrote: > Hello, > > I have a matrix with values, with columns c1..cn. > I need the values to be normalized between 0 and 1 by column. > Therefor, the 0 should correspond to the minimum value in the column c1 and > 1 should correspond to the maximum value in the column c1. > The remaining columns should be organized in the same way. > > Does a function in R exists for this purpose? > > Thanks, > Rui > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior with weights in binomial glm()
the same coefficients (maybe >>> this is my incorrect assumption, but I can't see why it would be). >>> Anyways, here's a minimum working example below: >>> >>>> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) >>>> glm( RESP ~ INDEP, family="binomial", data=d ) >>> >>> Call: glm(formula = RESP ~ INDEP, family = "binomial", data = d) >>> >>> Coefficients: >>> (Intercept)INDEP >>> -1.609 21.176 >>> >>> Degrees of Freedom: 9 Total (i.e. Null); 8 Residual >>> Null Deviance: 13.86 >>> Residual Deviance: 5.407AIC: 9.407 >>>> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) >>>> colnames(dAgg) = c("RESP","INDEP","WT") >>>> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT ) >>> >>> Call: glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg, >>>weights = WT) >>> >>> Coefficients: >>> (Intercept)INDEP >>> -1.609 20.975 >>> >>> Degrees of Freedom: 2 Total (i.e. Null); 1 Residual >>> Null Deviance: 13.86 >>> Residual Deviance: 5.407AIC: 9.407 >> >> Those two results look very similar and it is with a data situation that >> seems somewhat extreme. The concern is for the 1% numerical difference in >> the regression coefficient? Am I reading you correctly? >> >> -- >> David Winsemius, MD >> Alameda, CA, USA >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bonferroni correction for multiple correlation tests
On Wed, Aug 29, 2012 at 6:48 PM, R. Michael Weylandt wrote: > On Wed, Aug 29, 2012 at 6:23 PM, Louise Cowpertwait > wrote: >> Please can someone advise me how I can adjust correlations using >> bonferroni's correction? I am doing manny correlation tests as part of an >> investigation of the validity/reliability of a psychometric measure. >> Help would be so appreciated! >> Cheers, >> Louise >> > > The observed correlation is an immutable property of the observed data > and the Bonferroni correction does not change it. Rather, it should be > applied to the p-values of the observed correlations, much as it would > be for any test. Those more statistically savy than I might jump in, > but I don't see why the p-values of, e.g., cor.test() would be > adjusted in a different way than those of t.test(). I am happy to be corrected, but under specific situations, I can see an alternative correction method being appropriate. For p variables, the p x p correlation matrix has p * (p - 1) / 2 unique correlations, however, once you know about some of the correlations, you actually have some information about the other correlations. Imagine the situation where p = 3 and cor(p1, p2) = .9, cor(p2, p3) = 0. Is cor(p1, p3) free to be any possible correlation? The answer of course is no. I am not sure what the exact rule would be, but this would hold and increase for larger matrices. > Consider a similar case for a set of t-tests: you see some data and do > the tests based on the sample means. It doesn't make any sense to > "adjust the mean" of your data, rather you might wish to adjust your > _interpretation_ of calculated p-values to account for multiple > comparisons. > > Cheers, > Michael > >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define subset argument for function lm as variable?
On Wed, Aug 29, 2012 at 3:56 AM, Milan Bouchet-Valat wrote: > Le mardi 21 août 2012 à 07:51 -0700, Joshua Wiley a écrit : >> Hi Rainer, >> >> You could try: >> >> subs <- expression(dead==FALSE & recTreat==FALSE) >> >> lme(formula, subset = eval(subs)) >> >> Not tested, but something along those lines should work. > Out of curiosity, why isn't "subset" (and "weights", which is very > similar in that regard) evaluated in the "data" environment, just like > the formula? Is this for historical reasons, or are there drawbacks to > such a feature? I am not sure about weights offhand, but subset is evaluated in the data environmentthat is why that solution works. The original question was how to setup the expression as an object that was passed to subset. The trick is to avoid having the logical expression evaluated when the object is created, which I avoided by using expression, and then in lme() forcing the evaluation of the object. > > It seems very common to pass a data frame via the "data" argument, and > use variables from it for subsetting and/or weighting. > > > Regards > > >> Josh >> >> On Tue, Aug 21, 2012 at 7:44 AM, Rainer M Krug wrote: >> > Hi >> > >> > I want to do a series of linear models, and would like to define the input >> > arguments for lm() as variables. I managed easily to define the formula >> > arguments in a variable, but I also would like to have the "subset" in a >> > variable. My reasoning is, that I have the subset in the results object. >> > >> > So I wiould like to add a line like: >> > >> > subs <- dead==FALSE & recTreat==FALSE >> > >> > which obviously does not work as the expression is evaluated immediately. >> > Is >> > is it possible to do what I want to do here, or do I have to go back to use >> > >> > dat <- subset(dat, dead==FALSE & recTreat==FALSE) >> > >> > ? >> > >> > >> > >> > dat <- loadSPECIES(SPECIES) >> > feff <- height~pHarv*year # fixed effect in the model >> > reff <- ~year|plant # random effect in the model, where >> > year is the >> > dat.lme <- lme( >> > fixed = feff, # fixed effect in the >> > model >> > data = dat, >> > random = reff, # random effect in the >> > model >> > correlation = corAR1(form=~year|plant), # >> > subset = dead==FALSE & recTreat==FALSE, # >> > na.action = na.omit >> > ) >> > dat.lm <- lm( >> > formula = feff, # fixed effect in the model >> > data = dat, >> > subset = dead==FALSE & recTreat==FALSE, >> > na.action = na.omit >> > ) >> > >> > Thanks, >> > >> > Rainer >> > >> > -- >> > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, >> > UCT), Dipl. Phys. (Germany) >> > >> > Centre of Excellence for Invasion Biology >> > Stellenbosch University >> > South Africa >> > >> > Tel : +33 - (0)9 53 10 27 44 >> > Cell: +33 - (0)6 85 62 59 98 >> > Fax : +33 - (0)9 58 10 27 44 >> > >> > Fax (D):+49 - (0)3 21 21 25 22 44 >> > >> > email: rai...@krugs.de >> > >> > Skype: RMkrug >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define subset argument for function lm as variable?
data = dat, >>>>> subset = dead==FALSE & recTreat==FALSE, >>>>> na.action = na.omit >>>>> ) >>>>> >>>>> Thanks, >>>>> >>>>> Rainer >>>>> >>>>> -- >>>>> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation >>>>> Biology, >>>>> UCT), Dipl. Phys. (Germany) >>>>> >>>>> Centre of Excellence for Invasion Biology >>>>> Stellenbosch University >>>>> South Africa >>>>> >>>>> Tel : +33 - (0)9 53 10 27 44 >>>>> Cell: +33 - (0)6 85 62 59 98 >>>>> Fax : +33 - (0)9 58 10 27 44 >>>>> >>>>> Fax (D):+49 - (0)3 21 21 25 22 44 >>>>> >>>>> email: rai...@krugs.de >>>>> >>>>> Skype: RMkrug >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> >>>> >>> >>> >> >> > > > -- > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, > UCT), Dipl. Phys. (Germany) > > Centre of Excellence for Invasion Biology > Stellenbosch University > South Africa > > Tel : +33 - (0)9 53 10 27 44 > Cell: +33 - (0)6 85 62 59 98 > Fax : +33 - (0)9 58 10 27 44 > > Fax (D):+49 - (0)3 21 21 25 22 44 > > email: rai...@krugs.de > > Skype: RMkrug > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define subset argument for function lm as variable?
Hi Rainer, You could try: subs <- expression(dead==FALSE & recTreat==FALSE) lme(formula, subset = eval(subs)) Not tested, but something along those lines should work. Cheers, Josh On Tue, Aug 21, 2012 at 7:44 AM, Rainer M Krug wrote: > Hi > > I want to do a series of linear models, and would like to define the input > arguments for lm() as variables. I managed easily to define the formula > arguments in a variable, but I also would like to have the "subset" in a > variable. My reasoning is, that I have the subset in the results object. > > So I wiould like to add a line like: > > subs <- dead==FALSE & recTreat==FALSE > > which obviously does not work as the expression is evaluated immediately. Is > is it possible to do what I want to do here, or do I have to go back to use > > dat <- subset(dat, dead==FALSE & recTreat==FALSE) > > ? > > > > dat <- loadSPECIES(SPECIES) > feff <- height~pHarv*year # fixed effect in the model > reff <- ~year|plant # random effect in the model, where > year is the > dat.lme <- lme( > fixed = feff, # fixed effect in the > model > data = dat, > random = reff, # random effect in the > model > correlation = corAR1(form=~year|plant), # > subset = dead==FALSE & recTreat==FALSE, # > na.action = na.omit > ) > dat.lm <- lm( > formula = feff, # fixed effect in the model > data = dat, > subset = dead==FALSE & recTreat==FALSE, > na.action = na.omit > ) > > Thanks, > > Rainer > > -- > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, > UCT), Dipl. Phys. (Germany) > > Centre of Excellence for Invasion Biology > Stellenbosch University > South Africa > > Tel : +33 - (0)9 53 10 27 44 > Cell: +33 - (0)6 85 62 59 98 > Fax : +33 - (0)9 58 10 27 44 > > Fax (D):+49 - (0)3 21 21 25 22 44 > > email: rai...@krugs.de > > Skype: RMkrug > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R utilizing 25% of CPU for Dual core i3 370M processor
You probably have hyper threading so even though you only have two physical cores, there are 4 logical cores, hence 25%. Google can lead you to how to adjust this, if possible. Cheers, Josh On Fri, Aug 17, 2012 at 2:41 PM, Shantanu MULLICK wrote: > Hello Everyone > > I have a dual-core Intel i3-370M processor and Windows 64 operating system. > I have a 3GB RAM and my processor can support a maximum of 8GB RAM. > > I have virtual memory enabled on my computer. > > I am running a program in the 64 bit R which implements a MCMC on a large > dataset and involves around 8 iterations. > > The processor estimates that it will take around 1000 minutes to complete > running the program. > > However when I check the CPU usage it shows only 25% usage. > > I have read on threads that R runs on only 1 core - and thus was expecting > a CPU usage of around 50%. > > It would be great if anyone can point out a way by which I can make the > processor use 50% of the CPU which I believe would allow the program to be > run quicker. > > Best > Shantanu > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] named character question
Hi Erin, The first element of the character vector is a string. You cannot extract specifically characters from a string; try something like ?nchar or perhaps better use regular expressions to extract things between commas after two characters (or whatever logical rule accurately gets the zip code). Cheers, Josh On Sun, Aug 12, 2012 at 8:33 PM, Erin Hodgess wrote: > Dear R People: > > Here is a goofy question: > > I want to extract the zip code from an address and here is my work so far: > >> add1 > results.formatted_address > "200 W Rosamond St, Houston, TX 77076, USA" >> add1[1][32:36] > > NA NA NA NA NA >> str(add1) > Named chr "200 W Rosamond St, Houston, TX 77076, USA" > - attr(*, "names")= chr "results.formatted_address" >> > > What am I not seeing, please? > > Thanks, > Erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodg...@gmail.com > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Align columns in data frame write.table
I do not know of any option in write.table() that would allow a variable spacer, such as padding with spaces to make columns centered or right aligned. Everything is just separated somehow. You could look at ?format or ?sprintf which have padding/alignment options. Once you had properly padded character data, you could just use writeLines() to push it to a file. Cheers, Josh On Fri, Aug 10, 2012 at 6:39 PM, sharx wrote: > Does anyone know of a way to specify the alignment of individual columns in a > data frame so that after using write.table the columns are aligned in the > file? > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Align-columns-in-data-frame-write-table-tp4640007.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lavaan: Immediate non-positive definite matrix
Dear Andrew, Maximum likelihood estimation with missing data typically makes some rather strong assumptions. If I am not mistaken, the default covariance coverage in Mplus is .05, the fact that you need to set it lower suggests you have some combinations of variables with less than 5% jointly present? If that is true, I am not surprised you are getting errors. Estimation is an iterative process utilizing the expectation maximization algorithm with the E step finding the expected value given the available data and then maximizing the parameters based on that, which becomes the basis for another E step and so on. In the case of no missing data, the expectations do not need to be updated at all (easy) in the case of a lot of missing data, more steps are required. With < 5% (if the assumptions that conclusion was predicated on are indeed true) pairwise present data for some combinations, estimates could be quite instable. If you are dealing with a typical longitudinal study, drop out probably increases with each additional time point. The easiest case is where dropout is monotonic, in which case perhaps it still makes sense to try to include all time points. If the missingness is not monotonic and the < 5% is with some combination of the final time point, I would suggest you simply look at an unconditional growth model of say the first 5 or 6 time points. Actually, even if you keep all 7, if there is increasing missingness at later time points, I would suggest that you try just 5 and 6 times as a sort of sensitivity analysis to see how your results are effected. I realize that none of my comments really directly pertain to R so this email is rather off topic for R help. Here is my saving bit ;) For data, try sharing it by uploading to some website such as google drive, dropbox, skydrive, etc. and then just sharing a link with your emails. Also, while you're at that, perhaps include the R script to read data in and load lavaan etc. >From my brief read of the lavaan error message, it suggests that the sample covariance matrix is not positive definite and not invertible (well, I am assuming that S standards for the sample covariance matrix). Can you try fitting the model with listwise deletion and with direct ML? Have you look at the (listwise) present sample covariance matrix? Finally, you could try fitting the model in OpenMx, which also runs in R. Cheers, Josh On Fri, Aug 10, 2012 at 12:27 PM, Andrew Miles wrote: > Hi, > > I recently tried to estimate a linear unconditional latent growth curve on > 7 repeated measures using lavaan (most recent version): > > modspec=' > > alpha =~ 1*read_g0 + 1*read_g1 + 1*read_g2 + 1*read_g3 + 1*read_g4 + > 1*read_g5 + 1*read_g6 > > beta =~ 0*read_g0 + 1*read_g1 + 2*read_g2 + 3*read_g3 + 4*read_g4 + > 5*read_g5 + 6*read_g6 > > ' > > gmod=lavaan(modspec, data=math, meanstructure=T, int.ov.free=F, int.lv.free= > T, auto.var=T, auto.cov.lv.x=T, auto.cov.y=T, missing="direct", verbose=T) > The model immediately returned the following error message: > > Error in chol.default(S) : > > the leading minor of order 5 is not positive definite > > Error in lavSampleStatsFromData(Data = lavaanData, rescale = > (lavaanOptions$estimator == : > > lavaan ERROR: sample covariance can not be inverted > I ran the same model in Mplus and found that it has low covariance coverage > for the 7 repeated measures, but all coverage is greater than 0. The model > ran fine in Mplus once I set the covariance coverage limit to .001. That > provided me starting values to try in lavaan, but again it immediately > returned the same error message. In fact, nothing I could do allowed me to > fit the model without it immediately returning the same error message > (e.g., I tried changing the optimizer, the type of likelihood being used). > > Because the error message pops up immediately (i.e., not after lavaan tries > to estimate the model for a few seconds), it makes me think that my data is > failing some sort of initial data check. > > Any ideas what is happening, or what to do about it? Thanks. > > P.S. I haven't had success in attaching data files when I submit to the R > help list, so if you would like the data please email me at > andrewami...@hotmail.com. > > Andrew Miles > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ _
Re: [R] Simple question about formulae in R!?
On Fri, Aug 10, 2012 at 9:16 AM, S Ellison wrote: >> > R in general tries hard to prohibit this behavior (i.e., including an >> > interaction but not the main effect). When removing a main effect and >> > leaving the interaction, the number of parameters is not reduced by >> > one (as would be expected) but stays the same, at least >> > when using model.matrix: > > Surely this behaviour is less to do with a dislike of interactions without > both main effects (which we will necessarily use if we fit a simple > two-factor nested model) than the need to avoid non-uniqueness of a model > fitted with too many coefficients? > In a simple case, an intercept plus n coefficients for n factor levels gives > us n+1 coefficients to find, and we only have n independent groups to > estimate them from. In model matrix terms we would have one column that is a > linear combination of others. For OLS normal equations that generates a zero > determinant and for the numerical methods R uses the effect is the same; no > useful fit. To avoid that and allow least squares fitting, R sets up the > model matrix with only n-1 coefficients in addition to the intercept. As a > result we end up with fewer model coefficients than we might have expected > (and that annoyingly missing first level that always puzzles newcomers the > first time we look at a linear model summary), but we have exactly the number > of coefficients that we can estimate uniquely from the groups we have > specified. N.B. Off topic. This is an incredibly nice feature of R. SAS overparameterizes the design matrix and employs the sweep algorithm to zero out redundant parameters. With one result being if you want to specify your own data to post multiply by the coefficient vector, you need to realize that the vector is larger than the number of nonmissing parameters. I struggle to imagine this is more computationally efficient than simply creating an appropriately parameterized design matrix, although I suppose in either case you need to check for less than full rank design matrices. > > S > > *** > This email and any attachments are confidential. Any u...{{dropped:17}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple question about formulae in R!?
tor(vs)0:factor(am)1 15.0500020.7428619.75000 factor(vs)1:factor(am)1 28.37143 which you can see directly gets all the group expectations. Cheers, Josh > I have asked a question on how to get around this "limitation" on > stackoverflow with helpful answers by Ben Bolker and Joshua Wiley: > http://stackoverflow.com/q/11335923/289572 > (this functionality is now used in function mixed() in my new package afex > for obtaining "type 3" p-values for mixed models) With few exceptions, I'm in the habit of not giving people type 3 p-values. People like, but generally do not actually understand them. I would also like to point out that I am in psychology, so yes I know psychology likes them. Further, I am a doctoral student so it is not like I am so established that people bow to my every whim, it is work to explain why I am not sometimes and I have had discussions in various lab meetings and with advisors about this, but my point is that it is possible. I would urge you to do the same and take heart that there are others working for change---you are not the only one even if it feels like it at your university. > > Cheers, > Henrik > > Am 10.08.2012 15:48, schrieb R. Michael Weylandt: > >> On Fri, Aug 10, 2012 at 7:36 AM, ONKELINX, Thierry >> wrote: >>> >>> Dear Johan, >>> >>> Why should it be complicated? You have a very simple model, thus a very >>> simple formula. Isn't that great? >>> >>> Your formula matches the model. Though Trust~Culture + Structure * >>> Speed_of_Integration is another option. The model fit is the same, the only >>> difference is the parameterization of the model. >> >> >> Note quite, I don't think: A*B is shorthand for A + B + A:B where A:B >> is the 2nd order (interaction) term. The OP only had the 2nd order >> term without the main effects. >> >> OP: You almost certainly want A*B -- it's strange (though certainly >> not impossible) to have good models where you only have interactions >> but not main effects. >> >> Cheers, >> Michael >> >> >>> >>> Best regards, >>> >>> Thierry >>> >>> ir. Thierry Onkelinx >>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >>> and Forest >>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance >>> Kliniekstraat 25 >>> 1070 Anderlecht >>> Belgium >>> + 32 2 525 02 51 >>> + 32 54 43 61 85 >>> thierry.onkel...@inbo.be >>> www.inbo.be >>> >>> To call in the statistician after the experiment is done may be no more >>> than asking him to perform a post-mortem examination: he may be able to say >>> what the experiment died of. >>> ~ Sir Ronald Aylmer Fisher >>> >>> The plural of anecdote is not data. >>> ~ Roger Brinner >>> >>> The combination of some data and an aching desire for an answer does not >>> ensure that a reasonable answer can be extracted from a given body of data. >>> ~ John Tukey >>> >>> >>> -Oorspronkelijk bericht- >>> Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> Namens Johan Haasnoot >>> Verzonden: vrijdag 10 augustus 2012 9:15 >>> Aan: r-help@r-project.org >>> Onderwerp: [R] Simple question about formulae in R!? >>> >>> Good morning reader, >>> >>> >>> >>> I have encountered a, probably, simple issue with respect to the >>> *formulae* of a *regression model* I want to use in my research. I'm >>> researching alliances as part of my study Business Economics (focus >>> Strategy) at the Vrije Universiteit in Amsterdam. In the research model I >>> use a moderating variable, I'm looking for confirmation or help on the >>> formulation of the model. >>> >>> >>> >>> The research model consist of 2 explanatory variables, a moderating >>> variable and 1 response variable. The first explanatory variable is Culture, >>> measured on a nominal scale and the second is Structure of the alliance, >>> also measured on a nominal scale. The moderating variable in the relation >>> towards Trust is Speed of Integration, measured as an interval. >>> The response variable is Trust, measured on a nominal scale (highly >>> likely a 5-point Likert scale). Given the research model and the measurement >>> scales, I intent to use a ordered prob
Re: [R] Questionnaire Analysis virtually without continuous Variables
You may be able to get around zero cells using a an MCMC approach such as with MCMCglmm. On Aug 4, 2012, at 15:30, Sacha Viquerat wrote: > On 08/04/2012 07:57 PM, Joshua Wiley wrote: >> Hi Sacha, >> >> You're right that this is not an R related question really (would be better >> somewhere like crossvalidated.com). >> >> If basically everyone catches 0/1 birds, then I would consider dichotomizing: >> >> Y <- as.integer(caught >= 1) >> >> then check cross tabs to make sure there are no zero cells between >> predictors and outcome: >> >> xtabs(~Y + dogs + guns, data=yourdata) >> >> then use the glmer() function to model the nested random effects. >> >> m <- glmer(Y ~ dog + gun + (1 | household) + (1 | village) + (1 | district), >> data = yourdata, family=binomial) >> >> summary(m) >> >> Cheers, >> >> Josh >> >> On Aug 4, 2012, at 7:12, Sacha Viquerat wrote: >> >>> Hello! >>> I am doing an analysis on a questionnaire of hunters taken in 4 different >>> districts of some mysterious foreign country. The aim of the study was to >>> gather info on the factors that determine the hunting success of a >>> peculiarly beautiful bird in that area. All variables are factors, i.e. >>> they are variables such as "Use of Guns - yes / no", "Use of Dogs - yes / >>> no" and the likes. The response is upposed to be "number of Birds caught", >>> which was designed to be the only continuous variable. However, in reality >>> the number of caught birds is between 0 and 1, sometimes hunters answered >>> with 2. Unfortunately, it is not the questioner who is burdened with the >>> analysis, but me. I am struggling to find an appropriate approach to the >>> analysis. I don't really consider this as count data, since it would be >>> very vulnerable to overinflation (and a steep decline for counts above 0). >>> I can't really suggest binomial models either, since the lack of >>> explanatory, continuous data renders such ! an approach quite vague. I also struggle with the random design of the survey (households nested within villages nested within districts). Adding to that, hunters don't even target the bird as their prime objective. The bird is essentially a by-catch, most often used for instant consumption on the hunting trip. I therefore doubt that any analysis makes more than a little sense, but I will not yet succumb to failure. Any ideas? >>> >>> Thanks in advance! >>> >>> PS: I just realized that this is not a question related to R but to >>> statistics in general. Apologies for that! >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > I did exactly what you proposed already (since the binomial model seemed > obvious to me), however, of course there are zero cells. I was thinking > someone more accustomed to doing questionnaire analysis could unveil some > mysterious approach common to sociologists but occluded from the naturalists > eyes (hardened after years of dealing with exact science ;) > I think I will expand the binomial approach and just try to find fancy > graphics that make up for the low value of the actual results (maybe with > colours). :D > Thank you for the reply (do they really give such tasks for homework these > days? These kids must be awesome statisticians!) > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questionnaire Analysis virtually without continuous Variables
Hi Sacha, You're right that this is not an R related question really (would be better somewhere like crossvalidated.com). If basically everyone catches 0/1 birds, then I would consider dichotomizing: Y <- as.integer(caught >= 1) then check cross tabs to make sure there are no zero cells between predictors and outcome: xtabs(~Y + dogs + guns, data=yourdata) then use the glmer() function to model the nested random effects. m <- glmer(Y ~ dog + gun + (1 | household) + (1 | village) + (1 | district), data = yourdata, family=binomial) summary(m) Cheers, Josh On Aug 4, 2012, at 7:12, Sacha Viquerat wrote: > Hello! > I am doing an analysis on a questionnaire of hunters taken in 4 different > districts of some mysterious foreign country. The aim of the study was to > gather info on the factors that determine the hunting success of a peculiarly > beautiful bird in that area. All variables are factors, i.e. they are > variables such as "Use of Guns - yes / no", "Use of Dogs - yes / no" and the > likes. The response is upposed to be "number of Birds caught", which was > designed to be the only continuous variable. However, in reality the number > of caught birds is between 0 and 1, sometimes hunters answered with 2. > Unfortunately, it is not the questioner who is burdened with the analysis, > but me. I am struggling to find an appropriate approach to the analysis. I > don't really consider this as count data, since it would be very vulnerable > to overinflation (and a steep decline for counts above 0). I can't really > suggest binomial models either, since the lack of explanatory, continuous > data renders such an! approach quite vague. I also struggle with the random design of the survey (households nested within villages nested within districts). Adding to that, hunters don't even target the bird as their prime objective. The bird is essentially a by-catch, most often used for instant consumption on the hunting trip. I therefore doubt that any analysis makes more than a little sense, but I will not yet succumb to failure. Any ideas? > > Thanks in advance! > > PS: I just realized that this is not a question related to R but to > statistics in general. Apologies for that! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Partial Likelihood
In addition to Bert's suggestion of r sig mixed models (which I second), I would encourage you to create a more detailed example and explanation of what you hope to accomplish. Sounds a bit like an auto regressive structure, but more details would be good. Cheers, Josh On Aug 4, 2012, at 9:34, Bert Gunter wrote: > Sounds like generalized linear mixed modeling (glmm) to me. Try > posting to the r-sig-mixed-models list rather than here to increase > the likelihood of a useful response. > > -- Bert > > On Sat, Aug 4, 2012 at 3:55 AM, doctoratza wrote: >> Hello everyone, >> >> i would like to ask if everyone knows how to perfom a glm partial likelihood >> estimation in a time series whrere dependence exists. >> >> lets say that i want to perform a logistic regression for binary data (0, 1) >> with binary responses which a re the previous days. >> >> for example: >> >> >> logistic<-glm(dat$Day~dat$Day1+dat$Day2, family=binomial(link="logit")) >> >> where dat$Day (0 or 1) is the current day and dat$Day1 is one day before (0 >> or 1). >> >> is it possible that R performs partial likelihood estimation automatically? >> >> >> thank you in advance >> >> Konstantinos Mammas >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Partial-Likelihood-tp4639159.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to link two R packages together
Hi Xuan, I would expect ~/R/ to be the R_HOME directory, though perhaps I am wrong. Also, you only need a depends or an imports, not both. At a simplified level, the difference is that importing PKG2 in PKG1 makes the (exported) functions in PKG2 available to PKG1. Depends also makes the (exported) functions in PKG2 available to PKG1, but it _also_ makes them available to the user. This can be good or bad, with the recommendation now being not to. The reason it can be good is if you want your users to have access to functions in both packages without explicitly loading both. However, let's say that function foo() is defined in PKG2 and in PKG3. PKG1 does not know about PKG3, and the user had PKG3 loaded prior to loading PKG1. Then when PKG1 loads, PKG2 also loads, and because all the exported functions from PKG2 are included, PKG2::foo() masks PKG3::foo(), which can be annoying for users if your package adds unnecessary functions to the search path potentially masking the versions they wanted to use. That was a rather convoluted explanation, but hopefully it is somewhat clear. Sincerely, Josh On Thu, Aug 2, 2012 at 7:02 AM, Xuan Zhao wrote: > Hi All, > Thank you so much for all the help you have provided, I really appreciate it! > The problem is solved, I just added PKG2 to the dependency of PKG1 in the > description file, also (I don't know if it's necessary or not), I added > "import(PKG2)" to the NAMESPACE file. In addition to that, I install PKG2 > under a path R can recognize, namely, belong to ".libPaths()". > I have tried other ways besides install it under the path R can recognize, > like adding a file ~/.R/check.Environ, whose contents are: > 'R_LIBS_SITE=${R_LIBS_SITE-'directoryunderwhichPKG2isinstalled'}', but it > still doesn't work. > I looked through the manual, they just say the file 'check.Environ' should be > put under '~/.R', but I am not should what the "~/" should be. Is that my > home directory or what? Should that be the host? I am not the host of the > server, does that matter? > Thank you so much for the help! > Yours, > Xuan > > -Original Message- > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Thursday, August 02, 2012 9:42 AM > To: Joshua Wiley > Cc: Michael Weylandt; r-help@r-project.org; Xuan Zhao > Subject: Re: [R] How to link two R packages together > > Josh: > > You may be right ... but I still do not think so. Note that the post begins > with: > > One of them (PKG1) needs to use the functions >> of the other package (PKG2). > > This is exactly what imports are for. I believe that, because he/she failed > to RTFM, he/she is not being accurate in his/her use of the relevant terms. > Certainly, to me at least, it is unclear. Moreover, as I understand it (see > the manual) imports are preferred over dependencies, as I indicated in my > reply. > > Anyway, in either case, the central advice is that which both Michael and I > gave: RTFM. Perhaps it's a generation gap (I am old), but I believe there has > been a trend for new R users to post here without making any significant > effort to consult the docs. This strikes me as intellectually lazy and, > frankly, inconsiderate to those like yourself > -- or BDR -- who are willing to give up their time and effort to help those > with legitimate questions. You and others may consider this an overreaction > of course. > > Sorry for the rant, but it seems relevant to your close parsing of the thread. > > -- Cheers, > Bert > > On Wed, Aug 1, 2012 at 10:35 PM, Joshua Wiley wrote: >> On Wed, Aug 1, 2012 at 10:28 PM, Bert Gunter wrote: >>> On Wed, Aug 1, 2012 at 6:45 PM, Michael Weylandt >>> wrote: >>>> Isn't this what package dependencies are for? >>> >>> No. It's what package imports are for (preferably). >>> As always, the OP should RTFM -- in this case the one to which you >>> refer on the next line, especially the NAMESPACES section. >> >> But note that the original question included, "when I load one package >> using 'library("PKG1")', PKG2 can be loaded at the same." which >> imports does not exactly do. They become available to the package, but >> not to the user, so if you really need both packages loaded to the >> search path, then you need a dependency, not imports. >> >> Cheers, >> >> Josh >> >>> >>> -- Bert >>> >>>> >>>> See the description of the DESCRIPTION file in Writing R Extensions >>>> >>>> Michael >>>> >>>> On Aug 1, 2012, at 5:27 PM, xuan
Re: [R] How to link two R packages together
Hi Bert, Likewise, you may well be right. In fact, if I were going to assign probabilities of being correct given the original post, I would give you about .7 and me about .3. I commented for completeness of discussion. I definitely see your point, and thank you for a thoughtful reply to my post. I agree that many questions seem to me that a quick read of the docs would solve them. That brings up an interesting point, though. In a population of 100, if 1-2 people have a preventable illness, we might be inclined to blame them; what if 40-50 people have the preventable illness? It seems to me lack of reading the docs and following the posting guide are at epidemic proportions. I do not know if it is a generation gap, true laziness, lack of knowledge that the docs and posting guide _should_ be read first, or ... This is slightly OT, but I remember one time when I was working with someone in person (not in R), and there was a question about how to customize some graphic. I immediately went to open up the help files but the person I was working with said, "I don't want look at the documentation" and instead starting searching through a cookbook of different graphs to try to find the answer. I was shocked because it seemed like an innate visceral response to the official docs. Perhaps our useRs are afflicted with this same condition. Cheers, Josh On Thu, Aug 2, 2012 at 6:42 AM, Bert Gunter wrote: > Josh: > > You may be right ... but I still do not think so. Note that the post > begins with: > > One of them (PKG1) needs to use the functions >> of the other package (PKG2). > > This is exactly what imports are for. I believe that, because he/she > failed to RTFM, he/she is not being accurate in his/her use of the > relevant terms. Certainly, to me at least, it is unclear. Moreover, as > I understand it (see the manual) imports are preferred over > dependencies, as I indicated in my reply. > > Anyway, in either case, the central advice is that which both Michael > and I gave: RTFM. Perhaps it's a generation gap (I am old), but I > believe there has been a trend for new R users to post here without > making any significant effort to consult the docs. This strikes me as > intellectually lazy and, frankly, inconsiderate to those like yourself > -- or BDR -- who are willing to give up their time and effort to help > those with legitimate questions. You and others may consider this an > overreaction of course. > > Sorry for the rant, but it seems relevant to your close parsing of the thread. > > -- Cheers, > Bert > > On Wed, Aug 1, 2012 at 10:35 PM, Joshua Wiley wrote: >> On Wed, Aug 1, 2012 at 10:28 PM, Bert Gunter wrote: >>> On Wed, Aug 1, 2012 at 6:45 PM, Michael Weylandt >>> wrote: >>>> Isn't this what package dependencies are for? >>> >>> No. It's what package imports are for (preferably). >>> As always, the OP should RTFM -- in this case the one to which you >>> refer on the next line, especially the NAMESPACES section. >> >> But note that the original question included, "when I load one package >> using 'library("PKG1")', PKG2 can be loaded at the same." which >> imports does not exactly do. They become available to the package, but >> not to the user, so if you really need both packages loaded to the >> search path, then you need a dependency, not imports. >> >> Cheers, >> >> Josh >> >>> >>> -- Bert >>> >>>> >>>> See the description of the DESCRIPTION file in Writing R Extensions >>>> >>>> Michael >>>> >>>> On Aug 1, 2012, at 5:27 PM, xuan zhao wrote: >>>> >>>>> Hi, >>>>> I have built two R packages. One of them (PKG1) needs to use the functions >>>>> of the other package (PKG2). >>>>> So I need to link these two packages together, so that the functions of >>>>> PKG2 >>>>> can be available to PKG1. And when I load one package using >>>>> 'library("PKG1")', PKG2 can be loaded at the same. >>>>> Any ideas welcome. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://r.789695.n4.nabble.com/How-to-link-two-R-packages-together-tp4638765.html >>>>> Sent from the R help mailing list archive at Nabble.com. >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-hel
Re: [R] How to link two R packages together
On Wed, Aug 1, 2012 at 10:28 PM, Bert Gunter wrote: > On Wed, Aug 1, 2012 at 6:45 PM, Michael Weylandt > wrote: >> Isn't this what package dependencies are for? > > No. It's what package imports are for (preferably). > As always, the OP should RTFM -- in this case the one to which you > refer on the next line, especially the NAMESPACES section. But note that the original question included, "when I load one package using 'library("PKG1")', PKG2 can be loaded at the same." which imports does not exactly do. They become available to the package, but not to the user, so if you really need both packages loaded to the search path, then you need a dependency, not imports. Cheers, Josh > > -- Bert > >> >> See the description of the DESCRIPTION file in Writing R Extensions >> >> Michael >> >> On Aug 1, 2012, at 5:27 PM, xuan zhao wrote: >> >>> Hi, >>> I have built two R packages. One of them (PKG1) needs to use the functions >>> of the other package (PKG2). >>> So I need to link these two packages together, so that the functions of PKG2 >>> can be available to PKG1. And when I load one package using >>> 'library("PKG1")', PKG2 can be loaded at the same. >>> Any ideas welcome. >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/How-to-link-two-R-packages-together-tp4638765.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm without intercept
Hi, R actually uses a different formula for calculating the R square depending on whether the intercept is in the model or not. You may also find this discussion helpful: http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-lm/ If you conceptualize R^2 as the squared correlation between the oberserved and fitted values, it is easy to get: summary(m0 <- lm(mpg ~ 0 + disp, data = mtcars)) summary(m1 <- lm(mpg ~ disp, data = mtcars)) cor(mtcars$mpg, fitted(m0))^2 cor(mtcars$mpg, fitted(m1))^2 but that is not how R calculates R^2. Cheers, Josh On Sat, Jul 28, 2012 at 10:40 AM, citynorman wrote: > I've just picked up R (been using Matlab, Eviews etc) and I'm having the same > issue. Running reg=lm(ticker1~ticker2) gives R^2=50% while running > reg=lm(ticker1~0+ticker2) gives R^2=99%!! The charts suggest the fit is > worse not better and indeed Eviews/Excel/Matlab all say R^2=15% with > intercept=0. How come R calculates a totally different value?! > > Call: > lm(formula = ticker1 ~ ticker2) > > Residuals: > Min 1Q Median 3Q Max > -0.22441 -0.03380 0.01099 0.04891 0.16688 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.570620.08187 19.18 <2e-16 *** > ticker2 0.617220.02699 22.87 <2e-16 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 0.07754 on 530 degrees of freedom > Multiple R-squared: 0.4967, Adjusted R-squared: 0.4958 > F-statistic: 523.1 on 1 and 530 DF, p-value: < 2.2e-16 > > Call: > lm(formula = ticker1 ~ 0 + ticker2) > > Residuals: > Min1QMedian3Q Max > -0.270785 -0.069280 -0.007945 0.087340 0.268786 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > ticker2 1.134508 0.001441 787.2 <2e-16 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 0.1008 on 531 degrees of freedom > Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991 > F-statistic: 6.197e+05 on 1 and 531 DF, p-value: < 2.2e-16 > > > Jan private wrote >> >> Hi, >> >> thanks for your help. I'm beginning to understand things better. >> >>> If you plotted your data, you would realize that whether you fit the >>> 'best' least squares model or one with a zero intercept, the fit is >>> not going to be very good >>> Do the data cluster tightly around the dashed line? >> No, and that is why I asked the question. The plotted fit doesn't look >> any better with or without intercept, so I was surprised that the >> R-value etc. indicated an excellent regression (which I now understood >> is the wrong interpretation). >> >> One of the references you googled suggests that intercepts should never >> be omitted. Is this true even if I know that the physical reality behind >> the numbers suggests an intercept of zero? >> >> Thanks, >> Jan >> >> __ >> R-help@ mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/lm-without-intercept-tp3312429p4638204.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting a subset of files to be processed
Hi Erin, It is not difficult to imagine doing it either in R or via the shell. If they are all in the same directory, I would tend towards R, just because you can easily set the seed and keep that information so you can reproduce your random selection. If the wd is in the directory with the files, from R just: set.seed(10) toprocess <- sample(list.files(), size = 45) Cheers, Josh On Sat, Jul 28, 2012 at 10:49 AM, Erin Hodgess wrote: > Dear R People: > > I am using a Linux system in which I have about 3000 files. > > I would like to randomly select about 45 of those files to be processed in R. > > Could I make the selection in R or should I do it in Linux, please? > > This is with R-2.15.1. > > Thanks, > erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodg...@gmail.com > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zeroinfl problem: cannot get standard errors, hessian has NaN
Hi, This is not nearly enough information. Please follow the posting guide and provide us with a reproducible example, or at the bare minimum, the code for your models. Without more details we can only wildly guess, but here are a few: ---You have more parameters than your data can support (the Hessian is the matrix of partial second derivatives, and SEs are typically based closely on the hessian (or rather the information matrix, the negative of the hessian). ---you have some redundant effects? Try a simpler model and see if you can find some particular parameter that leads to the problem. You could also try tweaking optimization options, but I that is usually not the source of the problem. We really need more information, with the ideal being putting data that reproduces your problem (or your actual data) online somewhere where we can easily download it, and providing us with all your model code, so on our machines, we can exactly reproduce what is happening for you and try to troubleshoot. Cheers, Josh On Tue, Jul 24, 2012 at 10:50 PM, nikkks wrote: > Hi! > > I have three models. > > > > In the first model, everything is fine. > > > > > However, in the second and third models, I have NA's for standard errors: > > > > > > The hessians also have NaN's (same for m2 and m3). > > > > What should I do about it? It there a way to obtain the hessian without > transforming my variables? I will greatly appreciate your help! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/zeroinfl-problem-cannot-get-standard-errors-hessian-has-NaN-tp4637715.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FIML using lavaan returns zeroes for coefficients
Hi Andrew, I do not think there is a reason to avoid it for univariate regression other than: 1) as was stated the predictors must be continuous 2) it will be slower (non issue for a handful of regressions on a few thousand cases but for people doing thousands of regression on millions of observations, a big concern) In the next month or so, I may have a beta version of a package primarily providing helper functions for fitting SEM models, but could also include an lm()ish wrapper to lavaan. It would use the traditional formula interface, but issue a warning if factor variables or variables with insufficient unique values were used (as either predictors or outcomes). If anyone would be interested in beta testing, feel free to email me. Once I have a basic package working, it will go up on github. Cheers, Josh On Mon, Jul 23, 2012 at 6:07 AM, Andrew Miles wrote: > Thanks for the helpful explanation. > > As to your question, I sometimes use lavaan to fit univariate regressions > simply because it can handle missing data using FIML rather than listwise > deletion. Are there reasons to avoid this? > > BTW, thanks for the update in the development version. > > Andrew Miles > > > On Jul 21, 2012, at 12:59 PM, yrosseel wrote: > >> On 07/20/2012 10:35 PM, Andrew Miles wrote: >>> Hello! >>> >>> I am trying to reproduce (for a publication) analyses that I ran >>> several months ago using lavaan, I'm not sure which version, probably >>> 0.4-12. A sample model is given below: >>> >>> pathmod='mh30days.log.w2 ~ mh30days.log + joingroup + leavegroup + >>> alwaysgroup + grp.partic.w2 + black + age + bivoc + moved.conf + >>> local.noretired + retired + ds + ministrytime + hrswork + >>> nomoralescore.c + negint.c + cong.conflict.c + nomoraleXjoin + >>> nomoraleXleave + nomoraleXalways + negintXjoin + negintXleave + >>> negintXalways + conflictXjoin + conflictXleave + conflictXalways ' >>> mod1 = sem(pathmod, data=sampledat, missing="fiml", se="robust") >>> >>> At the time, the model ran fine. Now, using version 0.4-14, the >>> model returns all 0's for coefficients. >> >> What happened is that since 0.4-14, lavaan tries to 'detect' models that are >> just univariate regression, and internally calls lm.fit, instead of the >> lavaan estimation engine, at least when the missing="ml" argument is NOT >> used. (BTW, I fail to understand why you would use lavaan if you just want >> to fit a univariate regression). >> >> When missing="ml" is used, lavaan normally checks if you have fixed x >> covariates (which you do), and if fixed.x=TRUE (which is the default). In >> 0.4, lavaan internally switches to fixed.x=FALSE (which implicitly assumes >> that all your predictors are continuous, but I assume you would not using >> missing="ml" otherwise). Unfortunately, for the 'special' case of univariate >> regression, it fails to do this. This behavior will likely change in 0.5, >> where, by default, only endogenous/dependent variables will be handled by >> missing="ml", not exogenous 'x' covariates. >> >> To fix it: simply add the fixed.x=FALSE argument, or revert to 0.4-12 to get >> the old behavior. >> >> Hope this helps, >> >> Yves. >> http://lavaan.org >> >> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Last answer
My apologies. I stand corrected. Thanks Michael. Josh On Thu, Jul 19, 2012 at 10:29 PM, R. Michael Weylandt wrote: > See: https://stat.ethz.ch/pipermail/r-help/2012-February/303110.html > > Michael > > On Fri, Jul 20, 2012 at 12:19 AM, Joshua Wiley wrote: >> Hi David, >> >> No, but you can store the results and access that. >> >> ## parentheses to force printing >> (x <- 2 + 3) >> >> x >> >> Cheers, >> >> Josh >> >> >> On Thu, Jul 19, 2012 at 10:12 PM, darnold wrote: >>> Hi, >>> >>> In Matlab, I can access the last computation as follows: >>> >>>>> 2+3 >>> >>> ans = >>> >>> 5 >>> >>>>> ans >>> >>> ans = >>> >>> 5 >>> >>> Anything similar in R? >>> >>> David >>> >>> >>> >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Last-answer-tp4637151.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/ >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Last answer
Hi David, No, but you can store the results and access that. ## parentheses to force printing (x <- 2 + 3) x Cheers, Josh On Thu, Jul 19, 2012 at 10:12 PM, darnold wrote: > Hi, > > In Matlab, I can access the last computation as follows: > >>> 2+3 > > ans = > > 5 > >>> ans > > ans = > > 5 > > Anything similar in R? > > David > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Last-answer-tp4637151.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding the values to minimize sum of functions
Hi Linh, Here is an approach: f <- function(v) { v <- v/sum(v) (v[1]^2) + (2 * v[2]^2) + (3*v[3]^2) } (res <- optim(c(.6, .3, .1), f)) res$par/sum(res$par) This is a downright lazy way to implement the constraint. The main idea is to combine all three functions into one function that takes a vector of parameters (v, in this case). Cheers, Josh On Thu, Jul 19, 2012 at 10:24 AM, Linh Tran wrote: > Hi fellow R users, > > I am desperately hoping there is an easy way to do this in R. > > Say I have three functions: > > f(x) = x^2 > f(y) = 2y^2 > f(z) = 3z^2 > > constrained such that x+y+z=c (let c=1 for simplicity). > > I want to find the values of x,y,z that will minimize f(x) + f(y) + f(z). > > I know I can use the optim function when there is only one function, but > don't know how to set it up when there are three. > > I would also like to apply this to higher dimensions (i.e. for more than > three functions) if possible. > > Thank you for all your help! > > -- > Kind regards, > > Linh Tran, MPH > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] complexity of operations in R
also rep.int() > system.time(for (i in 1:1000) x <- rep.int(FALSE, 10)) user system elapsed 0.290.020.29 > system.time(for (i in 1:1000) x <- rep(FALSE, 10)) user system elapsed 1.960.082.05 On Thu, Jul 19, 2012 at 9:11 AM, Bert Gunter wrote: > Hadley et. al: > > Indeed. And using a loop is a poor way to do it anyway. > > v <- as.list(rep(FALSE,dotot)) > > is way faster. > > -- Bert > > On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham wrote: >> On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan wrote: >>> Johan, >>> >>> Your 'list' and 'array doubling' code can be written much more efficient. >>> >>> The following function is faster than your g and easier to read: >>> >>> g2 <- function(dotot) { >>> v <- list() >>> for (i in seq_len(dotot)) { >>> v[[i]] <- FALSE >>> } >>> } >> >> Except that you don't need to pre-allocate lists... >> >> Hadley >> >> -- >> Assistant Professor / Dobelman Family Junior Chair >> Department of Statistics / Rice University >> http://had.co.nz/ >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] easy way to fit saturated model in sem package?
Dear John, Thanks very much for the reply. Looking at the optimizers, I had thought that the objectiveML did what I wanted. I appreciate the clarification. I think that multiple imputation is more flexible in some ways because you can easy create different models for every variable. At the same time, if the assumptions hold, FIML is equivalent to multiple imputation, and considerably more convenient. Further, I suspect that in many circumstances, either option is equal to or better than listwise deletion. In my case, I am working on some tools primarily for data exploration, in a SEM context (some characteristics of individual variables and then covariance/correlation matrices, clustering, etc.) and hoped to include listwise/pairwise/FIML as options. I will check out the lavaan package. Thanks again for your time, Josh On Thu, Jul 12, 2012 at 8:20 AM, John Fox wrote: > Dear Joshua, > > If I understand correctly what you want to do, the sem package won't do it. > That is, the sem() function won't do what often is called FIML estimation > for models with missing data. I've been thinking about implementing this > feature, and don't think that it would be too difficult, but I can't promise > when and if I'll get to it. You might also take a look at the lavaan > package. > > As well, I must admit to some skepticism about the FIML estimator, as > opposed to approaches such as multiple imputation of missing data. I suspect > that the former is more sensitive than the latter to the assumption of > multinormality. > > Best, > John > > > John Fox > Senator William McMaster > Professor of Social Statistics > Department of Sociology > McMaster University > Hamilton, Ontario, Canada > http://socserv.mcmaster.ca/jfox > > > > >> -----Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >> project.org] On Behalf Of Joshua Wiley >> Sent: July-12-12 2:53 AM >> To: r-help@r-project.org >> Cc: John Fox >> Subject: [R] easy way to fit saturated model in sem package? >> >> Hi, >> >> I am wondering if anyone knows of an easy way to fit a saturated model >> using the sem package on raw data? Say the data were: >> >> mtcars[, c("mpg", "hp", "wt")] >> >> The model would estimate the three means (intercepts) of c("mpg", "hp", >> "wt"). The variances of c("mpg", "hp", "wt"). The covariance of mpg >> with hp and wt and the covariance of hp with wt. >> >> I am interested in this because I want to obtain the MLE mean vector >> and covariance matrix when there is missing data (i.e., the sum of the >> case wise likelihoods or so-called full information maximum >> likelihood). Here is exemplary missing data: >> >> dat <- as.matrix(mtcars[, c("mpg", "hp", "wt")]) >> dat[sample(length(dat), length(dat) * .25)] <- NA dat <- >> as.data.frame(dat) >> >> It is not too difficult to write a wrapper that does this in the OpenMx >> package because you can easily define paths using vectors and get all >> pairwise combinations using: >> >> combn(c("mpg", "hp", "wt"), 2) >> >> but I would prefer to use the sem package, because OpenMx does not work >> on 64 bit versions of R for Windows x64 and is not available from CRAN >> presently. Obviously it is not difficult to write out the model, but I >> am hoping to bundle this in a function that for some arbitrary data, >> will return the FIML estimated covariance (and correlation matrix). >> Alternately, if there are any functions/packages that just return FIML >> estimates of a covariance matrix from raw data, that would be great >> (but googling and using findFn() from the sos package did not turn up >> good results). >> >> Thanks! >> >> Josh >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, Statistical Consulting Group University of >> California, Los Angeles https://joshuawiley.com/ >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] easy way to fit saturated model in sem package?
Hi, I am wondering if anyone knows of an easy way to fit a saturated model using the sem package on raw data? Say the data were: mtcars[, c("mpg", "hp", "wt")] The model would estimate the three means (intercepts) of c("mpg", "hp", "wt"). The variances of c("mpg", "hp", "wt"). The covariance of mpg with hp and wt and the covariance of hp with wt. I am interested in this because I want to obtain the MLE mean vector and covariance matrix when there is missing data (i.e., the sum of the case wise likelihoods or so-called full information maximum likelihood). Here is exemplary missing data: dat <- as.matrix(mtcars[, c("mpg", "hp", "wt")]) dat[sample(length(dat), length(dat) * .25)] <- NA dat <- as.data.frame(dat) It is not too difficult to write a wrapper that does this in the OpenMx package because you can easily define paths using vectors and get all pairwise combinations using: combn(c("mpg", "hp", "wt"), 2) but I would prefer to use the sem package, because OpenMx does not work on 64 bit versions of R for Windows x64 and is not available from CRAN presently. Obviously it is not difficult to write out the model, but I am hoping to bundle this in a function that for some arbitrary data, will return the FIML estimated covariance (and correlation matrix). Alternately, if there are any functions/packages that just return FIML estimates of a covariance matrix from raw data, that would be great (but googling and using findFn() from the sos package did not turn up good results). Thanks! Josh -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compare stacked histograms/datasets
Hi, Sure, you could do a qqplot for each variable between two datasets. In a 2d graph, it will be hard to reasonably compare more than 2 datasets (you can put many such graphs on a single page, but it would be pairwise sets of comparisons, I think. Perhaps you could plots multiple qqplots on top of each other varying the points by colour for the different data sets? I have not seen anything like this before, so I suppose it depends what helps you understand your data. Cheers, Josh On Sat, Jul 7, 2012 at 3:25 PM, Atulkakrana wrote: > Hello Joshua, > > Thanks for taking time out to help me with problem. Actually the comparison > is to be done among two (if possible, more than two) datasets and not within > the dataset. Each dataset hold 5 variables (i.e Red, Purple, Blue, Grey and > Yellow) for 21 different positions i.e 1-21n. So, we have 5 values for each > position (total 21) that make a single dataset or stacked histogram (Plot in > original post). > > Initially I was comparing datasets by plotting stacked histograms for each > and analyzing them visually. But that doesn't give a statistical idea of how > similar or different the datasets are. Therefore, I want to evaluate the > datasets in order to quantify their difference/similarity. So, end result > would be a plot showing similarity/difference among two or more datasets. > > Example datasets: http://pastebin.com/iYj1RNvt > > Does the method you explained can be applied to multiple datasets? Can a > qqplot be obtained in such a case? > > Awaiting your reply > > Thanks > > Atul > > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668p4635744.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Notation for previous observation in a data frame
Hi Ernie, I'll use the built in mtcars dataset for demonstrative purposes. You have some condition, which can be used to create an index ## show data frame mtcars ## create index based on your condition i <- which(mtcars$carb == 1) ## set those rows of mtcars in your index ## to the index - 1 mtcars[i, ] <- mtcars[i - 1, ] ## show resulting data frame mtcars note that if your condition grabs the first row, this will wreck havoc. Cheers, Josh On Sat, Jul 7, 2012 at 8:20 PM, Ernie Tedeschi wrote: > I've created a data frame in R, but in order to clean up some of the data, > I need to set certain variable observations equal to the value of their > previous observation (it would be conditional, but that part's less > important right now). In Stata, I would simply set var = var[_n-1] in those > cases. What is the R equivalent? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compare stacked histograms/datasets
Hi, Probably easier to work with the raw data, but whatever. If your data is in a data frame, dat, ## create row index dat$x <- 1:21 ## load packages require(ggplot2) require(reshape2) ## melt the data frame to be long, long dat, ldat for short ldat <- melt(dat, id.vars="x") ## plot the distributions ggplot(ldat, aes(x, value, colour = variable)) + geom_line() ## they don't really look on the same scale ## we could scale the data first to have equal mean and variance dat2 <- as.data.frame(scale(dat)) ## remake index so it is not scaled dat2$x <- 1:21 ldat2 <- melt(dat2, id.vars="x") ggplot(ldat2, aes(x, value, colour = variable)) + geom_line() which yields the attached PDF (maybe scrubbed on the official list as most file extensions are, but should go through to you personally via gmail). I'm not sure it's the greatest approach ever, but it gives you a sense if they go up and down together or at different points. Cheers, Josh On Fri, Jul 6, 2012 at 1:55 PM, Atulkakrana wrote: > Hello All, > > I have a couple of stacked histograms which I need to compare/evaluate for > similarity or difference. > http://r.789695.n4.nabble.com/file/n4635668/Selection_011.png > > I believe rather than evaluating histograms is will be east to work with > dataset used to plot these stacked histograms, which is in format: > > RED PURPLE BLUE > GREY YELLOW > 22.0640569395 16.9483985765 0 60.9875444840 > 8.18505338088.85231316730 82.9626334520 > 6.85053380786.89501779360.756227758 85.4982206406 0.5338078292 > 6.76156583635.24911032031.645907473386.3434163701 0.6672597865 > 5.82740213527.384341637 2.135231316784.6530249111.1565836299 > 7.87366548046.628113879 1.556939501883.9412811388 1.2010676157 > 7.16192170828.18505338081.245551601483.4074733096 1.3790035587 > 5.560498220610.2758007117 1.067615658483.0960854093 1.0231316726 > 7.11743772247.60676156580.711743772284.5640569395 0.756227758 > 7.87366548043.95907473310.667259786587.50.3113879004 > 7.65124555167.87366548040.533807829283.9412811388 0.5338078292 > 7.60676156588.98576512461.467971530281.9395017794 0.3558718861 > 8.94128113888.00711743771.379003558781.6725978648 0.5782918149 > 19.0836298932 9.20818505342.135231316769.5729537367 1.3790035587 > 14.9911032028 11.0765124555 3.202846975170.7295373665 1.0676156584 > 15.3914590747 10.8985765125 3.024911032 70.6850533808 1.2900355872 > 17.4822064057 12.5444839858 2.491103202867.4822064057 1.334519573 > 15.8362989324 13.0338078292 2.001779359469.1281138791.334519573 > 17.03736654810.4537366548 2.402135231370.1067615658 1.2010676157 > 20.2846975089 10.0088967972 0 69.7064056941.0676156584 > 28.7366548043 12.6334519573 0 58.6298932384 0 > > Is there any possible way I can compare such dataset from multiple > experiments (n=8) and visually show (plot) that these datasets are in > consensus or differ from each other? > > Awaiting reply, > > Atul > > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ plots.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal places in a number?
Hi Martin, Ted is spot on about the binary representation. A very different approach from his would be to convert to character and use regular expressions: ## the example numbers in a vector x <- c(3.14, 3.142, 3.1400, 123456.123456789, 123456789.123456789, pi, sqrt(2)) nchar(gsub("(.*\\.)|([0]*$)", "", as.character(x))) which for me returns: [1] 2 3 2 9 6 14 13 an advantage of this approach is that for numbers like 123456789.123456789, although R cannot represent it properly as a binary number, the character string is totally fine. nchar(gsub("(.*\\.)|([0]*$)", "", "123456789.123456789")) returns 9 Essentially the expression looks for anything (the period) zero or more times (the *) followed by an actual period (the \\.) OR 0 repeated zero or more times at the end of the string, and replaces all of those with nothing (the "") and then returns the result, the number of characters of which is counted by nchar() See ?regex for details Cheers, Josh On Sat, Jul 7, 2012 at 3:04 AM, Ted Harding wrote: > On 07-Jul-2012 08:52:35 Martin Ivanov wrote: >> Dear R users, >> >> I need a function that gets a number and returns its number of >> actual decimal places. >> For example f(3.14) should return 2, f(3.142) should return 3, >> f(3.1400) should also return 2 and so on. Is such function already >> available in R? If not, could you give me a hint how to achieve that? >> >> Many thanks in advance. > > I'm not aware of such a function in R. In any case, it will be > a tricky question to solve in full generality, since R stores > numbers internally in a binary representation and the exact > conversion of this representation to a decimal number may not > match the exact value of the decimal representation of the > original number. > > In particular, a number entered as a decimal representation > from the keyboard, or read as such from a text file, may not > be exactly matched by the internal representation in R. > > However, that said, the following function definition seems to > do what you are asking for, for cases such as you list: > > f<-function(x) {min(which( x*10^(0:20)==floor(x*10^(0:20)) )) - 1} > > f(3.14) > # [1] 2 > f(3.142) > # [1] 3 > f(3.1400) > # [1] 2 > > > > Note, however: > > f(123456.123456789) > # [1] 9 > > f(123456789.123456789) > #[1] 7 > > (a consequence of the fact that R does not have enough binary > digits in its binary representation to accommodate the precision > in all the decimal digits of 123456789.123456789 -- not that it > can do that exactly anyway in binary, no matter how many binary > digits it had available). > > Similarly: > > f(pi) > # [1] 15 > f(sqrt(2)) > # [1] 16 > > which is a consequence of the fact that 2 < pi < 4, while > 1 < sqrt(2) < 2, so the binary representation of pi needs > 1 more binary digit for its integer part than sqrt(2) does, > which it therefore has to "steal" from the fractional part. > > Hoping this helps, > Ted. > > - > E-Mail: (Ted Harding) > Date: 07-Jul-2012 Time: 11:04:26 > This message was sent by XFMail > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] differences between survival models between STATA and R
Hi J, You have not provided nearly enough information for us to evaluate whether the results should be similar. You are talking about two completely different packages, with no information on your data, only a small amount of information about your model ("the same analysis") but clearly the analysis is *not* the same or you would get the same results, so really you have "I think I am running the same analysis but either the underlying code/model, the data, or my syntax is actually running a different model and what I really want is someone to help me synchronize the results from R and Stata so I feel more confident, but please do this without data, code, or output". Please do not take this the wrong way, I am not trying to be harsh, just point out the difficulty of the question you have asked us. I primarily use R, but I work at a consulting center and have access to Stata and some interest in survival models, so if you sent us your data (or found/created some example data that replicated the discrepancy you note), I would try to check out both your R and Stata code, but I can only do this with some help from you. Otherwise, I have to find my own data, examine the functions in R and Stata to compare what they are doing, and try many tests on my own to hopefully try to replicate what you might be seeing and then see if I can actually get the same output. Perhaps a more saintly version of myself would do that, but unlike my officemate, there is no special place waiting for me in heaven, or perhaps I have yet to find my wings. More detailed help provided when you provide a reproducible example as the posting guide requests. Cheers, Josh On Fri, Jul 6, 2012 at 2:12 PM, JPF wrote: > Dear Community, > > I have been using two types of survival programs to analyse a data set. > > The first one is an R function called aftreg. The second one an STATA > function called streg. > > Both of them include the same analyisis with a weibull distribution. Yet, > results are very different. > > Shouldn't the results be the same? > > Kind regards, > J > > -- > View this message in context: > http://r.789695.n4.nabble.com/differences-between-survival-models-between-STATA-and-R-tp4635670.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html it's this bit right here I am referring to: > and provide commented, minimal, self-contained, reproducible code. --> ^ -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove loop which compares row i to row i-1
Hi, Great chance to practice debugging. Rather than trying one complicated statement, break it into pieces. The basic structure is: results <- ifelse(condition, value if true, value if false) Each argument needs to be a vector of the same length. In your case, condition itself consists of two vectors: v1 >= v2 so try creating all four vectors and making sure they are what you want and are the appropriate length. Then: results <- ifelse(v1 >= v2, VIT, VIF) will work. Cheers, Josh On Thu, Jul 5, 2012 at 3:52 PM, jcrosbie wrote: > Thank you, > > I tired > > ifelse(tUnitsort[length(tUnitsort$Hour),4]>=tUnitsort[-1,4],(tempMC > =tUnitsort[length(tUnitsort$Hour),7]),tempMC ) > > But this doesn't seem to work. > > Where am I going wrong? > > -- > View this message in context: > http://r.789695.n4.nabble.com/remove-loop-which-compares-row-i-to-row-i-1-tp4635327p4635566.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector entry in matix
Hi Thomas, This is non trivial to do, but if you will be working with this sort of data and are inclined to do some programming, you might consider creating a new class. S4 classes and methods are quite flexible, and you can allow them to lean on or inherit from existing classes such as matrices. For example, your class might consist of three lists and a data frame. The length of each list would be forced to be equal to the number of rows in the data frame. Each element of each list would support a length 2 numeric vector (or integer vector or whatever it is you need). THe lists then would hold nodes 1 - 3, and the data frame would hold other information. You can similarly write methods so that for your special class, d[1, 1] would return c(x1, y1), essentially you create a mix of lists and a matrix with all the necessary constraints, and you develop methods that allow you to operate on this in a way that is sensible for your data. This is stab in the dark, but if you happen to be dealing with social network style data, (triangles and neighbors make me think of that), you should investigate existing packages for that as they may already have their own classes/have a way they expect data to be structured. Off the top of my head, the package "sna" comes to mind as a starting place to look (though I am not personally very familiar with it). Cheers, Josh On Thu, Jul 5, 2012 at 9:37 AM, Thomas C. wrote: > well thank you, i think we get closer to my problem. but i meant it in a > different way. maybe i describe what i intended to do: > > i have number of triangles which i'd like to store in a list, matrix,...etc. > i thought it could look sth. like that: > > trianglenode1node2node3 . > 1 c(x1,y1) c(x2,y3) c(x3,y3) > 1 c(x4,y4) c(x5,y5) c(x6,y6) > . > . > . > > i know that i could just make more rows for the x/y values but the problem > is, that i need to store more than these properties (say the neighbours of > these nodes and their coordinates) and i thought it would be more convenient > to save these as vectors. it was just a thought... is this possible? > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/vector-entry-in-matix-tp4635478p4635509.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GEE with Inverse Probability Weights
Hi Frank, It clusters by twin, that is why in Dr. Lumley's example, the "id" was twin pair, not individual, and the SE is adjusted accordingly. Cheers, Josh On Thu, Jul 5, 2012 at 12:10 PM, RFrank wrote: > Thanks -- extremely helpful. But what is the mechanism by which this > analysis corrects for the fact that my subjects are clustered (twins)? > > -- > View this message in context: > http://r.789695.n4.nabble.com/GEE-with-Inverse-Probability-Weights-tp4633172p4635533.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop for regression
Apologies to Bert, I see now that "market" is a variable in the dataset, not a generic name for market indicators. lm() with a matrix as the outcome seems the best approach here. Josh On Wed, Jul 4, 2012 at 12:12 PM, Joshua Wiley wrote: > On Wed, Jul 4, 2012 at 11:48 AM, Bert Gunter wrote: >> Please carefully read ?lm. As I previously told the OP, no looping/apply is >> necessary. The left hand side of the lm formula can be a matrix for which >> separate fits will be done on each column automatically. > > Which is a great option if the design matrix is constant, but suppose > you want to predict each stock from every other stock in a dataset? > This is a one liner with lapply: > > lapply(colnames(mtcars), function(n) lm(substitute(y ~ ., list(y = > as.name(n))), data = mtcars)) > > granted, not the prettiest or most efficient thing on the planet (and > falls apart for some reason with fastLm(), which I am still > investigating work arounds to). The matrix outcome to lm() approach: > > lm(as.matrix(mtcars) ~ ., data = mtcars) > > yeilds perfect explanation by the variable of itself, as expected, > which is not really useful. The OP did not give many details other > than "write a for loop". It is not clear what should be varying. If > it is *just* the outcome, you are absolutely right, giving lm a matrix > seems the most sensible route. > > Cheers, > > Josh > >> >> -- Bert >> >> On Wed, Jul 4, 2012 at 9:44 AM, arun wrote: >> >>> >>> >>> Hi, >>> >>> You could also use: >>> dat1 <- read.table(text=" >>> >>> Date Stock1 Stock2 Stock3Market >>> 01/01/20001 2 34 >>> 01/02/20005 6 78 >>> 01/03/20001 2 34 >>> 01/04/20005 6 78 >>> ", header=TRUE, stringsAsFactors=FALSE) >>> >>> Stocks<-dat1[,2:4] >>> apply(Stocks,2,function(x) lm(x~Market,data=dat1)) >>> $Stock1 >>> >>> Call: >>> lm(formula = x ~ Market, data = dat1) >>> >>> Coefficients: >>> (Intercept) Market >>> -31 >>> >>> >>> $Stock2 >>> >>> Call: >>> lm(formula = x ~ Market, data = dat1) >>> >>> Coefficients: >>> (Intercept) Market >>> -21 >>> >>> >>> $Stock3 >>> >>> Call: >>> lm(formula = x ~ Market, data = dat1) >>> >>> Coefficients: >>> (Intercept) Market >>> -11 >>> >>> A.K. >>> >>> >>> >>> >>> - Original Message - >>> From: Akhil dua >>> To: r-help@r-project.org >>> Cc: >>> Sent: Wednesday, July 4, 2012 1:08 AM >>> Subject: [R] loop for regression >>> >>> -- Forwarded message -- >>> From: Akhil dua >>> Date: Wed, Jul 4, 2012 at 10:33 AM >>> Subject: >>> To: r-help@r-project.org >>> >>> >>> Hi everyone I >>> have data on stock prices and market indices >>> >>> and I need to run a seperate regression of every stock on market >>> so I want to write a "for loop" so that I wont have to write codes again >>> and again to run the regression... >>> my data is in the format given below >>> >>> >>> >>> Date Stock1 Stock2 Stock3Market >>> 01/01/2000 1 2 3 4 >>> 01/02/2000 5 6 7 8 >>> 01/03/2000 1 2 3 4 >>> 01/04/2000 5 6 7 8 >>> >>> >>> So can any one help me how to write this loop >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read
Re: [R] loop for regression
On Wed, Jul 4, 2012 at 11:48 AM, Bert Gunter wrote: > Please carefully read ?lm. As I previously told the OP, no looping/apply is > necessary. The left hand side of the lm formula can be a matrix for which > separate fits will be done on each column automatically. Which is a great option if the design matrix is constant, but suppose you want to predict each stock from every other stock in a dataset? This is a one liner with lapply: lapply(colnames(mtcars), function(n) lm(substitute(y ~ ., list(y = as.name(n))), data = mtcars)) granted, not the prettiest or most efficient thing on the planet (and falls apart for some reason with fastLm(), which I am still investigating work arounds to). The matrix outcome to lm() approach: lm(as.matrix(mtcars) ~ ., data = mtcars) yeilds perfect explanation by the variable of itself, as expected, which is not really useful. The OP did not give many details other than "write a for loop". It is not clear what should be varying. If it is *just* the outcome, you are absolutely right, giving lm a matrix seems the most sensible route. Cheers, Josh > > -- Bert > > On Wed, Jul 4, 2012 at 9:44 AM, arun wrote: > >> >> >> Hi, >> >> You could also use: >> dat1 <- read.table(text=" >> >> Date Stock1 Stock2 Stock3Market >> 01/01/20001 2 34 >> 01/02/20005 6 78 >> 01/03/20001 2 34 >> 01/04/20005 6 78 >> ", header=TRUE, stringsAsFactors=FALSE) >> >> Stocks<-dat1[,2:4] >> apply(Stocks,2,function(x) lm(x~Market,data=dat1)) >> $Stock1 >> >> Call: >> lm(formula = x ~ Market, data = dat1) >> >> Coefficients: >> (Intercept) Market >> -31 >> >> >> $Stock2 >> >> Call: >> lm(formula = x ~ Market, data = dat1) >> >> Coefficients: >> (Intercept) Market >> -21 >> >> >> $Stock3 >> >> Call: >> lm(formula = x ~ Market, data = dat1) >> >> Coefficients: >> (Intercept) Market >> -11 >> >> A.K. >> >> >> >> >> - Original Message - >> From: Akhil dua >> To: r-help@r-project.org >> Cc: >> Sent: Wednesday, July 4, 2012 1:08 AM >> Subject: [R] loop for regression >> >> -- Forwarded message -- >> From: Akhil dua >> Date: Wed, Jul 4, 2012 at 10:33 AM >> Subject: >> To: r-help@r-project.org >> >> >> Hi everyone I >> have data on stock prices and market indices >> >> and I need to run a seperate regression of every stock on market >> so I want to write a "for loop" so that I wont have to write codes again >> and again to run the regression... >> my data is in the format given below >> >> >> >> Date Stock1 Stock2 Stock3Market >> 01/01/2000 1 2 3 4 >> 01/02/2000 5 6 7 8 >> 01/03/2000 1 2 3 4 >> 01/04/2000 5 6 7 8 >> >> >> So can any one help me how to write this loop >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop for regression
Hi, A few comments. First a for loop is probably not optimally efficient. Consider instead (using a bulit in example dataset): lm(cbind(mpg, hp) ~ cyl + vs, data = mtcars) which gives: Call: lm(formula = cbind(mpg, hp) ~ cyl + vs, data = mtcars) Coefficients: mpg hp (Intercept) 39.6250 -15.6279 cyl -3.0907 27.5843 vs-0.9391 -19.1148 i.e., same predictors used on both outcomes. Note that this is substantially faster than running separately. See ?lm for details. If you need to run separate models (e.g., predictors are changing), and you have many models and a lot of data (which would not be surprising when working with stock data), consider using the RcppEigen package. You can get it by: install.packages("RcppEigen") require(RcppEigen) # load package it has a function called fastLm which is orders of magnitude faster than lm() and works almost identically. lapply(mtcars[, c("mpg", "hp")], function(x) fastLm(X = cbind(Int = 1, mtcars[, c("cyl", "vs")]), y = x)) you just give it the design matrix (X) and response vector (y) see ?fastLm Cheers, Josh On Tue, Jul 3, 2012 at 10:08 PM, Akhil dua wrote: > -- Forwarded message -- > From: Akhil dua > Date: Wed, Jul 4, 2012 at 10:33 AM > Subject: > To: r-help@r-project.org > > > Hi everyone I > have data on stock prices and market indices > > and I need to run a seperate regression of every stock on market > so I want to write a "for loop" so that I wont have to write codes again > and again to run the regression... > my data is in the format given below > > > > Date Stock1 Stock2 Stock3Market > 01/01/2000 1 2 3 4 > 01/02/2000 5 6 7 8 > 01/03/2000 1 2 3 4 > 01/04/2000 5 6 7 8 > > > So can any one help me how to write this loop > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with lmer formula
Hi Camila, In mixed equation form instead of multilevel, it would be: Y_it = gamma_00 + gamma_10*X_it + gamma_11*W_it*X + (e_it + u_0t + u_1j*X) your code seems reasonable. Note that the random intercept and slope will be correlated in your specification (unstructured if you want, it is possible to force out, but is sensible starting place) model <- lmer(Y ~ X + X:W + (X | ID), data = data) which gives: residual variance: e_it variance of intercept (constant term, gamma_00): u_0t variance of slope (gamma_10): u_1j*X as well as overall estimates for the intercept, slope of X, and the interaction of X and W. Bert is correct that R sig mixed models is the more appropriate list, but many people read both and there is no reason it cannot be answered here. Cheers, Josh On Mon, Jul 2, 2012 at 6:47 PM, Camila Mendes wrote: > Hey all - > > I am a newbie on mixed-effects models. I want to estimate the following > model: > > Y_it = alpha_0t + alpha_1t*X_it + e_it > alpha_0t = gamma_00 + u_0t > alpha_1t = gamma_10 + gamma_11*W_it + u_1j > > Where Y is my outcome, X is my level-1 predictor, and W is my level 2 > predictor. > > I am not sure if I am doing it right. Is this the correct specification of > the formula? > > model = lmer(Y ~ X + X:Y + ( X | ID), data = data) > > Also, can you help me to write down the combined model formula? I tried > by substituting on the first equation, but I got some weird interactions > with the residual (?) > > Thanks a lot! > > - Camila > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.prcomp() call/eval
On Fri, Jun 29, 2012 at 1:20 AM, Jessica Streicher wrote: > Hm.. i attached a file with the code, but it doesn't show up somehow.. non text files are scrubbed, and only certain file extensions are allowed (I forget which, I know that .R is *not* allowed (although I think that .txt and maybe .log are). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.prcomp() call/eval
On Thu, Jun 28, 2012 at 8:27 AM, Jessica Streicher wrote: > Thanks Josh, > > that soggy sandwhich saves me a LOT of code by the way, > I'll keep it for the time being ;) There may be other ways. With no knowledge of your context, obviously I cannot say absolutely that there are better ways to save yourself code, but I can say that there often are. > > greetings > Jessica > > On 28.06.2012, at 17:15, Joshua Wiley wrote: > >> Hi Jessica, >> >> x <- call("plot", quote(pcaI)) >> eval(x) >> >> that said, I suspect you would be better off avoiding this idiom >> altogether. Storing unevaluated calls is akin to putting tomatoes on >> your sandwich before packing it for work---you can do it but you end >> up with a soggy sandwich by the time you are eating it. Better to >> store the bread and tomatoe separately and put them together when you >> want a delicious sandwich. >> >> Cheers, >> >> Josh >> >> On Thu, Jun 28, 2012 at 8:03 AM, Jessica Streicher >> wrote: >>> Hi! >>> >>> I am getting a lot of numbers in the background of the pca screeplots if i >>> use call("plot") and eval(somecall). >>> Til now, creating the calls and plotting later on this way worked fine. >>> Example: >>> >>> pcaI<-prcomp(iris[,1:4]) >>> plot(pcaI) >>> >>> x<-call("plot",pcaI) >>> eval(x) >>> >>> Anyone got an idea how i can avoid that? (also it might take a second or so >>> for the numbers to appear, so wait a bit before you tell me everythings >>> fine ^^) >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/ > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.prcomp() call/eval
Hi Jessica, x <- call("plot", quote(pcaI)) eval(x) that said, I suspect you would be better off avoiding this idiom altogether. Storing unevaluated calls is akin to putting tomatoes on your sandwich before packing it for work---you can do it but you end up with a soggy sandwich by the time you are eating it. Better to store the bread and tomatoe separately and put them together when you want a delicious sandwich. Cheers, Josh On Thu, Jun 28, 2012 at 8:03 AM, Jessica Streicher wrote: > Hi! > > I am getting a lot of numbers in the background of the pca screeplots if i > use call("plot") and eval(somecall). > Til now, creating the calls and plotting later on this way worked fine. > Example: > > pcaI<-prcomp(iris[,1:4]) > plot(pcaI) > > x<-call("plot",pcaI) > eval(x) > > Anyone got an idea how i can avoid that? (also it might take a second or so > for the numbers to appear, so wait a bit before you tell me everythings fine > ^^) > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large Test Datasets in R
Hi Ravi, My hunch would be "no" because it seems awfully inefficient. Packages are mirrored all over the world, and it seems rather silly to be mirroring, updating, etc. large datasets. The good news is that if you just want a 10,000 x 100,000 matrix of 0/1s, it is trivial to generate: X <- matrix(sample(0L:1L, 10^9, TRUE), nrow = 10^4) Even stored as integers, this is probably going to be around 4GB. If you want arbitrary values to later cut: X <- matrix(rnorm(10^9), nrow = 10^4) Cheers, Josh On Sun, Jun 24, 2012 at 7:08 AM, vioravis wrote: > I am looking for some large datasets (10,000 rows & 100,000 columns or vice > versa) to create some test sets. I am not concerned about the invidividual > elements since I will be converting them to binary (0/1) by using arbitrary > thresholds. > > Does any R package provide such big datasets? > > Also, what is the biggest text document collection available in R? tm > package seems to provide only 20 records from the Reuters dataset. Is there > any package that has 10,000+ documents?? > > Would appreciate any help on these. > > Thank you. > > Ravi > > -- > View this message in context: > http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining multiple variables in a loop
untry1$lagtaxVSgdp1, size = >>>> (nrow(country1)), replace = T)) >>>> tax2 <- as.matrix(sample(country2$lagtaxVSgdp1, size = >>>> (nrow(country2)), replace = T)) >>>> tax3 <- as.matrix(sample(country3$lagtaxVSgdp1, size = >>>> (nrow(country3)), replace = T)) >>>> >>>> gdp1 <- as.matrix(sample(country1$yoygdpcapita, size = >>>> (nrow(country1)), replace = T)) >>>> gdp2 <- as.matrix(sample(country2$yoygdpcapita, size = >>>> (nrow(country2)), replace = T)) >>>> gdp3 <- as.matrix(sample(country3$yoygdpcapita, size = >>>> (nrow(country3)), replace = T)) >>>> >>>> unemployment1 <- as.matrix(sample(country1$lagunemployment, size = >>>> (nrow(country1)), replace = T)) >>>> unemployment2 <- as.matrix(sample(country2$lagunemployment, size = >>>> (nrow(country2)), replace = T)) >>>> unemployment3 <- as.matrix(sample(country3$lagunemployment, size = >>>> (nrow(country3)), replace = T)) >>>> >>>> country.year1 <- as.matrix(cbind(country1$Country, country1$Year2)) >>>> country.year2 <- as.matrix(cbind(country2$Country, country2$Year2)) >>>> country.year3 <- as.matrix(cbind(country3$Country, country3$Year2)) >>>> >>>> country1.2 <- as.data.frame(cbind(country.year1, exp1, tax1, gdp1, >>>> unemployment1)) >>>> country2.2 <- as.data.frame(cbind(country.year2, exp2, tax2, gdp2, >>>> unemployment2)) >>>> country3.2 <- as.data.frame(cbind(country.year3, exp3, tax3, gdp3, >>>> unemployment3)) >>>> >>>> data <- as.data.frame(rbind(country1.2, country2.2, country3.2)) >>>> >>>> OECDsamplepanel <- pdata.frame(data, index = NULL, drop = F) >>>> >>>> plm <- plm(V5 ~ lag(V6, 1) + V3 + V4 + V5, data = OECDSamplepanel, >>>> model = "within") >>>> >>>> coefficients <- t(as.matrix(plm$coefficients)) >>>> fixef <- t(as.matrix(fixef(plm))) >>>> >>>> plmcoef[i, 1:4] = coefficients >>>> plmfixef[i, 1:3] = fixef >>>> >>>> } >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> >>> Internal Contact Info: >>> Phone: 467-7374 >>> Website: >>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval parse question
Hi Sebastian, Probably, but I suspect the "correct" answer is simply do not do it that way. Instead of: test1 <- 1:10 test2 <- 11:20 ... test5 <- 41:50 testt5[7] <- .435 do test <- list(1:10, 11:20, 21:30, 31:40, 41:50) then it is as easy as test[[5]][7] <- .435 Cheers, Josh On Wed, Jun 20, 2012 at 12:59 AM, Leuzinger Sebastian wrote: > Dear all > > Is there a more efficient way of achieving the following (assigning to an > indexed vector in a loop): > > test5 <- 1:10 > eval(parse(text=paste("test",5,"[",7,"]<- ",0.435,sep=""))) > > this works, but it is probably very slow. > > Thanks > Sebastian Leuzinger > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] populating a large matrix
Hi Nick, Off the cuff: doit2 <- function(s, d, e) { matrix(c(s, d, d*d, e, s, d*e, e*e, e*d, s), 3, byrow=FALSE) } doit2(s, d, e) seems like it should offer a substantial speed up. Unless the matrix can be populated with arbitrary formulae, I do not see why it should be a text matrix and then evaluated. If you have not already, I would also encourage you to check out OpenMx. It is a rather flexible matrix optimizer, and you may be able to get it to do what you want. http://openmx.psyc.virginia.edu/docs/OpenMx/latest/Likelihood_Matrix.html Cheers, Josh On Tue, Jun 19, 2012 at 8:10 PM, Nick Matzke wrote: > Hi all, > > This question is slightly weird. I am trying to populate a matrix with > equations. The matrix represents transition probabilities between states. > A simple example is: > > state1 state2 state3 > state1 s d d*d > state2 e s d*e > state3 e*e e*d s > > The parameters s, d, and e need to be optimized with an iterative algorithm. > This means I have to modify, say, d, and then recalculate the transition > probabilities for each cell. > > Currently, I do this by making a matrix with the equations in character > format, setting s, e, and d to values, and then running each cell through > parse(eval(text=celltxt)). As follows: > > # > # Test code: > # Make the text matrix > txtmat = matrix(c("s", "d", "d*d", "e", "s", "d*e", "e*e", "e*d", "e*d"), > nrow=3, byrow=TRUE) > > s=0.7 > d=0.2 > e=0.1 > > doit <- function(celltxt) > { > cellval = eval(parse(text=celltxt)) > return(cellval) > } > > # Calculate the matrix with numerical values > matrix_vals = sapply(X=txtmat, FUN=doit) > valmat = matrix(matrix_vals, nrow=3, byrow=TRUE) > valmat > # End test code > # > > > ...however, this seems to get slow for large matrices. Since I have to > optimize all the parameters I need something that updates the matrix quickly > when s, d, or e is changed. Perhaps this is a job for pointers in C++ or > something, but I figure there must be a better way in R. > > Can anyone think of something more sophisticated than my current method? > > Thanks in advance for any and all help!! > > Cheers, > Nick > > > > > -- > > Nicholas J. Matzke > Ph.D. Candidate, Graduate Student Researcher > > Huelsenbeck Lab > Center for Theoretical Evolutionary Genomics > 4151 VLSB (Valley Life Sciences Building) > Department of Integrative Biology > University of California, Berkeley > > Graduate Student Instructor, IB200B > Principles of Phylogenetics: Ecology and Evolution > http://ib.berkeley.edu/courses/ib200b/ > http://phylo.wikidot.com/ > > > Lab websites: > http://ib.berkeley.edu/people/lab_detail.php?lab=54 > http://fisher.berkeley.edu/cteg/hlab.html > Dept. personal page: > http://ib.berkeley.edu/people/students/person_detail.php?person=370 > Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html > Lab phone: 510-643-6299 > Dept. fax: 510-643-6264 > > Cell phone: 510-301-0179 > Email: mat...@berkeley.edu > > Mailing address: > Department of Integrative Biology > 1005 Valley Life Sciences Building #3140 > Berkeley, CA 94720-3140 > > - > "[W]hen people thought the earth was flat, they were wrong. When people > thought the earth was spherical, they were wrong. But if you think that > thinking the earth is spherical is just as wrong as thinking the earth is > flat, then your view is wronger than both of them put together." > > Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, > 14(1), 35-44. Fall 1989. > http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Storing datasets
On Tue, Jun 12, 2012 at 12:24 AM, Rody wrote: > I found a solution myself, but thanks for the answers. I solved it like this: > D <- matrix(1:225*100,nrow=100,ncol=225) > for(i in 1:225) > D[,i] <- rt(100,df=225) > end but as Don said, you can do this in one step (and it is both faster and more elegant). D <- matrix(rt(100 * 225, df = 225), ncol = 225) Cheers, Josh > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Storing-datasets-tp4632874p4633071.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.