Re: [R] Adding predicted values as a new variable in a data frame
Specify na.action = na.exlude, e.g. x - y - 1:10; x[5] - NA fitted(lm(y ~ x, na.action = na.exclude)) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 NA 6 7 8 9 10 On 9/14/06, Robi Ragan [EMAIL PROTECTED] wrote: I am running a regression: ols.reg1 - lm(y ~ x1 + x2 + x3 + x4) on a data.frame and then generating fitted values: y.hat - ols.reg1$fitted.values Then I would like to add these fitted values to the data.frame as a new variable. The problem is that when the values are predicted the resulting output has too few rows. for some reason certian observations do not get predicted values. So this shrinks the column down and I then cannot combine the output into the original data.frame. If someone could please help I would apreciate it. Stata automatically adds a new column to the data set when you find the fitted values. So having to fight with R just to do something I used to routimely do has made me think of turning back to the dark side. I hope I have just missed something trival in all the help files I have been looking through. Thanks, -- Robi Ragan Graduate Student Department of Economics Department of Political Science The University of Georgia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rv generation
Hi, Can Someone inform me how to generate RV's using the below CDF, by inverse technique. Thanks for your help and time. My CDF is as follows \[ F(x)=0 \ \text{if} \ x 0\]\[ F(x)=\{\frac{x-x_i}{x_{i+1}-x_{i}}*(p_{i+1}-p_{i})\}+p_{i}\ \forall \ x_{i}\leq x x_{i+1} \] \[ F(x)=1 \ \text{if} \ x x_{i+1} \] Regards Murthy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding predicted values as a new variable in a data frame
?na.exclude should help you: my guess is that you asked (by using the default) na.action = na.omit) for rows with missing values to be excluded from the residuals. But since you have not mentioned missing values, we have to guess what 'for some reason' was: please note the footer of this messag. On Thu, 14 Sep 2006, Robi Ragan wrote: I am running a regression: ols.reg1 - lm(y ~ x1 + x2 + x3 + x4) on a data.frame Hmm, no data frame is mentioned: you want a data= argument. and then generating fitted values: y.hat - ols.reg1$fitted.values Please use the accessor functions and not dive into the internal details, e.g. y.hat - fitted(ols.reg1) BTW: where did you get the use of ols.reg1$fitted.values from? Then I would like to add these fitted values to the data.frame as a new variable. The problem is that when the values are predicted the resulting output has too few rows. for some reason certian observations do not get predicted values. So this shrinks the column down and I then cannot combine the output into the original data.frame. fit - lm(formula, data=data_frame, na.action=na.exclude) data_frame$fitted - fitted(fit) If someone could please help I would apreciate it. Stata automatically adds a new column to the data set when you find the fitted values. So having to fight with R just to do something I used to routimely do has made me think of turning back to the dark side. I hope I have just missed something trival in all the help files I have been looking through. The above looks trivial to me. It was not in R or S when lm was first introduced (1991 White Book), but was added last century (thanks to the ideas and persistent advocacy of Terry Therneau). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an error message with 't.test' with R under Unix
Tao Shi [EMAIL PROTECTED] writes: Again, I don't see the error in R under Windows (R 2.3.0) or Unix (R2.3.1). Is this the bug you were talking about? CHANGES IN R VERSION 2.3.0 o Some of the classical tests put unnecessary restrictions on the LHS in the formula interface (e.g., t.test(x+y ~ g) was not allowed). -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
- Original Message - From: Manuel Morales [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Wednesday, September 13, 2006 1:04 PM Subject: Re: [R] Conservative ANOVA tables in lmer On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. Best, Dimitris 2) Obtain confidence intervals for one or more fixed or random effects - use mcmcsamp Did I miss anything important? - What else would people like to do? Cheers Andrew Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-9763 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/13/06, Dimitris Rizopoulos [EMAIL PROTECTED] wrote: - Original Message - From: Manuel Morales [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Wednesday, September 13, 2006 1:04 PM Subject: Re: [R] Conservative ANOVA tables in lmer On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. There are two issues here: how the test statistic is evaluated and what reference distribution is used for the test statistic (i.e. how do you convert a value of the test statistic to a p-value). Manuel is addressing the latter issue. If you compare the difference of -2*logLik for the models to a chi-square with degrees of freedom determined by the difference in the number of parameters the test will be anti-conservative when the number of observations is small. The use of the chi-square as a reference distribution is based on asymptotic properties. The other question is how does one evaluate the likelihood-ratio test statistic and that is the issue that Dimitris is addressing. The REML criterion is a modified likelihood and it is inappropriate to look at differences in the REML criterion when the models being compared have different fixed-effects specifications, or even a different parameterization of the fixed effects. However, the anova method for an lmer object does not use the REML criterion even when the model has been estimated by REML. It uses the profiled log-likelihood evaluated at the REML estimates of the relative variances of the random effects. That's a complicated statement so let me break it down. The optimization in lmer is done with respect to the elements of the variance-covariance matrix of the random effects relative to $\sigma^2$. Given these values the conditional estimates of the fixed-effects parameters and of $\sigma^2$ can be evaluated directly with some linear algebra. In the summary or show output of an lmer model there are two quantities called the MLdeviance and the REMLdeviance. Those are based on the same relative variances but different conditional estimates of $\sigma^2$ (and hence different estimates of the elements of the variance-covariance of the random effects). It turns out that there is very little difference in the value of the profiled log-likelihood at the ML estimates and at the REML estimates. This is not to say that the log-likelihood is similar at the two (complete) sets of estimates - it is the profiled log-likelihoods that are similar and these are what are used to create the likelihood ratio test statistic, even when the models have been fit by REML. As I said, this is complicated and until I went to reply to this message I hadn't really sorted it out for myself. I knew the empirical result, which I had checked before releasing the code, but I hadn't worked out the details of why
[R] ANOVA in R
Despite having used R on a daily basis for the past two years, Im encountering some difficulty performing an ANOVA on my data. What Im trying to do is the following: Given data such as: Day 1 Day 4 Day 8 2 7 2 3 2 8 3 4 7 6 6 8 1 3 4 I want to use ANOVA to determine if there is a significant change over the three days. In other stats packages I have used, I can just select this data and run the ANOVA function and get the F and p values. However in R, the anova function seems to only work with a fitted model, eg. Linear regression. This function seems to assume there is a relationship such as day1~ day 4 + day 8, but in my case there isnt I just want to perform an ANOVA without regression. If anyone could point me in the right direction Id greatly appreciate it, Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/13/06, Dimitris Rizopoulos [EMAIL PROTECTED] wrote: - Original Message - From: Manuel Morales [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Wednesday, September 13, 2006 1:04 PM Subject: Re: [R] Conservative ANOVA tables in lmer On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. There are two issues here: how the test statistic is evaluated and what reference distribution is used for the test statistic (i.e. how do you convert a value of the test statistic to a p-value). Manuel is addressing the latter issue. If you compare the difference of -2*logLik for the models to a chi-square with degrees of freedom determined by the difference in the number of parameters the test will be anti-conservative when the number of observations is small. The use of the chi-square as a reference distribution is based on asymptotic properties. The other question is how does one evaluate the likelihood-ratio test statistic and that is the issue that Dimitris is addressing. The REML criterion is a modified likelihood and it is inappropriate to look at differences in the REML criterion when the models being compared have different fixed-effects specifications, or even a different parameterization of the fixed effects. However, the anova method for an lmer object does not use the REML criterion even when the model has been estimated by REML. It uses the profiled log-likelihood evaluated at the REML estimates of the relative variances of the random effects. That's a complicated statement so let me break it down. The optimization in lmer is done with respect to the elements of the variance-covariance matrix of the random effects relative to $\sigma^2$. Given these values the conditional estimates of the fixed-effects parameters and of $\sigma^2$ can be evaluated directly with some linear algebra. In the summary or show output of an lmer model there are two quantities called the MLdeviance and the REMLdeviance. Those are based on the same relative variances but different conditional estimates of $\sigma^2$ (and hence different estimates of the elements of the variance-covariance of the random effects). It turns out that there is very little difference in the value of the profiled log-likelihood at the ML estimates and at the REML estimates. This is not to say that the log-likelihood is similar at the two (complete) sets of estimates - it is the profiled log-likelihoods that are similar and these are what are used to create the likelihood ratio test statistic, even when the models have been fit by REML. As I said, this is complicated and until I went to reply to this message I hadn't really sorted it out for myself. I knew the empirical result, which I had checked before releasing the code, but I hadn't worked out the details of why
[R] ANOVA in R
Despite having used R on a daily basis for the past two years, I'm encountering some difficulty performing an ANOVA on my data. What I'm trying to do is the following: Given data such as: Day 1Day 4Day 8 2 7 2 3 2 8 3 4 7 6 6 8 1 3 4 I want to use ANOVA to determine if there is a significant change over the three days. In other stats packages I have used, I can just select this data and run the ANOVA function and get the F and p values. However in R, the anova function seems to only work with a fitted model, eg. Linear regression. This function seems to assume there is a relationship such as day1~ day 4 + day 8, but in my case there isn't - I just want to perform an ANOVA without regression. If anyone could point me in the right direction I'd greatly appreciate it, Thanks [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA in R
Try test - data.frame(day.1=c(2,3,3,6,1), day.4=c(7,2,4,6,3), day.8=c(2,8,7,8,4)) test test.long - reshape(test, direction=long, varying=c(day.1,day.4,day.8), v.names=response, timevar=day, times=names(test)) test.long$day - factor(test.long$day) test.long aov(response ~ day, data=test.long) I hope that this helps, Andrew On Thu, Sep 14, 2006 at 09:23:13AM +0100, Russell Compton wrote: Despite having used R on a daily basis for the past two years, I'm encountering some difficulty performing an ANOVA on my data. What I'm trying to do is the following: Given data such as: Day 1Day 4Day 8 2 7 2 3 2 8 3 4 7 6 6 8 1 3 4 I want to use ANOVA to determine if there is a significant change over the three days. In other stats packages I have used, I can just select this data and run the ANOVA function and get the F and p values. However in R, the anova function seems to only work with a fitted model, eg. Linear regression. This function seems to assume there is a relationship such as day1~ day 4 + day 8, but in my case there isn't - I just want to perform an ANOVA without regression. If anyone could point me in the right direction I'd greatly appreciate it, Thanks [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R input output error
Dear R users! I have some problems with some cronjobs containing R programs in batch mode. The CPU load always is quite high, as I plot some weather charts which require extensive interpolation procedures. The crucial point is that I frequently get R input/output errors, if my linux PC operates at full capacity. I am curious, if anyone of you has similar problems. Is there a possibility to prevent those runtime errors? Thanks in advance! Johannes -- Johannes Jenkner Institute for Atmospheric and Climate Science ETH Zuerich Universitätsstrasse 16 ETH Zentrum, CHN M18 CH-8092 Zuerich phone: +41/44/6332773 www: http://www.iac.ethz.ch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA in R
Andrew Robinson [EMAIL PROTECTED] writes: Try test - data.frame(day.1=c(2,3,3,6,1), day.4=c(7,2,4,6,3), day.8=c(2,8,7,8,4)) test test.long - reshape(test, direction=long, varying=c(day.1,day.4,day.8), v.names=response, timevar=day, times=names(test)) test.long$day - factor(test.long$day) test.long aov(response ~ day, data=test.long) Was a one-way ANOVA intended? He never said. On a more elementary level, y - with(test, c(day.1,day.4,day.8)) day - factor(rep(c(1,4,8),each=5)) # or gl(3,5,labels=c(1,4,8)) sub - factor(rep(1:5,3)) # or gl(5,1,15) print(data.frame(y,day,sub)) # just to show the point anova(lm(y~day))# 1-way anova(lm(y~day+sub))# 2-way # This could be better for unbalanced designs: drop1(lm(y~day+sub),test=F) I hope that this helps, Andrew On Thu, Sep 14, 2006 at 09:23:13AM +0100, Russell Compton wrote: Despite having used R on a daily basis for the past two years, I'm encountering some difficulty performing an ANOVA on my data. What I'm trying to do is the following: Given data such as: Day 1Day 4Day 8 2 7 2 3 2 8 3 4 7 6 6 8 1 3 4 I want to use ANOVA to determine if there is a significant change over the three days. In other stats packages I have used, I can just select this data and run the ANOVA function and get the F and p values. However in R, the anova function seems to only work with a fitted model, eg. Linear regression. This function seems to assume there is a relationship such as day1~ day 4 + day 8, but in my case there isn't - I just want to perform an ANOVA without regression. If anyone could point me in the right direction I'd greatly appreciate it, Thanks [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA in R
Andrew, Peter, Thanks both for the help, that's exactly what I was after. It is for a one-way ANOVA, looking at identifying differentially expressed genes across time in a microarray dataset. Also, one of the datasets I'm working with is unbalanced, so that additional code will be most useful. Thanks again, Russell Compton -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Dalgaard Sent: 14 September 2006 10:26 To: Andrew Robinson Cc: Russell Compton; r-help@stat.math.ethz.ch Subject: Re: [R] ANOVA in R Andrew Robinson [EMAIL PROTECTED] writes: Try test - data.frame(day.1=c(2,3,3,6,1), day.4=c(7,2,4,6,3), day.8=c(2,8,7,8,4)) test test.long - reshape(test, direction=long, varying=c(day.1,day.4,day.8), v.names=response, timevar=day, times=names(test)) test.long$day - factor(test.long$day) test.long aov(response ~ day, data=test.long) Was a one-way ANOVA intended? He never said. On a more elementary level, y - with(test, c(day.1,day.4,day.8)) day - factor(rep(c(1,4,8),each=5)) # or gl(3,5,labels=c(1,4,8)) sub - factor(rep(1:5,3)) # or gl(5,1,15) print(data.frame(y,day,sub)) # just to show the point anova(lm(y~day))# 1-way anova(lm(y~day+sub))# 2-way # This could be better for unbalanced designs: drop1(lm(y~day+sub),test=F) I hope that this helps, Andrew On Thu, Sep 14, 2006 at 09:23:13AM +0100, Russell Compton wrote: Despite having used R on a daily basis for the past two years, I'm encountering some difficulty performing an ANOVA on my data. What I'm trying to do is the following: Given data such as: Day 1Day 4Day 8 2 7 2 3 2 8 3 4 7 6 6 8 1 3 4 I want to use ANOVA to determine if there is a significant change over the three days. In other stats packages I have used, I can just select this data and run the ANOVA function and get the F and p values. However in R, the anova function seems to only work with a fitted model, eg. Linear regression. This function seems to assume there is a relationship such as day1~ day 4 + day 8, but in my case there isn't - I just want to perform an ANOVA without regression. If anyone could point me in the right direction I'd greatly appreciate it, Thanks [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Greedy triangulation
Hello, does anyone have code that will generate a greedy triangulation (triangulation that uses shortest non-overlapping edges) for a set of points in Euclidean space? Thanks, Dan Bebber ___ Dr. Daniel P. Bebber Department of Plant Sciences University of Oxford South Parks Road Oxford OX1 3RB UK Tel. 01865 275060 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative
Douglas Bates bates at stat.wisc.edu writes: On 9/13/06, Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. ... The other question is how does one evaluate the likelihood-ratio test statistic and that is the issue that Dimitris is addressing. The REML criterion is a modified likelihood and it is inappropriate to look at differences in the REML criterion when the models being compared have different fixed-effects specifications, or even a different parameterization of the fixed effects. However, the anova method for an lmer object does not use the REML criterion even when the model has been estimated by REML. It uses the profiled log-likelihood evaluated at the REML estimates of the relative variances of the random effects. That's a complicated statement so let me break it down. ... Is this then the same answer as given by Robinson:1991 (ref at the end) to question by Robin Thompson on which likelihood (ML or REML) should be used in testing the fixed effects. Robinson answered (page 49 near bottom right) that both likelihoods give the same conclusion about fixed effects. Can anyone comment on this issues? Thanks, Gregor @Article{Robinson:1991, author = {Robinson, G. K.}, title ={That {BLUP} is a good thing: the estimation of random effects}, journal = ss, year = {1991}, volume = {6}, number = {1}, pages ={15--51}, keywords = {BLUP, example, derivations, links, applications}, vnos = {GG} } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kendall's w
Bianca Vieru wrote: Hi, I try to calculate Kendall's W coefficient and I have a bizarre error. little.app.mat-matrix(c(1,3,4,2,6,5,2,4,3,1,5,6,3,2,5,1,5,4),nrow=3,byrow=TRUE) print(kendall.w(little.app.mat[-1,])) Kendall's W for ordinal data W = 0.7753623Error in if (is.na(x$p.table)) { : argument is of length zero big.app.mat-matrix(c(1,3,4,2,6,5,2,4,3,1,5,6,3,2,5,1,5,42,3,5,3,6,7,9,9,8,7),nrow=3,byrow=TRUE) print(kendall.w(big.app.mat[-1,])) Kendall's W for ordinal data W = 0.4568966 p(X2[8]) = 0.5035488 Why is that working for the big matrix and not for the little one? Thanks for finding this and thanks to David Barron for the correct answer - I'll insert a check for out of range matrices like this and submit a new version. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative
Gregor Gorjanc [EMAIL PROTECTED] writes: Douglas Bates bates at stat.wisc.edu writes: On 9/13/06, Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. ... The other question is how does one evaluate the likelihood-ratio test statistic and that is the issue that Dimitris is addressing. The REML criterion is a modified likelihood and it is inappropriate to look at differences in the REML criterion when the models being compared have different fixed-effects specifications, or even a different parameterization of the fixed effects. However, the anova method for an lmer object does not use the REML criterion even when the model has been estimated by REML. It uses the profiled log-likelihood evaluated at the REML estimates of the relative variances of the random effects. That's a complicated statement so let me break it down. ... Is this then the same answer as given by Robinson:1991 (ref at the end) to question by Robin Thompson on which likelihood (ML or REML) should be used in testing the fixed effects. Robinson answered (page 49 near bottom right) that both likelihoods give the same conclusion about fixed effects. Can anyone comment on this issues? At the risk of sticking my foot in it due to not reading the paper carefully enough: There appears to be two other likelihoods in play, one traditional one depending on fixed effects and variances and another depending on fixed effects and BLUPs (most likely unobservables). I think Robinson is talking about the equivalence of those two. (and BTW ss=Statistical Science in the ref.) Thanks, Gregor @Article{Robinson:1991, author = {Robinson, G. K.}, title ={That {BLUP} is a good thing: the estimation of random effects}, journal = ss, year = {1991}, volume = {6}, number = {1}, pages ={15--51}, keywords = {BLUP, example, derivations, links, applications}, vnos = {GG} } -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rv generation
On 9/14/2006 2:16 AM, nmi13 wrote: Hi, Can Someone inform me how to generate RV's using the below CDF, by inverse technique. Thanks for your help and time. My CDF is as follows \[ F(x)=0 \ \text{if} \ x 0\]\[ F(x)=\{\frac{x-x_i}{x_{i+1}-x_{i}}*(p_{i+1}-p_{i})\}+p_{i}\ \forall \ x_{i}\leq x x_{i+1} \] \[ F(x)=1 \ \text{if} \ x x_{i+1} \] This sounds an awful lot like a class assignment. You should ask your instructor for help with it. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binomial test using R
Hullo, Can someone suggest whether the binomial test as described in the link http://home.clara.net/sisa/binomial.htm is available in an equivalent form in R? I have downloaded the R package from the CRAN site. Using R will help me do this test rapidly Many Thanks Ramachandran Dr. S. Ramachandran Scientist E I G.N. Ramachandran Knowledge Centre for Genome Informatics Institute of Genomics and Integrative Biology Mall Road, Delhi 110 007 Tel: 091-11-2766-6156 Fax: 091-11-2766-7471 http://www.igib.res.in/Sprofile/cram.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial test using R
Srinivasan Ramachandran [EMAIL PROTECTED] writes: Hullo, Can someone suggest whether the binomial test as described in the link http://home.clara.net/sisa/binomial.htm is available in an equivalent form in R? I have downloaded the R package from the CRAN site. Using R will help me do this test rapidly I think you could be expected to do a bit more of your homework before querying the list! Anyways, have a look at binom.test(). -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Run-time error 521 in SciViews
Dear list, I use SciViews R Console 0.8.9 with R-2.2.1 under Windows XP SP2, and it works very well for most of time. However, sometimes, when commands were executed by clicking F5, the error Run-time error 521, Can't open clipboard pop out. After choosing Yes, the both R console and SciViews exits, and all things in the Command window losts, which is very annoying. Could anyone tell me how to avoid this problem? Thanks. Wuming __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting all subgroups with augPred
All, I have a question RE plotting the prediction lines of a random effects model via augPred. I'll illustrate via the Pixel dataset within the nlme package: library(nlme) attach(Pixel) fm1Pixel = lme(pixel ~ day + I(day^2), data = Pixel, random = list(Dog = ~ 1)) plot(augPred(fm1Pixel)) ### 10 fitted lines since there are 10 dogs fm2Pixel = update(fm1Pixel, . ~ . + Side) plot(augPred(fm2Pixel))## also produces 10 fitted lines For the second plot, shouldn't we have 2 fitted lines per dog, one for each level of the fixed effect Side? 1) How does one plot insure that this is plotted accordingly? 2) How does one plot say only the first 5 dogs? Thanks! Dave __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] working with strptime data
Dear R-forum, I am looking for a good resource/help on working with POSIXct values and controlling the pretty x-axis labels and tick marks for a data VS time plots. Specifically, I wish to do programming logic which creates different vertical ablines calculations based on the range of times which i am working on. The default plot results are adequate, but I would love to make explicit calls on how the x-axis annotates the timestamps. Does anyone have example code or know of a good reference for these kinds of R-programming tasks? Here's a simplified example: -- I have a large data set that consists of N pairs of values and timestamps strings. Like this: TheData - c(1.2, 0.9, etc...[to the Nth datapoint]) TheTime - c(9/1/2006 8:00, 9/1/2006 8:13, etc...[to the Nth timestamp]) I convert the timestamp strings into POSIXct types as: TheTime - as.POSIXct(strptime(TheTime, format=%m/%d/%Y %H:%M)) And create a plot as such: plot(MyTime,MyData) -- And here is a specific question: How do I calculate the number of months than are spanned between two POSIXct values? (i.e. NumOfMonths - MonthFunction(range(MyTimeStampData)) Thanks-in-advance, - rich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time varying covariates
Hello, I am trying to model an intensity function with time-varying covariates. Before, I have successfully defined a log likelihood function for a Power-Law Process (lambda(t)=alpha*beta*t^(beta-1)) with two paramters and no covariates for a repairable systems with failure times (t). This function was maximized with R optim. No problem! But now I want to include a covariate indicating a time-varying value at each failure time t. For constant covariates, the procedure is feasible : leads to following log likelihood funciton: here zi are the covariates which are constant for each unit i under observation. tij are the failure time for failure j of unit i. Do you know how to formulate a log likelihood function for covariates which vary for each tj of each unit i ? Thank you very much Best regards Martin Wagner Berlin University of Technology __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem setting options(error=recover) in my Rprofile.site
I'd like to be able to set options(error=recover) in my Rprofile.site file. When I do this I get the message Error in options(error = recover) : object recover not found I assume this is because the utils package (where recover and dump.frames are defined) has not been loaded at the time I make this call. I suppose I could explicitly do library(utils) before setting the options even though it will be loaded again when the default packages are loaded. Any simpler suggestions of how to get options(error=recover) set automatically every time I start R? More generally, is there a way to have code executed on startup but *after* the default packages are loaded? Thanks. Jeff __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Greedy triangulation
Does the deldir package do what you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bebber Sent: Thursday, September 14, 2006 3:56 AM To: r-help@stat.math.ethz.ch Subject: [R] Greedy triangulation Hello, does anyone have code that will generate a greedy triangulation (triangulation that uses shortest non-overlapping edges) for a set of points in Euclidean space? Thanks, Dan Bebber ___ Dr. Daniel P. Bebber Department of Plant Sciences University of Oxford South Parks Road Oxford OX1 3RB UK Tel. 01865 275060 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Greedy triangulation
Or, perhaps, tripack? url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Sep 14, 2006, at 10:32 AM, Greg Snow wrote: Does the deldir package do what you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bebber Sent: Thursday, September 14, 2006 3:56 AM To: r-help@stat.math.ethz.ch Subject: [R] Greedy triangulation Hello, does anyone have code that will generate a greedy triangulation (triangulation that uses shortest non-overlapping edges) for a set of points in Euclidean space? Thanks, Dan Bebber ___ Dr. Daniel P. Bebber Department of Plant Sciences University of Oxford South Parks Road Oxford OX1 3RB UK Tel. 01865 275060 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Greedy triangulation
Thanks, but no, the Delaunay is different. I have written the following, which interested persons might want to check for accuracy and streamlining: #GREEDY TRIANGULATION # #Pick all lines that are shortest and don't overlap, starting with shortest. # greedy-function(xy){ #input is a matrix with 2 columns (x,y) require(spatstat) #need the crossing.psp function w-owin(range(xy[,1]),range(xy[,2]))#set the window for crossing.psp dists-dist(xy) #calculate Euclidean distances for each line ind-which(lower.tri(dists),arr.ind=T) #matrix with 2 columns (start point, end point) x1-xy[ind[,1],1] #x of start point y1-xy[ind[,1],2] #y of start point x2-xy[ind[,2],1] #x of end point y2-xy[ind[,2],2] #y of end point dat-data.frame(strt=ind[,1],fin=ind[,2], x1=x1,y1=y1,x2=x2,y2=y2,len=as.vector(dists))#put all data for lines together dat-dat[order(dists),] #order data by line length, shortest first inc-dat[1,]#include the shortest line in the triangulation dat-dat[-1,] #keep remaining lines while(nrow(dat)0){ #while lines remain to be tested, do the following... A-psp(inc$x1,inc$y1,inc$x2,inc$y2,w) #create the psp object for the lines already included B-psp(dat$x1[1],dat$y1[1],dat$x2[1],dat$y2[1],w) #create the psp object for the next line to be tested cross-crossing.psp(A,B) #check for crossing of the test line over the lines already included p.match-sum(cross$x%in%c(inc$x1,inc$x2)) #check if the crossing occurs because same points are included if(cross$n-p.match==0){inc[nrow(inc)+1,]-dat[1,]} #if the only crossing are due to shared points, include the line dat-dat[-1,]} #remove the line from the untested lines return(inc)} #when all lines have been tested, return all included lines # #Plot the greedy triangulation # plot.greedy-function(xy,gr,...){ plot(xy,asp=1,xlab=x,ylab=y,...) segments(gr$x1,gr$y1,gr$x2,gr$y2)} # #Test it # xy-matrix(runif(40,0,1),nc=2) gr-greedy(xy) plot.greedy(xy,gr,main=Greedy Triangulation) #END - Original Message - From: Greg Snow [EMAIL PROTECTED] To: Dan Bebber [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Thursday, September 14, 2006 4:32 PM Subject: RE: [R] Greedy triangulation Does the deldir package do what you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time varying covariates
Have a look at the survSplit function in the survival package. It looks to me as though you could use survreg with the weibull option to achieve what you want. Otherwise, you'll have to rewrite the likelihood in terms of both start and end times. On 14/09/06, Martin Wagner [EMAIL PROTECTED] wrote: Hello, I am trying to model an intensity function with time-varying covariates. Before, I have successfully defined a log likelihood function for a Power-Law Process (lambda(t)=alpha*beta*t^(beta-1)) with two paramters and no covariates for a repairable systems with failure times (t). This function was maximized with R optim. No problem! But now I want to include a covariate indicating a time-varying value at each failure time t. For constant covariates, the procedure is feasible : leads to following log likelihood funciton: here zi are the covariates which are constant for each unit i under observation. tij are the failure time for failure j of unit i. Do you know how to formulate a log likelihood function for covariates which vary for each tj of each unit i ? Thank you very much Best regards Martin Wagner Berlin University of Technology __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem setting options(error=recover) in my Rprofile.site
On Thu, 14 Sep 2006, Marcus, Jeffrey wrote: I'd like to be able to set options(error=recover) in my Rprofile.site file. When I do this I get the message Error in options(error = recover) : object recover not found I assume this is because the utils package (where recover and dump.frames are defined) has not been loaded at the time I make this call. I suppose I could explicitly do library(utils) before setting the options even though it will be loaded again when the default packages are loaded. It will not be loaded again. Any simpler suggestions of how to get options(error=recover) set automatically every time I start R? options(error=utils::recover) More generally, is there a way to have code executed on startup but *after* the default packages are loaded? Load the default packages yourself in your startup code, by calling .First.sys() (and resetting it to avoid the slight overhead of subsequent require() calls if you want). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Beta stochastic simulation
Hi, I am finding that I get quite different results when I interchange the following equivalent lines for sampling from a beta distribution in my r script. The rbeta line is correct judging by the summary statistics of the simulated values, while the qbeta line consistently leads to a higher mean simulated value. simulation - rbeta(1, alpha, beta) simulation - qbeta(runif(1), alpha, beta) Are there any implementation reasons for this? Thanks Mark Pinkerton Risk Management Solutions Peninsular House 30 Monument Street London EC3R 8HB UK www.RMS.com http://www.rms.com/ Tel: +44 20 7444 7783 Fax: +44 20 7444 7601 This message and any attachments contain information that may be RMS Inc. confidential and/or privileged. If you are not the intended recipient (or authorized to receive for the intended recipient), and have received this message in error, any use, disclosure or distribution is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to the e-mail and permanently deleting the message from your computer and/or storage system. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] polr
Dear R heplper, I'm using polr: fm - polr(factor(y) ~ x, ,method='logistic', data = dat, Hess=T) ans I'm getting a very strange message: Error in optim(start, fmin, gmin, method = BFGS, hessian = Hess, ...) : initial value in 'vmmin' is not finite Can you help me? Thank you in advance, Grazia M. Grazia Pittau, Ph.D. Post-Doctoral Research Fellow Department of Statistics Columbia University 1255 Amsterdam Avenue New York, NY 10027 [EMAIL PROTECTED] Phone: 212.851.2160 Fax: 212.851.2164 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta stochastic simulation
On 9/14/2006 1:42 PM, Mark Pinkerton wrote: Hi, I am finding that I get quite different results when I interchange the following equivalent lines for sampling from a beta distribution in my r script. The rbeta line is correct judging by the summary statistics of the simulated values, while the qbeta line consistently leads to a higher mean simulated value. simulation - rbeta(1, alpha, beta) simulation - qbeta(runif(1), alpha, beta) Are there any implementation reasons for this? You need to be more specific about the R version, the parameter values you're using, and the size of the differences you're seeing. I just tried x - rbeta(10, 1, 6) y - qbeta(runif(10), 1, 6) summary(x); summary(y) Min. 1st Qu.Median Mean 3rd Qu. Max. 4.076e-07 4.681e-02 1.085e-01 1.423e-01 2.055e-01 8.773e-01 Min. 1st Qu.Median Mean 3rd Qu. Max. 2.803e-06 4.583e-02 1.078e-01 1.419e-01 2.047e-01 8.861e-01 and I think those results look reasonable (and the qbeta method has a smaller mean, this run). Duncan Murdoch Thanks Mark Pinkerton Risk Management Solutions Peninsular House 30 Monument Street London EC3R 8HB UK www.RMS.com http://www.rms.com/ Tel: +44 20 7444 7783 Fax: +44 20 7444 7601 This message and any attachments contain information that may be RMS Inc. confidential and/or privileged. If you are not the intended recipient (or authorized to receive for the intended recipient), and have received this message in error, any use, disclosure or distribution is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to the e-mail and permanently deleting the message from your computer and/or storage system. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help On EBAM
Dear RUsers, I am new to R. I am learning how to use R. I am a PC user and run R on windows. I would appreciate if some one could guide me on a few questions I have: 1) I have 4 cel files (2 replicates for NORM and Disease resp). I have been able to run siggenes on this dataset where I have 4 labels in the class file groupsnhi.cl op- (0,0,1,1) and my data has been read into datrmanhi after performing rma. When I run these commands here I receive these errors: plot(samnhi.out,seq(0.1,0.6,0.1)) identify(samnhi.out,ll=FALSE) warning: no point with 0.25 inches warning: no point with 0.25 inches warning: no point with 0.25 inches Why does this happen. 2) Now I am trying to run EBAM: and when I type I get an error find.out-find.a0(datrmanhi,groupsnhi.cl,rand=123) Loading required package: affy Loading required package: affyio EBAM Analysis for the two class unpaired case. Warning: There are 1 genes which have variance Zero or no non-missing values. The d-value of these genes is set to NA. The following object(s) are masked _by_ .GlobalEnv : n The following object(s) are masked from mat.repeat ( position 5 ) : center log.bino n p success x1 x2 x3 x4 x5 Error in optim(rep(0, 6), neglogLik.repeat, method = BFGS) : non-finite finite-difference value [1] In addition: Warning message: collapsing to unique 'x' values in: approx(Z, Z.norm, z, rule = 2) - I have also tried : find.out-find.a0(exprs2,c(1,1,1,0,0,0),rand=123) EBAM Analysis for the two class unpaired case. Warning: There are 1 genes which have variance Zero or no non-missing values. The d-value of these genes is set to NA. The following object(s) are masked _by_ .GlobalEnv : n The following object(s) are masked from mat.repeat ( position 3 ) : center log.bino n p success x1 x2 x3 x4 x5 The following object(s) are masked from mat.repeat ( position 6 ) : center log.bino n p success x1 x2 x3 x4 x5 Error in optim(rep(0, 6), neglogLik.repeat, method = BFGS) : non-finite finite-difference value [1] In addition: Warning message: collapsing to unique 'x' values in: approx(Z, Z.norm, z, rule = 2) I would greatly appreciate any solutions and help to solve this problem. Thank you, Appreciate your time. Regards, Lolita [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] converting strings to expressions
Hi, consider this: -- estr - c(2^4, alpha[1]) eexp - expression(2^4, alpha[1]) ## Is it possible to get 'eexp' starting from 'estr'? The closest I could ## get was: do.call(expression, lapply(estr, as.name)) ## but it is not quite the same; e.g. the following behave differently: library(lattice) xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = eexp))) xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = do.call(expression, lapply(estr, as.name) --- This happens in both 2.3.1 and pre-2.4.0. Deepayan sessionInfo() Version 2.3.1 Patched (2006-08-27 r39012) x86_64-unknown-linux-gnu attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: lattice 0.13-10 sessionInfo() R version 2.4.0 Under development (unstable) (2006-08-30 r39022) x86_64-unknown-linux-gnu locale: C attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: lattice 0.14-3 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
Deepayan Sarkar [EMAIL PROTECTED] writes: Hi, consider this: -- estr - c(2^4, alpha[1]) eexp - expression(2^4, alpha[1]) ## Is it possible to get 'eexp' starting from 'estr'? The closest I could ## get was: do.call(expression, lapply(estr, as.name)) ## but it is not quite the same; e.g. the following behave differently: Er, how about estr - c(2^4, alpha[1]) parse(text=estr) expression(2^4, alpha[1]) or (brain teaser alert!) parse(text=deparse(parse(text=estr)))[[1]] expression(2^4, alpha[1]) which is _not_ quite the same thing. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
Deepayan Sarkar said the following on 9/14/2006 2:31 PM: Hi, consider this: -- estr - c(2^4, alpha[1]) eexp - expression(2^4, alpha[1]) ## Is it possible to get 'eexp' starting from 'estr'? The closest I could ## get was: do.call(expression, lapply(estr, as.name)) ## but it is not quite the same; e.g. the following behave differently: library(lattice) xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = eexp))) xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = do.call(expression, lapply(estr, as.name) --- This happens in both 2.3.1 and pre-2.4.0. Deepayan sessionInfo() Version 2.3.1 Patched (2006-08-27 r39012) x86_64-unknown-linux-gnu attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: lattice 0.13-10 sessionInfo() R version 2.4.0 Under development (unstable) (2006-08-30 r39022) x86_64-unknown-linux-gnu locale: C attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: lattice 0.14-3 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi, Deepayan, Will this work for you? estr - c(2^4, alpha[1]) eexp - expression(2^4, alpha[1]) library(lattice) xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = eexp))) estr.2 - sprintf(expression(%s), paste(estr, collapse = ,)) eexp.2 - eval(parse(text = estr.2)) xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = eexp.2))) Works in both R-2.3.1 and R-2.4.0dev. Typically, I don't like using eval(parse(text=...)), but I've run into this problem before I could not see another way. Perhaps Gabor will enlighten us with something slicker. HTH, --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
On 14 Sep 2006 21:44:01 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Deepayan Sarkar [EMAIL PROTECTED] writes: Hi, consider this: -- estr - c(2^4, alpha[1]) eexp - expression(2^4, alpha[1]) ## Is it possible to get 'eexp' starting from 'estr'? The closest I could ## get was: do.call(expression, lapply(estr, as.name)) ## but it is not quite the same; e.g. the following behave differently: Er, how about estr - c(2^4, alpha[1]) parse(text=estr) expression(2^4, alpha[1]) or (brain teaser alert!) parse(text=deparse(parse(text=estr)))[[1]] expression(2^4, alpha[1]) which is _not_ quite the same thing. Ah, I'd forgotten about parse. A link from ?expression might be reasonable. Thanks, Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
sapply(estr, FUN=function(x) parse(text=x)) and it does print the greek letter in the xlab. xyplot(1:10 ~ 1:10, scales = list(x = list(at = c(3, 6), labels = sapply(estr, FUN=function(x) parse(text=x) Rich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
On Thu, 14 Sep 2006, Deepayan Sarkar wrote: [...] Ah, I'd forgotten about parse. A link from ?expression might be reasonable. But it says Details: 'Expression' here is not being used in its colloquial sense, that of mathematical expressions. Those are calls (see 'call') in R, and an R expression vector is a list of calls, typically as returned by 'parse'. so it already has a link. 'Expression' was often misused in the R documentation, and a couple of months ago I tried to move to a more careful usage. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
Prof Brian Ripley [EMAIL PROTECTED] writes: On Thu, 14 Sep 2006, Deepayan Sarkar wrote: [...] Ah, I'd forgotten about parse. A link from ?expression might be reasonable. But it says Details: 'Expression' here is not being used in its colloquial sense, that of mathematical expressions. Those are calls (see 'call') in R, and an R expression vector is a list of calls, typically as returned by 'parse'. so it already has a link. 'Expression' was often misused in the R documentation, and a couple of months ago I tried to move to a more careful usage. Actually, you need to be even more careful because has commonly been called unevaluated expressions cover symbols and constants too. I.e., quote(1) and quote(x) are not calls. And such non-call objects can be elements of an expression vector. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group bunch of lines in a data.frame, an additional requirement
Thanks Gabor, that is much faster than using a loop! I've got a last question: Can you think of a fast way of keeping track of the number of observations collapsed for each entry? i.e. I'd like to end up with: A 2.0 400 ID1 3 (3obs in the first matrix) B 0.7 35 ID2 2 (2obs in the first matrix) C 5.0 70 ID1 1 (1obs in the first matrix) Or is it required to use an temporary matrix that is merged later? (As examplified by Mark in a previous email?) Thanks a lot for your help, Emmanuel On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: See below. On 9/13/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Thanks for pointing me out aggregate, that works fine! There is one complication though: I have mixed types (numerical and character), So the matrix is of the form: A 1.0 200 ID1 A 3.0 800 ID1 A 2.0 200 ID1 B 0.5 20 ID2 B 0.9 50 ID2 C 5.0 70 ID1 One letter always has the same ID but one ID can be shared by many letters (like ID1) I just want to keep track of the ID, and get a matrix like: A 2.0 400 ID1 B 0.7 35 ID2 C 5.0 70 ID1 Any idea on how to do that without a loop? If V4 is a function of V1 then you can aggregate by it too and it will appear but have no effect on the classification: aggregate(DF[2:3], DF[c(1,4)], mean) V1 V4 V2 V3 1 A ID1 2.0 400 2 C ID1 5.0 70 3 B ID2 0.7 35 Many thanks, Emmanuel On 9/12/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Hello, I'd like to group the lines of a matrix so that: A 1.0 200 A 3.0 800 A 2.0 200 B 0.5 20 B 0.9 50 C 5.0 70 Would give: A 2.0 400 B 0.7 35 C 5.0 70 So all lines corresponding to a letter (level), become a single line where all the values of each column are averaged. I've done that with a loop but it doesn't sound right (it is very slow). I imagine there is a sort of apply shortcut but I can't figure it out. Please note that it is not exactly a matrix I'm using, the function typeof tells me it's a list, however I access to it like it was a matrix. Could someone help me with the right function to use, a help topic or a piece of code? Thanks, Emmanuel __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
Peter Dalgaard [EMAIL PROTECTED] writes: Actually, you need to be even more careful because has commonly been Drats! ...because _what_ has..., of course. Talk about being careful. called unevaluated expressions cover symbols and constants too. I.e., quote(1) and quote(x) are not calls. And such non-call objects can be elements of an expression vector. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting strings to expressions
On 9/14/06, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 14 Sep 2006, Deepayan Sarkar wrote: [...] Ah, I'd forgotten about parse. A link from ?expression might be reasonable. But it says Details: 'Expression' here is not being used in its colloquial sense, that of mathematical expressions. Those are calls (see 'call') in R, and an R expression vector is a list of calls, typically as returned by 'parse'. so it already has a link. Right, I missed that (although I remember reading the first part of that paragraph). Deepayan 'Expression' was often misused in the R documentation, and a couple of months ago I tried to move to a more careful usage. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hist for two sets
A quick question, please. x = c(0.0001, 0.0059, 0.0855, 0.9082) y = c(0.54, 0.813, 0.379, 0.35) where A = 1st set, B = 2nd set, C = 3rd set, D = 4th set respectivley. How do you make hist plot side by side for x y? i.e. 0.0001 and then to the right 0.54, 0.0059 and then to the right 0.813, etc. thx much [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta stochastic simulation
On 9/14/2006 5:26 PM, Mark Pinkerton wrote: Hi Duncan, I had also validated the logic with a simple test which is why I was surprised by the differences I was seeing from tthe more complex simulation. I am running R on a Windows 2000 - I'll have to check which version at my desk tomorrow but it's pretty up to date, maybe 6 monthes old. Attached is a code snippet from my simulation program which is used to estimate multi-event annual losses for US hurricanes. The event set being sampled from is quite large (~14000) with each event and account combination having a unique beta loss distribution. Simply swapping lines 23 and 24 has the effect on results that I mentioned in the previous email. The simulation is large enough that the MC error in the estimated means are negligible. The code you sent isn't usable, because it's missing your data. Could you please do the following? - verify that the behaviour still happens in the current alpha test version - try to simplify the example code so someone else can run it? It could be that certain values of alpha and beta trigger a bug but the ones I tried were fine. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] EBAM ERROR
Dear RUsers, I am new to R. I am learning how to use R. I am a PC user and run R on windows. I would appreciate if some one could guide me on a few questions I have: 1) I have 4 cel files (2 replicates for NORM and Disease resp). I have been able to run siggenes on this dataset where I have 4 labels in the class file groupsnhi.cl op- (0,0,1,1) and my data has been read into datrmanhi after performing rma. When I run these commands here I receive these errors: plot(samnhi.out,seq(0.1,0.6,0.1)) identify(samnhi.out,ll=FALSE) warning: no point with 0.25 inches warning: no point with 0.25 inches warning: no point with 0.25 inches Why does this happen. 2) Now I am trying to run EBAM: and when I type I get an error find.out-find.a0(datrmanhi,groupsnhi.cl,rand=123) Loading required package: affy Loading required package: affyio EBAM Analysis for the two class unpaired case. Warning: There are 1 genes which have variance Zero or no non-missing values. The d-value of these genes is set to NA. The following object(s) are masked _by_ .GlobalEnv : n The following object(s) are masked from mat.repeat ( position 5 ) : center log.bino n p success x1 x2 x3 x4 x5 Error in optim(rep(0, 6), neglogLik.repeat, method = BFGS) : non-finite finite-difference value [1] In addition: Warning message: collapsing to unique 'x' values in: approx(Z, Z.norm, z, rule = 2) - I have also tried : find.out-find.a0(exprs2,c(1,1,1,0,0,0),rand=123) EBAM Analysis for the two class unpaired case. Warning: There are 1 genes which have variance Zero or no non-missing values. The d-value of these genes is set to NA. The following object(s) are masked _by_ .GlobalEnv : n The following object(s) are masked from mat.repeat ( position 3 ) : center log.bino n p success x1 x2 x3 x4 x5 The following object(s) are masked from mat.repeat ( position 6 ) : center log.bino n p success x1 x2 x3 x4 x5 Error in optim(rep(0, 6), neglogLik.repeat, method = BFGS) : non-finite finite-difference value [1] In addition: Warning message: collapsing to unique 'x' values in: approx(Z, Z.norm, z, rule = 2) I would greatly appreciate any solutions and help to solve this problem. Thank you, Appreciate your time. Regards, Lolita [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hist for two sets
On Thu, 2006-09-14 at 19:37 -0400, Ethan Johnsons wrote: A quick question, please. x = c(0.0001, 0.0059, 0.0855, 0.9082) y = c(0.54, 0.813, 0.379, 0.35) where A = 1st set, B = 2nd set, C = 3rd set, D = 4th set respectivley. How do you make hist plot side by side for x y? i.e. 0.0001 and then to the right 0.54, 0.0059 and then to the right 0.813, etc. thx much You don't want a histogram, but a barplot: x - c(0.0001, 0.0059, 0.0855, 0.9082) y - c(0.54, 0.813, 0.379, 0.35) # create a two row matrix with x and y height - rbind(x, y) # Use height and set 'beside = TRUE' to get pairs # save the bar midpoints in 'mp' # Set the bar pair labels to A:D mp - barplot(height, beside = TRUE, ylim = c(0, 1), names.arg = LETTERS[1:4]) # Draw the bar values above the bars text(mp, height, labels = format(height, 4), pos = 3, cex = .75) See ?barplot, ?text and ?format (or ?formatC or ?sprintf). HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what does Height represent?
hi-- I am new to R and try to use R cluster my binary data. I use hierarchical clustering plot (hclust (dist(x,method=binary),method=average),cex=0.1) I end up with a cluster Dendrogram. On the left of my dendrogram, there is scale called Height from 0.0 to 1.0. I don't understand what does Height represent. If the Height represents the distance scale between two different data point, it looks like if I add up the length of each branch, I end up with distance of some pairs 1. It is not possible the distance between any data point will greater than 1. Could some help me out? Thanks Zhaoshi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group bunch of lines in a data.frame, an additional requirement
Emmanuel, I wouldn't be surprised if Gabor comes up with something, but since aggregate() can only return scalars, you can't do it in one step here. There are possibilities using other functions such as split(), tapply() or by(), but each has it own respective limitations requiring more than one step or post consolidation reformatting. You could certainly write a unified wrapper function that would do this in a single call, but unless you plan on doing this sort of operation a lot, it is probably easier with multiple calls. I suspect using Gabor's approach as he had below, combined with my own on using aggregate() (now twice) then using merge() may be the easiest. HTH, Marc On Thu, 2006-09-14 at 21:35 +0100, Emmanuel Levy wrote: Thanks Gabor, that is much faster than using a loop! I've got a last question: Can you think of a fast way of keeping track of the number of observations collapsed for each entry? i.e. I'd like to end up with: A 2.0 400 ID1 3 (3obs in the first matrix) B 0.7 35 ID2 2 (2obs in the first matrix) C 5.0 70 ID1 1 (1obs in the first matrix) Or is it required to use an temporary matrix that is merged later? (As examplified by Mark in a previous email?) Thanks a lot for your help, Emmanuel On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: See below. On 9/13/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Thanks for pointing me out aggregate, that works fine! There is one complication though: I have mixed types (numerical and character), So the matrix is of the form: A 1.0 200 ID1 A 3.0 800 ID1 A 2.0 200 ID1 B 0.5 20 ID2 B 0.9 50 ID2 C 5.0 70 ID1 One letter always has the same ID but one ID can be shared by many letters (like ID1) I just want to keep track of the ID, and get a matrix like: A 2.0 400 ID1 B 0.7 35 ID2 C 5.0 70 ID1 Any idea on how to do that without a loop? If V4 is a function of V1 then you can aggregate by it too and it will appear but have no effect on the classification: aggregate(DF[2:3], DF[c(1,4)], mean) V1 V4 V2 V3 1 A ID1 2.0 400 2 C ID1 5.0 70 3 B ID2 0.7 35 Many thanks, Emmanuel On 9/12/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Hello, I'd like to group the lines of a matrix so that: A 1.0 200 A 3.0 800 A 2.0 200 B 0.5 20 B 0.9 50 C 5.0 70 Would give: A 2.0 400 B 0.7 35 C 5.0 70 So all lines corresponding to a letter (level), become a single line where all the values of each column are averaged. I've done that with a loop but it doesn't sound right (it is very slow). I imagine there is a sort of apply shortcut but I can't figure it out. Please note that it is not exactly a matrix I'm using, the function typeof tells me it's a list, however I access to it like it was a matrix. Could someone help me with the right function to use, a help topic or a piece of code? Thanks, Emmanuel __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] working with strptime data
Read the help desk article in R News 4/1. On 9/14/06, Richard Evans [EMAIL PROTECTED] wrote: Dear R-forum, I am looking for a good resource/help on working with POSIXct values and controlling the pretty x-axis labels and tick marks for a data VS time plots. Specifically, I wish to do programming logic which creates different vertical ablines calculations based on the range of times which i am working on. The default plot results are adequate, but I would love to make explicit calls on how the x-axis annotates the timestamps. Does anyone have example code or know of a good reference for these kinds of R-programming tasks? Here's a simplified example: -- I have a large data set that consists of N pairs of values and timestamps strings. Like this: TheData - c(1.2, 0.9, etc...[to the Nth datapoint]) TheTime - c(9/1/2006 8:00, 9/1/2006 8:13, etc...[to the Nth timestamp]) I convert the timestamp strings into POSIXct types as: TheTime - as.POSIXct(strptime(TheTime, format=%m/%d/%Y %H:%M)) And create a plot as such: plot(MyTime,MyData) -- And here is a specific question: How do I calculate the number of months than are spanned between two POSIXct values? (i.e. NumOfMonths - MonthFunction(range(MyTimeStampData)) Thanks-in-advance, - rich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Table manipulation question
I have a table: C1 RowName13 RowName22 and another table: C2 RowName15.6 RowName1a 4.3 RowName2NA I want to join join the tables with matching rows: C1 C2 RowName1 35.6 RowName22 NA I'm thinking of something like: T1$C2=T2$C2[index-expression-to-pullout the matching ones] Any ideas would be appreciated. Cheers, Geoff Russell [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group bunch of lines in a data.frame, an additional requirement
Here are three different ways to do it: # base R fb - function(x) c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2), V3.mean = mean(x$V3), n = length(x$V1)) do.call(rbind, by(DF, DF[c(1,4)], fb)) # package doBy library(doBy) summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5] # package reshape library(reshape) f - function(x) c(mean = mean(x), n = length(x)) cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6] # base R fb - function(x) +c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2), + V3.mean = mean(x$V3), n = length(x$V1)) do.call(rbind, by(DF, DF[c(1,4)], fb)) V1 V4 V2.mean V3.mean n [1,] 1 1 2.0 400 3 [2,] 3 1 5.0 70 1 [3,] 2 2 0.7 35 2 # package doBy library(doBy) summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5] V1 V4 mean.V2 mean.V3 length.V3 1 A ID1 2.0 400 3 2 C ID1 5.0 70 1 3 B ID2 0.7 35 2 # package reshape library(reshape) f - function(x) c(mean = mean(x), n = length(x)) cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6] V1 V4 V2_mean V2_n V3_mean 1 A ID1 2.03 400 2 B ID2 0.72 35 3 C ID1 5.01 70 --- library(doBy) summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5] V1 V4 mean.V2 mean.V3 length.V3 1 A ID1 2.0 400 3 2 C ID1 5.0 70 1 3 B ID2 0.7 35 2 library(reshape) f - function(x) c(mean = mean(x), n = length(x)) cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6] V1 V4 V2_mean V2_n V3_mean 1 A ID1 2.03 400 2 B ID2 0.72 35 3 C ID1 5.01 70 On 9/14/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Thanks Gabor, that is much faster than using a loop! I've got a last question: Can you think of a fast way of keeping track of the number of observations collapsed for each entry? i.e. I'd like to end up with: A 2.0 400 ID1 3 (3obs in the first matrix) B 0.7 35 ID2 2 (2obs in the first matrix) C 5.0 70 ID1 1 (1obs in the first matrix) Or is it required to use an temporary matrix that is merged later? (As examplified by Mark in a previous email?) Thanks a lot for your help, Emmanuel On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: See below. On 9/13/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Thanks for pointing me out aggregate, that works fine! There is one complication though: I have mixed types (numerical and character), So the matrix is of the form: A 1.0 200 ID1 A 3.0 800 ID1 A 2.0 200 ID1 B 0.5 20 ID2 B 0.9 50 ID2 C 5.0 70 ID1 One letter always has the same ID but one ID can be shared by many letters (like ID1) I just want to keep track of the ID, and get a matrix like: A 2.0 400 ID1 B 0.7 35 ID2 C 5.0 70 ID1 Any idea on how to do that without a loop? If V4 is a function of V1 then you can aggregate by it too and it will appear but have no effect on the classification: aggregate(DF[2:3], DF[c(1,4)], mean) V1 V4 V2 V3 1 A ID1 2.0 400 2 C ID1 5.0 70 3 B ID2 0.7 35 Many thanks, Emmanuel On 9/12/06, Emmanuel Levy [EMAIL PROTECTED] wrote: Hello, I'd like to group the lines of a matrix so that: A 1.0 200 A 3.0 800 A 2.0 200 B 0.5 20 B 0.9 50 C 5.0 70 Would give: A 2.0 400 B 0.7 35 C 5.0 70 So all lines corresponding to a letter (level), become a single line where all the values of each column are averaged. I've done that with a loop but it doesn't sound right (it is very slow). I imagine there is a sort of apply shortcut but I can't figure it out. Please note that it is not exactly a matrix I'm using, the function typeof tells me it's a list, however I access to it like it was a matrix. Could someone help me with the right function to use, a help topic or a piece of code? Thanks, Emmanuel __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Table manipulation question
Hi, Russell, Here is a piece of code and you might need to tweak it a little. MERGE 2 DATA FRAMES### # MERGE 2 DATA FRAMES:# # INNER JOIN, LEFT JOIN, RIGHT JOIN, # # FULL JOIN, CARTESIAN PRODUCT # ### data1-data.frame(id1 = 1:10, x = rnorm(length(1:10))); data2-data.frame(id2 = seq(1, 20, by = 2), y = rnorm(length(seq(1, 20, by = 2; # INNER JOIN inner.join-merge(data1, data2, by.x = id1, by.y = id2); # LEFT JOIN left.join-merge(data1, data2, by.x = id1, by.y = id2, all.x = TRUE); # RIGHT JOIN right.join-merge(data1, data2, by.x = id1, by.y = id2, all.y = TRUE); # FULL JOIN full.join-merge(data1, data2, by.x = id1, by.y = id2, all = TRUE); # CARTESIAN PRODUCT cartesian-merge(data1, data2); On 9/14/06, Geoff Russell [EMAIL PROTECTED] wrote: I have a table: C1 RowName13 RowName22 and another table: C2 RowName15.6 RowName1a 4.3 RowName2NA I want to join join the tables with matching rows: C1 C2 RowName1 35.6 RowName22 NA I'm thinking of something like: T1$C2=T2$C2[index-expression-to-pullout the matching ones] Any ideas would be appreciated. Cheers, Geoff Russell [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu (http://spaces.msn.com/statcompute/blog) Senior Decision Support Analyst Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hist for two sets
thx so much, Marc. ej On 9/14/06, Marc Schwartz [EMAIL PROTECTED] wrote: On Thu, 2006-09-14 at 19:37 -0400, Ethan Johnsons wrote: A quick question, please. x = c(0.0001, 0.0059, 0.0855, 0.9082) y = c(0.54, 0.813, 0.379, 0.35) where A = 1st set, B = 2nd set, C = 3rd set, D = 4th set respectivley. How do you make hist plot side by side for x y? i.e. 0.0001 and then to the right 0.54, 0.0059 and then to the right 0.813, etc. thx much You don't want a histogram, but a barplot: x - c(0.0001, 0.0059, 0.0855, 0.9082) y - c(0.54, 0.813, 0.379, 0.35) # create a two row matrix with x and y height - rbind(x, y) # Use height and set 'beside = TRUE' to get pairs # save the bar midpoints in 'mp' # Set the bar pair labels to A:D mp - barplot(height, beside = TRUE, ylim = c(0, 1), names.arg = LETTERS[1:4]) # Draw the bar values above the bars text(mp, height, labels = format(height, 4), pos = 3, cex = .75) See ?barplot, ?text and ?format (or ?formatC or ?sprintf). HTH, Marc Schwartz [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot region too large
Hi! I don't understand this: layout(matrix(c(1:10),5,2),heights=c(1,rep(2,4))) plot(1,1) error in plot.new() : plot region too large Why?? Thanks! Kamila __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot region too large
The figure margins come from what is set in par(mar), eg layout(matrix(c(1:10),5,2),heights=c(1,rep(2,4))) par(mar) [1] 5.1 4.1 4.1 2.1 There is not enough space left to plot anything with those margins. You will need to make them smaller first, e.g. par(mar=c(1,1,1,1,)) plot(1,1) In which case things work. Cheers, Andreas _ Dr Andreas Kiermeier Senior Statistician SARDI FOOD SAFETY PROGRAM 33 Flemington Street Glenside SA 5065 Ph: +61 8 8207 7884 Fax:+61 8 8207 7854 Mob:0423 028 565 Email: [EMAIL PROTECTED] Web: http://www.sardi.sa.gov.au/ _ If you would like to correspond with me in a secure way, using public key encryption, please contact me directly for details of my GPG public key or visit the SARDI website, where you can download my public key from my personal staff page. The information in this e-mail and attachments (if any) may be confidential and/or legally privileged. If you are not the intended recipient, any disclosure, copying, distribution or action taken is prohibited. SARDI, The South Australian Research and Development Institute, is the research division of Primary Industries and Resources (SA) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kamila Naxerova Sent: Friday, 15 September 2006 13:26 To: r-help@stat.math.ethz.ch Subject: [R] plot region too large Hi! I don't understand this: layout(matrix(c(1:10),5,2),heights=c(1,rep(2,4))) plot(1,1) error in plot.new() : plot region too large Why?? Thanks! Kamila __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.