[R] Sweave trim output
Dear All, I'd like to trim the output produced in a Sweave code chunk. For instance, in fit - lm(conc ~ . - Plant, data = CO2) summary(fit) I'd like, skip the info after the coefficients' table, and possibly replace it with '...'. I've created this small function to do this, which is based on capture.output(): trim.output - function (x, lines, above = FALSE) { if (above) cat(\n...\n\n) cat(paste(x[lines], collapse = \n)) cat(\n\n...\n) } out - capture.output(summary(fit)) trim.output(out, 1:13) but I was wondering if there is an *official* way to do this. Thanks in advance. Best, Dimitris -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable names AS variable names?
On Feb 25, 2011, at 1:55 AM, Noah Silverman wrote: How can I dynamically use a variable as the name for another variable? I realize this sounds cryptic, so an example is best: #Start with an array of codes codes - c(a1, b24, q99) Is there some reason not to use list(a1, b24, q99)? If not then: lapply(codes, somefun) #Each code has a corresponding matrix (could be vector) a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) #Now, I want to loop through all the codes and do something with each matrix for(code in codes){ #here is where I'm stuck. I don't want the value of code, but the variable who's name is the value of code } Any suggestions? -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating objects (data.frames) with names stored in character vector
On Feb 24, 2011, at 3:38 PM, Kent Alleman wrote: Hello, I'm fairly new to R. I'm a chemist, not a programmer so please bear with me. I have a large data.frame that I want to break down (subset) into smaller data.frames for analysis. I would like to give the data.frames descriptive names which I have stored in a character vector. My original thought was that I want the subsets to show up as individual objects, but haveing them stored in a list is fine (maybe better). I can create a list of subsetted data.frames like this: Lst = list(subset1 = (subset (blablabla)), subset2 = (subset(blabla))) but I have to provide the component names (subset1, subset2) manually. lstnames - paste(subset, 1:2, sep=_) names(Lst) - lstnames I would like to pull the component names from an existing character vector, but so far my attempts have failed. Any advice is appreciated, even if the advice is don't do that. Thank you, Kent __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Selecting data based on the range of dates
Hi: I want to give an index with all the dates between Sept. to Nov. as 1, and anything else is 0. It doesn't matter which year it is, as long as it is between Sept. to Nov, then set up to 1, otherwise is 0. My data frame looks like below: ID Date 201 1/1/05 6:07 AM 201 3/27/09 9:45 AM 201 9/29/09 8:44 AM 203 10/16/08 10:01 AM 203 10/28/08 9:45 AM 203 10/31/08 11:12 AM 203 11/7/08 11:32 AM 203 11/14/08 10:30 AM 203 11/19/08 10:40 AM 203 11/25/08 3:25 PM 203 12/4/08 10:48 AM 203 1/28/09 11:04 AM 203 2/12/09 3:15 PM 203 2/16/09 2:59 PM 203 2/24/09 2:45 PM 203 3/4/09 10:14 AM 203 3/27/09 11:36 AM 203 4/1/09 10:43 AM 203 4/16/09 2:28 PM 203 4/22/09 2:37 PM 203 4/29/09 10:48 AM 203 4/1/09 10:45 AM 203 12/3/09 9:07 AM 203 12/11/09 8:58 AM 203 1/7/10 8:53 AM Thanks -- View this message in context: http://r.789695.n4.nabble.com/Selecting-data-based-on-the-range-of-dates-tp3323452p3323452.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate integration
Hello, I came across a package called cubature ( http://cran.r-project.org/web/packages/cubature/index.html) to perform multivariate integration. I was not able to understand few stuff: What is the need for package flags under src/Makevars? What is the purpose of fWrapper in the rcubature.c file? Thank you Sonal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting data based on the range of dates
I think I got it, I post it here see if you have better way, please let me know. index - rep(0, length(mydata[,1])) index[as.Date(mydata3$Date) as.Date(2006-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2006-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2007-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2007-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2008-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2008-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2009-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2009-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2010-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2010-09-01 00:00:00 AM)] - 1 Thanks -- View this message in context: http://r.789695.n4.nabble.com/Selecting-data-based-on-the-range-of-dates-tp3323452p3323536.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] I have a Quick question about biometics
Hello, I was searching online to find more info about Biometics and I came across your information. Can you tell me, are you still involved with Biometics? If you are, how are things going for you? Please let me know. Sincerely, Will Hammack __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compatibility with R for Windows 2.12.2
Hi, Please someone let me know that the installation of both R for Windows 2.12.2 and MS office 2010 on the same system will interfere each other or not. In short, are these two tools compatible to each other? Thanks in advance. Best Regards, Vedajit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable names AS variable names?
One of my hypotheses of what you want is: for(code in codes) { get(code) } The other one is: for(code in codes) { as.name(code) } On 25/02/2011 06:55, Noah Silverman wrote: How can I dynamically use a variable as the name for another variable? I realize this sounds cryptic, so an example is best: #Start with an array of codes codes- c(a1, b24, q99) #Each code has a corresponding matrix (could be vector) a1- matrix(rnorm(100), nrow=10) b24- matrix(rnorm(100), nrow=10) q99- matrix(rnorm(100), nrow=10) #Now, I want to loop through all the codes and do something with each matrix for(code in codes){ #here is where I'm stuck. I don't want the value of code, but the variable who's name is the value of code } Any suggestions? -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compatibility with R for Windows 2.12.2
On Fri, 25 Feb 2011, Vedajit Boyd wrote: Hi, Please someone let me know that the installation of both R for Windows 2.12.2 and MS office 2010 on the same system will interfere each other or not. In short, are these two tools compatible to each other? There is nothing special about R, but you will have to ask Microsoft if their products cause problems for other (well written, standard-conformant) software. Their software works at system level: R does not install anything at system level, not even registry entries (unless selected in an Administrator install). R is widely used on systems with MS office 2007 installed, but that's no guarantee that some rarely used Office option on some version of Windows does not interfere with R. NB: 'R for Windows 2.12.2' is future-ware. Thanks in advance. Best Regards, Vedajit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLM, how to get an R2 to explain how much of data explained by one variable
Hi Celine, GLM outputs usually give the null deviance and residual deviance in the summary() term - so you can work out % deviance explained for a variable/model from this. Hope this helps. Best wishes, Clare Dr Clare B Embling Visiting Research Fellow Marine Institute University of Plymouth Plymouth, UK. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compatibility with R for Windows 2.12.2
I can't think of a reason why they would... Rob Tirrell On Thu, Feb 24, 2011 at 23:56, Vedajit Boyd vedajit.b...@gmail.com wrote: Hi, Please someone let me know that the installation of both R for Windows 2.12.2 and MS office 2010 on the same system will interfere each other or not. In short, are these two tools compatible to each other? Thanks in advance. Best Regards, Vedajit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Substituting inside expression
I am having following problem: I´m constructing model for calculation of area of triangle. I know sides a, b, and gamma angle. I wish to calculate the area using heron´s formula: S - sqrt(s*(s-a)*(s-b)*(s-c)) where s - (a+b+c)/2 and c is calculated using law of cosines: c - sqrt(a^2 + b^2 -2*a*b*cos(gamma)) since i am calculating a regression model, i need derivation of this expression for area S. something like (D(expression.S,c(a,b))) To write it all into a single expression, it is too complicated, so i would like to use some kind of substitution. however, if i try: s.e - substitute(expression((a+b+c)/2), list(c = expression(sqrt(a^2+b^2-2*a*b*cos(gamma), I get s.e expression((a + b + expression(sqrt(a^2 + b^2 - 2 * a * b * cos(gamma/2) which is not what I wanted Can someone point me to the right direction? -- View this message in context: http://r.789695.n4.nabble.com/Substituting-inside-expression-tp3324092p3324092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting data based on the range of dates
Try my.date - strptime(20/2/06 11:16:16.683, %d/%m/%y %H:%M:%OS) Then you can examine my.date$mon. -- Robert Tirrell | r...@stanford.edu | (607) 437-6532 Program in Biomedical Informatics | Butte Lab | Stanford University On Thu, Feb 24, 2011 at 14:12, Belle ping...@gmail.com wrote: I think I got it, I post it here see if you have better way, please let me know. index - rep(0, length(mydata[,1])) index[as.Date(mydata3$Date) as.Date(2006-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2006-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2007-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2007-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2008-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2008-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2009-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2009-09-01 00:00:00 AM)] - 1 index[as.Date(mydata3$Date) as.Date(2010-11-30 23:29:29 PM) as.Date(mydata3$Date) as.Date(2010-09-01 00:00:00 AM)] - 1 Thanks -- View this message in context: http://r.789695.n4.nabble.com/Selecting-data-based-on-the-range-of-dates-tp3323452p3323536.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] speed up process
Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar - 2 seq.yvar - 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos - c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k - seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){ tsts - tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter - signif(tsts$coef[1], digits=3) tsts_slope - signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Substituting inside expression
Hi, If I follow you correctly, you could write a function: foo - function(a,b,gamma){ c - sqrt(a^2 + b^2 -2*a*b*cos(gamma)) s - (a+b+c)/2 A - sqrt(s*(s-a)*(s-b)*(s-c)) return(A) } I hope I didn't make mistakes, but it can still help you, I guess. Ivan Le 2/25/2011 10:11, zbynek.jano...@gmail.com a écrit : I am having following problem: I´m constructing model for calculation of area of triangle. I know sides a, b, and gamma angle. I wish to calculate the area using heron´s formula: S- sqrt(s*(s-a)*(s-b)*(s-c)) where s- (a+b+c)/2 and c is calculated using law of cosines: c- sqrt(a^2 + b^2 -2*a*b*cos(gamma)) since i am calculating a regression model, i need derivation of this expression for area S. something like (D(expression.S,c(a,b))) To write it all into a single expression, it is too complicated, so i would like to use some kind of substitution. however, if i try: s.e- substitute(expression((a+b+c)/2), list(c = expression(sqrt(a^2+b^2-2*a*b*cos(gamma), I get s.e expression((a + b + expression(sqrt(a^2 + b^2 - 2 * a * b * cos(gamma/2) which is not what I wanted Can someone point me to the right direction? -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar - 2 seq.yvar - 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos - c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k - seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){ tsts - tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter - signif(tsts$coef[1], digits=3) tsts_slope - signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Substituting inside expression
On Fri, Feb 25, 2011 at 4:11 AM, zbynek.jano...@gmail.com zbynek.jano...@centrum.cz wrote: I am having following problem: I´m constructing model for calculation of area of triangle. I know sides a, b, and gamma angle. I wish to calculate the area using heron´s formula: S - sqrt(s*(s-a)*(s-b)*(s-c)) where s - (a+b+c)/2 and c is calculated using law of cosines: c - sqrt(a^2 + b^2 -2*a*b*cos(gamma)) since i am calculating a regression model, i need derivation of this expression for area S. something like (D(expression.S,c(a,b))) To write it all into a single expression, it is too complicated, so i would like to use some kind of substitution. however, if i try: s.e - substitute(expression((a+b+c)/2), list(c = expression(sqrt(a^2+b^2-2*a*b*cos(gamma), I get s.e expression((a + b + expression(sqrt(a^2 + b^2 - 2 * a * b * cos(gamma/2) which is not what I wanted Can someone point me to the right direction? Try this: e - substitute((a+b+c)/2, list(c = quote(sqrt(a^2+b^2-2*a*b*cos(gamma) D(e, a) (1 + 0.5 * ((2 * a - 2 * b * cos(gamma)) * (a^2 + b^2 - 2 * a * b * cos(gamma))^-0.5))/2 Also library(Ryacas) # http://ryacas.googlecode.com a - Sym(a); b - Sym(b); gamma - Sym(gamma) c - sqrt(a^2+b^2-2*a*b*cos(gamma)) deriv((a+b+c)/2, a) expression(2 * ((2 * a - 2 * b * cos(gamma))/(2 * root(a^2 + b^2 - 2 * a * b * cos(gamma), 2)) + 1)/4) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scatter graph of more than one value
Hi, I have two X1,X2 and Y1,Y2 and I want to draw them ((X1,Y1), (X2,Y2)) in a scatter graph. How can I draw both of them in a same graph with different legends? And is there any way to show different labels on each point? Regards, Amir __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] color code in loop for piecharts plotting
Hi, I am using this loop par(mfrow=c(3,3)) annos-c(2001:2007,2009) for (i in annos) { t-subset(masia,YEAR==i) t$FAMILIA-drop.levels(t$FAMILIA) pie(table(t$FAMILIA),main=i) } To make piecharts of species composition among years (my data frame is called masia). So I get 1 piechart of the families that we have found in our survey each year. We don't have always the same families every year so I added t$FAMILIA-drop.levels(t$FAMILIA) to the loop to avoid having those family levels that aren't there in some specific years in the pie The problem is that the color code changes and I get for example different colors for the same families in different years. If I group those families that I have less individuals together in a category called others and I make a new column called familia2 with fewer levels so that every year I have all levels of familia2 in my species composition I don't get the problem and all families have the same color among years. Does anybody know how to avoid the color code change for the families in the loop. I know I can do it manually and give each family a color but I have quite a lot of families so I'm wondering if there's any way to fix that some other way. I don't know if I made myself clear... Thanks! Lucia -- View this message in context: http://r.789695.n4.nabble.com/color-code-in-loop-for-piecharts-plotting-tp3324196p3324196.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scatter graph of more than one value
Hi, Take a look at ?points, ?legend and ?par (specifically col and pch) HTH, Ivan Le 2/25/2011 11:58, amir a écrit : Hi, I have two X1,X2 and Y1,Y2 and I want to draw them ((X1,Y1), (X2,Y2)) in a scatter graph. How can I draw both of them in a same graph with different legends? And is there any way to show different labels on each point? Regards, Amir __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time series with NA - acf - tsdiag - Ljung-Box
Hi all, I am modelling a time series with missing data. *Q1)* However, I am not sure if I should use the next *graphics* to understand my data: *a)* ACF PACF (original series) *b)* ACF PACF (residuals) * * *Q2)* I am using *tsdiag*, so I obtain a graphic with 3 plots: stand. residuals vs time; acf for residuals; Ljung-Box for residuals (it is wrong for residuals). I know that using Box.test with type Ljung-Box, I can specify a correct df to my estimated model (fitdf = p + q). So, I could do this test with different lags, evaluate their significance, and then plot it. However, in Box.test NA are not handled. But, it is possible to do a Ljung-Box test with missing data [Stoffer Toloi, 1992. A note on the Ljung-Box-Pierce pormanteau statistic with missing data]. *a)* Do you know any function to do a Ljung-Box test with NA? *Q3) *In general, what (other?) tools do you recommend to use for time series with missing data? I had been using auto.arima and arima functions. I don't want to do an interpolation. Thanks in advance, Cecilia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){ tsts- tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter- signif(tsts$coef[1], digits=3) tsts_slope- signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){ tsts- tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter- signif(tsts$coef[1], digits=3) tsts_slope- signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list
Re: [R] color code in loop for piecharts plotting
On 02/25/2011 09:33 PM, Lucia Rueda wrote: Hi, I am using this loop par(mfrow=c(3,3)) annos-c(2001:2007,2009) for (i in annos) { t-subset(masia,YEAR==i) t$FAMILIA-drop.levels(t$FAMILIA) pie(table(t$FAMILIA),main=i) } To make piecharts of species composition among years (my data frame is called masia). So I get 1 piechart of the families that we have found in our survey each year. We don't have always the same families every year so I added t$FAMILIA-drop.levels(t$FAMILIA) to the loop to avoid having those family levels that aren't there in some specific years in the pie The problem is that the color code changes and I get for example different colors for the same families in different years. If I group those families that I have less individuals together in a category called others and I make a new column called familia2 with fewer levels so that every year I have all levels of familia2 in my species composition I don't get the problem and all families have the same color among years. Does anybody know how to avoid the color code change for the families in the loop. I know I can do it manually and give each family a color but I have quite a lot of families so I'm wondering if there's any way to fix that some other way. Hi Lucia, FAMILIA is probably a factor, therefore can be used as an index with as.numeric(). So if you have a vector of colors for all the families in your dataset, you could specify the color for each sector of the pie with: # this gives you different colors for each family family_colors-1:length(levels(t$FAMILIA)) for(i in annos) { t-subset(masia,YEAR==i) sector_index-as.numeric(unique(t$FAMILIA)) pie(table(t$FAMILIA),main=i,col=family_colors[sector_index]) } Can't try it at the moment, but it should be close. Jim [as.numeric(unique(t$FAMILIA[i]))] without dropping the levels (I think). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm - log(variable) - skip log(0)
I want to do a lm regression, some of the variables are going to be affected with log, I would like not no take into account the values which imply doing log(0) for just one variable I have done the following but it doesn't work: lmod1.lm - lm(log(dat$inaltu)~log(dat$indiam),subset=(!(dat$indiam %in% c(0,1))) and obtain: Error en lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases lmod1.lm - lm(log(dat$inaltu)~log(dat$indiam),subset=(!(dat$indiam = 0)), na.action=na.exclude) and obtain Error en lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf en llamada a una función externa (arg 1) Thanks, u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/lm-log-variable-skip-log-0-tp3324263p3324263.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] group by in data.frame
Hi all, i have a little problem, and i think it is really simple to solve, but i dont know exactly how to. here is the challange: i have a data.frame with n colum, i have to group 2 of them and calculate the mean value of the 3. one. so far so good, that was easy - i used aggregate function to do this: group-aggregate(x[,1],list(x[,2],x[,3]),mean) and now i have to copy the calculated mean value to every row of the date.frame (in a new column in the dataframe), ofcourse by copying should be the value adequate to the group it will be great if someone can help me thanx in advance! -- View this message in context: http://r.789695.n4.nabble.com/group-by-in-data-frame-tp3324240p3324240.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select element from vector
Hi Jessica, try this: Q[k:c(k+3)] -- View this message in context: http://r.789695.n4.nabble.com/select-element-from-vector-tp3323725p3324286.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
Dear Jim, I've tried to use Rprof() as you advised me, but I don't understand how it works. I've done this: Rprof(for (i in seq_along(seq.yvar)){ all_my_commands }) summaryRprof() But I got this error: Error in summaryRprof() : no lines found in ‘Rprof.out’ I couldn't really understand from the help page what I should do. In any case, it's sure that the function tstsreg(), is what takes the most computing time. But I wanted to optimize the rest of the code to gain as much speed as possible. Ivan Le 2/25/2011 12:30, Jim Holtman a écrit : use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2- mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3- mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg- function(dat, xvar, yvar, mycol, pos, name.dat){ tsts- tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter- signif(tsts$coef[1], digits=3) tsts_slope- signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1,
Re: [R] lm - log(variable) - skip log(0)
You need to use == instead of = for testing equality. While you're at it, you should check for positive values, not just screening out 0s. This works for me: R mydata = data.frame(x=0:10, y=runif(11)) R fm = lm(y ~ log(x), mydata, subset=x0) Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of agent dunham Sent: Friday, February 25, 2011 6:24 AM To: r-help@r-project.org Subject: [R] lm - log(variable) - skip log(0) I want to do a lm regression, some of the variables are going to be affected with log, I would like not no take into account the values which imply doing log(0) for just one variable I have done the following but it doesn't work: lmod1.lm - lm(log(dat$inaltu)~log(dat$indiam),subset=(!(dat$indiam %in% c(0,1))) and obtain: Error en lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases lmod1.lm - lm(log(dat$inaltu)~log(dat$indiam),subset=(!(dat$indiam = 0)), na.action=na.exclude) and obtain Error en lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf en llamada a una función externa (arg 1) Thanks, u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/lm-log-variable-skip-log-0-tp332 4263p3324263.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
Hi, I think ave() might do what you want: df - data.frame(a=rep(c(this,that),5), b1=rnorm(10), b2=rnorm(10)) ave(df[,2], df[,1], FUN=mean) For all columns, you could do that: d - lapply(df[,2:3], FUN=function(x)ave(x,df[,1],FUN=mean)) df2 - cbind(df, d) HTH, Ivan Le 2/25/2011 12:11, zem a écrit : Hi all, i have a little problem, and i think it is really simple to solve, but i dont know exactly how to. here is the challange: i have a data.frame with n colum, i have to group 2 of them and calculate the mean value of the 3. one. so far so good, that was easy - i used aggregate function to do this: group-aggregate(x[,1],list(x[,2],x[,3]),mean) and now i have to copy the calculated mean value to every row of the date.frame (in a new column in the dataframe), ofcourse by copying should be the value adequate to the group it will be great if someone can help me thanx in advance! -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data
hello dear list! I wonder about the layout of my csv for my study design: i have 11 different sites. each site had been visited 9 times. on each visit, 6 distinctive water parameters had been taken ONCE on each visit (as continuous variables). on each visit, the fish abundance was counted using a net at 3 different locations within the site (count data). I know i will have to do an lmer using the nested locations as error term. Question is: how to organize my data, since i have abundances from the same 3 locations per site replicate but only one water parameter measurement per site replicate. to give you an idea, heres the basic look so far of my csv: sitelocationabundancepHno3and so on... A1127.10.003... A2157.10.003... A3187.10.003... B1117.40.004... B287.40.004... B3177.40.004... A1137.20.001... A2197.20.001... A3217.20.001... B196.90.002... B256.90.002... B326.90.002... i just made up the table to give an idea how the data looks like. the goal would be to analyze fish abundance ~ water parameters, does anyone have a suggestion? thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] limma function problem
Hi, I have two data set of normalized Affymetrix CEL files, wild type vs Control type.(each set have further three replicates). wild.fish AffyBatch object size of arrays=712x712 features (10 kb) cdf=Zebrafish (15617 affyids) number of samples=3 number of genes=15617 annotation=zebrafish notes= Dicer.fish AffyBatch object size of arrays=712x712 features (10 kb) cdf=Zebrafish (15617 affyids) number of samples=3 number of genes=15617 annotation=zebrafish notes= Now, I have to combine these two S4 objects and use lmFit function of Limma package.I am able to combine the two S4 objects using merge function. merge.fish -merge(wild.fish,Dicer.fish) merge.fish AffyBatch object size of arrays=712x712 features (17833 kb) cdf=Zebrafish (15617 affyids) number of samples=6 number of genes=15617 annotation=zebrafish notes=Merge from two AffyBatches with notes: 1) , and 2) design Wild Mz_Dicer GSM95623.CEL10 GSM95624.CEL10 GSM95625.CEL10 GSM95617.CEL01 GSM95618.CEL01 GSM95619.CEL01 fit -lmFit(merge.fish, design) Error in as.vector(data) : no method for coercing this S4 class to a vector mode(merge.fish) [1] S4 So, how to troubleshoot this problem? Regards, Sukhbir Singh Rattan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data
Dear Sacha, Do you revisit the same locations per site? If so, use (1|site/location) as random effect. Otherwise use just (1|site). You might want to add a crossed random effect (1|date) if you can expect an effect of phenology. Best regards, Thierry PS R-sig-mixed-models is a better list for this kind of questions. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Sacha Viquerat Verzonden: vrijdag 25 februari 2011 13:16 Aan: r-help Onderwerp: [R] count data hello dear list! I wonder about the layout of my csv for my study design: i have 11 different sites. each site had been visited 9 times. on each visit, 6 distinctive water parameters had been taken ONCE on each visit (as continuous variables). on each visit, the fish abundance was counted using a net at 3 different locations within the site (count data). I know i will have to do an lmer using the nested locations as error term. Question is: how to organize my data, since i have abundances from the same 3 locations per site replicate but only one water parameter measurement per site replicate. to give you an idea, heres the basic look so far of my csv: sitelocationabundancepHno3and so on... A1127.10.003... A2157.10.003... A3187.10.003... B1117.40.004... B287.40.004... B3177.40.004... A1137.20.001... A2197.20.001... A3217.20.001... B196.90.002... B256.90.002... B326.90.002... i just made up the table to give an idea how the data looks like. the goal would be to analyze fish abundance ~ water parameters, does anyone have a suggestion? thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 2.12.2 is released
I've rolled up R-2.12.2.tar.gz a short while ago. This is an update release, which fixes a number of mostly minor issues, and one major issue in which complex arithmetic was being messed up on some compiler platform. You can get it from http://cran.r-project.org/src/base/R-2/R-2.12.2.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team Peter Dalgaard These are the md5sums for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = ac9746b4845ae81f51cfc99262f5 MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = 72deeabefdf6fd14e83bf5703dce9176 MD5 (INSTALL) = 70447ae7f2c35233d3065b004aa4f331 MD5 (NEWS) = 30b55e4f34c155fcb2fafa7ebb55528e MD5 (ONEWS) = 0c3e10eef74439786e5fceddd06dac71 MD5 (OONEWS) = b0d650eba25fc5664980528c147a20db MD5 (R-latest.tar.gz) = bc70b51dddab8aa39066710624e55d5e MD5 (README) = 296871fcf14f49787910c57b92655c76 MD5 (RESOURCES) = 020479f381d5f9038dcb18708997f5da MD5 (THANKS) = f2ccf22f3e20ebaa86f8ee5cc6b0f655 MD5 (R-2/R-2.12.2.tar.gz) = bc70b51dddab8aa39066710624e55d5e This is the relevant part of the NEWS file: R News CHANGES IN R VERSION 2.12.2: SIGNIFICANT USER-VISIBLE CHANGES: • Complex arithmetic (notably z^n for complex z and integer n) gave incorrect results since R 2.10.0 on platforms without C99 complex support. This and some lesser issues in trignometric functions have been corrected. Such platforms were rare (we know of Cygwin and FreeBSD). However, because of new compiler optimizations in the way complex arguments are handled, the same code was selected on x86_64 Linux with gcc 4.5.x at the default -O2 optimization (but not at -O). • There is a workaround for crashes seen with several packages on systems using zlib 1.2.5: see the INSTALLATION section. NEW FEATURES: • PCRE has been updated to 8.12 (two bug-fix releases since 8.10). • rep(), seq(), seq.int() and seq_len() report more often when the first element is taken of an argument of incorrect length. • The Cocoa back-end for the quartz() graphics device on Mac OS X provides a way to disable event loop processing temporarily (useful, e.g., for forked instances of R). • kernel()'s default for m was not appropriate if coef was a set of coefficients. (Reported by Pierre Chausse.) • bug.report() has been updated for the current R bug tracker, which does not accept emailed submissions. • R CMD check now checks for the correct use of $(LAPACK_LIBS) (as well as $(BLAS_LIBS)), since several CRAN recent submissions have ignored ‘Writing R Extensions’. INSTALLATION: • The zlib sources in the distribution are now built with all symbols remapped: this is intended to avoid problems seen with packages such as XML and rggobi which link to zlib.so.1 on systems using zlib 1.2.5. • The default for FFLAGS and FCFLAGS with gfortran on x86_64 Linux has been changed back to -g -O2: however, setting -g -O may still be needed for gfortran 4.3.x. PACKAGE INSTALLATION: • A LazyDataCompression field in the DESCRIPTION file will be used to set the value for the --data-compress option of R CMD INSTALL. • Files R/sysdata.rda of more than 1Mb are now stored in the lazyload daabase using xz compression: this for example halves the installed size of package Imap. • R CMD INSTALL now ensures that directories installed from inst have search permission for everyone. It no longer installs files inst/doc/Rplots.ps and inst/doc/Rplots.pdf. These are almost certainly left-overs from Sweave runs, and are often large. DEPRECATED DEFUNCT: • The ‘experimental’ alternative specification of a name space via .Export() etc is now deprecated. • zip.file.extract() is now deprecated. • Zip-ing data sets in packages (and hence R CMD INSTALL --use-zip-data and the ZipData: yes field in a DESCRIPTION file) is deprecated: using efficiently compressed .rda images and lazy-loading of data has superseded it. BUG FIXES: • identical() could in rare cases generate a warning about non-pairlist attributes on CHARSXPs. As these are used for internal purposes, the attribute check should be skipped. (Reported by Niels Richard Hansen). • If the filename extension (usually .Rnw) was not included in a call to Sweave(), source references would not work properly and the keep.source option failed. (PR#14459) • format.data.frame() now keeps zero character column names. • pretty(x) no longer raises an error when x contains solely non-finite values. (PR#14468) • The plot.TukeyHSD() function now uses a line width of 0.5 for its
[R] Visualizing Points on a Sphere
Dear All, I need to plot some points on the surface of a sphere, but I am not sure about how to proceed to achieve this in R (or if it is suitable for this at all). In any case, I am not looking for really fancy visualizations; for instance you can consider the images between formulae 5 and 6 at http://bit.ly/hOgK9h Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
You invoke Rprof, run your code and then terminate it: Rprof() ... code you want to profile Rprof(NULL) # generate output summaryRprof() example: Rprof() for (i in 1:1e6) sin(i) + cos(i) + sqrt(i) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct sin 0.2430.77 0.24 30.77 sqrt 0.2228.21 0.22 28.21 cos 0.1620.51 0.16 20.51 + 0.1417.95 0.14 17.95 : 0.02 2.56 0.02 2.56 $by.total total.time total.pct self.time self.pct sin0.24 30.77 0.2430.77 sqrt 0.22 28.21 0.2228.21 cos0.16 20.51 0.1620.51 + 0.14 17.95 0.1417.95 : 0.02 2.56 0.02 2.56 $sample.interval [1] 0.02 $sampling.time [1] 0.78 On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Dear Jim, I've tried to use Rprof() as you advised me, but I don't understand how it works. I've done this: Rprof(for (i in seq_along(seq.yvar)){ all_my_commands }) summaryRprof() But I got this error: Error in summaryRprof() : no lines found in ‘Rprof.out’ I couldn't really understand from the help page what I should do. In any case, it's sure that the function tstsreg(), is what takes the most computing time. But I wanted to optimize the rest of the code to gain as much speed as possible. Ivan Le 2/25/2011 12:30, Jim Holtman a écrit : use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1- structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb,
Re: [R] speed up process
Ha... it was way too simple! I thought it would be like system.time()... my bad. Thanks for the tip! As we thought, foo_reg() takes most of the computing time, and I cannot improve that. Any ideas of how to improve the rest? Thanks again for your help Ivan Le 2/25/2011 14:29, jim holtman a écrit : You invoke Rprof, run your code and then terminate it: Rprof() ... code you want to profile Rprof(NULL) # generate output summaryRprof() example: Rprof() for (i in 1:1e6) sin(i) + cos(i) + sqrt(i) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct sin 0.2430.77 0.24 30.77 sqrt 0.2228.21 0.22 28.21 cos 0.1620.51 0.16 20.51 + 0.1417.95 0.14 17.95 : 0.02 2.56 0.02 2.56 $by.total total.time total.pct self.time self.pct sin0.24 30.77 0.2430.77 sqrt 0.22 28.21 0.2228.21 cos0.16 20.51 0.1620.51 + 0.14 17.95 0.1417.95 : 0.02 2.56 0.02 2.56 $sample.interval [1] 0.02 $sampling.time [1] 0.78 On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Dear Jim, I've tried to use Rprof() as you advised me, but I don't understand how it works. I've done this: Rprof(for (i in seq_along(seq.yvar)){ all_my_commands }) summaryRprof() But I got this error: Error in summaryRprof() : no lines found in ‘Rprof.out’ I couldn't really understand from the help page what I should do. In any case, it's sure that the function tstsreg(), is what takes the most computing time. But I wanted to optimize the rest of the code to gain as much speed as possible. Ivan Le 2/25/2011 12:30, Jim Holtman a écrit : use Rprof to find where time is being spent. probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control. Sent from my iPad On Feb 25, 2011, at 6:19, Ivan Calandraivan.calan...@uni-hamburg.de wrote: Thanks Nick for your quick answer. It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s). It looks like my double loop was not that bad. Does anyone know another faster way to do this? Thanks again in advance, Ivan Le 2/25/2011 11:41, Nick Sabbe a écrit : Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar- 2 seq.yvar- 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos- c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k- seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos,
Re: [R] accuracy of measurements
On Feb 24, 2011, at 4:50 PM, Denis Kazakiewicz wrote: Dear R people Could you please help with following Trying to compare accuracy of tumor size evaluation by different methods. So data looks like id true metod1 method2 ... 1 2 2 2.5 2 1.52 2 3 2 2 2 etc. Could you please give a hint how to deal with that. Seems like {merror} does not suite to me because I am trying to compare accuracy of measurements with their true known values not just overall agreement of methods. Moreover sample size is ridiculously small (33 patients) so ANOVA is not much of help (or is it?) Any suggestions, hints and even guesses are highly appreciated. I am stuck a bit. Denis, I would suggest that you start here: http://www-users.york.ac.uk/~mb55/meas/meas.htm This covers various resources pertaining to the design and analysis of measurement studies, primarily based upon methods by Bland and Altman. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] accuracy of measurements
And in that vein, the recently released MethComp package by Bendix Carstensen may be of service. HTH, Dennis On Fri, Feb 25, 2011 at 5:39 AM, Marc Schwartz marc_schwa...@me.com wrote: On Feb 24, 2011, at 4:50 PM, Denis Kazakiewicz wrote: Dear R people Could you please help with following Trying to compare accuracy of tumor size evaluation by different methods. So data looks like id true metod1 method2 ... 1 2 2 2.5 2 1.52 2 3 2 2 2 etc. Could you please give a hint how to deal with that. Seems like {merror} does not suite to me because I am trying to compare accuracy of measurements with their true known values not just overall agreement of methods. Moreover sample size is ridiculously small (33 patients) so ANOVA is not much of help (or is it?) Any suggestions, hints and even guesses are highly appreciated. I am stuck a bit. Denis, I would suggest that you start here: http://www-users.york.ac.uk/~mb55/meas/meas.htm This covers various resources pertaining to the design and analysis of measurement studies, primarily based upon methods by Bland and Altman. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Brian Tsai btsai00 at gmail.com writes: Hi all, I'm trying to figure out the effective differences between BFGS and L-BFGS-B are, besides the obvious that L-BFGS-B should be using a lot less memory, and the user can provide box constraints. 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but use less memory? L-BFGS-B is a bit more finicky: for example, it does not allow non-finite (infinite or NA) return values from the objective function, while BFGS does (although neither does during the initial function evaluation). I don't know offhand of other differences, although speed may differ. 2) If i'm optimizing with respect to a variable x that must be non-negative, a common approach is to do a change of variables x = exp(y), and optimize unconstrained with respect to y. Is optimization using box constraints on x, likely to produce as good a result as unconstrained optimization on y? It depends. If the optimal solution is on the boundary (i.e. x=0) then optimization on the transformed variable (I think you mean y=exp(x) above?) will work very badly. On the other hand, if the solution is in the interior then transforming sometimes works even better -- for example, the goodness-of-fit surface may be closer to quadratic (which sometimes has advantages in terms of inference) with the transformed than the untransformed parameter. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Visualizing Points on a Sphere
On 25/02/2011 8:21 AM, Lorenzo Isella wrote: Dear All, I need to plot some points on the surface of a sphere, but I am not sure about how to proceed to achieve this in R (or if it is suitable for this at all). In any case, I am not looking for really fancy visualizations; for instance you can consider the images between formulae 5 and 6 at http://bit.ly/hOgK9h Any suggestion is appreciated. Those plots show simple linear projections of the points, after culling those that are on the far side of the sphere. That's very easy for the points, slightly more work for the grid. I'm not aware of any package that implements all of it, but you could do it yourself fairly easily. If you want something more fancy you could use the rgl package for 3d plots that you can rotate. You'll still have to draw the grid, and you'll probably find it a little painful to implement the hidden surface removal: rgl uses depth checking to remove things, and because of rounding error it's not very good at drawing points and lines on surfaces. (There are new options to control depth checking; see depth_mask and depth_test in ?material3d. You can probably improve the default behaviour using those). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Visualizing Points on a Sphere
That's interesting. You might also like: http://en.wikipedia.org/wiki/Von_Mises%E2%80%93Fisher_distribution I'm not sure how to plot the wireframe sphere, but you can visualize the points by transforming to Cartesian coordinates like so: u - runif(1000,0,1) v - runif(1000,0,1) theta - 2 * pi * u phi - acos(2 * v - 1) x - sin(theta) * cos(phi) y - sin(theta) * sin(phi) z - cos(theta) library(lattice) cloud(z ~ x + y) -Matt On Fri, 2011-02-25 at 14:21 +0100, Lorenzo Isella wrote: Dear All, I need to plot some points on the surface of a sphere, but I am not sure about how to proceed to achieve this in R (or if it is suitable for this at all). In any case, I am not looking for really fancy visualizations; for instance you can consider the images between formulae 5 and 6 at http://bit.ly/hOgK9h Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph
Here's the way I would explore this, and some of the code is made more tidy. Note that also you could vectorize your simulation. I have used set.seed multiple times to make bootstrap samples the same across runs. -Frank . . . if (data[i, 3] == 4) data[i, 5] - sample(c(0, 1), 1, prob=c(.06, .94))} d - data.frame(tumor=factor(data[,1]), ecog=factor(data[,2]), rx=factor(data[,3]), os=data[,4], censor=data[,5]) S - with(d, Surv(os, censor)) ## Check collinearity of rx with other predictors lrm(rx ~ tumor*ecog, data=d) ## What is the marginal strength of rx (assuming PH)? cph(S ~ rx, data=d) ## What is partial effect of rx (assuming PH)? anova(cph(S ~ tumor + ecog + rx, data=d)) ## What is combined partial effect of tumor and ecog adjusting for rx? anova(cph(S ~ tumor + ecog + strat(rx), data=d), tumor, ecog) ## nothing but noise ## What is their effect not adjusting for rx cph(S ~ tumor + ecog, data=d) ## huge f - cph(S ~ tumor + ecog, x=TRUE, y=TRUE, surv=TRUE, data=d) set.seed(1) validate(f, B=100, dxy=TRUE) w - rep(1, 1000) # only one stratum, doesn't change model f - cph(S ~ tumor + ecog + strat(w), x=TRUE, y=TRUE, surv=TRUE, data=d) set.seed(1) validate(f, B=100, dxy=TRUE, u=60) ## identical to last validate except for -Dxy f - cph(S ~ tumor + ecog + strat(rx), x=TRUE, y=TRUE, surv=TRUE, time.inc=60, data=d) set.seed(1) validate(f, B=100) ## no predictive ability set.seed(1) validate(f, B=100, dxy=TRUE, u=60) ## Only Dxy indicates some predictive information; large in abs. value ## than model ignoring rx (0.3842 vs. 0.3177) - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3324516.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm - log(variable) - skip log(0)
Apologies, I'm really new with R, Can you help me with the syntax? here is my data.frame in which I introduce independent variables: varind - data.frame(datpos$hdom2,datpos$NumPies,datpos$InHart,datpos$CV,datpos$CA,datpos$FCC) varind has dimensions(194, 6), in case that's necessary. Then I type: loglmp4 - lm(log(datpos$IncAltuDom)~log(varind), subset=varind0) Error en model.frame.default(formula = log(datpos$IncAltuDom) ~ log(varind), : invalid type (list) for variable 'log(varind)' Thanks again, -- View this message in context: http://r.789695.n4.nabble.com/lm-log-variable-skip-log-0-tp3324263p3324341.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls
hi, I would like to find the x value (independent variable) for a certain dependent value using the fitted model with nls. with (predict) I can find y that corresponds to a list of x. I need the other way around. can it be done? thanks, afadda __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm - log(variable) - skip log(0)
Apologies, I'm really new with R, Can you help me with the syntax? here is my data.frame in which I introduce independent variables: varind - data.frame(datpos$hdom2,datpos$NumPies,datpos$InHart,datpos$CV,datpos$CA,datpos$FCC) varind has dimensions(194, 6), in case that's necessary. Then I type: loglmp4 - lm(log(datpos$IncAltuDom)~log(varind), subset=varind0) Error en model.frame.default(formula = log(datpos$IncAltuDom) ~ log(varind), : invalid type (list) for variable 'log(varind)' Thanks again,u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/lm-log-variable-skip-log-0-tp3324263p3324344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Group rows by common ID and plot?
Date: Thu, 24 Feb 2011 13:28:18 -0800 From: dannyb...@gmail.com To: r-help@r-project.org Subject: Re: [R] Group rows by common ID and plot? does this do what you want? library(lattice) df-data.frame(x=1:100,y=1.0/(1:100),f=floor((1:100)/10)) str(df) 'data.frame': 100 obs. of 3 variables: $ x: int 1 2 3 4 5 6 7 8 9 10 ... $ y: num 1 0.5 0.333 0.25 0.2 ... $ f: num 0 0 0 0 0 0 0 0 0 1 ... xyplot(y~x|f,data=df) In terms of a reproducible example: ProbeSet.ID.F ProbeSet.ID Feature.ID Gene.Symbol X0030V120810.4 X0143V120110.4 X0258V111710.4 X0283V111710.4 X0430V120710.4 X0472V111610.4 X0520V111610.4 X0546V113010.4 X0578V111810.4 X0624V111810.4 7896741_479302 7896741 479302 OR4F17 20 14 5 4 43 85 12 14 7 5 7896741_226901 7896741 226901 OR4F17 15 73 31 14 32 28 10 42 11 28 7896741_2337 7896741 2337 OR4F17 168 126 111 120 119 84 149 76 347 88 7896741_289201 7896742 289201 OR4F18 54 64 11 6 59 66 10 50 51 27 7896741_240730 7896742 240730 OR4F18 38 158 95 38 59 131 114 100 102 40 7896741_776611 7896743 776611 OR4F19 6 27 7 7 16 105 35 17 19 23 ...becomes three panels of a plot, containing the lines: Plot 1: 7896741_479302 7896741 479302 OR4F17 20 14 5 4 43 85 12 14 7 5 7896741_226901 7896741 226901 OR4F17 15 73 31 14 32 28 10 42 11 28 7896741_2337 7896741 2337 OR4F17 168 126 111 120 119 84 149 76 347 88 Plot2: 7896741_289201 7896742 289201 OR4F18 54 64 11 6 59 66 10 50 51 27 7896741_240730 7896742 240730 OR4F18 38 158 95 38 59 131 114 100 102 40 Plot 3: 7896741_776611 7896743 776611 OR4F19 6 27 7 7 16 105 35 17 19 23 and so on... Any ideas much appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Group-rows-by-common-ID-and-plot-tp3321955p3323465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
Hi Ivan, thanks for your replay! but the problem is there that the dataframe has 2 rows and ca. 2000 groups, but i dont have the column with the groupnames, because the groups are depending on 2 onother columns ... any other idea or i didnt understand waht are you posted ... :( -- View this message in context: http://r.789695.n4.nabble.com/group-by-in-data-frame-tp3324240p3324327.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Group rows by common ID and plot?
I imagine you want the ggplot2 package. something like: ggplot(dataframe, aes(x = yourxvar, y = youryvar)) + geom_point() + facet_wrap(~ ProbeSet.ID) Or facet_grid(), either of which makes a different panel for each unique level of ProbeSet.ID see gggplot help here: http://had.co.nz/ggplot2/ On Thursday, February 24, 2011 at 3:28 PM, DB1984 wrote: In terms of a reproducible example: ProbeSet.ID.F ProbeSet.ID Feature.ID Gene.Symbol X0030V120810.4 X0143V120110.4 X0258V111710.4 X0283V111710.4 X0430V120710.4 X0472V111610.4 X0520V111610.4 X0546V113010.4 X0578V111810.4 X0624V111810.4 7896741_479302 7896741 479302 OR4F17 20 14 5 4 43 85 12 14 7 5 7896741_226901 7896741 226901 OR4F17 15 73 31 14 32 28 10 42 11 28 7896741_2337 7896741 2337 OR4F17 168 126 111 120 119 84 149 76 347 88 7896741_289201 7896742 289201 OR4F18 54 64 11 6 59 66 10 50 51 27 7896741_240730 7896742 240730 OR4F18 38 158 95 38 59 131 114 100 102 40 7896741_776611 7896743 776611 OR4F19 6 27 7 7 16 105 35 17 19 23 ...becomes three panels of a plot, containing the lines: Plot 1: 7896741_479302 7896741 479302 OR4F17 20 14 5 4 43 85 12 14 7 5 7896741_226901 7896741 226901 OR4F17 15 73 31 14 32 28 10 42 11 28 7896741_2337 7896741 2337 OR4F17 168 126 111 120 119 84 149 76 347 88 Plot2: 7896741_289201 7896742 289201 OR4F18 54 64 11 6 59 66 10 50 51 27 7896741_240730 7896742 240730 OR4F18 38 158 95 38 59 131 114 100 102 40 Plot 3: 7896741_776611 7896743 776611 OR4F19 6 27 7 7 16 105 35 17 19 23 and so on... Any ideas much appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Group-rows-by-common-ID-and-plot-tp3321955p3323465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
10x i solved it ... mein problem was that i had 2 column by them i have to group, i just pasted the values together so that at the end i have one column to group and then was easy ... here is the script that i used: http://tolstoy.newcastle.edu.au/R/help/06/07/30184.html Ivan thanks for the help too :) -- View this message in context: http://r.789695.n4.nabble.com/group-by-in-data-frame-tp3324240p3324469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error
Hi, I am running the following script for a different (much larger data frame): DF = data.frame(read.table(textConnection(A B C D E 1 1 a 1999 1 0 2 1 b 1999 0 1 3 1 c 1999 0 1 4 1 d 1999 1 0 5 2 c 2001 1 0 6 2 d 2001 0 1 7 3 a 2004 0 1 8 3 b 2004 0 1 9 3 d 2004 0 1 10 4 b 2001 1 0 11 4 c 2001 1 0 12 4 d 2001 0 1),head=TRUE,stringsAsFactors=FALSE)) DF-DF[order(DF$B,DF$C),]#order by developer_id and year f- function(x) { unlist(lapply(x, FUN = function(z) cumsum(z) - z)) } DF-cbind(DF[,c(1:3)],ave(DF[, c(4:5)],DF$B, FUN = f)) I get the following error: Error in `[-.data.frame`(`*tmp*`, i, , value = integer(0)) : replacement has 0 items, need 37597770 In addition: Warning message: In max(i) : no non-missing arguments to max; returning -Inf The dimensions of the data frame are (5,108), so the last line of the script becomes: DF-cbind(DF[,c(1:3)],ave(DF[, c(4:108)],DF$B, FUN = f)) Any idea how to solve this problem? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Error-tp3324531p3324531.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kohonen: Argument data should be numeric
Hi, I'm trying to utilize the kohonen package to build SOM's. However, trying this on my data I get the error: Argument data should be numeric when running the som(data.train, grid = somgrid(6, 6, hexagonal)) function. As you see, there is a problem with the data type of data.train which is a list. When I try to convert it to numeric I get the error: (list) object cannot be coerced to type 'double' What should I do? I can convert the data.train if I take only one column of the list: data.train[[1]], but that is naturally not what I want. How did I end up with this data format? What I did: data1 - read.csv(data1.txt, sep = ;) training - sample(nrow(data1), 1000) data.train - data1[training,2:20] I tried to use scan as the import method (read about this somewhere) and unlist, but I'm not really sure how I should get it to numeric/ working. Thanks, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error
Sorry for being unclear: the example works fine on my machine too. However, with the much larger dataset (dim(5,108)) I get the reported error. Mathijs On Fri, Feb 25, 2011 at 3:56 PM, Scott Chamberlain scttchamberla...@gmail.com wrote: Works fine on my machine: DF A BC D E 1 1 a 1999 0 0 2 1 b 1999 0 0 3 1 c 1999 0 0 4 1 d 1999 0 0 5 2 c 2001 0 1 6 2 d 2001 1 0 7 3 a 2004 1 0 8 3 b 2004 0 1 9 3 d 2004 1 1 10 4 b 2001 0 2 11 4 c 2001 1 1 12 4 d 2001 1 2 here's my session info: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phyloch_1.4.48 XML_3.2-0colorspace_1.0-1 phangorn_1.3-1 ape_2.6-2 [6] quadprog_1.5-3 plyr_1.4 loaded via a namespace (and not attached): [1] gee_4.13-16 grid_2.12.1 lattice_0.19-17 nlme_3.1-97 tools_2.12.1 On Friday, February 25, 2011 at 8:31 AM, mathijsdevaan wrote: Hi, I am running the following script for a different (much larger data frame): DF = data.frame(read.table(textConnection( A B C D E 1 1 a 1999 1 0 2 1 b 1999 0 1 3 1 c 1999 0 1 4 1 d 1999 1 0 5 2 c 2001 1 0 6 2 d 2001 0 1 7 3 a 2004 0 1 8 3 b 2004 0 1 9 3 d 2004 0 1 10 4 b 2001 1 0 11 4 c 2001 1 0 12 4 d 2001 0 1),head=TRUE,stringsAsFactors=FALSE)) DF-DF[order(DF$B,DF$C),]#order by developer_id and year f- function(x) { unlist(lapply(x, FUN = function(z) cumsum(z) - z)) } DF-cbind(DF[,c(1:3)],ave(DF[, c(4:5)],DF$B, FUN = f)) I get the following error: Error in `[-.data.frame`(`*tmp*`, i, , value = integer(0)) : replacement has 0 items, need 37597770 In addition: Warning message: In max(i) : no non-missing arguments to max; returning -Inf The dimensions of the data frame are (5,108), so the last line of the script becomes: DF-cbind(DF[,c(1:3)],ave(DF[, c(4:108)],DF$B, FUN = f)) Any idea how to solve this problem? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Error-tp3324531p3324531.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error
Hi, I don't get any error... DF - cbind(DF[,c(1:3)],ave(DF[, c(4:5)],DF$B, FUN = f)) DF A BC D E 1 1 a 1999 0 0 7 3 a 2004 1 0 2 1 b 1999 0 0 10 4 b 2001 0 1 8 3 b 2004 1 1 3 1 c 1999 0 0 5 2 c 2001 0 1 11 4 c 2001 1 1 4 1 d 1999 0 0 6 2 d 2001 1 0 12 4 d 2001 1 1 9 3 d 2004 1 2 Ivan Le 2/25/2011 15:31, mathijsdevaan a écrit : Hi, I am running the following script for a different (much larger data frame): DF = data.frame(read.table(textConnection(A B C D E 1 1 a 1999 1 0 2 1 b 1999 0 1 3 1 c 1999 0 1 4 1 d 1999 1 0 5 2 c 2001 1 0 6 2 d 2001 0 1 7 3 a 2004 0 1 8 3 b 2004 0 1 9 3 d 2004 0 1 10 4 b 2001 1 0 11 4 c 2001 1 0 12 4 d 2001 0 1),head=TRUE,stringsAsFactors=FALSE)) DF-DF[order(DF$B,DF$C),]#order by developer_id and year f- function(x) { unlist(lapply(x, FUN = function(z) cumsum(z) - z)) } DF-cbind(DF[,c(1:3)],ave(DF[, c(4:5)],DF$B, FUN = f)) I get the following error: Error in `[-.data.frame`(`*tmp*`, i, , value = integer(0)) : replacement has 0 items, need 37597770 In addition: Warning message: In max(i) : no non-missing arguments to max; returning -Inf The dimensions of the data frame are (5,108), so the last line of the script becomes: DF-cbind(DF[,c(1:3)],ave(DF[, c(4:108)],DF$B, FUN = f)) Any idea how to solve this problem? Thanks! -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting distribution in range
Hello. I am trying to fit my data sample x with different distributions such that the integral from min(x) to max(x) of the fitted distribution will be one. Therefore I have wrote my own log-likelihood functions and then I am using mle {stats4}. So, for example: ll_gamma - function(a,b) { integrand - function(y){dgamma(y, shape=a, rate=b)} integ_res - tryCatch({integrate(integrand,min_x,max_x)$value}, error=function(err){0}); if (integ_res == 0) { return(NA) } C = 1 / integ_res res = -(sum(log(C*dgamma(x,shape=a,rate=b return(res) } m - mean(x) v - var(x) fit - mle(minuslog=ll_gamma,start=list(a=m^2/v,b=m/v)) Now, for some reason I get very weird results. I have tested it by sampling random numbers from gamma distribution, for example, and then try to fit it with the algorithm I wrote. Am I doing something wrong? do I need to first fit the sample with regular gamma distribution and then calculate the normalisation factor C (I think not. i.e. - I think that the normalisation factor should be included in the log-likelihood function). Please note that I don't know what is the best fit for my data. So I am trying to fit it with several distributions and choose the best using AIC. Any comments will be very appreciated. Please let me know if any of you have ever ran into a similar problem Thank you in advance, Saray -- View this message in context: http://r.789695.n4.nabble.com/Fitting-distribution-in-range-tp3324579p3324579.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error
Works fine on my machine: DF A B C D E 1 1 a 1999 0 0 2 1 b 1999 0 0 3 1 c 1999 0 0 4 1 d 1999 0 0 5 2 c 2001 0 1 6 2 d 2001 1 0 7 3 a 2004 1 0 8 3 b 2004 0 1 9 3 d 2004 1 1 10 4 b 2001 0 2 11 4 c 2001 1 1 12 4 d 2001 1 2 here's my session info: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phyloch_1.4.48 XML_3.2-0 colorspace_1.0-1 phangorn_1.3-1 ape_2.6-2 [6] quadprog_1.5-3 plyr_1.4 loaded via a namespace (and not attached): [1] gee_4.13-16 grid_2.12.1 lattice_0.19-17 nlme_3.1-97 tools_2.12.1 On Friday, February 25, 2011 at 8:31 AM, mathijsdevaan wrote: Hi, I am running the following script for a different (much larger data frame): DF = data.frame(read.table(textConnection( A B C D E 1 1 a 1999 1 0 2 1 b 1999 0 1 3 1 c 1999 0 1 4 1 d 1999 1 0 5 2 c 2001 1 0 6 2 d 2001 0 1 7 3 a 2004 0 1 8 3 b 2004 0 1 9 3 d 2004 0 1 10 4 b 2001 1 0 11 4 c 2001 1 0 12 4 d 2001 0 1),head=TRUE,stringsAsFactors=FALSE)) DF-DF[order(DF$B,DF$C),]#order by developer_id and year f- function(x) { unlist(lapply(x, FUN = function(z) cumsum(z) - z)) } DF-cbind(DF[,c(1:3)],ave(DF[, c(4:5)],DF$B, FUN = f)) I get the following error: Error in `[-.data.frame`(`*tmp*`, i, , value = integer(0)) : replacement has 0 items, need 37597770 In addition: Warning message: In max(i) : no non-missing arguments to max; returning -Inf The dimensions of the data frame are (5,108), so the last line of the script becomes: DF-cbind(DF[,c(1:3)],ave(DF[, c(4:108)],DF$B, FUN = f)) Any idea how to solve this problem? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Error-tp3324531p3324531.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear model lme4
Hi, I wanted to check the difference in results (using lme4) , if I treated a particular variable (beadchip) as a random effect vs if I treated it as a fixed effect. For the first case, my formula is: lmer.result - lmer(expression ~ cancerClass + (1|beadchip)) For the second case, I want to do: lmer.result2 - lmer(expression ~ cancerClass + beadchip) However, I get an error in the second case: Error in lmerFactorList(formula, fr, 0L, 0L): No random effects terms specified in formula Is there any way that I can get lmer() to accept a formula without a random effect? many thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
Yeah, you are right i want to post an short example what i want to do .. and in the meantime i solved the problem ... but here is: i have something like this dataframe: c1-c(1,2,3,2,2,3,1,2,2,2) c2-c(5,6,7,7,5,7,5,7,6,6) c3-rnorm(10) x-cbind(c1,c2,c3) x c1 c2 c3 [1,] 1 5 0.08279036 [2,] 2 6 0.59135988 [3,] 3 7 1.45520468 [4,] 2 7 -1.70094640 [5,] 2 5 0.13065228 [6,] 3 7 -1.12080980 [7,] 1 5 0.42779354 [8,] 2 7 -1.53111972 [9,] 2 6 0.29299987 [10,] 2 6 -0.01602095 #whith aggregate i receive this: aggregate(x[,3],list(x[,1],x[,2]),mean) Group.1 Group.2 x 1 1 5 0.2552920 2 2 5 0.1306523 3 2 6 0.2894463 4 2 7 -1.6160331 5 3 7 0.1671974 and the problem was that i was grouping by 2 columns, so i couldn't copy the result to x. the solution was i made another column with paste(x[,1],x[,2],sep=_) and then i used the solution from this link: http://tolstoy.newcastle.edu.au/R/help/06/07/30184.html so i solved my problem Ivan, many thanks for your support and quik responses! :) -- View this message in context: http://r.789695.n4.nabble.com/group-by-in-data-frame-tp3324240p3324608.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate with cumsum
Bill, what will be the fastest way to output not just single lines but small data frames of about 60 rows? I prefer writing to a text file because the final output is large 47k times 60 rows and since I do not know the size of it I have to use rbind to build the object which creates the memory problems described here: http://www.matthewckeller.com/html/memory.html look at the swiss cheese paragraph. kind regards Stephen -- View this message in context: http://r.789695.n4.nabble.com/aggregate-with-cumsum-tp2992383p3324610.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using scatter plot for different values
Hi everyone, I have two different results which is determined by (x,y) x1 - c(1,5,8) y1 - c(8,9,10) x2 - c(1,7,9) y2 - c(5,7,9) Let call one=(x1,y1) and Two=(x2,y2) how can I draw them in R in a scatter plot showing (x,y) with two different legends (One, Two) Regards, Amir __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] neural networks with RSNNS
Hello All! I am training to train a NN with function train() after splitting data with the function splitForTrainingAndTest(). The split is ok (checked it), but when I get a try on training I get this message: Error in UseMethod(train) : no applicable method for 'train' applied to an object of class c('double', 'numeric') The input data are logrithms of some financial values and their first lags. Does anybody can give me a hint how to make the train() function work correctly? Thank you and have a good day! Sara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm - log(variable) - skip log(0)
On Feb 25, 2011, at 7:25 AM, agent dunham wrote: Apologies, I'm really new with R, Can you help me with the syntax? here is my data.frame in which I introduce independent variables: varind - data.frame(datpos$hdom2,datpos$NumPies,datpos$InHart,datpos $CV,datpos$CA,datpos$FCC) varind has dimensions(194, 6), in case that's necessary. Then I type: loglmp4 - lm(log(datpos$IncAltuDom)~log(varind), subset=varind0) Because varind is now a dataframe, you need to refer to its columns when offering candidate independent variables to lm. It is not clear which column you wanted to test for positivity and which use on the RHS from varind. You should also get in the habit of: --- including context in followup questions --- using the data= argument in model construction Going back to your original question where the dataframe was named datand it was clear what variable you wanted on the RHS: lmod1.lm - lm( log(inaltu)~log(indiam), data= dat, subset=(indiam 0 inaltu 0) ) -- David. Error en model.frame.default(formula = log(datpos$IncAltuDom) ~ log(varind), : invalid type (list) for variable 'log(varind)' Thanks again,u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/lm-log-variable-skip-log-0-tp3324263p3324344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
Ok, now I think I've understood, but I'm not sure since I think that my ave() solution does work. Although, I though you have several numerical variables and 1 factor; it is the opposite but it is still possible: c3_mean - ave(x[,3], list(x[,1],x[,2]), FUN=mean) #note that values are different because of rnorm() cbind(x[,1:2], c3_mean) Is it what you want? Ivan Le 2/25/2011 16:14, zem a écrit : Yeah, you are right i want to post an short example what i want to do .. and in the meantime i solved the problem ... but here is: i have something like this dataframe: c1-c(1,2,3,2,2,3,1,2,2,2) c2-c(5,6,7,7,5,7,5,7,6,6) c3-rnorm(10) x-cbind(c1,c2,c3) x c1 c2 c3 [1,] 1 5 0.08279036 [2,] 2 6 0.59135988 [3,] 3 7 1.45520468 [4,] 2 7 -1.70094640 [5,] 2 5 0.13065228 [6,] 3 7 -1.12080980 [7,] 1 5 0.42779354 [8,] 2 7 -1.53111972 [9,] 2 6 0.29299987 [10,] 2 6 -0.01602095 #whith aggregate i receive this: aggregate(x[,3],list(x[,1],x[,2]),mean) Group.1 Group.2 x 1 1 5 0.2552920 2 2 5 0.1306523 3 2 6 0.2894463 4 2 7 -1.6160331 5 3 7 0.1671974 and the problem was that i was grouping by 2 columns, so i couldn't copy the result to x. the solution was i made another column with paste(x[,1],x[,2],sep=_) and then i used the solution from this link: http://tolstoy.newcastle.edu.au/R/help/06/07/30184.html so i solved my problem Ivan, many thanks for your support and quik responses! :) -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
On Feb 25, 2011, at 10:14 AM, zem wrote: Yeah, you are right i want to post an short example what i want to do .. and in the meantime i solved the problem ... but here is: i have something like this dataframe: c1-c(1,2,3,2,2,3,1,2,2,2) c2-c(5,6,7,7,5,7,5,7,6,6) c3-rnorm(10) x-cbind(c1,c2,c3) x c1 c2 c3 [1,] 1 5 0.08279036 [2,] 2 6 0.59135988 [3,] 3 7 1.45520468 [4,] 2 7 -1.70094640 [5,] 2 5 0.13065228 [6,] 3 7 -1.12080980 [7,] 1 5 0.42779354 [8,] 2 7 -1.53111972 [9,] 2 6 0.29299987 [10,] 2 6 -0.01602095 #whith aggregate i receive this: aggregate(x[,3],list(x[,1],x[,2]),mean) Group.1 Group.2 x 1 1 5 0.2552920 2 2 5 0.1306523 3 2 6 0.2894463 4 2 7 -1.6160331 5 3 7 0.1671974 and the problem was that i was grouping by 2 columns, so i couldn't copy the result to x. the solution was i made another column with paste(x[,1],x[,2],sep=_) and then i used the solution from this link: http://tolstoy.newcastle.edu.au/R/help/06/07/30184.html so i solved my problem Right. That works and has the virtue that it is reasonably clear what is going on. Another approach, possibly even more clear and even more R-ish, is to use the interaction() function. aggregate(x[,3], list(interaction(x[,1],x[,2]) ), mean) Group.1x 1 1.5 -0.658932424 2 2.5 0.824756795 3 2.6 0.640471421 4 2.7 -0.008519716 5 3.7 -0.053233855 Ivan, many thanks for your support and quik responses! :) -- View this message in context: http://r.789695.n4.nabble.com/group-by-in-data-frame-tp3324240p3324608.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using scatter plot for different values
On Feb 25, 2011, at 10:26 AM, amir wrote: Hi everyone, I have two different results which is determined by (x,y) x1 - c(1,5,8) y1 - c(8,9,10) x2 - c(1,7,9) y2 - c(5,7,9) Let call one=(x1,y1) and Two=(x2,y2) how can I draw them in R in a scatter plot showing (x,y) with two different legends (One, Two) ? points Regards, Amir __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] BFGS versus L-BFGS-B
There are considerable differences between the algorithms. And BFGS is an unfortunate nomenclature, since there are so many variants that are VERY different. It was called variable metric in my book from which the code was derived, and that code was from Roger Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and more complicated algorithm with some pretty nice properties. The code is much larger. Re: less memory -- this will depend on the number of parameters, but to my knowledge there are no good benchmark studies of memory and performance. Perhaps someone wants to propose one for Google Summer of Code (see http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011 ). The optimx package can call Rvmmin which has box constraints (also Rcgmin that is intended for very low memory). Also several other methods with box constraints, including L-BFGS-B. Worth a try if you are seeking a method for multiple production runs. Unfortunately, we seem to have some CRAN check errors on Solaris and some old releases -- platforms I do not have -- so it may be a few days or more until we sort out the issues, which seem to be related to alignment of the underlying packages for which optimx is a wrapper. Use of transformation can be very effective. But again, I don't think there are good studies on whether use of box constraints or transformations is better and when. Another project, which I have made some tentative beginings to carry out. Collaborations welcome. Best, JN On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote: Message: 86 Date: Fri, 25 Feb 2011 00:11:59 -0500 From: Brian Tsai btsa...@gmail.com To: r-help@r-project.org Subject: [R] BFGS versus L-BFGS-B Message-ID: aanlktimszvkjbuhv-bbr1easpx9ootjxqcujgujr5...@mail.gmail.com Content-Type: text/plain Hi all, I'm trying to figure out the effective differences between BFGS and L-BFGS-B are, besides the obvious that L-BFGS-B should be using a lot less memory, and the user can provide box constraints. 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but use less memory? 2) If i'm optimizing with respect to a variable x that must be non-negative, a common approach is to do a change of variables x = exp(y), and optimize unconstrained with respect to y. Is optimization using box constraints on x, likely to produce as good a result as unconstrained optimization on y? - Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model lme4
No, as the error states, you need random effects in lmer. But, you don't for lm() and that is what you're running with no random effects. However, some caution is warranted on the comparison. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Smith Sent: Friday, February 25, 2011 10:06 AM To: r-help@r-project.org Subject: [R] linear model lme4 Hi, I wanted to check the difference in results (using lme4) , if I treated a particular variable (beadchip) as a random effect vs if I treated it as a fixed effect. For the first case, my formula is: lmer.result - lmer(expression ~ cancerClass + (1|beadchip)) For the second case, I want to do: lmer.result2 - lmer(expression ~ cancerClass + beadchip) However, I get an error in the second case: Error in lmerFactorList(formula, fr, 0L, 0L): No random effects terms specified in formula Is there any way that I can get lmer() to accept a formula without a random effect? many thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: address 0x6951c20, cause 'memory not mapped'
Dear R list, I get a strange error in R: *** caught segfault *** address 0x6951c20, cause 'memory not mapped' Traceback: 1: .C(spline_eval, z$method, nu = as.integer(n), x = as.double(xout), y = double(n), z$n, z$x, z$y, z$b, z$c, z$d, PACKAGE = stats) 2: spline(gam.data$x[, col.data], gam.smooths.all$fit[, m], xout = gam.results.global[m, , x.values], ties = mean) 3: eval.with.vis(expr, envir, enclos) 4: eval.with.vis(ei, envir) 5: source(file.path(getwd(), Skripte, r, GAM_hourly, 1_calcs_GAM_all_sites_hourly.R), echo = TRUE, max.deparse.length = 2e+05) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace It seems as whether the error occurs when I try to perform a spline interpolation of a smooth function. Can anybody give me some hints on where to dig for a solution? Thanks a lot Jannis My R version (if this has anything to do with it): sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mgcv_1.6-2 loaded via a namespace (and not attached): [1] grid_2.10.1lattice_0.18-3 Matrix_0.999375-43 nlme_3.1-96 [5] tools_2.10.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by in data.frame
Hi: Here's another way: c1-c(1,2,3,2,2,3,1,2,2,2) c2-c(5,6,7,7,5,7,5,7,6,6) c3-rnorm(10) x - data.frame(c1 = factor(c1), c2 = factor(c2), c3) x - transform(x, mean = ave(c3, c1, c2, FUN = mean)) Yet another with function ddply() in package plyr: ddply(x, .(c1, c2), transform, mean = mean(c3)) HTH, Dennis On Fri, Feb 25, 2011 at 7:14 AM, zem zmanol...@gmail.com wrote: Yeah, you are right i want to post an short example what i want to do .. and in the meantime i solved the problem ... but here is: i have something like this dataframe: c1-c(1,2,3,2,2,3,1,2,2,2) c2-c(5,6,7,7,5,7,5,7,6,6) c3-rnorm(10) x-cbind(c1,c2,c3) x c1 c2 c3 [1,] 1 5 0.08279036 [2,] 2 6 0.59135988 [3,] 3 7 1.45520468 [4,] 2 7 -1.70094640 [5,] 2 5 0.13065228 [6,] 3 7 -1.12080980 [7,] 1 5 0.42779354 [8,] 2 7 -1.53111972 [9,] 2 6 0.29299987 [10,] 2 6 -0.01602095 #whith aggregate i receive this: aggregate(x[,3],list(x[,1],x[,2]),mean) Group.1 Group.2 x 1 1 5 0.2552920 2 2 5 0.1306523 3 2 6 0.2894463 4 2 7 -1.6160331 5 3 7 0.1671974 and the problem was that i was grouping by 2 columns, so i couldn't copy the result to x. the solution was i made another column with paste(x[,1],x[,2],sep=_) and then i used the solution from this link: http://tolstoy.newcastle.edu.au/R/help/06/07/30184.html so i solved my problem Ivan, many thanks for your support and quik responses! :) -- View this message in context: http://r.789695.n4.nabble.com/group-by-in-data-frame-tp3324240p3324608.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] limma function problem
On 02/25/2011 04:26 AM, Sukhbir Rattan wrote: Hi, I have two data set of normalized Affymetrix CEL files, wild type vs Control type.(each set have further three replicates). wild.fish AffyBatch object size of arrays=712x712 features (10 kb) cdf=Zebrafish (15617 affyids) number of samples=3 number of genes=15617 annotation=zebrafish notes= Dicer.fish AffyBatch object size of arrays=712x712 features (10 kb) cdf=Zebrafish (15617 affyids) number of samples=3 number of genes=15617 annotation=zebrafish notes= Now, I have to combine these two S4 objects and use lmFit function of Limma package.I am able to combine the two S4 objects using merge function. merge.fish -merge(wild.fish,Dicer.fish) merge.fish AffyBatch object size of arrays=712x712 features (17833 kb) cdf=Zebrafish (15617 affyids) number of samples=6 number of genes=15617 annotation=zebrafish notes=Merge from two AffyBatches with notes: 1) , and 2) design Wild Mz_Dicer GSM95623.CEL10 GSM95624.CEL10 GSM95625.CEL10 GSM95617.CEL01 GSM95618.CEL01 GSM95619.CEL01 fit -lmFit(merge.fish, design) Error in as.vector(data) : no method for coercing this S4 class to a vector mode(merge.fish) [1] S4 So, how to troubleshoot this problem? Hi Sukhbir -- this is a Bioconductor package, so please ask on that list. http://bioconductor.org/help/mailing-list/ However, you'll want to review basic microarray analysis work flows in R, either on the Bioconductor web site http://bioconductor.org/help/workflows/oligo-arrays/ or other resources, such as the vignette that comes with affy or limma. What you have is 'raw' data; you want to 'pre-process' it, e.g., by the RMA algorithm, prior to assessing differential expression. A more typical work flow might go directly from your 6 CEL files to an 'ExpressionSet' object using RMA normalization, via the single function call just.rma from the affy package; no need to ReadAffy and merge. Hope that helps. Martin Regards, Sukhbir Singh Rattan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls
On Fri, Feb 25, 2011 at 6:09 AM, Abeer Fadda a.fa...@zmbh.uni-heidelberg.de wrote: hi, I would like to find the x value (independent variable) for a certain dependent value using the fitted model with nls. with (predict) I can find y that corresponds to a list of x. I need the other way around. can it be done? thanks, afadda Yes. -- Bert ... Oh, if you mean HOW can it be done? -- lots of ways. Analytically, if y=f(x) is your fitted model, just backsolve x = g(y). If your algebra isn't up to the task, then just predict y on a suitably fine x grid, and (assuming monotonicity) find the x corresponding to the predicted y closest to the y you wish to back calibrate. This is just a matter of indexing. See ?which. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable names AS variable names?
My actual code is several things with adaptive filtering. This will require accessing data sporadically. The loop was just a quick example for the e-mail. One application is to work with online (streaming) data. If I get a new data point in for code a1, I'll need to be able to reference the matrix named a1. On 2/25/11 12:23 AM, David Winsemius wrote: On Feb 25, 2011, at 1:55 AM, Noah Silverman wrote: How can I dynamically use a variable as the name for another variable? I realize this sounds cryptic, so an example is best: #Start with an array of codes codes - c(a1, b24, q99) Is there some reason not to use list(a1, b24, q99)? If not then: lapply(codes, somefun) #Each code has a corresponding matrix (could be vector) a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) #Now, I want to loop through all the codes and do something with each matrix for(code in codes){ #here is where I'm stuck. I don't want the value of code, but the variable who's name is the value of code } Any suggestions? -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing sub diagonals / spdiag in R ?
Hello, I'm attempting to access a specific number of sub diagonals in a MATRIX and have been accustomed to using spdiags in MATLAB or Octave. I've got a solution pieced together using for loops and it works though isn't vectorized and liable to run very slow for large matrices. As an example: A = 1 2 3 4 5 9 8 7 6 5 4 5 6 7 8 5 4 3 2 1 8 7 6 0 1 The subdiagonals are: 9,5,3,0 4,4,6 5,7 and 8, I know about lower.tri and can fetch the data in a resulting vector which ,in this case, would be: 9,4,5,8,5,4,7,3,6,0 though I would have to manipulate this some more to extract the other diagonals (imagine this being done for say a 1000 x 1000 matrix). I looked at CRAN and didn't see anything corresponding to spdiags. The closest package appeared to be the one relating to sparse matrices and band symmetry. Would you have any suggestions about 1) how to emulate spdiags or 2) working with the lower.tri returned-data and extracting the remaining diags efficiently. I can live with what I have but imagine that there is something more direct. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forced inclusion of varaibles in validate command as well as step
Hello all I am a very new R user I am used to using STATA My problem: I want to build a Cox model and validate this. I have a large number of clinical relevant factors and feel the need to reduce these. Meanwhile I have some clinical variables I deem sufficiently important to force into the model regardless of AIC or p value. This is my present log over commands library(rms) library(survival) library(Hmisc) data1 - read.table(optimism.csv, header=T, sep=,) attach(data1) coxmodel4 - coxph(formula=Surv(OS,mors) ~ iAJCC2+iAJCC3+iPS2+iPS3++alder_diag+gender+vol_GTV+iforb2+iforb3+hem_LNL+ser o_thromb+LDH_UNL+ALAT_UNL+BASP_UNL+sero_bili+resection_perf+sero_WBC, data=data1, x=TRUE, y=TRUE,method=c(efron)) coxmodel.streg-step(coxmodel4) I would like to lock iAJCC2 iAJCC3 and iPS2 + iPS3 regardless, but I cannot seem to get the step function to accept this. Further Once I have the model I would like to validate it with the validate command I am presently using this fit - cph(formula=Surv(OS,mors) ~ iAJCC2+iAJCC3+iPS2+iPS3+alder_diag+gender+vol_GTV+iforb2+iforb3+hem_LNL+sero _thromb+LDH_UNL+ALAT_UNL+BASP_UNL+sero_bili+resection_perf+sero_WBC, data=data1, x=TRUE, y=TRUE) fit validate(fit, method=boot, B=40,bw=TRUE, rule=p, type=residual, sls=0.15, aics=0, pr=TRUE) Due to my small data set 153 patients with 130 events I have chosen to lift the p limit from 5% to 15% as suggested by Steyerberg. I would appreciate any help with the lock term (also if it cannot be done) As I mentioned I am a bit of a rookie, and not too experienced as a programmer (I am a MD after all) However I am quite impressed with R so far since I have been trying to get this far in STATA for a few weeks. Sincerely Jon Kroll Bjerregaard, MD. Dep of Oncology Odense University Hospital [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help please ..simple question regarding output the p-value inside a function and lm
Dear R community members and R experts I am stuck at a point and I tried with my colleagues and did not get it out. Sorry, I need your help. Here my data (just created to show the example): # generating a dataset just to show how my dataset look like, here I have x variables # x1 .to X1000 plus ind and y ind - c(1:100) y - rnorm(100, 10,2) set.seed(201) P - vector() dataf1 - as.data.frame(matrix(rep(NA, 10), nrow=100)) dataf - data.frame (dataf1, ind,y) names(dataf) - (c(paste(x,1:1000, sep=),ind, y)) for(i in 1:1000) { dataf[,i] - rnorm(100) } # my intension was to fit a model that would fit the following fashion: y ~ x1 +x2, y ~ x3+x4, y ~ x5+ x6y ~ x999+x1000 (to end of the dataframe) # please not that I want to avoid to fit y ~ x2 + x3 or y ~ x4 + x5 (means that I am selecting two x variables at time to end) # question: how can I do this and put inside a user function as I worked out the following??? # defining function for lm model mylm - function (mydata,nvar) { y - NULL P1 - vector (mode=numeric, length = nvar) P2 - vector (mode=numeric, length = nvar) for(i in 1: nvar) { print(P1[i] - summary(lm(mydata$y ~ mydata[,i]) + mydata[,i+1]$coefficients[2,4])) print(P2[i] - summary(lm(mydata$y ~ mydata[,i]) + mydata[,i+1]$coefficients[2,5])) print(plot(nvar, P1)) print(plot(nvar, P2)) } } # applying the function to mydata mylm (dataf, 1000) Does not work?? The following is the error message: Error in model.frame.default(formula = mydata$y ~ mydata[, i], drop.unused.levels = TRUE) : invalid type (NULL) for variable 'mydata$y' Please help ! Thanks; Umesh R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error
I simply don't understand why I get this error when using a larger dataset. Error in `[-.data.frame`(`*tmp*`, i, , value = integer(0)) : replacement has 0 items, need 37597770 In addition: Warning message: In max(i) : no non-missing arguments to max; returning -Inf Any ideas on what this error means? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Error-tp3324531p3324859.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA and Pseudoreplication in R
Hi, As part of my dissertation, I'm going to be doing an Anova, comparing the dead zone diameters on plates of microbial growth with little paper disks loaded with antimicrobial, a clear zone appears where death occurs, the size depending on the strength and succeptibility. So it's basically 4 different treatments, and I'm comparing the diameters (in mm) of circles. I'm concerned however, about Pseudoreplication and how to deal with it in R, (I thought of using the Error() term. I have four levels of one factor(called Treatment): NE.Dettol, EV.Dettol, NE.Garlic, EV.Garlic. (NE.Dettol is E.coli not evolved to dettol, exposed to dettol to get dead zones. And the same for NE.Garlic, but with garlic, not dettol. EV.Dettol is E.coli that has been evolved against dettol, and then tested afterwards against dettol to get the dead zones. Same applies for EV.Garlic but with garlic). You see from the four levels (or treatments) there are two chemicals involved. So my first concern is whether they should be analysed using two seperate ANOVA's. NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli, just exposed to two different chemicals. EV.Dettol and EV.Garlic, are in principle, likely to be two different forms of the organism after the many experimental doses of their respective chemical. For NE.Garlic and NE.Dettol I have 5, what I've called Lineages, basically seperate bottles of them (10 in total). Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. - This was done because there was the possiblity that, whilst I'm expecting them all to respond in a similar manner, there are many evolutionary paths to the same result, and previous research and reading shows that occasionally one or two react differently to the rest through random chance. The point I observed above (NE.Dettol and NE.Garlic are both the same organism...) is also applicable to the 5 bottles: The 5 bottles each of NE.Garlic and NE.Dettol are supposed to be all the same organism - from a stock one kept in store in the lab. There is potential though for the 5 of EV.Garlic, to be different from one another, and potential for the 5 EV.Dettol to be different from one another. The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5). Because they may be different. To get the measurements of the diamter of the zones. I take out a small amount from a tube and spread it on a plate, then take three paper disks, soaked in their respective chemical, either Dettol or Garlic. and press them and and incubate them. Then when the zones have appeared after a day or 2. I take 4 diameter measurements from each zone, across the zone at different angles, to take account for the fact, that there may be a weird shape, or not quite circular. I'm concerned about pseudoreplication, such as the multiple readings from one disk, and the 5 lineages - which might be different from one another in each of the Two EV. treatments, but not with NE. treatments. I read that I can remove pseudoreplication from the multiple readings from each disk, by using the 4 readings on each disk, to produce a mean for the disks, and analyse those means - Exerciseing caution where there are extreme values. I think the 3 disks for each lineage themselves are not pseudoreplication, because they are genuinley 3 disks on a plate: the Disk Diffusion Test replicated 3 times - but the multiple readings from one disk if eel, is pseudoreplication. I've also read about including Error() terms in a formula. I'm unsure of the two NE. Treatments comming from the same culture does not introduce pseudoreplications at Treatment Factor Level, because of the two different antimicrobials used have two different effects. I was hoping for a more expert opinion on whether I have identified pseudoreplication correctly or if there is indeed pseudoreplication in the 5 Lineages or anywhere else I haven't seen. And how best this is dealt with in R. At the minute my solution to the multiple readings from one disk is to simply make a new factor, with the means on and do Anova from that, or even take the means before I even load the dataset into R. I'm wondering if an Error() term would be correct. Thanks, Ben W. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] accuracy of measurements
Thank so much to everybody who found time to answer my question All your messages are of great help. Good luck У Пят, 25/02/2011 у 05:46 -0800, Dennis Murphy піша: And in that vein, the recently released MethComp package by Bendix Carstensen may be of service. HTH, Dennis On Fri, Feb 25, 2011 at 5:39 AM, Marc Schwartz marc_schwa...@me.com wrote: On Feb 24, 2011, at 4:50 PM, Denis Kazakiewicz wrote: Dear R people Could you please help with following Trying to compare accuracy of tumor size evaluation by different methods. So data looks like id true metod1 method2 ... 1 2 2 2.5 2 1.52 2 3 2 2 2 etc. Could you please give a hint how to deal with that. Seems like {merror} does not suite to me because I am trying to compare accuracy of measurements with their true known values not just overall agreement of methods. Moreover sample size is ridiculously small (33 patients) so ANOVA is not much of help (or is it?) Any suggestions, hints and even guesses are highly appreciated. I am stuck a bit. Denis, I would suggest that you start here: http://www-users.york.ac.uk/~mb55/meas/meas.htm This covers various resources pertaining to the design and analysis of measurement studies, primarily based upon methods by Bland and Altman. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate probabilty
Are you clear about the question you are asking? Do you want to know whether there are 6 balls or at least 6 balls? (It sounds like at least.) Do you want to know whether there are at least 6 balls in the first box, or at least 6 balls in exactly one box or at least 6 balls in at least one box? This is the probability that there are exactly 6 balls in the first box: dbinom(6,142,1/491) [1] 5.53366e-07 This is the probability that there are MORE THAN 6 balls in the first box: (NOT at least 6) 1-pbinom(6,142,1/491) [1] 2.272026e-08 sum(sapply(7:142, function(i) dbinom(i,142,1/491))) [1] 2.272026e-08 1-sum(sapply(0:6, function(i) dbinom(i,142,1/491))) [1] 2.272026e-08 This is probability that there are at least 6 balls in the first box: 1-pbinom(5,142,1/491) [1] 5.760862e-07 You can get all this from ?dbinom, but it pretty confusing that the argument n and the italic n in the details are totally different things, italic n = argument size. (Likewise, italic p = argument prob, not argument p.) Questions about more than one box are a little harder since the boxes are not independent. HTH, Rex -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Fabrice Tourre Sent: Thursday, February 24, 2011 3:51 PM To: r-help@r-project.org Subject: [R] Calculate probabilty Hi List, I have a question to calculate probability using R. There are 491 boxes and 142 balles. If the ball randomly put into the box. How to calculate the probability of six or more there are in one box? I have try : dbinom(6,142,1/491) 1-pbinom(6,142,1/491) But I think I have some unclear about the dbinom and pbinom. Thank you very much in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Missing R.h
Hi, I'm trying to install a module - gputools - and keep getting compile time errors about missing R.h Does anyone know where this file can be found? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable names AS variable names?
To access a variable by a character string name, try for(code in codes) { dat - get(code) [stuff] } Other options include ?assign if you need to manipulate the original, or ?with to use the subject of codes as an environment. -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it. - Jubal Early, Firefly r-help-boun...@r-project.org wrote on 02/25/2011 12:33:32 PM: [image removed] Re: [R] Variable names AS variable names? Noah Silverman to: 02/25/2011 12:35 PM Sent by: r-help-boun...@r-project.org Cc: r-help My actual code is several things with adaptive filtering. This will require accessing data sporadically. The loop was just a quick example for the e-mail. One application is to work with online (streaming) data. If I get a new data point in for code a1, I'll need to be able to reference the matrix named a1. On 2/25/11 12:23 AM, David Winsemius wrote: On Feb 25, 2011, at 1:55 AM, Noah Silverman wrote: How can I dynamically use a variable as the name for another variable? I realize this sounds cryptic, so an example is best: #Start with an array of codes codes - c(a1, b24, q99) Is there some reason not to use list(a1, b24, q99)? If not then: lapply(codes, somefun) #Each code has a corresponding matrix (could be vector) a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) #Now, I want to loop through all the codes and do something with each matrix for(code in codes){ #here is where I'm stuck. I don't want the value of code, but the variable who's name is the value of code } Any suggestions? -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing sub diagonals / spdiag in R ?
On Fri, Feb 25, 2011 at 11:26 AM, Mingo catojo...@gmail.com wrote: Hello, I'm attempting to access a specific number of sub diagonals in a MATRIX and have been accustomed to using spdiags in MATLAB or Octave. I've got a solution pieced together using for loops and it works though isn't vectorized and liable to run very slow for large matrices. As an example: A = 1 2 3 4 5 9 8 7 6 5 4 5 6 7 8 5 4 3 2 1 8 7 6 0 1 The subdiagonals are: 9,5,3,0 4,4,6 5,7 and 8, I know about lower.tri and can fetch the data in a resulting vector which ,in this case, would be: 9,4,5,8,5,4,7,3,6,0 though I would have to manipulate this some more to extract the other diagonals (imagine this being done for say a 1000 x 1000 matrix). I looked at CRAN and didn't see anything corresponding to spdiags. The closest package appeared to be the one relating to sparse matrices and band symmetry. Would you have any suggestions about 1) how to emulate spdiags or 2) working with the lower.tri returned-data and extracting the remaining diags efficiently. I can live with what I have but imagine that there is something more direct. Thanks A[ col(A) == row(A) - i ] is the ith subdiagonal -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable names AS variable names?
On Feb 25, 2011, at 12:33 PM, Noah Silverman wrote: My actual code is several things with adaptive filtering. This will require accessing data sporadically. The loop was just a quick example for the e-mail. One application is to work with online (streaming) data. If I get a new data point in for code a1, I'll need to be able to reference the matrix named a1. So, how do these things you are calling codes get their names? (code is not an R datatype. a1 is a matrix, a1 is a character value and it would be returned by names(a1). ) a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) codes - list(a1=a1, b24=b24, q99=q99) str(codes[['a1']]) ... should be a matrix Assignment also works with [[ or with [. We really _do_ need examples that represent the problems posed on the list. You have been posting a sufficient number of times to have understood this by now. -- David On 2/25/11 12:23 AM, David Winsemius wrote: On Feb 25, 2011, at 1:55 AM, Noah Silverman wrote: How can I dynamically use a variable as the name for another variable? I realize this sounds cryptic, so an example is best: #Start with an array of codes Is there some reason not to use list(a1, b24, q99)? If not then: lapply(codes, somefun) #Each code has a corresponding matrix (could be vector) a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) #Now, I want to loop through all the codes and do something with each matrix for(code in codes){ #here is where I'm stuck. I don't want the value of code, but the variable who's name is the value of code } Any suggestions? -N David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph
P.S. I used the latest version of the rms package to run this. The Design package is no longer supported. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3325050.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable names AS variable names?
On Feb 25, 2011, at 1:09 PM, David Winsemius wrote: On Feb 25, 2011, at 12:33 PM, Noah Silverman wrote: My actual code is several things with adaptive filtering. This will require accessing data sporadically. The loop was just a quick example for the e-mail. One application is to work with online (streaming) data. If I get a new data point in for code a1, I'll need to be able to reference the matrix named a1. So, how do these things you are calling codes get their names? (code is not an R datatype. a1 is a matrix, a1 is a character value and it would be returned by names(a1). ) That's not correct. names(a1) would not return a1 but names(codes) [1] would if defined as a list as below. a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) codes - list(a1=a1, b24=b24, q99=q99) str(codes[['a1']]) ... should be a matrix Assignment also works with [[ or with [. We really _do_ need examples that represent the problems posed on the list. You have been posting a sufficient number of times to have understood this by now. -- David On 2/25/11 12:23 AM, David Winsemius wrote: On Feb 25, 2011, at 1:55 AM, Noah Silverman wrote: How can I dynamically use a variable as the name for another variable? I realize this sounds cryptic, so an example is best: #Start with an array of codes Is there some reason not to use list(a1, b24, q99)? If not then: lapply(codes, somefun) #Each code has a corresponding matrix (could be vector) a1 - matrix(rnorm(100), nrow=10) b24 - matrix(rnorm(100), nrow=10) q99 - matrix(rnorm(100), nrow=10) #Now, I want to loop through all the codes and do something with each matrix for(code in codes){ #here is where I'm stuck. I don't want the value of code, but the variable who's name is the value of code } Any suggestions? -N David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Hi John, Thanks so much for the informative reply! I'm currently trying to optimize ~10,000 parameters simultaneously - for some reason, when I compare the memory usage for L-BFGS-B and BFGS, the L-BFGS-B only uses about 1/7 of the memory, with all default input parameters, I'm a bit surprised that it isn't a lot less, but BFGS is definitely converging a lot slower. My other question is that, L-BFGS-B is returning 'non-finite' errors with respect to the gradient function I'm supplying, because again, all the parameters i'm optimizing need to be non-negative (so i'm optimizing the log of the parameters), but the gradient at some point divides by each parameter, so when some of the parameters go to 0, the gradient becomes infinite. Do you (or anyone else) have any suggestions for how to prevent this? Is the only way to force the parameters to be larger than some number close to 0 (i.e. 1e-10), or modify the gradient function to set the entry of small parameters to 0? Thanks! Brian. On Fri, Feb 25, 2011 at 10:51 AM, Prof. John C Nash nas...@uottawa.cawrote: There are considerable differences between the algorithms. And BFGS is an unfortunate nomenclature, since there are so many variants that are VERY different. It was called variable metric in my book from which the code was derived, and that code was from Roger Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and more complicated algorithm with some pretty nice properties. The code is much larger. Re: less memory -- this will depend on the number of parameters, but to my knowledge there are no good benchmark studies of memory and performance. Perhaps someone wants to propose one for Google Summer of Code (see http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011 ). The optimx package can call Rvmmin which has box constraints (also Rcgmin that is intended for very low memory). Also several other methods with box constraints, including L-BFGS-B. Worth a try if you are seeking a method for multiple production runs. Unfortunately, we seem to have some CRAN check errors on Solaris and some old releases -- platforms I do not have -- so it may be a few days or more until we sort out the issues, which seem to be related to alignment of the underlying packages for which optimx is a wrapper. Use of transformation can be very effective. But again, I don't think there are good studies on whether use of box constraints or transformations is better and when. Another project, which I have made some tentative beginings to carry out. Collaborations welcome. Best, JN On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote: Message: 86 Date: Fri, 25 Feb 2011 00:11:59 -0500 From: Brian Tsai btsa...@gmail.com To: r-help@r-project.org Subject: [R] BFGS versus L-BFGS-B Message-ID: aanlktimszvkjbuhv-bbr1easpx9ootjxqcujgujr5...@mail.gmail.com Content-Type: text/plain Hi all, I'm trying to figure out the effective differences between BFGS and L-BFGS-B are, besides the obvious that L-BFGS-B should be using a lot less memory, and the user can provide box constraints. 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but use less memory? 2) If i'm optimizing with respect to a variable x that must be non-negative, a common approach is to do a change of variables x = exp(y), and optimize unconstrained with respect to y. Is optimization using box constraints on x, likely to produce as good a result as unconstrained optimization on y? - Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate probabilty
Hi Rex, Thanks for you explain. In fact, my question is: When I observed that there are 6 or more balls in one box, what is this probability? The ball is randomly put into the boxes. I think it is: 1-pbinom(6,142,1/491) = 2.272026e-08. When the sample size is large, how should I do this? using chisq.test? Becuase binom test is not suitable for large sample size. For example, There are 6000 balls and 500 boxes, when I observed that there are 60 or more balls in one box, what is this probability? On Fri, Feb 25, 2011 at 6:40 PM, rex.dw...@syngenta.com wrote: Rex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA and Pseudoreplication in R
I can hopefully save bandwidth here by suggesting that this belongs on the R-sig-mixed-models list. -- Bert As an aside, shouldn't you be figuring this out yourself or seeking local consulting expertise? On Fri, Feb 25, 2011 at 9:08 AM, Ben Ward benjamin.w...@bathspa.org wrote: Hi, As part of my dissertation, I'm going to be doing an Anova, comparing the dead zone diameters on plates of microbial growth with little paper disks loaded with antimicrobial, a clear zone appears where death occurs, the size depending on the strength and succeptibility. So it's basically 4 different treatments, and I'm comparing the diameters (in mm) of circles. I'm concerned however, about Pseudoreplication and how to deal with it in R, (I thought of using the Error() term. I have four levels of one factor(called Treatment): NE.Dettol, EV.Dettol, NE.Garlic, EV.Garlic. (NE.Dettol is E.coli not evolved to dettol, exposed to dettol to get dead zones. And the same for NE.Garlic, but with garlic, not dettol. EV.Dettol is E.coli that has been evolved against dettol, and then tested afterwards against dettol to get the dead zones. Same applies for EV.Garlic but with garlic). You see from the four levels (or treatments) there are two chemicals involved. So my first concern is whether they should be analysed using two seperate ANOVA's. NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli, just exposed to two different chemicals. EV.Dettol and EV.Garlic, are in principle, likely to be two different forms of the organism after the many experimental doses of their respective chemical. For NE.Garlic and NE.Dettol I have 5, what I've called Lineages, basically seperate bottles of them (10 in total). Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. - This was done because there was the possiblity that, whilst I'm expecting them all to respond in a similar manner, there are many evolutionary paths to the same result, and previous research and reading shows that occasionally one or two react differently to the rest through random chance. The point I observed above (NE.Dettol and NE.Garlic are both the same organism...) is also applicable to the 5 bottles: The 5 bottles each of NE.Garlic and NE.Dettol are supposed to be all the same organism - from a stock one kept in store in the lab. There is potential though for the 5 of EV.Garlic, to be different from one another, and potential for the 5 EV.Dettol to be different from one another. The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5). Because they may be different. To get the measurements of the diamter of the zones. I take out a small amount from a tube and spread it on a plate, then take three paper disks, soaked in their respective chemical, either Dettol or Garlic. and press them and and incubate them. Then when the zones have appeared after a day or 2. I take 4 diameter measurements from each zone, across the zone at different angles, to take account for the fact, that there may be a weird shape, or not quite circular. I'm concerned about pseudoreplication, such as the multiple readings from one disk, and the 5 lineages - which might be different from one another in each of the Two EV. treatments, but not with NE. treatments. I read that I can remove pseudoreplication from the multiple readings from each disk, by using the 4 readings on each disk, to produce a mean for the disks, and analyse those means - Exerciseing caution where there are extreme values. I think the 3 disks for each lineage themselves are not pseudoreplication, because they are genuinley 3 disks on a plate: the Disk Diffusion Test replicated 3 times - but the multiple readings from one disk if eel, is pseudoreplication. I've also read about including Error() terms in a formula. I'm unsure of the two NE. Treatments comming from the same culture does not introduce pseudoreplications at Treatment Factor Level, because of the two different antimicrobials used have two different effects. I was hoping for a more expert opinion on whether I have identified pseudoreplication correctly or if there is indeed pseudoreplication in the 5 Lineages or anywhere else I haven't seen. And how best this is dealt with in R. At the minute my solution to the multiple readings from one disk is to simply make a new factor, with the means on and do Anova from that, or even take the means before I even load the dataset into R. I'm wondering if an Error() term would be correct. Thanks, Ben W. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list
[R] R in different OS
Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] means, SD's and tapply
I'm trying to use tapply to output means and SD or SE for my data but seem to be limited by how many times I can subset it. Here's a snippet of my data stems353[1:10,] Time DataSource Plot Elevation Aspect Slope Type Species SizeClass Stems 1 ModernCameron 70F221 1730ESE20 ConiferABCO Class1 3 2 ModernCameron 70F221 1730ESE20 ConiferABMA Class1 0 3 ModernCameron 70F221 1730ESE20 HardwoodACMA Class1 0 4 ModernCameron 70F221 1730ESE20 HardwoodAECA Class1 0 5 ModernCameron 70F221 1730ESE20 HardwoodARME Class1 0 6 ModernCameron 70F221 1730ESE20 ConiferCADE Class115 7 ModernCameron 70F221 1730ESE20 HardwoodCELE Class1 0 8 ModernCameron 70F221 1730ESE20 HardwoodCONU Class1 0 9 ModernCameron 70F221 1730ESE20 ConiferJUCA Class1 0 10 ModernCameron 70F221 1730ESE20 ConiferJUOC Class1 0 I'd like to see means/SD of Stems stratified by Species, Time and SizeClass. I can get R to give me this for means by species: tapply(stems353$Stems, stems353$Species, mean) ABCO ABMA ACMA AECA ARME CADE CELE 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 0.4684844193 0.0063739377 CONU JUCA JUOC LIDE PIAL PICO PIJE 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 1.5651558074 0.2315864023 PILA PIMOPIMO2 PIPO PISA POTR PSME 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 0.0506373938 0.2000708215 QUCH QUDO QUDU QUKE QULO QUWISalix 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 0.0548866856 0.0003541076 SEGI TSME 0.0021246459 0.5017705382 but I really need to see each species by SizeClass and Time so that each value would be labeled something like ABCOSizeClass1TimeModern. Adding 2 variables to the function doesn't seem to work tapply(stems353$Stems, stems353$Species, stems353$SizeClass, stems353$Time, mean) Error in match.fun(FUN) : 'stems353$SizeClass' is not a function, character or symbol I've already created proper subsets for each of these groups, e.g. one subset is called stems353ABCO1 and I can run analyses on this. But, trying to extract means straight from those subsets doesn't seem to work mean(stems353ABCO1) [1] NA Warning message: In mean.default(stems353ABCO1) : argument is not numeric or logical: returning NA Thanks, Chris Dolanc -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Group rows by common ID and plot?
Thanks Mike - this doesn't quite do it, but I think that you've hit of the right method. I am just trying to use 'plot' initially - I don't care so much about the arrangement in the file. plot(df$y,group=df$f) outputs the Y column in the appropriate plot. What I would like to do is have 10 Y columns, i.e. Y1, Y2, Y3...Y10, and plot just the values in each row that is grouped by 'f'. Does this make sense? I'm looking at 'group' functions that allow extracting of all the values of all rows that match a unique 'f', and then trying to plot them individually, but not working yet. Any other suggestions for group functions that might allow the data to be reshaped into appropriate lists? Scott - I think that the main issue is the upfront grouping of all columns within a row, rather than the faceting... Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Group-rows-by-common-ID-and-plot-tp3321955p3325121.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with card-sorting experiment
This is the first time that I've posted to this list, so if I'm doing something wrong, please let me know. Also, if there is a searchable forum where I can find help, that would be good too. I'm doing a card sorting experiment, and I'm having problems imputing my data into R for later analysis. This is what I get from each sorter: Category NameCard numbers Group 1: 1,3,5,7,9 Group 2: 2,4,6,8,10 I would like to use the adjusted Rand index to compare each sorter. As an input the ARI needs a vector like this: Group 1 Group 2 Group 1 Group 2 Group 1 Group 2 Group 1 Group 2 Group 1 Group 2 or even. 1 2 1 2 1 2 1 2 1 2 (Note: I am using the function adjustedRandIndex from the library mclust) I also need to make a similarity matrix so that I can do cluster analysis on the reviewers (which I am able to do). I would like to be able to load in several of these vectors at once so that I can create a big matrix with all of the data in it, and I currently do this using read.table() PROBLEM 1) When I attempt to do the Adjusted Rand Index calculation I do this: reviewer1 - read.table(filename, sep=,) (they are csv files) . Then I try to make a big matrix by doing this: rset[1] - reviewer1 rset[2] - reviewer2 . So when I try to do adjustedRandIndex(rset[1],rset[2]) I get an error message: Error in FUN(X, Y, ...) : comparison of these types is not implemented And If I do this: rset[1] [[1]] [1] set1 set2 set1 set2 set1 set2 set1 set2 set1 set2 Levels: set1 set2 PROBLEM 2) Some of my reviewers have put single cards into two piles (which was allowed). However, this routine for the Adjusted Rand Index doesn't seem to be able to handle that sort of category as an input. Is that a problem for the Adj Rand Index in general? Is there a routine that can find the Adjusted Rand Index for a different input? Thanks, -Steve Wolf MSU--Lyman Briggs College [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interactive/Dynamic plots with R
What types of interaction do you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Abhishek Pratap Sent: Friday, February 25, 2011 12:37 AM To: r-help@r-project.org Subject: [R] Interactive/Dynamic plots with R Hi Guys In order to look at a dense plot I would like to have the capability to plot dynamic/interactive. Before I try rgobi which I heard can help me; I would like to take your opinion. Thanks! -Abhi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error
Does it work for FUN=mean? If yes, you need to print out the results of f before you return them to find the anomalous value. BTW Error is not a very good subject line. I don't see many posts from people reporting how well things are going :) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of mathijsdevaan Sent: Friday, February 25, 2011 9:31 AM To: r-help@r-project.org Subject: [R] Error Hi, I am running the following script for a different (much larger data frame): DF = data.frame(read.table(textConnection(A B C D E 1 1 a 1999 1 0 2 1 b 1999 0 1 3 1 c 1999 0 1 4 1 d 1999 1 0 5 2 c 2001 1 0 6 2 d 2001 0 1 7 3 a 2004 0 1 8 3 b 2004 0 1 9 3 d 2004 0 1 10 4 b 2001 1 0 11 4 c 2001 1 0 12 4 d 2001 0 1),head=TRUE,stringsAsFactors=FALSE)) DF-DF[order(DF$B,DF$C),]#order by developer_id and year f- function(x) { unlist(lapply(x, FUN = function(z) cumsum(z) - z)) } DF-cbind(DF[,c(1:3)],ave(DF[, c(4:5)],DF$B, FUN = f)) I get the following error: Error in `[-.data.frame`(`*tmp*`, i, , value = integer(0)) : replacement has 0 items, need 37597770 In addition: Warning message: In max(i) : no non-missing arguments to max; returning -Inf The dimensions of the data frame are (5,108), so the last line of the script becomes: DF-cbind(DF[,c(1:3)],ave(DF[, c(4:108)],DF$B, FUN = f)) Any idea how to solve this problem? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Error-tp3324531p3324531.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme in loop help
Dear R users I am new R user, execuse me I bother you, but I worked hard to find a solution: # data ID - c(1:100) set.seed(21) y - rnorm(100, 10,2) x1 - rnorm(100, 10,2) x2 - rnorm(100, 10,2) x3 - rnorm(100, 10,2) x4 - rnorm(100, 10,2) x5 - rnorm(100, 10,2) x6 - rnorm(100, 10,2) mydf - data.frame(ID,y, x1,x2, x3, x4, x5, x6) # just seperate analyis require(nlme) mod1- lme(fixed= y ~ x1 + x2, random = ~ 1 | ID) # i want to put subject / ID as random is it correct?? m1 - anova(mod1) m1 [1,4] # I am not getting exact value below .0001???, how to get one? m1 [2,4] m1 [3,4] # putting in a loop for(i in length(mydf)){ mylme - NULL print(m1[i+1,]- lme(fixed= mydf$y ~ mydf$x[,i] + mydf$x[,i+1], random = ~ 1 | ID))} could not help myself to work ! However I have the following output in my mind # The output in my mind a data fram with the following model p-intercept variable1 p-value1 variable2 p-value2 1.0001 x10.9452 x2 0.5455 2 .0001x30.3301 x 4 0.9905 3 .0001 x50.9971 x60.0487 I need a solution as I have tons of variables to work on. I am sowing these six just for an example. Thank you in advance; ram sharma [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
Hi Hui, May be sessionInfo() is what you are looking for. See ?sessionInfo as well as ?version for more details. You can run the following on your R session and see what comes up: sessionInfo() sessionInfo()$R.version$platform version$platform Then, you might use ifelse() to set up the right path. HTH, Jorge On Fri, Feb 25, 2011 at 1:23 PM, Hui Du wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
On Feb 25, 2011, at 12:23 PM, Hui Du wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD See ?.Platform, more specifically: On Unixen (eg. Linux, OSX) .Platform$OS.type [1] unix and on Windows, will be windows. If needed, look at the additional functions listed in the See Also on the help page (eg. ?Sys.info, etc.). HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
Hi, see ?R.version Something like if(version$os == mingw32) { path = /ABC} else { path = /DEF } might do it, but I'm not sure exactly what possible values version$os can take or what determines the value exactly. Best, Ista On Fri, Feb 25, 2011 at 1:23 PM, Hui Du hui...@dataventures.com wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF Any idea? Thanks. Best Regards, HXD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in different OS
On Feb 25, 2011, at 1:23 PM, Hui Du wrote: Hi All, I have two Rs, one has been installed in Windows system and another one has been installed under UNIX system. Is there any environmental variable or function to tell me which R I am using? The reason that I need to know it is under different system, the data path could be different. I want to do something like if it is R under Windows path = /ABC else if it is R under UNIX, path = /DEF ?version -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.