Re: [R] What is the best way to lag a time series?
On Sun, Dec 26, 2010 at 8:49 AM, Christian Schoder schoc...@newschool.edu wrote: Dear R-users, I've been using R for a while and I am very satisfied! Unfortunately, I still have not figured out an efficient and general way to construct and use lags of time series, especially when I need to work with different packages. Let me give an example. I have two time series x and y and I want to estimate a variaty of distributed lags models and run different tests (autocorrelation, etc). It is obvious that I need to be able to lag x and y in a flexible way. So far, my temporary solution was to construct the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it to R, which is not very satisfactory because it does not allow for much flexibility. Is there a straighforward command which allows me to easily construct a lag Perhaps ?diff. Liviu when required and which allows me to, for example, use the lm() command to fit a dynamic model and the bgtest() command to perform the breusch-godfrey test on the same model? Is it adviseable to use time series objects which consist of many time series (like a dataframe) or is it better to have it contain only one time series? I would be grateful for any hints and links. Thx! Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] object names from character strings
I realize this is probably pretty basic but I can't figure it out. I'm looping through an array, doing various calculations and producing a resulting data frame in each loop iteration. I need to give each data frame a different name. Although I can easily create a new character string for writing each frame to an output file, I cannot figure out how to convert such strings to corresponding object names within the R workspace itself, so as to give each d.f. a distinct name. The closest I got were various attempts with the as.name function, but couldn't get that to work either. Any help appreciated. Thanks. -- Jim Bouldin, PhD Research Ecologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the best way to lag a time series?
First off, there are data manipulation techniques that will beat doing it in a spreadsheet. For example: head(x, -1) is lagged 1 relative to tail(x, -1) But I think you are really looking for 'Lag' in the 'quantmod' package. On 26/12/2010 07:49, Christian Schoder wrote: Dear R-users, I've been using R for a while and I am very satisfied! Unfortunately, I still have not figured out an efficient and general way to construct and use lags of time series, especially when I need to work with different packages. Let me give an example. I have two time series x and y and I want to estimate a variaty of distributed lags models and run different tests (autocorrelation, etc). It is obvious that I need to be able to lag x and y in a flexible way. So far, my temporary solution was to construct the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it to R, which is not very satisfactory because it does not allow for much flexibility. Is there a straighforward command which allows me to easily construct a lag when required and which allows me to, for example, use the lm() command to fit a dynamic model and the bgtest() command to perform the breusch-godfrey test on the same model? Is it adviseable to use time series objects which consist of many time series (like a dataframe) or is it better to have it contain only one time series? I would be grateful for any hints and links. Thx! Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to specify ff object filepaths when reading a CSV file into a ff data frame.
Hi, I have done another simple test, I test the two syntext against a CSV file with only one column, both success, fdf - read.csv.ffdf(file=D:/rtemp/fftest2.csv,asffdf_args = list( col_args = list(filename=c(F:/a.f fdf ffdf (all open) dim=c(2,1), dimorder=c(1,2) row.names=NULL ffdf virtual mapping PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol PhysicalIsOpen col1 col1 integer integer FALSE FALSE FALSE 11 1 TRUE ffdf data col1 11 22 fdf - read.csv.ffdf(file=D:/rtemp/fftest2.csv,asffdf_args = list( col_args = c(list(filename=D:/a2.f fdf ffdf (all open) dim=c(2,1), dimorder=c(1,2) row.names=NULL ffdf virtual mapping PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol PhysicalIsOpen col1 col1 integer integer FALSE FALSE FALSE 11 1 TRUE ffdf data col1 11 22 Regards, Xiaobo Gu On Fri, Dec 24, 2010 at 11:27 PM, Xiaobo Gu guxiaobo1...@gmail.com wrote: Hi, The read.csv.ffdf function in package ff will create the ff object physical file in the default directories, I am trying to let the files created in the paths users specify, I think the point is to make use of the asffdf_args parameter, I have a test CSV file named D:\rtemp\fftest.csv, the content of the file is as following: col1,col2,col3 1,amber,2.4 2,linda,4.5 I tried the following code, hoping ff will create the physical files for col1,col2 and col3 to D:/a.f,D:/b.f,D:/c.f respectively fdf - read.csv.ffdf(file=D:/rtemp/fftest.csv,asffdf_args = list( col_args = c(list(filename=D:/a.f), list(filename=D:/b.f), list(filename=D:/c.f and the error message is : Error in as.ff.default(1:2, vmode = NULL, filename = D:/a.f, filename = D:/b.f, : formal argument filename matched by multiple actual arguments I also tried the following: fdf - read.csv.ffdf(file=D:/rtemp/fftest.csv,asffdf_args = list( col_args = list(filename=c(D:/a.f,D:/b.f,D:/c.f Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, : bad argument initdata for existing file; initializing existing file is invalid In addition: Warning messages: 1: In if (file.exists(filename)) { : the condition has length 1 and only the first element will be used 2: In if (file.exists(filename)) { : the condition has length 1 and only the first element will be used 3: In if (file.access(filename, 4) == -1) { : the condition has length 1 and only the first element will be used 4: In if (file.access(filename, 2) == -1) { : the condition has length 1 and only the first element will be used 5: In if (is.na(filesize)) stop(unable to open file) : the condition has length 1 and only the first element will be used My questions are: 1. What's the datatype of the col_args parameter of the as.ffdf function 2. If I can make layout of the asffdf_args parameter correct, how can I set the exact filenames for each column of the ff data frame. Regards, Xiaobo Gu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R2WinBugs data import error
For some purpose, I need to transfer a NAs array to WinBugs through R2WinBugs, But I constantly got an error message:'type' must be real for this format. Here is my data to transfer: x = matrix(data=NA,nrow=3,ncol=3) x = as.array(x) data - list (x) if I add a line to above setting, then I can pass R2WinBugs: x[1,1] = 0 If I manually input the NA array to WinBugs, I could get it running. So my original data set has no problem with WinBugs. -- View this message in context: http://r.789695.n4.nabble.com/R2WinBugs-data-import-error-tp3164106p3164106.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting mixtures with non-linear parameters constraints
Dear R users Does anyone happen to know a function to fit a Gaussian mixture using *non-linear* constraints between the parameters? (An EM the allows that will do the job obviously). Thank you in advance -- Jonathan Rosenblatt www.john-ros.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice splom: how to adjust space between tick marks and tick labels?
Dear expeRts, how can I decrease the space between the tick marks and the corresponding labels in an splom? See here: library(lattice) U - matrix(runif(4000), ncol = 8) splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and tick marks is/seems to be too large I checked ?panel.pairs but could not find an option for that. Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Doing a mixed-ANOVA after accounting for a covariate
Dear r helpers, I would like to look at the interaction between two two-level factors, one between and one within participants, after accounting for any variance due to practice (31 trials in each of two blocks) in the task. It seems to require treating practice as a covariate. All the examples I noticed for handling covariates (i.e. ANCOVA, including the ones in Faraway's Practical regression and anova using r) use lm(), but this doesn't handle repeated-measures. I thought of a solution in the form of first running a regression on the covariate: cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat) and then run the aov() on the residuals: m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar + Error(subj/withinVar, data=dat) Does it seem to be a valid answer to my problem? Is there an existing function that can do this (perhaps more appropriately)? Thank you for any help, dror [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to replace my double for loop which is little efficient!
Dear all, My double for loop as follows, but it is little efficient, I hope all friends can give me a vectorized program to replace my code. thanks x: is a matrix 202*263, that is 202 samples, and 263 independent variables num.compd-nrow(x); # number of compounds diss.all-0 for( i in 1:num.compd) for (j in 1:num.compd) if (i!=j) { S1-sum(x[i,]*x[j,]) S2-sum(x[i,]^2) S3-sum(x[j,]^2) sim2-S1/(S2+S3-S1) diss2-1-sim2 diss.all-diss.all+diss2} it will cost a long time to finish this computation! i really need rapid code to replace my code. thanks kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2WinBugs data import error
On Dec 26, 2010, at 12:44 AM, unsown wrote: For some purpose, I need to transfer a NAs array to WinBugs through R2WinBugs, But I constantly got an error message:'type' must be real for this format. Here is my data to transfer: x = matrix(data=NA,nrow=3,ncol=3) str(x) It is of mode logical. Try instead: x = matrix(vector(mode=numeric,0) ,nrow=3,ncol=3) x = as.array(x) data - list (x) Why are you making a list with a single character element? If you need to pass the matricx you just created in a list then try (and don't use data as the name : dat - list(x) if I add a line to above setting, then I can pass R2WinBugs: x[1,1] = 0 If I manually input the NA array to WinBugs, I could get it running. So my original data set has no problem with WinBugs. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] object names from character strings
On Dec 26, 2010, at 4:04 AM, Jim Bouldin wrote: I realize this is probably pretty basic but I can't figure it out. I'm looping through an array, doing various calculations and producing a resulting data frame in each loop iteration. I need to give each data frame a different name. Although I can easily create a new character string for writing each frame to an output file, I cannot figure out how to convert such strings to corresponding object names within the R workspace itself, so as to give each d.f. a distinct name. The closest I got were various attempts with the as.name function, but couldn't get that to work either. Any help appreciated. Thanks. Here's the first example in the help(assign) page: or(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6' nam - paste(r,i, sep=.) assign(nam, 1:i) } ls(pattern = ^r..$) -- Jim Bouldin, PhD Research Ecologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] object names from character strings
Consider storing the dataframes in a list so that you do not have to create unique names and it will also give you better control by keeping all the data together in one object. On Sun, Dec 26, 2010 at 4:04 AM, Jim Bouldin bouldi...@gmail.com wrote: I realize this is probably pretty basic but I can't figure it out. I'm looping through an array, doing various calculations and producing a resulting data frame in each loop iteration. I need to give each data frame a different name. Although I can easily create a new character string for writing each frame to an output file, I cannot figure out how to convert such strings to corresponding object names within the R workspace itself, so as to give each d.f. a distinct name. The closest I got were various attempts with the as.name function, but couldn't get that to work either. Any help appreciated. Thanks. -- Jim Bouldin, PhD Research Ecologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice splom: how to adjust space between tick marks and tick labels?
On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote: Dear expeRts, how can I decrease the space between the tick marks and the corresponding labels in an splom? See here: library(lattice) U - matrix(runif(4000), ncol = 8) splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and tick marks is/seems to be too large So you want more tick marks? I checked ?panel.pairs but could not find an option for that. What about the pscales argument? A single number would increase the number of ticks, or a list with at and labels values can be passed. Seem to be just what you asked for. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] T2 hoteling
Dear All It is very kind of you to guide me. When I want to run this line, I see this error stat.obs - apply(GS, 2, function(z) Hott2(t(DATA[which(z==1),]), cl)) Error in colSums(w * x) : 'x' must be an array of at least two dimensions cl - as.factor(y) GS: a matrix with 0 or 1 GS: gene sets - a data matrix with rows=genes, columns= gene sets, GS[i,j]=1 if gene i in gene set j GS[i,j]=0 otherwise Hott2 - function(x, y, var.equal=TRUE) #T2 hoteling Y- c(1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1) Data=transpose(X)= gene expression: row=40 gene, column=10 sample Data: there is in attachment file Thanks a lot - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doing a mixed-ANOVA after accounting for a covariate
On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote: Dear r helpers, I would like to look at the interaction between two two-level factors, one between and one within participants, after accounting for any variance due to practice (31 trials in each of two blocks) in the task. It seems to require treating practice as a covariate. All the examples I noticed for handling covariates (i.e. ANCOVA, including the ones in Faraway's Practical regression and anova using r) use lm(), but this doesn't handle repeated-measures. See if Dalgaard's piece in R-News offers better guidance: http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf I thought of a solution in the form of first running a regression on the covariate: cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat) and then run the aov() on the residuals: m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar + Error(subj/withinVar, data=dat) Does it seem to be a valid answer to my problem? Is there an existing function that can do this (perhaps more appropriately)? Thank you for any help, dror -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
bbslover wrote: x: is a matrix 202*263, that is 202 samples, and 263 independent variables num.compd-nrow(x); # number of compounds diss.all-0 for( i in 1:num.compd) for (j in 1:num.compd) if (i!=j) { S1-sum(x[i,]*x[j,]) S2-sum(x[i,]^2) S3-sum(x[j,]^2) sim2-S1/(S2+S3-S1) diss2-1-sim2 diss.all-diss.all+diss2} it will cost a long time to finish this computation! i really need rapid code to replace my code. Alternative 1: j-loop only needs to start at i+1 so for( i in 1:num.compd) { for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) { S1-sum(x[i,]*x[j,]) S2-sum(x[i,]^2) S3-sum(x[j,]^2) sim2-S1/(S2+S3-S1) diss2-1-sim2 diss2.all-diss2.all+diss2 } } diss2.all - 2 * diss2.all On my pc this is about twice as fast as your version (with 202 samples and 263 variables) Alternative 2: all sum() are not necessary. Use some matrix algebra: xtx - x %*% t(x) diss3.all - 0 for( i in 1:num.compd) { for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) { S1 - xtx[i,j] S2 - xtx[i,i] S3 - xtx[j,j] sim2-S1/(S2+S3-S1) diss2-1-sim2 diss3.all-diss3.all+diss2 } } diss3.all - 2 * diss3.all This is about four times as fast as alternative 1. I'm quite sure that more expert R gurus can get some more speed up. Note: I generated the x matrix with: set.seed(1);x-matrix(runif(202*263),nrow=202) (Timings on iMac 2.16Ghz and using 64-bit R) Berend -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the best way to lag a time series?
The correct answer to How to lag..? is almost certainly, Don't. The functionality of numerous time series packages and functions take care of this automatically for you (using suitable data structures, probably). Rather than trying to reinvent wheels, it might be wiser to consult the Time Series Task View on Cran to see what's there first. Incidentally, my limited understanding is that modern time series methods tend to use more appropriately specified covariance structures (e.g. arima models) rather than the lagged models of e.g. classical econometrics. But on this, I would happily stand correction. -- Cheers, Bert On Sun, Dec 26, 2010 at 12:21 AM, Liviu Andronic landronim...@gmail.com wrote: On Sun, Dec 26, 2010 at 8:49 AM, Christian Schoder schoc...@newschool.edu wrote: Dear R-users, I've been using R for a while and I am very satisfied! Unfortunately, I still have not figured out an efficient and general way to construct and use lags of time series, especially when I need to work with different packages. Let me give an example. I have two time series x and y and I want to estimate a variaty of distributed lags models and run different tests (autocorrelation, etc). It is obvious that I need to be able to lag x and y in a flexible way. So far, my temporary solution was to construct the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it to R, which is not very satisfactory because it does not allow for much flexibility. Is there a straighforward command which allows me to easily construct a lag Perhaps ?diff. Liviu when required and which allows me to, for example, use the lm() command to fit a dynamic model and the bgtest() command to perform the breusch-godfrey test on the same model? Is it adviseable to use time series objects which consist of many time series (like a dataframe) or is it better to have it contain only one time series? I would be grateful for any hints and links. Thx! Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculation of BIC done by leaps-package
Hi Folks, I've got a question concerning the calculation of the Schwarz-Criterion (BIC) done by summary.regsubsets() of the leaps-package: Using regsubsets() to perform subset-selection I receive an regsubsets object that can be summarized by summary.regsubsets(). After this operation the resulting summary contains a vector of BIC-values representing models of size i=1,...,K. My problem is that I can't reproduce the calculation of these BIC values. I already tried to use extractAIC(...,k=log(n)), AIC(...,k=log(n)) and manual calculation using the RSS-vector but none matches the calculation done by the summary-function. I already checked for constants that could be the reason for the differences but i found out, that the values vary apart of adding a constant term. The source code of the leaps-package states the package calculates the BIC this way: bicvec-c(bicvec,(n1+ll$intercept)*log(vr)+i*log(n1+ll$intercept)) with: ## number of observations - Intercept: n1-ll$nn-ll$intercept ## fraction of sum of squared residulas model i ## and sum of squared residuals null model, I ## just can't understand why the vector ll$ress ## is subscripted double vr-ll$ress[i,j]/ll$nullrss ## maximum number of variables i ^^ This seems to match the calculation done by extractAIC but it doesn't! Maybe anyone can tell me about the reason of the variation of the BIC-values? Best regards, Jan Henckens ### Minimal Example: require(leaps) bridge - read.table(http://www.stat.tamu.edu/~sheather/book/docs/datasets/bridge.txt;, header=TRUE) fmla.full - formula(Time ~ .) (lm.model - summary(regsubsets(fmla.full,data=bridge,weights=NULL, intercept=TRUE, method=forward))) lm.model$bic ### The first two models constructed via lm(): extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge))) extractAIC(lm(Time~Dwgs+Case,data=bridge),k=log(nrow(bridge))) or see http://www.henckens.de/min_example.R -- jan.henckens | jöllenbecker str. 58 | 33613 bielefeld | germany tel 0521-5251970 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing basic Multiple Sequence Alignment in R?
From: marchy...@hotmail.com To: tal.gal...@gmail.com; r-help@r-project.org Subject: RE: [R] Performing basic Multiple Sequence Alignment in R? Date: Tue, 21 Dec 2010 17:03:17 -0500 From: tal.gal...@gmail.com Date: Tue, 21 Dec 2010 20:17:18 +0200 Subject: Re: [R] Performing basic Multiple Sequence Alignment in R? To: r-help@r-project.org Dear Mike and Thomas, From what I gathered here (Thanks to Joris Meys): http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434 There is an R interface to the MUSCLE algorithm in the bio3d package (function seqaln()). But not one for clustal. I will probably end up using pairwiseAlignment on pairs of allignments with some sort of stopping rules (I'll have to play with it to see how it works). http://scholar.google.com/scholar?hl=enq=%22exact+string+matching%22+alignment http://citeseerx.ist.psu.edu/search?q=exact+string+matching+alignment+dnasubmit=Searchsort=rel Certainly if you are flexible and can use whatever may be close in R that is fine but I seem to recall that exact string matching was a fast and interesting way to go and maybe some of the authors above, in the interest of promoting their work, would help implement an R version if there is demand. I seem to recall I did something like building indexes of the strings to be aligned first, finding substrings that were unique to a given string but appeared only once in each of the sequences to be aligned ( this was the most restrictive criterion but you can imagine how to make it more accomodating). Now that you got me started, up front tokenizing or compiling of input sequences ( usually no more than indexing them in some way ) made many later operations like alignment go faster. This may have ended up being similar to BLAST but now I can't really recall. Anyway, my point here is that some where in R there may be packages that generate intermediate forms useful across disciplines- mining data from text, linquistics, or macromolecule analysis. In fact, the indexing process helps find things that have migrated a long ways from their original place and there are probably other non-alignment related things you could get out of the approach. If you pursue this or make some decision would you please get back to us, at least me off list? I just went back through my old code and hit the search links I posted above, this still seems like quite an interesting area and the issues do not appear to be confined to bio. Looking at my method names in my code, it looks like I had a way to supply fixed patterns, probably from places like PROSITE or CDD, for use as the string you probably meant to suggest although I seem to think it would make more sense to discover these based on the strings it finds in the sequences. I seem to recall I could do 2 sequences reasonably well with some quirks and limitations but gave up when I tried to do multiple alignments ( actually there was no point at the time). Recent literature seems to still talk about sub-quadratic time although practically for large sequences the real execution time could be dominated by VM not algorithm order LOL. The indexing also makes it possible to find related but distant strings, something that may be of interest but not normally thought of as alignment between strings perturbed in limited ways ( edit distance being rather restricted to a few operations). If you find a specific paper or approach that seems to work that may be of interest to many here and indeed may be implemented under some other name. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doing a mixed-ANOVA after accounting for a covariate
Thank you David, for the reference to Dalgaard's paper in Rnews_2007-2. Unfortunately I don't seem to have the mathematical-statistical sophistication required to adapt the example in Dalgaard's paper for my case. I hope someone can suggest a less-mathematical direction for solution. Thanks again, dror On Sun, Dec 26, 2010 at 3:59 PM, David Winsemius dwinsem...@comcast.netwrote: On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote: Dear r helpers, I would like to look at the interaction between two two-level factors, one between and one within participants, after accounting for any variance due to practice (31 trials in each of two blocks) in the task. It seems to require treating practice as a covariate. All the examples I noticed for handling covariates (i.e. ANCOVA, including the ones in Faraway's Practical regression and anova using r) use lm(), but this doesn't handle repeated-measures. See if Dalgaard's piece in R-News offers better guidance: http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf I thought of a solution in the form of first running a regression on the covariate: cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat) and then run the aov() on the residuals: m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar + Error(subj/withinVar, data=dat) Does it seem to be a valid answer to my problem? Is there an existing function that can do this (perhaps more appropriately)? Thank you for any help, dror -- David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doing a mixed-ANOVA after accounting for a covariate
On Dec 26, 2010, at 9:55 AM, Dror D Lev wrote: Thank you David, for the reference to Dalgaard's paper in Rnews_2007-2. Unfortunately I don't seem to have the mathematical-statistical sophistication required to adapt the example in Dalgaard's paper for my case. I hope someone can suggest a less-mathematical direction for solution. Here's what I would suggest if you want to stay more concrete. If you are not prepared to offer a minimal subset of your own data and also provide working or non-working code that uses it, then pick an available dataset that resembles it in structure and autocorrelation. One possibility would be the BodyWeight dataset in either the nlme or the MEMSS packages (although see below for my current level of uncertainty regarding your data). require(nlme) plot(BodyWeight) Thanks again, dror On Sun, Dec 26, 2010 at 3:59 PM, David Winsemius dwinsem...@comcast.net wrote: On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote: Dear r helpers, I would like to look at the interaction between two two-level factors, one between and one within participants, after accounting for any variance due to practice (31 trials in each of two blocks) in the task. It seems to require treating practice as a covariate. I had trouble figuring out exactly what you meant by 31 trials in two blocks. Was that 31 trials by each participant? Or was it two trials by each of 31 participants divided unequally into two groups? -- David. All the examples I noticed for handling covariates (i.e. ANCOVA, including the ones in Faraway's Practical regression and anova using r) use lm(), but this doesn't handle repeated-measures. See if Dalgaard's piece in R-News offers better guidance: http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf I thought of a solution in the form of first running a regression on the covariate: cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat) and then run the aov() on the residuals: m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar + Error(subj/withinVar, data=dat) Does it seem to be a valid answer to my problem? Is there an existing function that can do this (perhaps more appropriately)? Thank you for any help, dror -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice splom: how to adjust space between tick marks and tick labels?
Dear David, thank you for your answer. As I wrote, I am looking for an option to control the *space* between the tick marks and the corresponding labels. I am happy with the *number* of tick marks and their default values. As far as I know, pscales can't control the space, so it is *not* what I am looking for. Cheers, Marius On 2010-12-26, at 14:36 , David Winsemius wrote: On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote: Dear expeRts, how can I decrease the space between the tick marks and the corresponding labels in an splom? See here: library(lattice) U - matrix(runif(4000), ncol = 8) splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and tick marks is/seems to be too large So you want more tick marks? I checked ?panel.pairs but could not find an option for that. What about the pscales argument? A single number would increase the number of ticks, or a list with at and labels values can be passed. Seem to be just what you asked for. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] can't install R with *local* gcc
Hello, we re-distribute R with our open-source platform http://www.ok-sat-library.org/ where we use R mainly for evaluation of computational experiments. Due to the various platforms, we build everything from source, and that works fine. Until now, that is: there are circumstances (for example in computer-science computer labs) where no Fortran-compiler is provided, and the users (students) can't change that. Thus we now try to build gfortran as part of the GCC version 4.2.4 suite, and building R using that local gcc. We already use the local C and C++ compiler of the suite extensively, and that all works. But we don't have any experience with using gfortran. The gcc-build works fine, everything seems alright --- only R (version 2.11.0) won't build with it: We use the configuration F77=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran FC=${F77} CC=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gcc CXX=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/g++ LDFLAGS=-L /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib ./configure --prefix=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/R/2.11.0 (the same problems with lib64 instead of lib, by the way) which yields checking for Fortran 77 libraries of /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran... -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../.. -lgfortranbegin -lgfortran -lm /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/libgfortran.a which looks alright to me (but I don't know Fortran), but then we get checking for dummy main to link with Fortran 77 libraries... none checking for Fortran 77 name-mangling scheme... lower case, underscore, no extra underscore checking whether /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran appends underscores to external names... yes checking whether /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran appends extra underscores to external names... no checking whether mixed C/Fortran code can be run... configure: WARNING: cannot run mixed C/Fortran code configure: error: Maybe check LDFLAGS for paths to Fortran libraries? make: *** [R_base] Error 1 The R installation-documentation doesn't say much on using local compilers (more or less nothing), and everything we could get from it are the above settings of environment variables. Internet search reveals old stuff on libg2c which appears not to exist anymore, some recommendations not to build from sources (which is not an option for us), an open Sage ticket (apparently without any further work on it), and a request to the R-list with apparently no reply. Since we are working in a well-defined setting (gcc is fully under our control), and apparently all the libraries needed are build by gcc (though this is nowhere said or (dream) specified), it should be possible to solve that problem. I very hope to get some hints (we can't get R running (for our system!) otherwise). The error is exactly the same on various systems (all 64-bit machines, Intel and AMD). If we use the system-gcc (4.5.0 or 4.1.2) then the installation of R works without problems; here (for one of the machines) some data version platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 11.0 year 2010 month 04 day22 svn rev51801 language R version.string R version 2.11.0 (2010-04-22) Thanks for you help in any case! Oliver __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package arules - 'transpose' of the transactions
Hi Kohleth, Suppose this is my list of transactions: set.seed(200) tran=random.transactions(100,3) inspect(tran) itemstransactionID 1 {item80}trans1 2 {item8, item20}trans2 3 {item28}trans3 I want to get the 'transpose' of the data, i.e. transactionID items 1 {trans2}item8 2 {trans2}item20 3 {trans3}item28 4 {trans1}item80 This is not the transpose. The data structure you want can be created this way: l - LIST(tran) single - data.frame(ID=rep(names(l), lapply(l, length)), items=unlist(l), row.names=NULL) single ID items 1 trans1 item80 2 trans2 item8 3 trans2 item20 4 trans3 item28 I tried converting tran into a matrix, then transpose it, then convert it back to transactions. But my dataset is actually very very large, so I wonder if there is any faster method? The method above should be very fast. -Michael Thanks -- Dr. Michael Hahsler, Visiting Assistant Professor Department of Computer Science and Engineering Lyle School of Engineering Southern Methodist University, Dallas, Texas (214) 768-8878 * mhahs...@lyle.smu.edu * http://lyle.smu.edu/~mhahsler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lost in POSIX
Dimitri Shvorob wrote: df = structure(list(t = structure(c(1033963406.044, 1033974144.847, + 1033988418.836), class = c(POSIXt, POSIXct))), .Names = t, row.names = c(NA, + 3L), class = data.frame) df$min = trunc(df$t,units=mins) does not work, Jeff; you will see that my original post suggests familiarity with 'trunc' :) Well, perhaps you should read the error message or the Value section of ?trunc.POSIXt, and convert the result to a compact type... df$min - trunc( df$t, units=mins ) Error in `$-.data.frame`(`*tmp*`, min, value = list(sec = 0, min = c(3L, : replacement has 9 rows, data has 3 df$min - as.POSIXct( trunc( df$t, units=mins ) ) str(df) 'data.frame': 3 obs. of 2 variables: $ t : POSIXct, format: 2002-10-06 21:03:26 2002-10-07 00:02:24 ... $ min: POSIXct, format: 2002-10-06 21:03:00 2002-10-07 00:02:00 ... -- --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lost in POSIX
Dimitri Shvorob wrote: .. One issue with the solution proposed by Jeff is that the transformed column does not have the original's type: x = structure(list(time = structure(c(1020232904.818, 1020232904.818 ), class = c(POSIXt, POSIXct), tzone = ), price = c(321, 323.5), minute = c(1020232860, 1020232860)), .Names = c(time, price, minute), row.names = 1:2, class = data.frame) minute - function(t) { d - as.POSIXlt(t, origin = as.Date(1970-01-01)) d$sec - 0 as.POSIXct(d) } x$minute = sapply(x$time, minute) head(x) time price minute 1 2002-05-01 07:01:44 321.0 1020232860 2 2002-05-01 07:01:44 323.5 1020232860 class(x.l$minute) [1] numeric That is not an issue with the minute function, as you can see if you evaluate minute(x$time) [1] 2002-04-30 23:01:00 PDT 2002-04-30 23:01:00 PDT or str(minute(x$time)) POSIXct[1:2], format: 2002-04-30 23:01:00 2002-04-30 23:01:00 rather, you are seeing a side effect of sapply. -- --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lost in POSIX
On Dec 25, 2010, at 2:25 PM, Dimitri Shvorob wrote: df = structure(list(t = structure(c(1033963406.044, 1033974144.847, + 1033988418.836), class = c(POSIXt, POSIXct))), .Names = t, row.names = c(NA, + 3L), class = data.frame) df$min = trunc(df$t,units=mins) does not work, ??? seems to work on my system. Perhaps you should say what you mean by not work df t min 1 2002-10-07 00:03:26 2002-10-07 00:03:00 2 2002-10-07 03:02:24 2002-10-07 03:02:00 3 2002-10-07 07:00:18 2002-10-07 07:00:00 sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid splines stats graphics grDevices utils datasets methods base other attached packages: [1] nlme_3.1-97lme4_0.999375-37 Matrix_0.999375-46 zoo_1.6-4 ggplot2_0.8.8 proto_0.3-8 reshape_0.8.3 plyr_1.2.1 MASS_7.3-9 [10] rms_3.1-0 Hmisc_3.8-3survival_2.36-2 sos_1.3-0 brew_1.0-4 lattice_0.19-13 loaded via a namespace (and not attached): [1] cluster_1.13.2 stats4_2.12.1 tools_2.12.1 Jeff; you will see that my original post suggests familiarity with 'trunc' :) -- View this message in context: http://r.789695.n4.nabble.com/Lost-in-POSIX-tp3052768p3163914.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A question on Statistics
I am not a pure Statistics background and therefore please forgive me if this question (which is not R related either) is too trivial. In many Statistics literature I find following statement: restrictions in different coefficients matrices have to be imposed to ensure uniqueness of the parametrization. Can somebody tell me what is the meaning of Uniqueness in the parametrization? Does it mean that, two different coefficient matrices may give exactly the same result, and therefore coefficient matrix is not unique? I find there are many members (perhaps all) in this forum who are really masters in Statistics. Therefore I hope somebody will clarify me with the intuition behind that. Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doing a mixed-ANOVA after accounting for a covariate
Dror, Please look at the demo(MMC.apple) in the HH package install.packages(HH) ## if you don't already have it. library(HH) demo(MMC.apple) Please reply to the list if there are further queries. Rich On Sun, Dec 26, 2010 at 7:42 AM, Dror D Lev dror.te...@gmail.com wrote: Dear r helpers, I would like to look at the interaction between two two-level factors, one between and one within participants, after accounting for any variance due to practice (31 trials in each of two blocks) in the task. It seems to require treating practice as a covariate. All the examples I noticed for handling covariates (i.e. ANCOVA, including the ones in Faraway's Practical regression and anova using r) use lm(), but this doesn't handle repeated-measures. I thought of a solution in the form of first running a regression on the covariate: cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat) and then run the aov() on the residuals: m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar + Error(subj/withinVar, data=dat) Does it seem to be a valid answer to my problem? Is there an existing function that can do this (perhaps more appropriately)? Thank you for any help, dror [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can't install R with *local* gcc
On Dec 26, 2010, at 17:50 , Oliver Kullmann wrote: Hello, we re-distribute R with our open-source platform http://www.ok-sat-library.org/ where we use R mainly for evaluation of computational experiments. Due to the various platforms, we build everything from source, and that works fine. Until now, that is: there are circumstances (for example in computer-science computer labs) where no Fortran-compiler is provided, and the users (students) can't change that. Thus we now try to build gfortran as part of the GCC version 4.2.4 suite, and building R using that local gcc. We already use the local C and C++ compiler of the suite extensively, and that all works. But we don't have any experience with using gfortran. The gcc-build works fine, everything seems alright --- only R (version 2.11.0) won't build with it: We use the configuration F77=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran FC=${F77} CC=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gcc CXX=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/g++ LDFLAGS=-L /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib ./configure --prefix=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/R/2.11.0 (the same problems with lib64 instead of lib, by the way) which yields checking for Fortran 77 libraries of /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran... -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../.. -lgfortranbegin -lgfortran -lm /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/libgfortran.a which looks alright to me (but I don't know Fortran), but then we get checking for dummy main to link with Fortran 77 libraries... none checking for Fortran 77 name-mangling scheme... lower case, underscore, no extra underscore checking whether /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran appends underscores to external names... yes checking whether /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran appends extra underscores to external names... no checking whether mixed C/Fortran code can be run... configure: WARNING: cannot run mixed C/Fortran code configure: error: Maybe check LDFLAGS for paths to Fortran libraries? make: *** [R_base] Error 1 The R installation-documentation doesn't say much on using local compilers (more or less nothing), and everything we could get from it are the above settings of environment variables. Internet search reveals old stuff on libg2c which appears not to exist anymore, some recommendations not to build from sources (which is not an option for us), an open Sage ticket (apparently without any further work on it), and a request to the R-list with apparently no reply. Since we are working in a well-defined setting (gcc is fully under our control), and apparently all the libraries needed are build by gcc (though this is nowhere said or (dream) specified), it should be possible to solve that problem. I very hope to get some hints (we can't get R running (for our system!) otherwise). The error is exactly the same on various systems (all 64-bit machines, Intel and AMD). If we use the system-gcc (4.5.0 or 4.1.2) then the installation of R works without problems; here (for one of the machines) some data I suppose r-devel would be a better mailing list for this sort of thing, but since we're here: Hint #1: Expect the process to be somewhat painful... Hint #2: Study the configure script and config.log to the level where you can reproduce the mixed C/Fortran code that it is trying to build and run and with which commands it is trying to build it Hint #3: Figure out what it really should have done to build such code An alternative hint is first to try setting up a very simple Fortran function to, say, double a number, and a C main program that calls it. Then try figuring out the compiler/linker options to make it work. (That is of course what configure was trying to do in the first place, but doing it by hand might be less prone to getting multiple toolchains mixed up.) version platform x86_64-unknown-linux-gnu arch x86_64 os
Re: [R] A question on Statistics
Maithula: On Sun, Dec 26, 2010 at 11:09 AM, Maithula Chandrashekhar m.chandrashekhar1...@gmail.com wrote: I am not a pure Statistics background and therefore please forgive me if this question (which is not R related either) is too trivial. In many Statistics literature I find following statement: restrictions in different coefficients matrices have to be imposed to ensure uniqueness of the parametrization. Can somebody tell me what is the meaning of Uniqueness in the parametrization? Does it mean that, two different coefficient matrices may give exactly the same result, and therefore coefficient matrix is not unique? -- yes. See the section on contrast matrices in Venables and Ripley's Modern Applied Statistics with S (MASS) for a concise but, I think, illuminating explanation. (It's in the chapter on linear models/regression). -- Bert I find there are many members (perhaps all) in this forum who are really masters in Statistics. Therefore I hope somebody will clarify me with the intuition behind that. Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about mars() -function
Hi! I have some questions about MARS model's coefficient of determination. I use the MARS method in my master's thesis and I have noticed some problems with the MARS model's R^2. You can see the following example that the MARS model's R^2 is too big when i have used mars() -function for MARS model building, and when I have made MARS-model using a linear regression, it gives much smaller R^2. So can you please tell me some information about why the MARS model R^2 is so big? How can I get the MARS model´s correct R^2 in R-projector some another way than in the following example or by calculating it myself using R^2-formula? I hope you can reply soon. Best regards, Tiina Hakanen library(ElemStatLearn) library(mda) data-ozone m-mars(data[,-1], data[,1], nk=4) m$factor[m$s,] m$cuts[m$s,] m$coef marsmodel-lm(data[,1]~m$x-1) summary(marsmodel) Call: lm(formula = data[, 1] ~ m$x - 1) Residuals: Min 1Q Median 3Q Max -36.264 -15.993 -2.351 9.993 122.793 Coefficients: Estimate Std. Error t value Pr(|t|) m$x1 52.9783 3.8894 13.621 2e-16 *** m$x2 4.7383 0.9599 4.936 2.92e-06 *** m$x3 -1.9428 0.3084 -6.300 6.61e-09 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 23.38 on 108 degrees of freedom Multiple R-squared: 0.8147, Adjusted R-squared: 0.8095 F-statistic: 158.2 on 3 and 108 DF, p-value: 2.2e-16 knot1 - function (x,k) ifelse(x k, x-k, 0) knot2 - function(x, k) ifelse(x k, k-x, 0) reg - lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data) summary(reg) Call: lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature, 85), data = data) Residuals: Min 1Q Median 3Q Max -36.264 -15.993 -2.351 9.993 122.793 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 52.9783 3.8894 13.621 2e-16 *** knot1(temperature, 85) 4.7383 0.9599 4.936 2.92e-06 *** knot2(temperature, 85) -1.9428 0.3084 -6.300 6.61e-09 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 23.38 on 108 degrees of freedom Multiple R-squared: 0.5153, Adjusted R-squared: 0.5064 F-statistic: 57.42 on 2 and 108 DF, p-value: 2.2e-16 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.data? without separator
Hello, I have a problem with read.data. For example I have a file # comment 1?0001010101 101010??1010 with comment on first line and data layout without separator. How I could read data that each character\sign was in another column. It is trivial probably, but I have no idea for it. Thank's, Kacper -- View this message in context: http://r.789695.n4.nabble.com/read-data-without-separator-tp3164358p3164358.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] T2 hoteling
On 12/27/2010 12:43 AM, leyla khodakarim wrote: Dear All It is very kind of you to guide me. When I want to run this line, I see this error stat.obs- apply(GS, 2, function(z) Hott2(t(DATA[which(z==1),]), cl)) Error in colSums(w * x) : 'x' must be an array of at least two dimensions cl- as.factor(y) GS: a matrix with 0 or 1 GS: gene sets - a data matrix with rows=genes, columns= gene sets, GS[i,j]=1 if gene i in gene set j GS[i,j]=0 otherwise Hott2- function(x, y, var.equal=TRUE) #T2 hoteling Y- c(1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1) Data=transpose(X)= gene expression: row=40 gene, column=10 sample Data: there is in attachment file Hi Leyla, Your attachment didn't make it to the list, but the problem may be that which(z==1) reduces the matrix (array? data frame?) X to a vector. One other thing that looks funny is the capitalization. In R, X and x are different, as are DATA and Data. First thing is to just print out the data you are trying to analyze: DATA[which(z==1)] and see if it really is an array with at least two dimensions. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parsing a Simple Chemical Formula
Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, Bryan *** Bryan Hanson Professor of Chemistry Biochemistry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levelplot blocks size
Thanks for your advice, but my data is not decimals, so I don't need to round the values. Instead, what I need to really do is group the values into larger blocks. My data looks sort of like this: xy z 00687 0164 0271 0355 0452 0551 0638 0738 0854 0949 . . . 987 9881 999 9981 999 9991 But what I need to do is make it so that on the graph rather than having tiny little dots for each point (as shown in the bigplot diagram), there are bigger points, so say 0=x10, 0=y10 is one point in the lower left, rather than having 100 points for each x,y value. The same strategy should then be applied to the whole graph. Any ideas how to achieve this? I'm sure this is quite a common thing to do want to with heatmaps?? Thanks, Jonathan -- View this message in context: http://r.789695.n4.nabble.com/levelplot-blocks-size-tp3089972p3164564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLS with corAR(1) correlation structure residual/standard error calculation
I am using the gls function to fit a two-stage least squares model with first order autoregressive error terms. Since there is no automated adjustment for the use of two-stage least squares in this package, I am trying to manually replicate standard errors of the coefficient estimates in order to adjust for a first stage OLS estimate of endogenous variables. However, thus far I have been unable to replicate the residuals or standard errors produced by this function. My understanding is outlined below, but using this approach does not yield the reported results. Is anyone familiar with the inner workings of this function and can either explain the calculation of the standard errors or provide code that explains the inner workings of this function. Thanks! Example of the model I am running: model1- gls(Y~ X1I + X2 + X3 + X4, data=Dat1, correlation = corAR1(), method = ML) My understanding of model errors: Y = b_0 + X1 b_1+ ...Xk b_k + Z Z_t =phi Z_{t-1) + e_t The residuals reported by GLS are the Z's, while the white noise terms are the e's. I cannot replicate the reported residuals using this approach. I also do not know how Z_0 should be calculated, i.e. what does the first step of this recursive procedure look like? From the residuals, I also cannot replicate the reported standard errors. I am using se(b_j) = sqrt(sigma^2/sum(x_i-x_mean)^2) where sigma =sqrt(SSR/df) Any help on this or explanation of how GLS works would be much appreciated. Any clarification would be much appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
try this: f.extract - function(formula) + { + # pattern to match the initial chemical + # assumes chemical starts with an upper case and optional lower case followed + # by zero or more digits. + first - ^([[:upper:]][[:lower:]]?)([0-9]*).* + # inverse of above to remove the initial chemical + last - ^[[:upper:]][[:lower:]]?[0-9]*(.*) + result - list() + extract - formula + # repeat as long as there is data + while ((start - nchar(extract)) 0){ + chem - sub(first, '\\1 \\2', extract) + extract - sub(last, '\\1', extract) + # if the number of characters is the same, then there was an error + if (nchar(extract) == start){ + warning(Invalid formula:, formula) + return(NULL) + } + # append to the list + result[[length(result) + 1L]] - strsplit(chem, ' ')[[1]] + } + result + } f.extract(C5H11BrO) [[1]] [1] C 5 [[2]] [1] H 11 [[3]] [1] Br [[4]] [1] O f.extract(H2O) [[1]] [1] H 2 [[2]] [1] O f.extract(CCC) [[1]] [1] C [[2]] [1] C [[3]] [1] C f.extract(Crr) # bad NULL Warning message: In f.extract(Crr) : Invalid formula:Crr On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, Bryan *** Bryan Hanson Professor of Chemistry Biochemistry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
There might be something simpler, but this is what I came up with: form = C5H11BrO ups = c(gregexpr([[:upper:]], form)[[1]], nchar(form) + 1) seperated = sapply(1:(length(ups)-1), function(x) substr(form, ups[x], ups[x+1] - 1)) elements = gsub([[:digit:]], , seperated) nums = gsub([[:alpha:]], , seperated) ans = data.frame(element = as.character(elements), num = as.numeric(ifelse(nums == , 1, nums)), stringsAsFactors = FALSE) -- View this message in context: http://r.789695.n4.nabble.com/Parsing-a-Simple-Chemical-Formula-tp3164562p3164581.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.data? without separator
I have a problem with 'read.data' also in that I don't see that as a function in the 'base'; I assume you meant read.table. Also you did not indicate is all the lines were the same length. Here is a solution to return a list is each character broken out separately. x - readLines(textConnection(# comment + 1?0001010101 + 101010??1010)) closeAllConnections() # split lines 2-n into a list of separate characters result - lapply(x[-1], function(.line) strsplit(.line, '')[[1]]) result [[1]] [1] 1 ? 0 0 0 1 0 1 0 1 0 1 [[2]] [1] 1 0 1 0 1 0 ? ? 1 0 1 0 On Sun, Dec 26, 2010 at 1:04 PM, Fror f...@interia.pl wrote: Hello, I have a problem with read.data. For example I have a file # comment 1?0001010101 101010??1010 with comment on first line and data layout without separator. How I could read data that each character\sign was in another column. It is trivial probably, but I have no idea for it. Thank's, Kacper -- View this message in context: http://r.789695.n4.nabble.com/read-data-without-separator-tp3164358p3164358.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
On Dec 26, 2010, at 6:29 PM, Bryan Hanson wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO Well here's how I see it: The form can be split with a regular expression: Capital letter followed by zero or one lower, followeed by a various number of digits greg - gregexpr([A-Z]{1}[a-z]?[0-9]*, form) Append a number equal to one moe lan the ength for reasins that will become clear ugreg - c(unlist(greg), nchar(form)+1) Then use substring function to serially pick from a split point to one minus the next split point (or in that case of the last element one minus the length of the string: sapply(1:(length(ugreg)-1), function(z) substr(form, ugreg[z], ugreg[z+1]-1) ) [1] C5 H11 Br O Then you can split these triples (cap,lower,n) and if n is absent assume 1. sub((\\d*)$, , sapply(1:(length(ugreg)-1), # blank out the digits function(z) substr(form, ugreg[z], ugreg[z+1]-1) ) ) [1] C H Br O sub(^$, 1, sub(([A-Za-z]*), ,# subst 1 for empty strings sapply(1:(length(ugreg)-1), function(z) substr(form, ugreg[z], ugreg[z +1]-1) ) ) ) [1] 5 11 1 1 If you limited the number of elements searched for, it might improve the error trapping, I suppose. -- David. I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, Bryan *** Bryan Hanson Professor of Chemistry Biochemistry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
Have you considered the 'CHNOSZ' package? makeup(C5H11BrO ) count C 5 H 11 Br 1 O 1 I found this using the 'sos' package as follows: library(sos) cf - ???'chemical formula' found 21 matches; retrieving 2 pages cf The print method for cf opened the results in a web browser, which showed that the CHNOSZ package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the CHNOSZ package is devoted to Chemical Thermodynamics and Activity Diagrams and provides many more capabilities that might interest you. Hope this helps. Spencer On 12/26/2010 5:01 PM, Bryan Hanson wrote: Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I have not looked at these vignettes, but most vignettes provide excellent introductions (though rarely with complete coverage) of important capabilities of the package. (The 'sos' package includes a vignette, which exposes more capabilities than the example below.) ## Have you considered the 'CHNOSZ' package? makeup(C5H11BrO ) count C 5 H 11 Br 1 O 1 I found this using the 'sos' package as follows: library(sos) cf - ???'chemical formula' found 21 matches; retrieving 2 pages cf The print method for cf opened the results in a web browser, which showed that the CHNOSZ package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the CHNOSZ package is devoted to Chemical Thermodynamics and Activity Diagrams and provides many more capabilities that might interest you. Hope this helps. Spencer On 12/26/2010 5:01 PM, Bryan Hanson wrote: Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drop column from a data frame
I am trying to drop a column of a data frame. The code below attempts to drop a numeric column (which does not work but gives no error or warning) and a factor column (which does not work but gives an error). I would appreciate someone telling me why my code does not work, and suggesting code that will work. Thanks, John rm(dfxyz,dfxz,dfxy) # create the data frame. dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5 dfxyz names(dfxyz) # try to drop y column # does not work, does not produce error message dfxz - dfxyz[,-(dfxyz$y)] dfxz # try to drop z column # does not work, produces error message: # In Ops.factor(df$z) : - not meaningful for factors dfxy - dfxyz[,-dfxyz$z] dfxy John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice splom: how to adjust space between tick marks and tick labels?
On 2010-12-26 08:26, Marius Hofert wrote: Dear David, thank you for your answer. As I wrote, I am looking for an option to control the *space* between the tick marks and the corresponding labels. I am happy with the *number* of tick marks and their default values. As far as I know, pscales can't control the space, so it is *not* what I am looking for. Marius, I think that you mean something like the following: U - matrix(runif(300), ncol = 3) splom(U, par.settings = list( axis.components = list( left = list(pad1 = 3) ) ) ) which will adjust the left axis; you'll have to add right, top, bottom components to handle those as well. Have a look at what trellis.par.get() produces and check the axis.components section. Peter Ehlers Cheers, Marius On 2010-12-26, at 14:36 , David Winsemius wrote: On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote: Dear expeRts, how can I decrease the space between the tick marks and the corresponding labels in an splom? See here: library(lattice) U- matrix(runif(4000), ncol = 8) splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and tick marks is/seems to be too large So you want more tick marks? I checked ?panel.pairs but could not find an option for that. What about the pscales argument? A single number would increase the number of ticks, or a list with at and labels values can be passed. Seem to be just what you asked for. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
Thanks Spencer, I'll definitely have a look at this package and it's vignettes. I believe I have looked at it before, but didn't catch it on this particular search. Bryan On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote: p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I have not looked at these vignettes, but most vignettes provide excellent introductions (though rarely with complete coverage) of important capabilities of the package. (The 'sos' package includes a vignette, which exposes more capabilities than the example below.) ## Have you considered the 'CHNOSZ' package? makeup(C5H11BrO ) count C 5 H 11 Br 1 O 1 I found this using the 'sos' package as follows: library(sos) cf - ???'chemical formula' found 21 matches; retrieving 2 pages cf The print method for cf opened the results in a web browser, which showed that the CHNOSZ package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the CHNOSZ package is devoted to Chemical Thermodynamics and Activity Diagrams and provides many more capabilities that might interest you. Hope this helps. Spencer On 12/26/2010 5:01 PM, Bryan Hanson wrote: Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drop column from a data frame
assign NULL to the column: dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5 dfxyz x y z 1 1 11 0 2 2 12 0 3 3 13 0 4 4 14 0 5 5 15 0 6 6 16 1 7 7 17 1 8 8 18 1 9 9 19 1 10 10 20 1 dfxyz$y - NULL dfxyz x z 1 1 0 2 2 0 3 3 0 4 4 0 5 5 0 6 6 1 7 7 1 8 8 1 9 9 1 10 10 1 On Sun, Dec 26, 2010 at 8:22 PM, John Sorkin jsor...@grecc.umaryland.edu wrote: I am trying to drop a column of a data frame. The code below attempts to drop a numeric column (which does not work but gives no error or warning) and a factor column (which does not work but gives an error). I would appreciate someone telling me why my code does not work, and suggesting code that will work. Thanks, John rm(dfxyz,dfxz,dfxy) # create the data frame. dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5 dfxyz names(dfxyz) # try to drop y column # does not work, does not produce error message dfxz - dfxyz[,-(dfxyz$y)] dfxz # try to drop z column # does not work, produces error message: # In Ops.factor(df$z) : - not meaningful for factors dfxy - dfxyz[,-dfxyz$z] dfxy John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for ...{{dropped:17}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
I think the OP had a very limited need but there is something more sophisticated that may be of larger insterest called SMILES which attempts to capture some structural information about a molecule in a text sting. Reducing pictures to tractable text is an important step in many analysis efforts and i was curious what others may be able to say about R support for things like this. A quick google search turned up this, http://cran.r-project.org/web/packages/rpubchem/rpubchem.pdf but I wasn't sure if there are more packages for manipulating different ball and stick collections( the atom and bond descriptions could just as easily represent any other collection of nodes and connections). You can get some idea what this does by typing your favorite chemical name here, http://pubchem.ncbi.nlm.nih.gov/ and the entries give something called Canonical SMILES structures For example, http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=8030loc=ec_rcs UPAC Name: thiophene Canonical SMILES: C1=CSC=C1 InChI: InChI=1S/C4H4S/c1-2-4-5-3-1/h1-4H InChIKey: YTPLMLYBLZKORZ-UHFFFAOYSA-N [Click for Info] From: han...@depauw.edu To: ggrothendi...@gmail.com Date: Sun, 26 Dec 2010 20:01:45 -0500 CC: r-h...@stat.math.ethz.ch Subject: Re: [R] Parsing a Simple Chemical Formula Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote: Thanks Spencer, I'll definitely have a look at this package and it's vignettes. I believe I have looked at it before, but didn't catch it on this particular search. Bryan Using the thermo list that the makeup function accesses to get its valid atomic symbols one can arrive at the the answer you posited would be too difficult in you first posting, the atomic weight from the formulae: str(thermo$element) 'data.frame': 130 obs. of 6 variables: $ element: chr Z O H He ... $ state : chr aq gas gas gas ... $ source : chr CWM89 CWM89 CWM89 CWM89 ... $ mass : num 0 16 1.01 4 20.18 ... $ s : num -15.6 49 31.2 30.2 35 ... $ n : int 1 2 2 1 1 1 1 1 2 2 ... patts - paste(^, rownames(makeup(form)), $, sep=) makuform- makeup(form) makuform$amass - sapply(patts, function(x) {return( thermo $element[ grep(x, thermo$element[[1]])[1], mass])} ) sum(makuform$amass *makuform$count) # [1] 167.0457 On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote: p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I have not looked at these vignettes, but most vignettes provide excellent introductions (though rarely with complete coverage) of important capabilities of the package. (The 'sos' package includes a vignette, which exposes more capabilities than the example below.) ## Have you considered the 'CHNOSZ' package? makeup(C5H11BrO ) count C 5 H 11 Br 1 O 1 I found this using the 'sos' package as follows: library(sos) cf - ???'chemical formula' found 21 matches; retrieving 2 pages cf The print method for cf opened the results in a web browser, which showed that the CHNOSZ package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the CHNOSZ package is devoted to Chemical Thermodynamics and Activity Diagrams and provides many more capabilities that might interest you. Hope this helps. Spencer On 12/26/2010 5:01 PM, Bryan Hanson wrote: Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel:
[R] filled.contour colors
Hi, I am trying to set the color scale in filled.contour based on a specific value instead of a relative position. Specifically, I want the values below 0 to be in a gradient of green, and those above 0 to be red. 0 would be white. I tried: posZero = abs(min(z)) / (abs(min(z)) + max(z)); filed.contour(..., col = designer.colors(n=30, col=c(green, white, red), x=c(0, posZero, 1))) but it does not center the white on the zero. Thanks for your help, Rand -- View this message in context: http://r.789695.n4.nabble.com/filled-contour-colors-tp3164639p3164639.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
Hi David others... I did find the function you recommended, plus, it's even easier (but a little hidden in the doc): element(form, mass). But, this uses the atomic masses from the periodic table, which are weighted averages of the isotopes of each element. What I'm doing actually involves mass spectrometry, so I need the isotope masses, which are integers (think 12C, 13C, 14C, but the periodic table says 12.011 reflecting the relative abundances). I used Gabor's solution and got my little function humming. Plus, I have several things to read through from the various recommendations. Thanks again, Bryan On Dec 26, 2010, at 10:21 PM, David Winsemius wrote: On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote: Thanks Spencer, I'll definitely have a look at this package and it's vignettes. I believe I have looked at it before, but didn't catch it on this particular search. Bryan Using the thermo list that the makeup function accesses to get its valid atomic symbols one can arrive at the the answer you posited would be too difficult in you first posting, the atomic weight from the formulae: str(thermo$element) 'data.frame': 130 obs. of 6 variables: $ element: chr Z O H He ... $ state : chr aq gas gas gas ... $ source : chr CWM89 CWM89 CWM89 CWM89 ... $ mass : num 0 16 1.01 4 20.18 ... $ s : num -15.6 49 31.2 30.2 35 ... $ n : int 1 2 2 1 1 1 1 1 2 2 ... patts - paste(^, rownames(makeup(form)), $, sep=) makuform- makeup(form) makuform$amass - sapply(patts, function(x) {return( thermo $element[ grep(x, thermo$element[[1]])[1], mass])} ) sum(makuform$amass *makuform$count) # [1] 167.0457 On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote: p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I have not looked at these vignettes, but most vignettes provide excellent introductions (though rarely with complete coverage) of important capabilities of the package. (The 'sos' package includes a vignette, which exposes more capabilities than the example below.) ## Have you considered the 'CHNOSZ' package? makeup(C5H11BrO ) count C 5 H 11 Br 1 O 1 I found this using the 'sos' package as follows: library(sos) cf - ???'chemical formula' found 21 matches; retrieving 2 pages cf The print method for cf opened the results in a web browser, which showed that the CHNOSZ package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the CHNOSZ package is devoted to Chemical Thermodynamics and Activity Diagrams and provides many more capabilities that might interest you. Hope this helps. Spencer On 12/26/2010 5:01 PM, Bryan Hanson wrote: Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function
Re: [R] Parsing a Simple Chemical Formula
On Sun, Dec 26, 2010 at 7:26 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert form into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! TIA, This can be done by strapply in gsubfn. It matches the regular expression to the target string passing the back references (the parenthesized portions of the regular expression) through a specified function as successive arguments. Thus the first arg is form, your input string. The second arg is the regular expression which matches an upper case letter optionally followed by lower case letters and all that is optionally followed by digits. The third arg is a function shown in a formula representation. strapply passes the back references (i.e. the portions within parentheses) to the function as the two arguments. Finally simplify is another function in formula notation which turns the result into a matrix and then a data frame. Finally we make the second column of the data frame numeric. library(gsubfn) DF - strapply(form, ([A-Z][a-z]*)(\\d*), ~ c(..1, if (nchar(..2)) ..2 else 1), simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE)) DF[[2]] - as.numeric(DF[[2]]) DF looks like this: DF V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 Here is a variation that is slightly simpler. The function in the third argument has been changed from c to paste so that it outputs strings like C 5. With this form of output we can use read.table to read it directly creating a data frame. strapply(form, + ([A-Z][a-z]*)(\\d*), + ~ paste(..1, if (nchar(..2)) ..2 else 1), + simplify = ~ read.table(textConnection(..1))) V1 V2 1 C 5 2 H 11 3 Br 1 4 O 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a Simple Chemical Formula
Mike Marchywka's post mentioned a CRAN package, rpubchem, missed by my search for chemical formula. A further search for chemical and chemistry still missed it. compound found it. Adding compounds and combining them with union produced a list of 564 links in 219 packages; 7 of the help pages were for rpubchem. The package with the most matches is seacarb (seawater carbonate chemistry with R: 21 matches), followed by CHNOSZ, previously mentioned (19 matches). rpubchem is the 22nd package on this list (5 matches, with a max score of 32, less than the max score of 2 other packages with 5 matches). Spencer On 12/26/2010 7:36 PM, Bryan Hanson wrote: Hi David others... I did find the function you recommended, plus, it's even easier (but a little hidden in the doc): element(form, mass). But, this uses the atomic masses from the periodic table, which are weighted averages of the isotopes of each element. What I'm doing actually involves mass spectrometry, so I need the isotope masses, which are integers (think 12C, 13C, 14C, but the periodic table says 12.011 reflecting the relative abundances). I used Gabor's solution and got my little function humming. Plus, I have several things to read through from the various recommendations. Thanks again, Bryan On Dec 26, 2010, at 10:21 PM, David Winsemius wrote: On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote: Thanks Spencer, I'll definitely have a look at this package and it's vignettes. I believe I have looked at it before, but didn't catch it on this particular search. Bryan Using the thermo list that the makeup function accesses to get its valid atomic symbols one can arrive at the the answer you posited would be too difficult in you first posting, the atomic weight from the formulae: str(thermo$element) 'data.frame':130 obs. of 6 variables: $ element: chr Z O H He ... $ state : chr aq gas gas gas ... $ source : chr CWM89 CWM89 CWM89 CWM89 ... $ mass : num 0 16 1.01 4 20.18 ... $ s : num -15.6 49 31.2 30.2 35 ... $ n : int 1 2 2 1 1 1 1 1 2 2 ... patts - paste(^, rownames(makeup(form)), $, sep=) makuform- makeup(form) makuform$amass - sapply(patts, function(x) {return( thermo$element[ grep(x, thermo$element[[1]])[1], mass])} ) sum(makuform$amass *makuform$count) # [1] 167.0457 On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote: p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I have not looked at these vignettes, but most vignettes provide excellent introductions (though rarely with complete coverage) of important capabilities of the package. (The 'sos' package includes a vignette, which exposes more capabilities than the example below.) ## Have you considered the 'CHNOSZ' package? makeup(C5H11BrO ) count C 5 H 11 Br 1 O 1 I found this using the 'sos' package as follows: library(sos) cf - ???'chemical formula' found 21 matches; retrieving 2 pages cf The print method for cf opened the results in a web browser, which showed that the CHNOSZ package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the CHNOSZ package is devoted to Chemical Thermodynamics and Activity Diagrams and provides many more capabilities that might interest you. Hope this helps. Spencer On 12/26/2010 5:01 PM, Bryan Hanson wrote: Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote: On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote: Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in Hill order which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say form - C5H11BrO I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way
[R] modifying user agent strings in http requests
Hi all. How does one change user agent strings in http requests made in R? And how do I figure out what my current user agent string looks like? Thanks in advance, Soumendra -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. --- Howard Aiken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
Hi: On Sun, Dec 26, 2010 at 4:18 AM, bbslover dlu...@yeah.net wrote: Dear all, My double for loop as follows, but it is little efficient, I hope all friends can give me a vectorized program to replace my code. thanks x: is a matrix 202*263, that is 202 samples, and 263 independent variables num.compd-nrow(x); # number of compounds diss.all-0 for( i in 1:num.compd) for (j in 1:num.compd) if (i!=j) { Isn't this just X'X? S1-sum(x[i,]*x[j,]) Aren't each of S2 and S3 just diag(X'X)? S2-sum(x[i,]^2) S3-sum(x[j,]^2) sim2-S1/(S2+S3-S1) diss2-1-sim2 diss.all-diss.all+diss2} I tried s1 - crossprod(x) s2 - diag(s1) s3 -outer(s2, s2, '+') - s1 s1/s3 This yields a symmetric matrix with 1's along the diagonal and quantities between 0 and 1 in the off-diagonal. Something like it could conceivably be used as a similarity matrix. Is that what you're looking for with sim2? I agree with Berend: it looks like a problem that could be easily solved with some matrix algebra. R can do matrix algebra quite efficiently, y'know... (BTW, I tried this on a 1000 x 1000 input matrix: system.time(myfunc(x)) user system elapsed 0.990.021.02 I expect it could be improved by an order of magnitude if one actually knew what you were computing... ) HTH, Dennis it will cost a long time to finish this computation! i really need rapid code to replace my code. thanks kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drop column from a data frame
John - You can use a syntax similar to what you've tried with the select= argument of the subset function: subset(dfxyz,select=-y) x z 1 1 0 2 2 0 . . . subset(dfxyz,select=-z) x y 1 1 11 2 2 12 . . . - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sun, 26 Dec 2010, John Sorkin wrote: I am trying to drop a column of a data frame. The code below attempts to drop a numeric column (which does not work but gives no error or warning) and a factor column (which does not work but gives an error). I would appreciate someone telling me why my code does not work, and suggesting code that will work. Thanks, John rm(dfxyz,dfxz,dfxy) # create the data frame. dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5 dfxyz names(dfxyz) # try to drop y column # does not work, does not produce error message dfxz - dfxyz[,-(dfxyz$y)] dfxz # try to drop z column # does not work, produces error message: # In Ops.factor(df$z) : - not meaningful for factors dfxy - dfxyz[,-dfxyz$z] dfxy John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2WinBugs data import error
You solved my problem, thank you. As you said it's the type of the content in the matrix that caused the problem. I needed to put variable x along with other variables to the list, somehow it turned out that x must be used in form of character in the statement: dat - list(x,otherVariables) Anyway, my codes work well now. Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/R2WinBugs-data-import-error-tp3164106p3164707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package update
I'm running Linux Ubuntu and tried to update my packages using the update.package() command. It appeared to download the updates ok but I got the following message: The downloaded packages are in ‘/tmp/RtmpFM82Ry/downloaded_packages’ Warning in install.packages(update[instlib == l, Package], l, contriburl = contriburl, : 'lib = /usr/lib/R/site-library' is not writable Error in install.packages(update[instlib == l, Package], l, contriburl = contriburl, : unable to install packages Calls: update.packages - install.packages What does this mean ? And more importantly, how do I address it ? -- View this message in context: http://r.789695.n4.nabble.com/package-update-tp3164690p3164690.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
thanks for your help, it is great. In addition, In the beginning, the format of x is dataframe, and i run my code, it is so slow, after your help, I change x for matirx, it is so quick. I am very grateful your kind help, and your code is so good! kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment: the code form Berend can get the answer i want to get. http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package update
Either switch the library path to a writable directory or run it as a su or sudo so you have the necessary permissions. Cheers, Josh On Dec 26, 2010, at 20:45, eric ericst...@aol.com wrote: I'm running Linux Ubuntu and tried to update my packages using the update.package() command. It appeared to download the updates ok but I got the following message: The downloaded packages are in ‘/tmp/RtmpFM82Ry/downloaded_packages’ Warning in install.packages(update[instlib == l, Package], l, contriburl = contriburl, : 'lib = /usr/lib/R/site-library' is not writable Error in install.packages(update[instlib == l, Package], l, contriburl = contriburl, : unable to install packages Calls: update.packages - install.packages What does this mean ? And more importantly, how do I address it ? -- View this message in context: http://r.789695.n4.nabble.com/package-update-tp3164690p3164690.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice splom: how to adjust space between tick marks and tick labels?
Dear Peter, thank you very much, *precisely* what I was looking for! Cheers, Marius On 2010-12-27, at 02:27 , Peter Ehlers wrote: On 2010-12-26 08:26, Marius Hofert wrote: Dear David, thank you for your answer. As I wrote, I am looking for an option to control the *space* between the tick marks and the corresponding labels. I am happy with the *number* of tick marks and their default values. As far as I know, pscales can't control the space, so it is *not* what I am looking for. Marius, I think that you mean something like the following: U - matrix(runif(300), ncol = 3) splom(U, par.settings = list( axis.components = list( left = list(pad1 = 3) ) ) ) which will adjust the left axis; you'll have to add right, top, bottom components to handle those as well. Have a look at what trellis.par.get() produces and check the axis.components section. Peter Ehlers Cheers, Marius On 2010-12-26, at 14:36 , David Winsemius wrote: On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote: Dear expeRts, how can I decrease the space between the tick marks and the corresponding labels in an splom? See here: library(lattice) U- matrix(runif(4000), ncol = 8) splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and tick marks is/seems to be too large So you want more tick marks? I checked ?panel.pairs but could not find an option for that. What about the pscales argument? A single number would increase the number of ticks, or a list with at and labels values can be passed. Seem to be just what you asked for. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
djmuseR wrote: On Sun, Dec 26, 2010 at 4:18 AM, bbslover dlu...@yeah.net wrote: x: is a matrix 202*263, that is 202 samples, and 263 independent variables num.compd-nrow(x); # number of compounds diss.all-0 for( i in 1:num.compd) for (j in 1:num.compd) if (i!=j) { Isn't this just X'X? S1-sum(x[i,]*x[j,]) Aren't each of S2 and S3 just diag(X'X)? S2-sum(x[i,]^2) S3-sum(x[j,]^2) sim2-S1/(S2+S3-S1) diss2-1-sim2 diss.all-diss.all+diss2} I tried s1 - crossprod(x) s2 - diag(s1) s3 -outer(s2, s2, '+') - s1 s1/s3 This yields a symmetric matrix with 1's along the diagonal and quantities between 0 and 1 in the off-diagonal. Something like it could conceivably be used as a similarity matrix. Is that what you're looking for with sim2? I agree with Berend: it looks like a problem that could be easily solved with some matrix algebra. R can do matrix algebra quite efficiently, y'know... (BTW, I tried this on a 1000 x 1000 input matrix: system.time(myfunc(x)) user system elapsed 0.990.021.02 I expect it could be improved by an order of magnitude if one actually knew what you were computing... ) I did some more work along Dennis' lines xtx - tcrossprod(x) xtd - diag(xtx) xzz - outer(xtd,xtd,'+') zz - 1 - xtx/(xzz-xtx) diss.all - sum(zz) this appears to give the desired result and it's quite a bit faster than my alternative 2. It would indeed be nice to know what is being computed. Berend -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.