Re: [R] vectorized sub, gsub, grep, etc.
Adam, You reopened an old thread noting its age, but did you begin at its beginning? > Subject: vectorized sub, gsub, grep, etc.> Date: Oct 7, 2008 > R pattern-matching and replacement functions are > vectorized: they can operate on vectors of targets. > However, they can only use one pattern and replacement. > Here is code to apply a different pattern and replacement > for every target. My question: can it be done better? sub2 <- function(pattern, replacement, x) { len <- length(x) if (length(pattern) == 1) pattern <- rep(pattern, len) if (length(replacement) == 1) replacement <- rep(replacement, len) FUN <- function(i, ...) { sub(pattern[i], replacement[i], x[i], fixed = TRUE) } idx <- 1:length(x) sapply(idx, FUN) } #Example X <- c("ab", "cd", "ef") patt <- c("b", "cd", "a") repl <- c("B", "CD", "A") sub2(patt, repl, X) If you run that code, you'll see the correct answer is not "" "CD" "", it is [1] "aB" "CD" "ef" And the same answer is given by the shorter (but slower) code suggested later that day by Christos mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) b cd a "aB" "CD" "ef" By talking instead about simple string matching, I'm afraid you've rather hijacked the thread. -John -John Adam wrote > I'm not sure I understand your question. Both functions return "" "CD" "" > because they > perform exact string matching. The first demonstrates how string or character > replacements > can be vectorized, while the second merely demonstrates how Rcpp can > accelerate this type of operation. On Jul 30, 2015, at 21:09, John Thaden wrote: | Can you show what is its solution for the original sample data? Why that discrepancy for you original sub2() function? | From:"Adam Erickson" Date:Thu, Jul 30, 2015 at 6:11 pm Subject:Re: [R] vectorized sub, gsub, grep, etc. Here is a Rcpp version for exact character matching (for example) written in C++ that is substantially faster. Hence, I think this is the way to go where loops may be unavoidable. However, the input vector length has to match the length of the pattern and replacement vectors, as your original code did. That can be changed though. #include using namespace Rcpp; // [[Rcpp::export]]CharacterVector subCPP(CharacterVector pattern, CharacterVector replacement, CharacterVector x) { int len = x.size(); CharacterVector y(len); int patlen = pattern.size(); int replen = replacement.size(); if (patlen != replen) Rcout<<"Error: Pattern and replacement length do not match"; for(int i = 0; i < patlen; ++i) { if (*(char*)x[i] == *(char*)pattern[i]) y[x[i] == pattern[i]] = replacement[i]; } return y;} "" "CD" "" system.time(for(i in 1:5) subCPP(patt, repl, X)) user system elapsed 0.16 0.00 0.16 Cheers, Adam On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote: Further refining the vectorized (within a loop) exact string match function, I get times below 0.9 seconds while maintaining error checking. This is accomplished by removing which() and replacing 1:length() with seq_along(). sub2 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in seq_along(pattern)) { y[x==pattern[i]] <- replacement[i] } return(y) } system.time(for(i in 1:5) sub2(patt, repl, X)) user system elapsed 0.86 0.00 0.86 Since the ordered vectors are perfectly aligned, might as well do an exact string match. Hence, I think this is not off-topic. Cheers, Adam On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote: There is confusion here. apply() family functions are **NOT** vectorization -- they ARE loops (at the interpreter level), just done in "functionalized" form. Please read background material (John Chambers's books, MASS, or numerous others) to improve your understanding and avoid posting erroneous comments. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Jul 28, 2015 at 3:00 PM, John Thaden wrote: > Adam, The method you propose gives a different result than the prior > methods for these example vectors > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > > Old method 1 > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > gives > b cd a > "aB" "CD" "ef" > > Old method 2 > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) {
[R] Error when compiling R-2.5.1 / *** [d-p-q-r-tests.Rout] Fehler 1
Hi everyone, I am new to Linux and R - but I managed to build R-2.5.1 from source to use it in Genepattern. Genepattern does only support R-2.5.1 which I could not find anywhere for installation via apt-get or in the Ubuntu Software-Centre (I am using Ubuntu 14.04 (Trusty Tahr) 32-bit) But after doing make check I get comparing 'method-dispatch.Rout' to './method-dispatch.Rout.save' ... OK running code in 'd-p-q-r-tests.R' ...make[3]: *** [d-p-q-r-tests.Rout] Fehler 1 make[3]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen make[2]: *** [test-Specific] Fehler 2 make[2]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen make[1]: *** [test-all-basics] Fehler 1 make[1]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen make: *** [check] Fehler 2 but I can make install and use R for simple plots etc. afterwards - still I am worried something is wrong, can you give some advice. A closer look at the error gives > ## PR#7099 : pf() with large df1 or df2: > nu <- 2^seq(25,34, 0.5) > y <- 1e9*(pf(1,1,nu) - 0.68268949) > stopifnot(All.eq(pf(1,1,Inf), 0.68268949213708596), + diff(y) > 0, # i.e. pf(1,1, *) is monotone increasing + All.eq(y [1], -5.07420372386491), + All.eq(y[19], 2.12300110824515)) Error: All.eq(y[1], -5.07420372386491) is not TRUE Execution halted As I understand so far some errors are critical some are not - can you please give some advice on the error above? Can I still use R installed with that error? What do I need to solve the error? Thanks, Joerg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clarification on Simulation and Iteration
On Jul 31, 2015, at 8:41 PM, Christopher Kelvin wrote: > Thanks Dave. > > What I actually want is to obtain say 10, different sets of (n=50) data for > every 10,000 iterations I run. You will realise that the current code > produces one set of data (n=50). I want 10 different sets of 50 observations > at one run. I hope this makes sense. I would think that either `replicate(50, ...)` or `for(i in 1:50) {...}` would suffice. Unless of course the phrase "want 10 different sets of 50 observations at one run` means something different than it appears to request. > > Chris Guure > > > > On Saturday, August 1, 2015 3:32 AM, David Winsemius > wrote: > > > On Jul 31, 2015, at 6:36 PM, Christopher Kelvin via R-help wrote: > >> Dear All, >> I am performing some simulations for a new model. I run about 10,000 >> iterations with a sample of 50 datasets and this returns one set of 50 >> simulated data. >> >> Now, what I need to obtain is 10 sets of the 50 simulated data out of the >> 10,000 iterations and not just only 1 set. The model is the Copas selection >> model for publication bias in Mete-analysis. Any one who knows this model >> has any suggestion for the improvement of my code is most welcome. >> >> Below is my code. >> >> >> Kind regards >> >> >> Chris Guure >> University of Ghana >> >> >> >> >> install.packages("msm") >> library(msm) >> >> >> rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 >> si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate >> standard errors for each study >> set.seed(2) ## I have stored the data and the output in this seed >> >> for( i in 1:rr){ >> >> mu<-rnorm(n,d,tua^2) # prob. of each effect estimate >> rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient >> mu0<- a + b/si # mean of the truncated normal model (Copas selection >> model) >> y1<-rnorm(mu,si^2)# observed effects zise >> z<-rnorm(mu0,1) # selection model >> rho2<-cor(y1, z) >> >> select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) >> probselect<-ifelse(select> >> probselect >> data<-data.frame(probselect,si)# this contains both include and exclude >> data >> data >> data1<-data[complete.cases(data),] # Contains only the included data for >> analysis >> data1 >> >> >> } >> > > OK. The code runs without error. So what exactly is the problem? (I have > no experience with the Copas selection model if in fact that is what is being > exemplified.) > > -- > > David Winsemius > Alameda, CA, USA David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorized sub, gsub, grep, etc.
I'm not sure I understand your question. Both functions return "" "CD" "" because they perform exact string matching. The first demonstrates how string or character replacements can be vectorized, while the second merely demonstrates how Rcpp can accelerate this type of operation. Cheers, Adam > On Jul 30, 2015, at 21:09, John Thaden wrote: > > > Can you show what is its solution for the original sample data? Why that > discrepancy for you original sub2() function? > > From:"Adam Erickson" > Date:Thu, Jul 30, 2015 at 6:11 pm > Subject:Re: [R] vectorized sub, gsub, grep, etc. > > Here is a Rcpp version for exact character matching (for example) written in > C++ that is substantially faster. Hence, I think this is the way to go where > loops may be unavoidable. However, the input vector length has to match the > length of the pattern and replacement vectors, as your original code did. > That can be changed though. > > #include > using namespace Rcpp; > > // [[Rcpp::export]] > CharacterVector subCPP(CharacterVector pattern, CharacterVector replacement, > CharacterVector x) { > int len = x.size(); > CharacterVector y(len); > int patlen = pattern.size(); > int replen = replacement.size(); > if (patlen != replen) > Rcout<<"Error: Pattern and replacement length do not match"; > for(int i = 0; i < patlen; ++i) { > if (*(char*)x[i] == *(char*)pattern[i]) > y[x[i] == pattern[i]] = replacement[i]; > } > return y; > } > > "" "CD" "" > > system.time(for(i in 1:5) subCPP(patt, repl, X)) >user system elapsed >0.160.000.16 > > Cheers, > > Adam > >> On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote: >> Further refining the vectorized (within a loop) exact string match function, >> I get times below 0.9 seconds while maintaining error checking. This is >> accomplished by removing which() and replacing 1:length() with seq_along(). >> >> sub2 <- function(pattern, replacement, x) { >>len<- length(x) >>y <- character(length=len) >>patlen <- length(pattern) >>replen <- length(replacement) >>if(patlen != replen) stop('Error: Pattern and replacement length do not >> match') >>for(i in seq_along(pattern)) { >> y[x==pattern[i]] <- replacement[i] >>} >>return(y) >> } >> >> system.time(for(i in 1:5) sub2(patt, repl, X)) >>user system elapsed >>0.860.000.86 >> >> Since the ordered vectors are perfectly aligned, might as well do an exact >> string match. Hence, I think this is not off-topic. >> >> Cheers, >> >> Adam >> >>> On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote: >>> There is confusion here. apply() family functions are **NOT** >>> vectorization -- they ARE loops (at the interpreter level), just done >>> in "functionalized" form. Please read background material (John >>> Chambers's books, MASS, or numerous others) to improve your >>> understanding and avoid posting erroneous comments. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>>-- Clifford Stoll >>> >>> >>> On Tue, Jul 28, 2015 at 3:00 PM, John Thaden wrote: >>> > Adam,The method you propose gives a different result than the prior >>> > methods for these example vectors >>> > X <- c("ab", "cd", "ef") >>> > patt <- c("b", "cd", "a") >>> > repl <- c("B", "CD", "A") >>> > >>> > Old method 1 >>> > >>> > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) >>> > gives >>> > b cda >>> > "aB" "CD" "ef" >>> > >>> > Old method 2 >>> > >>> > sub2 <- function(pattern, replacement, x) { >>> > len <- length(x) >>> > if (length(pattern) == 1) >>> > pattern <- rep(pattern, len) >>> > if (length(replacement) == 1) >>> > replacement <- rep(replacement, len) >>> > FUN <- function(i, ...) { >>> > sub(pattern[i], replacement[i], x[i], fixed = TRUE) >>> > } >>> > idx <- 1:length(x) >>> > sapply(idx, FUN) >>> > } >>> > sub2(patt, repl, X) >>> > gives >>> > [1] "aB" "CD" "ef" >>> > >>> > Your method (I gave it the unique name "sub3") >>> > sub3 <- function(pattern, replacement, x) { len<- length(x) y >>> > <- character(length=len) patlen <- length(pattern) replen <- >>> > length(replacement) if(patlen != replen) stop('Error: Pattern and >>> > replacement length do not match') for(i in 1:replen) { >>> > y[which(x==pattern[i])] <- replacement[i] } return(y)}sub3(patt, repl, >>> > X) >>> > gives[1] "" "CD" "" >>> > >>> > Granted, whatever it does, it does it faster >>> > #Old method 1 >>> > system.time(for(i in 1:5) >>> > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) >>> >user system elapsed >>> >2.530.002.52 >>> > >>> > #Old method 2 >>> > system.time(for(i in 1:5)sub2
Re: [R] Clarification on Simulation and Iteration
On Jul 31, 2015, at 6:36 PM, Christopher Kelvin via R-help wrote: > Dear All, > I am performing some simulations for a new model. I run about 10,000 > iterations with a sample of 50 datasets and this returns one set of 50 > simulated data. > > Now, what I need to obtain is 10 sets of the 50 simulated data out of the > 10,000 iterations and not just only 1 set. The model is the Copas selection > model for publication bias in Mete-analysis. Any one who knows this model has > any suggestion for the improvement of my code is most welcome. > > Below is my code. > > > Kind regards > > > Chris Guure > University of Ghana > > > > > install.packages("msm") > library(msm) > > > rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 > si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate > standard errors for each study > set.seed(2) ## I have stored the data and the output in this seed > > for( i in 1:rr){ > > mu<-rnorm(n,d,tua^2) # prob. of each effect estimate > rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient > mu0<- a + b/si # mean of the truncated normal model (Copas selection > model) > y1<-rnorm(mu,si^2)# observed effects zise > z<-rnorm(mu0,1) # selection model > rho2<-cor(y1, z) > > select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) > probselect<-ifelse(select > probselect > data<-data.frame(probselect,si)# this contains both include and exclude > data > data > data1<-data[complete.cases(data),] # Contains only the included data for > analysis > data1 > > > } > OK. The code runs without error. So what exactly is the problem? (I have no experience with the Copas selection model if in fact that is what is being exemplified.) -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clarification on Simulation and Iteration
Thanks Dave. What I actually want is to obtain say 10, different sets of (n=50) data for every 10,000 iterations I run. You will realise that the current code produces one set of data (n=50). I want 10 different sets of 50 observations at one run. I hope this makes sense. Chris Guure On Saturday, August 1, 2015 3:32 AM, David Winsemius wrote: On Jul 31, 2015, at 6:36 PM, Christopher Kelvin via R-help wrote: > Dear All, > I am performing some simulations for a new model. I run about 10,000 > iterations with a sample of 50 datasets and this returns one set of 50 > simulated data. > > Now, what I need to obtain is 10 sets of the 50 simulated data out of the > 10,000 iterations and not just only 1 set. The model is the Copas selection > model for publication bias in Mete-analysis. Any one who knows this model has > any suggestion for the improvement of my code is most welcome. > > Below is my code. > > > Kind regards > > > Chris Guure > University of Ghana > > > > > install.packages("msm") > library(msm) > > > rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 > si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate > standard errors for each study > set.seed(2) ## I have stored the data and the output in this seed > > for( i in 1:rr){ > > mu<-rnorm(n,d,tua^2) # prob. of each effect estimate > rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient > mu0<- a + b/si # mean of the truncated normal model (Copas selection > model) > y1<-rnorm(mu,si^2)# observed effects zise > z<-rnorm(mu0,1) # selection model > rho2<-cor(y1, z) > > select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) > probselect<-ifelse(select > probselect > data<-data.frame(probselect,si)# this contains both include and exclude > data > data > data1<-data[complete.cases(data),] # Contains only the included data for > analysis > data1 > > > } > OK. The code runs without error. So what exactly is the problem? (I have no experience with the Copas selection model if in fact that is what is being exemplified.) -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clarification on Simulation and Iteration
Dear All, I am performing some simulations for a new model. I run about 10,000 iterations with a sample of 50 datasets and this returns one set of 50 simulated data. Now, what I need to obtain is 10 sets of the 50 simulated data out of the 10,000 iterations and not just only 1 set. The model is the Copas selection model for publication bias in Mete-analysis. Any one who knows this model has any suggestion for the improvement of my code is most welcome. Below is my code. Kind regards Chris Guure University of Ghana install.packages("msm") library(msm) rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate standard errors for each study set.seed(2) ## I have stored the data and the output in this seed for( i in 1:rr){ mu<-rnorm(n,d,tua^2) # prob. of each effect estimate rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient mu0<- a + b/si # mean of the truncated normal model (Copas selection model) y1<-rnorm(mu,si^2)# observed effects zise z<-rnorm(mu0,1) # selection model rho2<-cor(y1, z) select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) probselect<-ifelse(selecthttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x11() hangs in 3.2.1
marc_schwa...@me.com writes: > First, just an FYI, that this would be better posted to R-SIG-Fedora: > https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Thanks, I'll give it a try. > Can you run: > capabilities() > in a terminal session and see what it shows for X11: Yes, here is the output: jpeg pngtiff tcltk X11aqua TRUETRUETRUE FALSETRUE FALSE http/ftp sockets libxmlfifo cledit iconv TRUETRUETRUETRUETRUETRUE NLS profmem cairo ICU long.double libcurl TRUE FALSETRUE FALSETRUE FALSE > Did you install R via local compilation? Yes, local compilation. > Presuming local compilation, I would check your configure and build > logs for warnings/errors. It is possible that you are missing an > X11 header or lib someplace. The config.log file has lots of errors but nothing obviously related to this problem. Thanks for your help, Steve -- Steven J. BackusComputer Systems Manager University of Utah E-Mail: steven.bac...@utah.edu Genetic EpidemiologyAlternate: bac...@math.utah.edu 391 Chipeta Way -- Suite D Office: 801.587.9308 Salt Lake City, UT 84108-1266 http://www.math.utah.edu/~backus __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x11() hangs in 3.2.1
On Fri, 31 Jul 2015, Marc Schwartz wrote: Can you run: capabilities() FWIW, on Slackware-14.1 I've had no issues with X11(). capabilities() jpeg pngtiff tcltk X11aqua TRUETRUETRUETRUETRUE FALSE http/ftp sockets libxmlfifo cledit iconv TRUETRUETRUETRUETRUETRUE NLS profmem cairo ICU long.double libcurl TRUE FALSETRUETRUETRUETRUE Built from the SlackBuilds.org script I've been using. Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exclude 2014 data from mean
This is very basic. Have you made **any** effort to learn R -- e.g. by going through an R tutorial? If not, please do this before posting further. This will save you -- and foks on this list, probably -- a lot of grief in the long (or even short) run. Also, if/when you do post further, post in plain text, not HTML, as requested by the posting guide below. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Fri, Jul 31, 2015 at 11:49 AM, Adam Jauregui wrote: > Hello R-help, > > I am trying to compute the mean of a quarterback's career fantasy football > stats, but I wish to exclude his 2014 stats from the mean, as that will be > the test data for the model I am trying to build for my academic undergrad > research. > > The code for figuring out the mean of his Yds for every career Game 1 was > simple: > > > *mean(brady.t$Yds[brady.t$G. == 1])* > How can I make an "if-then" statement though so that his 2014 stats are > excluded? Or is there an easier way besides "if-then?" > > Thank you, > > AKJ > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exclude 2014 data from mean
> On Jul 31, 2015, at 1:49 PM, Adam Jauregui wrote: > > Hello R-help, > > I am trying to compute the mean of a quarterback's career fantasy football > stats, but I wish to exclude his 2014 stats from the mean, as that will be > the test data for the model I am trying to build for my academic undergrad > research. > > The code for figuring out the mean of his Yds for every career Game 1 was > simple: > > > *mean(brady.t$Yds[brady.t$G. == 1])* > How can I make an "if-then" statement though so that his 2014 stats are > excluded? Or is there an easier way besides "if-then?" > > Thank you, > > AKJ It would be helpful to have a sample of data to know the structure, but take a look at: ?subset for examples of more complicated logic for subsetting data frames. You might be able to use something along the lines of: mean(subset(brady.t, (G. == 1) & (Year != 2014), select = Yds)[[1]]) Basically, subset() is returning a data frame where Year does not equal 2014 and G. is equal to 1. The select argument is only returning the Yds column, which would otherwise be a list, so the [[1]] only returns a vector, which is passed to mean(). Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exclude 2014 data from mean
Hi Adam, Possibly subset() or & would be helpful. Or even aggregate(), depending on your ultimate goal. Without a reproducible example that includes some sample data provided using dput() (fake is fine), the code you used, and some clear idea of what output you expect, it's impossible to figure out how to help you. Here are some suggestions for creating a good reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Sarah On Fri, Jul 31, 2015 at 2:49 PM, Adam Jauregui wrote: > Hello R-help, > > I am trying to compute the mean of a quarterback's career fantasy football > stats, but I wish to exclude his 2014 stats from the mean, as that will be > the test data for the model I am trying to build for my academic undergrad > research. > > The code for figuring out the mean of his Yds for every career Game 1 was > simple: > > > *mean(brady.t$Yds[brady.t$G. == 1])* > How can I make an "if-then" statement though so that his 2014 stats are > excluded? Or is there an easier way besides "if-then?" > > Thank you, > > AKJ > > [[alternative HTML version deleted]] > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x11() hangs in 3.2.1
> On Jul 31, 2015, at 2:22 PM, Steven Backus wrote: > > I'm on RHEL 6.6, R version 3.2.1 Patched (2015-07-30 r68761) -- > "World-Famous Astronaut". Issuing the x11() command hangs R and > does not complete. A window is partially drawn then freezes. > Does anyone know of a solution? > > Thanks, > Steve First, just an FYI, that this would be better posted to R-SIG-Fedora: https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Can you run: capabilities() in a terminal session and see what it shows for X11: > capabilities() jpeg pngtiff tcltk X11aqua TRUETRUETRUETRUETRUETRUE http/ftp sockets libxmlfifo cledit iconv TRUETRUETRUETRUETRUETRUE NLS profmem cairo ICU long.double libcurl TRUETRUETRUETRUETRUETRUE The above is on a Mac, just for clarity. Did you install R via local compilation or via RPMS from the EPEL? I have not looked to see if RPMS are available for the patched version. Presuming local compilation, I would check your configure and build logs for warnings/errors. It is possible that you are missing an X11 header or lib someplace. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] x11() hangs in 3.2.1
I'm on RHEL 6.6, R version 3.2.1 Patched (2015-07-30 r68761) -- "World-Famous Astronaut". Issuing the x11() command hangs R and does not complete. A window is partially drawn then freezes. Does anyone know of a solution? Thanks, Steve -- Steven J. BackusComputer Systems Manager University of Utah E-Mail: steven.bac...@utah.edu Genetic EpidemiologyAlternate: bac...@math.utah.edu 391 Chipeta Way -- Suite D Office: 801.587.9308 Salt Lake City, UT 84108-1266 http://www.math.utah.edu/~backus __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Exclude 2014 data from mean
Hello R-help, I am trying to compute the mean of a quarterback's career fantasy football stats, but I wish to exclude his 2014 stats from the mean, as that will be the test data for the model I am trying to build for my academic undergrad research. The code for figuring out the mean of his Yds for every career Game 1 was simple: *mean(brady.t$Yds[brady.t$G. == 1])* How can I make an "if-then" statement though so that his 2014 stats are excluded? Or is there an easier way besides "if-then?" Thank you, AKJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using latticeExtra as.layer function with different number of plot panels
Thanks David, I was hoping for something a little bit more generic and less case-by-case basis. Sebastien On 7/30/2015 3:51 PM, David Winsemius wrote: On Jul 30, 2015, at 8:37 AM, sbihorel wrote: Hi, When the as.layer function is used to overaly 2 lattice plots, there seems to be an assumption that the data used in both plots will generate the same number of panels (and, I believe, in the same order). In case the data used in the plot within the as.layer call is incomplete , data may be plotted on the "wrong" panel, and data seem to get re-used on the last panel(s). See what happens in the example code below when the records with state.region=="South" are dropped... Is there a trick to overlay panel based upon the conditioning variable value rather than the panel order? require(lattice) require(latticeExtra) state2 <- state <- data.frame(state.x77,state.region) state2$Income <- sample(state2$Income) state3 <- state2[which(state2$state.region!="South"),] foo <- xyplot(Income~Population|state.region,data=state,main='foo') foo bar <- update(foo,main='bar') + as.layer(xyplot(Income~Population|state.region,data=state2,col='red')) bar bar2 <- update(foo,main='bar2') + as.layer(xyplot(Income~Population|state.region,data=state3,col='red')) bar2 I don't know if this works using the `+.lattice` function but it is possible to selectively update panels using `trellis.focus` __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R parallel / foreach - aggregation of results
Try this chance to actually return values: library(doParallel) Simpar3 <- function(n1) { L2distance <- matrix(NA, ncol=n1, nrow=n1) data <- rnorm(n1) diag(L2distance)=0 cl <- makeCluster(4) registerDoParallel(cl) x <- foreach(j=1:n1) %dopar% { library(np) datj <- data[j] for(k in j:n1) { L2distance[j,k] <- k*datj } L2distance # return the value } stopCluster(cl) return(x) } Res <- Simpar3(100) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Jul 31, 2015 at 8:39 AM, Martin Spindler wrote: > Dear all, > > when I am running the code attached below, it seems that no results are > returned, only the predefined NAs. What mistake do I make? > Any comments and help is highly appreciated. > > Thanks and best, > > Martin > > > Simpar3 <- function(n1) { > L2distance <- matrix(NA, ncol=n1, nrow=n1) > data <- rnorm(n1) > diag(L2distance)=0 > cl <- makeCluster(4) > registerDoParallel(cl) > foreach(j=1:n1) %dopar% { > library(np) > datj <- data[j] > for(k in j:n1) { > L2distance[j,k] <- k*datj > } > } > stopCluster(cl) > return(L2distance) > } > > Res <- Simpar3(100) > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to simulate informative censoring in a Cox PH model?
Daniel, Basically just responding to your last paragraph (the others are interesting, but I think that you are learning as much as anyone and I don't currently have any other suggestions). I am not an expert on copulas, so this is a basic understanding, you should learn more about them if you choose to use them. The main idea of a copula is that it is a bivariate or multivariate distribution where all the variables have uniform marginal distributions but the variables are not independent from each other. How I would suggest using them is to choose a copula and generate random points from a bivariate copula, then put those (uniform) values into the inverse pdf function for the Weibull (or other distribution), one of which is the event time, the other the censoring time. This will give you times that (marginally) come from the distributions of interest, but are not independent (so would be considered informative censoring). Repeat this with different levels of relationship in the copula to see how much difference it makes in your simulations. On Thu, Jul 30, 2015 at 2:02 PM, Daniel Meddings wrote: > Thanks Greg once more for taking the time to reply. I certainly agree that > this is not a simple set-up, although it is realistic I think. In short you > are correct about model mis-specification being the key to producing more > biased estimates under informative than under non-informative censoring. > After looking again at my code and trying various things I realize that the > key factor that leads to the informative and non-informative censoring data > giving rise to the same biased estimates is how I generate my Z_i variable, > and also the magnitude of the Z_i coefficient in both of the event and > informative censoring models. > > In the example I gave I generated Z_i (I think of this as a "poor prognosis" > variable) from a beta distribution so that it ranged from 0-1. The biased > estimates for "beta_t_1" (I think of this as the effect of a treatment on > survival) were approximately 1.56 when the true value was -1. What I forgot > to mention was that estimating a cox model with 1,000,000 subjects to the > full data (i.e. no censoring at all) arguably gives the best treatment > effect estimate possible given that the effects of Z_i and Z_i*Treat_i are > not in the model. This "best possible" estimate turns out to be 1.55 - i.e. > the example I gave just so happens to be such that even with 25-27% > censoring, the estimates obtained are almost the best that can be attained. > > My guess is that the informative censoring does not bias the estimate more > than non-informative censoring because the only variable not accounted for > in the model is Z_i which does not have a large enough effect "beta_t_2", > and/or "beta_c_2", or perhaps because Z_i only has a narrow range which does > not permit the current "beta_t_2" value to do any damage? > > To investigate the "beta_t_2", and/or "beta_c_2" issue I changed "beta_c_2" > from 2 to 7 and "beta_c_0" from 0.2 to -1.2, and "beta_d_0" from -0.7 to > -0.4 to keep the censoring %'s equal at about 30%. This time the "best > possible" estimate of "beta_t_1" was -1.53 which was similar to that > obtained previously. The informative censoring gave an estimate for > "beta_t_1" of -1.49 whereas the non-informative censoring gave -1.53 - this > time the non-informative censoring attains the "best possible" but the > non-informative censoring does not. > > > > I then instead changed "beta_t_2" from 1 to 7 and "beta_c_0" from 0.2 to 2, > and "beta_d_0" from -0.7 to -1.9 again to keep the censoring %'s equal at > about 30%. This time the "best possible" estimate of "beta_t_1" was -0.999 > which is pretty much equal to the true value of -1. The informative > censoring gave an estimate for "beta_t_1" of -1.09 whereas the > non-informative censoring gave -0.87 – surprisingly this time the > informative censoring is slightly closer to the “best possible” than the > non-informative censoring. > > > > To investigate the Z_i issue I generated it from a normal distribution with > mean 1 and variance 1. I changed "beta_c_0 " from 0.2 to -0.5 to keep the > censoring levels equal (this time about 30% for both). This time the "best > possible" estimate was -1.98 which was further from -1 than previous > examples. The informative censoring gave an estimate for "beta_t_1" of -1.81 > whereas the non-informative censoring gave -1.84. So again the informative > censoring gives an estimate closer to the "best possible" when compared with > the informative censoring, but this time it does not attain the "best > possible". > > In conclusion it is clear to me that a stronger Z_i effect in the censoring > model causes the informative censoring to be worse than the non-informative > one (as expected), but a stronger Z_i effect in the event model does not > have this effect and even causes the independent censoring to be worse – > this in general may not hold but I nonetheless see it here.
Re: [R] trojan with R download
> On Jul 31, 2015, at 5:55 AM, tom walk wrote: > > > > > I am working in China for a month and needed to download an earlier version > of R in order to use Deseq2 and its requirements. The download got to the > last few seconds and hung up. A trojan was found. It could be coincidence > that it happened when I was downloading R, or perhaps a man in the middle > added a little something. Anyway, I thought you might be interested. You > might want to check on this source and others from this server. > > > https://mirrors.ustc.edu.cn/CRAN/bin/windows/base/old/3.0.3/R-3.0.3-win.exe These things are typically false positives due to overly aggressive filtering. I downloaded the above file from the same server: $ md5 R-3.0.3-win.exe MD5 (R-3.0.3-win.exe) = 446db51e5c188ed2dccbd44dfa5f4aa9 The official MD5 value from the main CRAN server at: https://cran.r-project.org/bin/windows/base/old/3.0.3/md5sum.txt is: 446db51e5c188ed2dccbd44dfa5f4aa9 *R-3.0.3-win.exe So unless that hash value was compromised centrally...which if that is the case, it has been long enough that mirrors would probably reflect that as well. Presuming you can get to a different server, try it to see what happens. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about implementation of the R
> On Jul 31, 2015, at 8:38 AM, Beka Biashvili via R-help > wrote: > > Dear Sir/MadamPlease provide me with the information how to implement R, what > is a steps of implementation and obligations, difficulties and etc. > thank you in advance Mr. Beka Biashvili > Quality Management System I would start with the R Installation and Administration Manual here: https://cran.r-project.org/manuals.html and then other manuals as may be apropos for your needs. If you need some documentation on R’s SDLC or perhaps any clinical trial related regulatory guidance: https://www.r-project.org/certification.html Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trojan with R download
I am working in China for a month and needed to download an earlier version of R in order to use Deseq2 and its requirements. The download got to the last few seconds and hung up. A trojan was found. It could be coincidence that it happened when I was downloading R, or perhaps a man in the middle added a little something. Anyway, I thought you might be interested. You might want to check on this source and others from this server. https://mirrors.ustc.edu.cn/CRAN/bin/windows/base/old/3.0.3/R-3.0.3-win.exe __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about implementation of the R
Dear Sir/MadamPlease provide me with the information how to implement R, what is a steps of implementation and obligations, difficulties and etc. thank you in advance Mr. Beka Biashvili Quality Management System ISO 9001:2008 Lead Auditor 18 Lortkipanidze street, Tbilisi, Georgia. mob: +995 599 414547 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R parallel / foreach - aggregation of results
Martin, I think the main problem is that you are trying to assign your results to the result matrix inside the foreach loop. Parallel functions in R are generally not good at updating parts of matrices from the different workers in this way. Instead, using e.g. foreach, each loop of the foreach-call has to return a vector which can be cbind-ed to a result matrix. Something like: L2distance = foreach(j=1:n1, .combine = cbind) %dopar% { res = rep(NA, 10) for (k in j:n1) res[k] = k*data[j] res } L2distance I am not sure what the np-library is, but you should consider putting it in a clusterExport-call after creating the cluster. Best wishes, Jon On 7/31/2015 2:39 PM, Martin Spindler wrote: Dear all, when I am running the code attached below, it seems that no results are returned, only the predefined NAs. What mistake do I make? Any comments and help is highly appreciated. Thanks and best, Martin Simpar3 <- function(n1) { L2distance <- matrix(NA, ncol=n1, nrow=n1) data <- rnorm(n1) diag(L2distance)=0 cl <- makeCluster(4) registerDoParallel(cl) foreach(j=1:n1) %dopar% { library(np) datj <- data[j] for(k in j:n1) { L2distance[j,k] <- k*datj } } stopCluster(cl) return(L2distance) } Res <- Simpar3(100) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Climate Risk Management Unit Via Fermi 2749, TP 100-01, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Tel: +39 0332 789205 Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R parallel / foreach - aggregation of results
Dear all, when I am running the code attached below, it seems that no results are returned, only the predefined NAs. What mistake do I make? Any comments and help is highly appreciated. Thanks and best, Martin Simpar3 <- function(n1) { L2distance <- matrix(NA, ncol=n1, nrow=n1) data <- rnorm(n1) diag(L2distance)=0 cl <- makeCluster(4) registerDoParallel(cl) foreach(j=1:n1) %dopar% { library(np) datj <- data[j] for(k in j:n1) { L2distance[j,k] <- k*datj } } stopCluster(cl) return(L2distance) } Res <- Simpar3(100) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping help
Hi April, You need nested loops for something like this qs<- c(0,0.25,0.5,1,2,4,8,16,32,64) nrows<-dim(Data)[1] nqs<-length(qs) D.mat<-SE.mat<-matrix(NA,nrow=nrows,ncol=nqs) for(row in 1:nrows) { for(qval in 1:nqs) { # perform your calculation and set D.mat[row,qval] and SE.mat[row,qval] to the return values } } Jim > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of April > Smith > Sent: Friday, July 31, 2015 2:21 AM > To: r-help@r-project.org > Subject: [R] Looping help > > I have never looped before and know I need to. I am unsure how to > proceed: > > >- Action I need done: d(Data[1,2:399], q=0, boot=TRUE, >boot.arg=list(num.iter=1000)) >- I need this to happen to all rows, I need All[1,2:399] to increase > to >All[2:399], etc. >- But I also need the results from q increasing from 0 to 0.25, 0.5, > 1, >2, 4,8,16,32,64 before the loop moves on to the next row. >- For each iteration I will receive two values: D and st.err. I > need >this put into a matrix > > > I feel like this should be pretty simple to learn, but I have never > looped before. > > I am hoping to get more of a tutorial on how to write loop code, then > to just be given the loop code. > > Thanks, > April > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R parallel - slow speed
Thank you very much to you both for your help. I knew that parallelizing has some additional "overhead" costs, but I was surprised be the order of magnitude (it was 10 times slower.) Therefore I thought I made some mistake or that there is a more clever way to do it. Best, Martin Gesendet: Donnerstag, 30. Juli 2015 um 15:28 Uhr Von: "jim holtman" An: "Jeff Newmiller" Cc: "Martin Spindler" , "r-help@r-project.org" Betreff: Re: [R] R parallel - slow speed I ran a test on my Windows box with 4 CPUs. THere were 4 RScript processes started in response to the request for a cluster of 4. Each of these ran for an elapsed time of around 23 seconds, making the median time around 0.2 seconds for 100 iterations as reported by microbenchmark. The 'apply' only takes about 0.003 seconds for a single iteration - again what microbenchmark is reporting. The 4 RScript processes each use about 3 CPU seconds in the 23 seconds of elapsed time, most of that is probably the communication and startup time for the processes and reporting results. So as was pointed out previous there is overhead is running in parallel. You probably have to have at least several seconds of heavy computation for a iteration to make trying to parallelize something. You should also investigate exactly what is happening on your system so that you can account for the time being spent. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 30, 2015 at 8:56 AM, Jeff Newmiller wrote:Parallelizing comes at a price... and there is no guarantee that you can afford it. Vectorizing your algorithms is often a better approach. Microbenchmarking is usually overkill for evaluating parallelizing. You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make each core look like two. The operating system can make a difference also... Windows processes are more expensive to start and communicate between than *nix processes are. In particular, Windows seems to require duplicated RAM pages while *nix can share process RAM (at least until they are written to) so you end up needing more memory and disk paging of virtual memory becomes more likely. --- Jeff Newmiller The . . Go Live... DCN: Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 30, 2015 8:26:34 AM EDT, Martin Spindler wrote: >Dear all, > >I am trying to parallelize the function npnewpar given below. When I am >comparing an application of "apply" with "parApply" the parallelized >version seems to be much slower (cf output below). Therefore I would >like to ask how the function could be parallelized more efficient. >(With increasing sample size the difference becomes smaller, but I was >wondering about this big differences and how it could be improved.) > >Thank you very much for help in advance! > >Best, > >Martin > > >library(microbenchmark) >library(doParallel) > >n <- 500 >y <- rnorm(n) >Xc <- rnorm(n) >Xd <- sample(c(0,1), replace=TRUE) >Weights <- diag(n) >n1 <- 50 >Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE)) > > >detectCores() >cl <- makeCluster(4) >registerDoParallel(cl) >microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, >Weights=Weights, h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc, >Xd = Xd, Weights=Weights, h=0.5), times=100) >stopCluster(cl) > > >Unit: milliseconds > expr min lq mean median >apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights, > h = 0.5) 4.674914 4.726463 5.455323 4.771016 >parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = >Weights, h = 0.5) 34.168250 35.434829 56.553296 39.438899 > uq max neval > 4.843324 57.01519 100 > 49.777265 347.77887 100 > > > > > > > > > > > > > > >npnewpar <- function(y, Xc, Xd, Weights, h, xeval) { > xc <- xeval[1] > xd <- xeval[2] > l <- function(x,X) { > w <- Weights[x,X] > return(w) > } > u <- (Xc-xc)/h > #K <- kernel(u) > K <- dnorm(u) > L <- l(xd,Xd) > nom <- sum(y*K*L) > denom <- sum(K*L) > ghat <- nom/denom > return(ghat) >} > >__ >R-help@r-project.org[R-help@r-project.org] mailing list -- To UNSUBSCRIBE and >more, see >https://stat.ethz.ch/mailman/listinfo/r-help[https://stat.ethz.ch/mailman/listinfo/r-help] >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html] >an
Re: [R] R parallel - slow speed
Thank you very much for your help. I tried it under Unix and then the parallel version was faster than under Windows (but still slower than the non parall version). This is an important point to keep in mind. Thanks for this. Best, Martin Gesendet: Donnerstag, 30. Juli 2015 um 14:56 Uhr Von: "Jeff Newmiller" An: "Martin Spindler" , "r-help@r-project.org" Betreff: Re: [R] R parallel - slow speed Parallelizing comes at a price... and there is no guarantee that you can afford it. Vectorizing your algorithms is often a better approach. Microbenchmarking is usually overkill for evaluating parallelizing. You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make each core look like two. The operating system can make a difference also... Windows processes are more expensive to start and communicate between than *nix processes are. In particular, Windows seems to require duplicated RAM pages while *nix can share process RAM (at least until they are written to) so you end up needing more memory and disk paging of virtual memory becomes more likely. --- Jeff Newmiller The . . Go Live... DCN: Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 30, 2015 8:26:34 AM EDT, Martin Spindler wrote: >Dear all, > >I am trying to parallelize the function npnewpar given below. When I am >comparing an application of "apply" with "parApply" the parallelized >version seems to be much slower (cf output below). Therefore I would >like to ask how the function could be parallelized more efficient. >(With increasing sample size the difference becomes smaller, but I was >wondering about this big differences and how it could be improved.) > >Thank you very much for help in advance! > >Best, > >Martin > > >library(microbenchmark) >library(doParallel) > >n <- 500 >y <- rnorm(n) >Xc <- rnorm(n) >Xd <- sample(c(0,1), replace=TRUE) >Weights <- diag(n) >n1 <- 50 >Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE)) > > >detectCores() >cl <- makeCluster(4) >registerDoParallel(cl) >microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd, >Weights=Weights, h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc, >Xd = Xd, Weights=Weights, h=0.5), times=100) >stopCluster(cl) > > >Unit: milliseconds > expr min lq mean median >apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights, > h = 0.5) 4.674914 4.726463 5.455323 4.771016 >parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = >Weights, h = 0.5) 34.168250 35.434829 56.553296 39.438899 > uq max neval > 4.843324 57.01519 100 > 49.777265 347.77887 100 > > > > > > > > > > > > > > >npnewpar <- function(y, Xc, Xd, Weights, h, xeval) { > xc <- xeval[1] > xd <- xeval[2] > l <- function(x,X) { > w <- Weights[x,X] > return(w) > } > u <- (Xc-xc)/h > #K <- kernel(u) > K <- dnorm(u) > L <- l(xd,Xd) > nom <- sum(y*K*L) > denom <- sum(K*L) > ghat <- nom/denom > return(ghat) >} > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html] >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.