Avi and Jeff, Thank you very much for your answers. I did not think I would get such an interessing answer when I asked my question.
In fact, I discovered recently the list comprehension reading some python code and I was seduced but the compact notation so I decided to do an exercice on an example. Now I know why the use of the comprehenr use is slow (cf. avi answer) and I was impressed by the jeff’s function which uses the vertorization. Unit: milliseconds expr min lq mean median S_diff2 <- dloop(N1, M2, ratio_sampling, vec1, vec2) 205.0905 212.86080 226.80683 221.3820 S_diff3 <- vloop(N1, M2, ratio_sampling, vec1, vec2) 49.8971 57.05555 64.25502 58.9455 uq max neval cld 227.57695 297.9974 20 a 63.15645 113.4106 20 b I did not have the idea to transform the second loop with a vectorize approach. Hence, the good direction is to think more in terms of vectorization. I will search for some exercices on the web. Le 16/06/2024 à 19:44, avi.e.gr...@gmail.com a écrit : > I fully agree with Jeff that the best way to use ANY language is to evaluate > the language in terms of not just the capabilities it offers but also the > philosophy behind what it was created for and how people do things and just > grok it and use it mostly in the way intended. I do that with all the > languages I learn, whether for computers or humans. > > Bringing in something you like from another language often gets in the way > of actually using what you have. But realistically, many languages that were > designed for one purpose will then evolve to suit many other purposes and > lose their direction and often their focus and even efficiency. S was > designed for statistical computing of some sorts and that meant a vectorized > approach could take you far. Python had other design goals and the original > designers wanted elements of genrality that a list provides more than a > vector does. R has lists too, but note if you want to use the kind of > dictionary or set used in python, which definitely can have advanatages and > disadvantages, you can find add-ons in R packages that give you something > like that too. And, note, many, myself included, really appreciate alternate > ways to do things and heavily use tidyverse packages that mostly are not > base R but sort of a grafted-on other language. So what? Purists don't > necessarily do well in the real world. > > On the topic at hand and speed, I went an looked at the comprehenr package > and it is no wonder it is slower. > > Here is the code Laurent used in calling to_vec: > >> to_vec > function (expr, recursive = TRUE, use.names = FALSE) > { > res = eval.parent(substitute(comprehenr::to_list(expr))) > unlist(res, recursive = recursive, use.names = use.names) > }' > > It does a few things and then calls to_list() to do the actual work. This > extra layer may slow it down a tad. > > So what does to_list() do? > >> to_list > function (expr) > { > expr = substitute(expr) > is_loop(expr) || stop(paste("argument should be expression with 'for', > 'while' or 'repeat' but we have: ", > deparse(expr, width.cutoff = 500)[1])) > expr = expand_loop_variables(expr) > expr = add_assignment_to_final_loops(expr) > expr = substitute(local({ > .___res <- list() > .___counter <- 0 > expr > .___res > })) > eval.parent(expr) > } > > I won't follow the entire chain, but it seems to take the code supplied and > isolate various parts needed and, in effect, build up some other code and > evaluates it in the context of the parent. > > Obviously, had you written similar (or different using loops or whatever) > code directly, it might execute faster. > > As I mentioned, this is largely syntactic sugar. A reasonable use of this is > if you are given python code and asked to translate it into R code that does > the same thing. You could spend time thinking and designing and come up with > the kind of R code an R expert might have done, or skip that and just make > slight changes needed for R and for the package being used and it should > work, but not necessarily the way a native polished version works. Later, if > time and finances permit, and you want it faster, rewrite it. > > I note the package, with a vignetter here: > https://cran.r-project.org/web//packages/comprehenr/vignettes/Introduction.h > tml > > Does make some changes so translating is not trivial. For example, the > python syntax such as: > > [ f(x) for x in iterable if condition] > > Is not able to be used in quite that order. It loosely translates to: > > to_vec(for x in iterable if condition f(x)) > > with the result at the end rather than beginning. And, since R has not > chosen to return multiple things from a function like python does and just > unpack them, they had to come up with interesting workarounds like `x, y` > and frankly, quite a few things I can do in python in this context are > simply not supported by this code, nor can be expected to. > > I think if someone using python was used to using the extended version by > loading modules like numpy and pandas and using them heavily, they might > find it a tad easier to then port the code to R and use vectorized > functionality better. > > So, are packages like comprehend a crutch or are they helpful or even evil? > My view is to not be a religious fanatic and assume any language was really > designed perfectly. Some ideas and implementations can be a useful way to > formulate a problem for a programmer who thinks in that way, at least until > they learn to also think in another. An example would be the R way to do > sets is probably not as useful as the python way. If I needed heavy duty > usage, I might load a package that lets me think about it the way I want, > and the same for a dictionary. > > But, if I am writing code for others to maintain and change later, the > closer I stick to the main language or accepted packages, the better. > > > > -----Original Message----- > From: R-help<r-help-boun...@r-project.org> On Behalf Of Jeff Newmiller via > R-help > Sent: Sunday, June 16, 2024 1:13 PM > To:r-help@r-project.org > Subject: Re: [R] slowness when I use a list comprehension > > I would be more strong on this advice: learn to think in R, rather than > thinking in Python, when programming in R. R has atomic vectors... Python > does not (until you import a package that implements them). I find that > while it is possible to import R thinking into Python, Python programmers > seem to object for stylistic reasons even though such thinking speeds up > Python also. > > A key step in that direction is to stop using lists directly for numeric > calculations... use them to manage numeric vactors. In some cases you can > switch to matrices or arrays to remove even more list manipulations from the > script. > > library(microbenchmark) > > ratio_sampling <- 500 > ## size of the first serie > N1 <- 70000 > ## size of the second serie > N2 <- 100 > ## mock data > set.seed(123) > vec1 <- rnorm(N1) > vec2 <- runif(N2) > > dloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) { > S_diff2 <- numeric( > N1-(N2-1)*ratio_sampling > ) > for( j in 1:length(S_diff2) ) { > sum_squares <- 0 > for( i in 1:length(vec2)){ > sum_squares <- ( > sum_squares > + ( > vec1[ (i-1)*ratio_sampling+j ] > - vec2[i] > )**2 > ) > } > S_diff2[j] <- sum_squares > } > S_diff2 > } > > vloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) { > S_diff3 <- numeric( > N1-(N2-1)*ratio_sampling > ) > i <- seq_along( vec2 ) > k <- (i-1)*ratio_sampling > for( j in seq_along( S_diff3 ) ) { > S_diff3[j] <- sum( > ( > vec1[ j + k ] > - vec2 > )^2 > ) > } > S_diff3 > } > > microbenchmark( > S_diff2 <- dloop( N1, M2, ratio_sampling, vec1, vec2 ) > , S_diff3 <- vloop( N1, M2, ratio_sampling, vec1, vec2 ) > , times = 20 > ) > > all.equal( S_diff2, S_diff3 ) > > > On June 16, 2024 9:33:54 AM PDT,avi.e.gr...@gmail.com wrote: >> Laurent, >> >> Thank you for introducing me to a package I did not know existed as I use > features like list comprehension in python all the time and could see using > it in R now that I know it is available. >> As to why you see your example as slow, I see you used a fairly complex and > nested expression and wonder if it was a better way to go. As you are > dealing with an interpreter doing delayed evaluation, I can imagine reasons > it can be slow. But note the package comprehenr may not be designed to be > more efficient than loops or of the more built-in functional methods that > can be faster. The package is there perhaps more as a compatibility helper > that allows you to write closer to the python style and perhaps re-shapes > what you wrote into a set of instructions in more native R. >> Just for comparison, in python, things like comprehensions for list or > dictionaries or tuples often are syntactic sugar and the interpreter may > simply rewrite them more like the first program you typed and evaluates > that. The comprehensions are more designed for users who can think another > way and write things more compactly as one-liners. Depending on > implementations, they may be faster or slower on some examples. >> I am not saying there is nothing else that is slowing it down for you. I am > suggesting that using the feature as currently implemented may not be an > advantage except in your thought process. It may be it could be improved, > such as by replacing more functionality out of R and into compiled languages > as has been done for many packages. >> Avi >> >> -----Original Message----- >> From: R-help<r-help-boun...@r-project.org> On Behalf Of Laurent Rhelp >> Sent: Sunday, June 16, 2024 11:28 AM >> To:r-help@r-project.org >> Subject: [R] slowness when I use a list comprehension >> >> Dear RHelp-list, >> >> I try to use the package comprehenr to replace a for loop by a list >> comprehension. >> >> I wrote the code but I certainly miss something because it is very >> slower compared to the for loops. May you please explain to me why the >> list comprehension is slower in my case. >> >> Here is my example. I do the calculation of the square difference >> between the values of two vectors vec1 and vec2, the ratio sampling >> between vec1 and vec2 is equal to ratio_sampling. I have to use only the >> 500th value of the first serie before doing the difference with the >> value of the second serie (vec2). >> >> Thank you >> >> Best regards >> >> Laurent >> >> library(tictoc) >> library(comprehenr) >> >> ratio_sampling <- 500 >> ## size of the first serie >> N1 <- 70000 >> ## size of the second serie >> N2 <- 100 >> ## mock data >> set.seed(123) >> vec1 <- rnorm(N1) >> vec2 <- runif(N2) >> >> >> ## 1. with the "for" loops >> >> ## the square differences will be stored in a vector >> S_diff2 <- numeric((N1-(N2-1)*ratio_sampling)) >> tic() >> for( j in 1:length(S_diff2)){ >> sum_squares <- 0 >> for( i in 1:length(vec2)){ >> sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] - >> vec2[i])**2) >> } >> S_diff2[j] <- sum_squares >> } >> toc() >> ## 0.22 sec elapsed >> which.max(S_diff2) >> ## 7857 >> >> ## 2. with the lists comprehension >> tic() >> S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in >> 1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2)))) >> toc() >> ## 25.09 sec elapsed >> which.max(S_diff2) >> ## 7857 >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.