Thank you very much, especially Milan and Bert! I will do some speedtests and fit the function to my needs.
I think the best way would be a modified function in C... But i am not familiar enough with C. Perhaps this would be a simple but useful extension. If someone has a solution, i would appreciate a post in this mailing list. Cheers and thanks to all, Nico 2012/9/19 Bert Gunter <gunter.ber...@gene.com>: > Well, following up on this observation, which can be put under the > heading of "Sometimes vectorization can be much slower than explicit > loops" , I offer the following: > > firsti <-function(x,k) > { > i <- 1 > while(x[i]<=k){i <- i+1} > i > } > >> system.time(for(i in 1:100)which(x>.99)[1]) > user system elapsed > 19.1 2.4 22.2 > >> system.time(for(i in 1:100)which.max(x>.99)) > user system elapsed > 30.45 6.75 37.46 > >> system.time(for(i in 1:100)firsti(x,.99)) > user system elapsed > 0.03 0.00 0.03 > > ## About a 500 - 1000 fold speedup ! > >> firsti(x,.99) > [1] 122 > > It doesn't seem to scale too badly, either (whatever THAT means!): > (Of course, the which() versions are essentially identical in timing, > and so are omitted) > >> system.time(for(i in 1:100)firsti(x,.9999)) > user system elapsed > 2.70 0.00 2.72 > >> firsti(x,.9999) > [1] 18200 > > Of course, at some point, the explicit looping is awful -- with k = > .999999, the index was about 360000, and the timing test took 54 > seconds. > > So I guess the point is -- as always -- that the optimal approach > depends on the nature of the data. Prudence and robustness clearly > demands the vectorized which() approaches if you have no information. > But if you do know something about the data, then you can often write > much faster tailored solutions. Which is hardly revelatory, of course. > > Cheers to all, > Bert > > On Wed, Sep 19, 2012 at 8:55 AM, Milan Bouchet-Valat <nalimi...@club.fr> > wrote: >> Le mercredi 19 septembre 2012 à 15:23 +0000, William Dunlap a écrit : >>> The original method is faster than which.max for longish numeric vectors >>> (in R-2.15.1), but you should check time and memory usage on your >>> own machine: >>> >>> > x <- runif(18e6) >>> > system.time(for(i in 1:100)which(x>0.99)[1]) >>> user system elapsed >>> 11.64 1.05 12.70 >>> > system.time(for(i in 1:100)which.max(x>0.99)) >>> user system elapsed >>> 16.38 2.94 19.35 >> If you the probability that such an element appears at the beginning of >> the vector, a custom hack might well be more efficient. The problem with >> ">", which() and which.max() is that they will go over all the elements >> of the vector even if it's not needed at all. So you can start with a >> small subset of the vector, and increase its size in a few steps until >> you find the value you're looking for. >> >> Proof of concept (the values of n obviously need to be adapted): >> x <-runif(1e7) >> >> find <- function(x, lim) { >> len <- length(x) >> >> for(n in 2^(14:0)) { >> val <- which(x[seq.int(1L, len/n)] > lim) >> >> if(length(val) > 0) return(val[1]) >> } >> >> return(NULL) >> } >> >>> system.time(for(i in 1:100)which(x>0.999)[1]) >> utilisateur système écoulé >> 9.740 5.795 15.890 >>> system.time(for(i in 1:100)which.max(x>0.999)) >> utilisateur système écoulé >> 14.288 9.510 24.562 >>> system.time(for(i in 1:100)find(x, .999)) >> utilisateur système écoulé >> 0.017 0.002 0.019 >>> find(x, .999) >> [1] 1376 >> >> (Looks almost like cheating... ;-) >> >> >> >> >> >>> Bill Dunlap >>> Spotfire, TIBCO Software >>> wdunlap tibco.com >>> >>> >>> > -----Original Message----- >>> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> > On Behalf >>> > Of Jeff Newmiller >>> > Sent: Wednesday, September 19, 2012 8:06 AM >>> > To: Mike Spam; r-help@r-project.org >>> > Subject: Re: [R] effective way to return only the first argument of >>> > "which()" >>> > >>> > ?which.max >>> > --------------------------------------------------------------------------- >>> > Jeff Newmiller The ..... ..... Go >>> > Live... >>> > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>> > Go... >>> > Live: OO#.. Dead: OO#.. Playing >>> > Research Engineer (Solar/Batteries O.O#. #.O#. with >>> > /Software/Embedded Controllers) .OO#. .OO#. >>> > rocks...1k >>> > --------------------------------------------------------------------------- >>> > Sent from my phone. Please excuse my brevity. >>> > >>> > Mike Spam <ichmags...@googlemail.com> wrote: >>> > >>> > >Hi, >>> > > >>> > >I was looking for a function like "which()" but only returns the first >>> > >argument. >>> > >Compare: >>> > > >>> > >x <- c(1,2,3,4,5,6) >>> > >y <- 4 >>> > >which(x>y) >>> > > >>> > >returns: >>> > >5,6 >>> > > >>> > >which(x>y)[1] >>> > >returns: >>> > >5 >>> > > >>> > >which(x>y)[1] is exactly what i need. I did use this but the dataset >>> > >is too big (~18 mio. Points). >>> > >That's why i need a more effective way to get the first element of a >>> > >vector which is bigger/smaller than a specific number. >>> > > >>> > >I found "match()" but this function only works for equal numbers. >>> > > >>> > > >>> > > >>> > >Thanks, >>> > >Nico >>> > > >>> > >______________________________________________ >>> > >R-help@r-project.org mailing list >>> > >https://stat.ethz.ch/mailman/listinfo/r-help >>> > >PLEASE do read the posting guide >>> > >http://www.R-project.org/posting-guide.html >>> > >and provide commented, minimal, self-contained, reproducible code. >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.