Thanks. I've played around with pure R solutions. The fastest re-write of diff
(for the 1 lag case) I can seem to find is this:
diff2 = function(x) {
y = c(x,NA) - c(NA,x)
y[2:length(x)]
}
#Compiling via 'cmpfun' doesn't seem to help (or hurt):
require(compiler)
diff2 = cmpfun(diff2)
But that only gets ~10% improvement over default 'diff' on my machine. Still
too slow for my particular application.
I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use of
C under the hood).
Could someone show me how to go about doing that?
Thanks!
Kevin
On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:
> ehm... this doesn't take very many ideas.
>
>
> x = runif(n=10e6, min=0, max=1000)
> x = round(x)
>
> system.time( {
> y = x[-1] - x[-length(x)]
> })
>
> I get about 0.5 seconds on my old laptop.
>
> HTH
>
> Peter
>
>
> On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel <[email protected]> wrote:
>> Hi everyone,
>>
>> Speed is the key here.
>>
>> I need to find the difference between a vector and its one-period lag (i.e.
>> the difference between each value and the subsequent one in the vector).
>> Let's say the vector contains 10 million random integers between 0 and
>> 1,000. The solution vector will have 9,999,999 values, since their is no lag
>> for the 1st observation.
>>
>> In R we have:
>>
>> #Set up input vector
>> x = runif(n=10e6, min=0, max=1000)
>> x = round(x)
>>
>> #Find one-period difference
>> y = diff(x)
>>
>> Question is: How can I get the 'diff(x)' part as fast as absolutely
>> possible? I queried some colleagues who work with other languages, and they
>> provided equivalent solutions in Python and Clojure that, on their machines,
>> appear to be potentially much faster (I've put the code below in case anyone
>> is interested). However, they mentioned that the overhead in passing the
>> data between languages could kill any improvements. I don't have much
>> experience integrating other languages, so I'm hoping the community has some
>> ideas about how to approach this particular problem...
>>
>> Many thanks,
>> Kevin
>>
>> In iPython:
>>
>> In [3]: import numpy as np
>> In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
>> In [5]: arr1 = arr[1:].view()
>> In [6]: timeit arr2 = arr1 - arr[:-1]
>> 10 loops, best of 3: 20.1 ms per loop
>>
>> In Clojure:
>>
>> (defn subtract-lag
>> [n]
>> (let [v (take n (repeatedly rand))]
>> (time (dorun (map - v (cons 0 v))))))
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.