Hi everyone,
Speed is the key here.
I need to find the difference between a vector and its one-period lag (i.e. the
difference between each value and the subsequent one in the vector). Let's say
the vector contains 10 million random integers between 0 and 1,000. The
solution vector will have 9,999,999 values, since their is no lag for the 1st
observation.
In R we have:
#Set up input vector
x = runif(n=10e6, min=0, max=1000)
x = round(x)
#Find one-period difference
y = diff(x)
Question is: How can I get the 'diff(x)' part as fast as absolutely possible? I
queried some colleagues who work with other languages, and they provided
equivalent solutions in Python and Clojure that, on their machines, appear to
be potentially much faster (I've put the code below in case anyone is
interested). However, they mentioned that the overhead in passing the data
between languages could kill any improvements. I don't have much experience
integrating other languages, so I'm hoping the community has some ideas about
how to approach this particular problem...
Many thanks,
Kevin
In iPython:
In [3]: import numpy as np
In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
In [5]: arr1 = arr[1:].view()
In [6]: timeit arr2 = arr1 - arr[:-1]
10 loops, best of 3: 20.1 ms per loop
In Clojure:
(defn subtract-lag
[n]
(let [v (take n (repeatedly rand))]
(time (dorun (map - v (cons 0 v))))))
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.