Hi, I saw you define a function f(::DataFrameRow) inside the timing loop. I wonder whether the Julia JIT re-compiles this local function each time, or whether it caches the compiled version. I don't really know.
Apparently there is a performance penalty for anonymous functions, as in map(x->x*x, i:10), but I don't know if this extends to locally defined functions. Cheers, ---david On Saturday, February 1, 2014 3:08:18 PM UTC+1, Joosep Pata wrote: > > Thanks! > > I wasn’t aware of eachrow, this seems quite close to what I had in mind. I > ran some simplistic timing checks [1], and the eachrow method is 2-3x > faster. I also tried the type asserts, byt they didn’t seem to make a > difference. I forgot to mention earlier that my data can also be NA, so > it’s not that easy for the compiler. > > [1] > http://nbviewer.ipython.org/urls/dl.dropbox.com/s/mj8g1s0ewmpd1b6/dataframe_iter_speed.ipynb?create=1 > > > Cheers, > Joosep > > On 01 Feb 2014, at 15:11, David van Leeuwen > <[email protected]<javascript:>> > wrote: > > > Hi, > > > > There now is the eachrow iterator which might do what you want more > efficiently. > > > > df = DataFrame(a=1:2, b=2:3) > > func(r::DataFrameRow) = r["a"] * r["b"] > > for r in eachrow(df) > > println(func(r)) > > end > > you can also use integer indices for the dataframerow r, r[1] * r[2] > > > > Cheers, > > > > ---david > > > > On Saturday, February 1, 2014 1:25:04 PM UTC+1, Joosep Pata wrote: > > I would like to do an explicit loop over a large DataFrame and evaluate > a function which depends on a subset of the columns in an arbitrary way. > What would be the fastest way to accomplish this? Presently, I’m doing > something like > > > > ~~~ > > f(df::DataFrame, i::Integer) = df[i, :a] * df[i, :b] + df[i, :c] > > > > for i=1:nrow(df) > > x = f(df, i) > > end > > ~~~ > > > > which according to Profile creates a major bottleneck. > > > > Would it make sense to somehow pre-create an immutable type > corresponding to a single row (my data are BitsKind), and run a compiled > function on these row-objects with strong typing? > > > > Thanks in advance for any advice, > > Joosep > >
