Hi, 

I saw you define a function f(::DataFrameRow) inside the timing loop.  I 
wonder whether the Julia JIT re-compiles this local function each time, or 
whether it caches the compiled version.  I don't really know. 

Apparently there is a performance penalty for anonymous functions, as in 
map(x->x*x, i:10), but I don't know if this extends to locally defined 
functions.  

Cheers, 

---david

On Saturday, February 1, 2014 3:08:18 PM UTC+1, Joosep Pata wrote:
>
> Thanks! 
>
> I wasn’t aware of eachrow, this seems quite close to what I had in mind. I 
> ran some simplistic timing checks [1], and the eachrow method is 2-3x 
> faster. I also tried the type asserts, byt they didn’t seem to make a 
> difference. I forgot to mention earlier that my data can also be NA, so 
> it’s not that easy for the compiler. 
>
> [1] 
> http://nbviewer.ipython.org/urls/dl.dropbox.com/s/mj8g1s0ewmpd1b6/dataframe_iter_speed.ipynb?create=1
>  
>
> Cheers, 
> Joosep 
>
> On 01 Feb 2014, at 15:11, David van Leeuwen 
> <[email protected]<javascript:>> 
> wrote: 
>
> > Hi, 
> > 
> > There now is the eachrow iterator which might do what you want more 
> efficiently. 
> > 
> > df = DataFrame(a=1:2, b=2:3) 
> > func(r::DataFrameRow) = r["a"] * r["b"] 
> > for r in eachrow(df) 
> >        println(func(r)) 
> > end 
> > you can also use integer indices for the dataframerow r, r[1] * r[2] 
> > 
> > Cheers, 
> > 
> > ---david 
> > 
> > On Saturday, February 1, 2014 1:25:04 PM UTC+1, Joosep Pata wrote: 
> > I would like to do an explicit loop over a large DataFrame and evaluate 
> a function which depends on a subset of the columns in an arbitrary way. 
> What would be the fastest way to accomplish this? Presently, I’m doing 
> something like 
> > 
> > ~~~ 
> > f(df::DataFrame, i::Integer) = df[i, :a] * df[i, :b] + df[i, :c] 
> > 
> > for i=1:nrow(df) 
> >         x = f(df, i) 
> > end 
> > ~~~ 
> > 
> > which according to Profile creates a major bottleneck. 
> > 
> > Would it make sense to somehow pre-create an immutable type 
> corresponding to a single row (my data are BitsKind), and run a compiled 
> function on these row-objects with strong typing? 
> > 
> > Thanks in advance for any advice, 
> > Joosep 
>
>

Reply via email to