[julia-users] Re: evaluate function on DataFrame row

David van Leeuwen Sat, 01 Feb 2014 05:12:47 -0800

Hi, 

There now is the eachrow iterator which might do what you want more 
efficiently.


df = DataFrame(a=1:2, b=2:3)
func(r::DataFrameRow) = r["a"] * r["b"]
for r in eachrow(df)
       println(func(r))
end

you can also use integer indices for the dataframerow r, r[1] * r[2]

Cheers, 

---david
On Saturday, February 1, 2014 1:25:04 PM UTC+1, Joosep Pata wrote:
>
> I would like to do an explicit loop over a large DataFrame and evaluate a 
> function which depends on a subset of the columns in an arbitrary way. What 
> would be the fastest way to accomplish this? Presently, I’m doing something 
> like 
>
> ~~~ 
> f(df::DataFrame, i::Integer) = df[i, :a] * df[i, :b] + df[i, :c] 
>
> for i=1:nrow(df) 
>         x = f(df, i) 
> end 
> ~~~ 
>
> which according to Profile creates a major bottleneck. 
>
> Would it make sense to somehow pre-create an immutable type corresponding 
> to a single row (my data are BitsKind), and run a compiled function on 
> these row-objects with strong typing? 
>
> Thanks in advance for any advice, 
> Joosep

[julia-users] Re: evaluate function on DataFrame row

Reply via email to