Hi,
There now is the eachrow iterator which might do what you want more
efficiently.
df = DataFrame(a=1:2, b=2:3)
func(r::DataFrameRow) = r["a"] * r["b"]
for r in eachrow(df)
println(func(r))
end
you can also use integer indices for the dataframerow r, r[1] * r[2]
Cheers,
---david
On Saturday, February 1, 2014 1:25:04 PM UTC+1, Joosep Pata wrote:
>
> I would like to do an explicit loop over a large DataFrame and evaluate a
> function which depends on a subset of the columns in an arbitrary way. What
> would be the fastest way to accomplish this? Presently, I’m doing something
> like
>
> ~~~
> f(df::DataFrame, i::Integer) = df[i, :a] * df[i, :b] + df[i, :c]
>
> for i=1:nrow(df)
> x = f(df, i)
> end
> ~~~
>
> which according to Profile creates a major bottleneck.
>
> Would it make sense to somehow pre-create an immutable type corresponding
> to a single row (my data are BitsKind), and run a compiled function on
> these row-objects with strong typing?
>
> Thanks in advance for any advice,
> Joosep