Hi David,

Thank you for your elaborated answer and for writing a package for general
queries, that is great! I will keep the package in mind if I need something
more complex.

I am currently looking for a lightweight solution within DataFrames,
filtering is a very common operation. Right now, I am considering
converting the DataFrame to an array and looping over the rows. I wonder if
there is a syntactic sugar for this loop.

-Júlio

2016-10-12 17:48 GMT-07:00 David Anthoff <anth...@berkeley.edu>:

> Hi Julio,
>
>
>
> you can use the Query package for the first part. To filter a DataFrame
> using some arbitrary julia expression, use something like this:
>
>
>
> using DataFrames, Query, NamedTuples
>
>
>
> q = @from i in df begin
>
>     @where <filter expression>
>
>     @select i
>
> end
>
>
>
> You can use any julia code in <filter expression>. Say your DataFrame has
> a column called price, then you could filter like this:
>
>
>
> @where i.price > 30.
>
>
>
> The i will be a NamedTuple type, so you can access the columns either by
> their name, or also by their index, e.g.
>
>
>
> @where i[1] > 30.
>
>
>
> if you want to filter by the first column. You can also just call some
> function that you have defined somewhere else:
>
>
>
> @where foo(i)
>
>
>
> As long as the <julia expression> returns a Bool, you should be good.
>
>
>
> If you run a query like this, q will be a standard julia iterator. Right
> now you can’t just say length(q), although that is something I should
> probably enable at some point (I’m also looking into the VB LINQ syntax
> that supports things like counting in the query expression itself).
>
>
>
> But you could materialize the query as an array and then look at the
> length of that:
>
>
>
> q = @from i in df begin
>
>     @where <filter expression>
>
>     @select i
>
>     @collect
>
> end
>
> count = length(q)
>
>
>
> The @collect statement means that the query will return an array of a
> NamedTuple type (you can also materialize it into a whole bunch of other
> data structures, take a look at the documentation).
>
>
>
> Let me know if this works, or if you have any other feedback on Query.jl,
> I’m much in need of some user feedback for the package at this point. Best
> way for that is to open issues here https://github.com/
> davidanthoff/Query.jl.
>
>
>
> Best,
>
> David
>
>
>
> *From:* julia-users@googlegroups.com [mailto:julia-users@googlegroups.com]
> *On Behalf Of *Júlio Hoffimann
> *Sent:* Wednesday, October 12, 2016 5:20 PM
> *To:* julia-users <julia-users@googlegroups.com>
> *Subject:* [julia-users] Filtering DataFrame with a function
>
>
>
> Hi,
>
>
>
> I have a DataFrame for which I want to filter rows that match a given
> criteria. I don't have the number of columns beforehand, so I cannot
> explicitly list the criteria with the :symbol syntax or write down a fixed
> number of indices.
>
>
>
> Is there any way to filter with a lambda expression? Or even better, is
> there any efficient way to count the number of occurrences of a specific
> row of observations?
>
>
>
> -Júlio
>

Reply via email to