Hi Alex,

That is closer to what I had in mind originally, but I actually solved the
problem by reorganizing my algorithm to avoid filters.

Thank you,
-Júlio

2016-10-13 8:21 GMT-07:00 Alex Mellnik <a.r.mell...@gmail.com>:

> Hi Júlio,
>
> If you're just interested in using an arbitrary function to filter on rows
> you can do something like:
>
> df = DataFrame(Fish = ["Amir", "Betty", "Clyde"], Mass = [1.2, 3.3, 0.4])
> filter(row) = (row[:Fish][1] != "A")&(row[:Mass]>1)
> df = df[[filter(r) for r in eachrow(df)],:]
>
> Is that what you're looking for?  If not, can you give an example of what
> you want to do?
>
> Best,
>
> Alex
>
> On Wednesday, October 12, 2016 at 10:20:52 PM UTC-7, Júlio Hoffimann wrote:
>>
>> Thank you very Much David, these queries you showed are really nice. I
>> meant that ideally I wouldn't need to install another package for a simple
>> filter operation on the rows.
>>
>> -Júlio
>>
>> 2016-10-12 22:14 GMT-07:00 <ant...@berkeley.edu>:
>>
>>> Were you worried about Query being not lightweight enough in terms of
>>> overhead, or in terms of syntax?
>>>
>>> I just added a more lightweight syntax for this scenario to Query. You
>>> can now do the following two things:
>>>
>>> q = @where(df, i->i.price > 30.)
>>>
>>> that will return a filtered iterator. You can materialize that into a
>>> DataFrame with collect(q, DataFrame).
>>>
>>> I also added a counting option. Turns out that is actually a LINQ query
>>> operator, and the goal is to implement all of those in Query. The syntax is
>>> simple:
>>>
>>> @count(df, i->i.price > 30.)
>>>
>>> returns the number of rows for which the filter condition is true.
>>>
>>> Under the hood both of these new syntax options use the normal Query
>>> machinery, this just provides a simpler syntax relative to the more
>>> elaborate things I've posted earlier. In terms of LINQ, this corresponds to
>>> the method invocation API that LINQ has. I'm still figuring out how to
>>> surface something like @count in the query expression syntax, but for now
>>> one can use it via this macro.
>>>
>>> All of this is on master right now, so you would have to do
>>> Pkg.checkout("Query") to get these macros.
>>>
>>> Best,
>>> David
>>>
>>> On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann
>>> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thank you for your elaborated answer and for writing a package for
>>>> general queries, that is great! I will keep the package in mind if I need
>>>> something more complex.
>>>>
>>>> I am currently looking for a lightweight solution within DataFrames,
>>>> filtering is a very common operation. Right now, I am considering
>>>> converting the DataFrame to an array and looping over the rows. I wonder if
>>>> there is a syntactic sugar for this loop.
>>>>
>>>> -Júlio
>>>>
>>>> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu>:
>>>>
>>>>> Hi Julio,
>>>>>
>>>>>
>>>>>
>>>>> you can use the Query package for the first part. To filter a
>>>>> DataFrame using some arbitrary julia expression, use something like this:
>>>>>
>>>>>
>>>>>
>>>>> using DataFrames, Query, NamedTuples
>>>>>
>>>>>
>>>>>
>>>>> q = @from i in df begin
>>>>>
>>>>>     @where <filter expression>
>>>>>
>>>>>     @select i
>>>>>
>>>>> end
>>>>>
>>>>>
>>>>>
>>>>> You can use any julia code in <filter expression>. Say your DataFrame
>>>>> has a column called price, then you could filter like this:
>>>>>
>>>>>
>>>>>
>>>>> @where i.price > 30.
>>>>>
>>>>>
>>>>>
>>>>> The i will be a NamedTuple type, so you can access the columns either
>>>>> by their name, or also by their index, e.g.
>>>>>
>>>>>
>>>>>
>>>>> @where i[1] > 30.
>>>>>
>>>>>
>>>>>
>>>>> if you want to filter by the first column. You can also just call some
>>>>> function that you have defined somewhere else:
>>>>>
>>>>>
>>>>>
>>>>> @where foo(i)
>>>>>
>>>>>
>>>>>
>>>>> As long as the <julia expression> returns a Bool, you should be good.
>>>>>
>>>>>
>>>>>
>>>>> If you run a query like this, q will be a standard julia iterator.
>>>>> Right now you can’t just say length(q), although that is something I 
>>>>> should
>>>>> probably enable at some point (I’m also looking into the VB LINQ syntax
>>>>> that supports things like counting in the query expression itself).
>>>>>
>>>>>
>>>>>
>>>>> But you could materialize the query as an array and then look at the
>>>>> length of that:
>>>>>
>>>>>
>>>>>
>>>>> q = @from i in df begin
>>>>>
>>>>>     @where <filter expression>
>>>>>
>>>>>     @select i
>>>>>
>>>>>     @collect
>>>>>
>>>>> end
>>>>>
>>>>> count = length(q)
>>>>>
>>>>>
>>>>>
>>>>> The @collect statement means that the query will return an array of a
>>>>> NamedTuple type (you can also materialize it into a whole bunch of other
>>>>> data structures, take a look at the documentation).
>>>>>
>>>>>
>>>>>
>>>>> Let me know if this works, or if you have any other feedback on
>>>>> Query.jl, I’m much in need of some user feedback for the package at this
>>>>> point. Best way for that is to open issues here
>>>>> https://github.com/davidanthoff/Query.jl.
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On
>>>>> Behalf Of *Júlio Hoffimann
>>>>> *Sent:* Wednesday, October 12, 2016 5:20 PM
>>>>> *To:* julia-users <julia...@googlegroups.com>
>>>>> *Subject:* [julia-users] Filtering DataFrame with a function
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I have a DataFrame for which I want to filter rows that match a given
>>>>> criteria. I don't have the number of columns beforehand, so I cannot
>>>>> explicitly list the criteria with the :symbol syntax or write down a fixed
>>>>> number of indices.
>>>>>
>>>>>
>>>>>
>>>>> Is there any way to filter with a lambda expression? Or even better,
>>>>> is there any efficient way to count the number of occurrences of a 
>>>>> specific
>>>>> row of observations?
>>>>>
>>>>>
>>>>>
>>>>> -Júlio
>>>>>
>>>>
>>>>
>>

Reply via email to