Hi Alex, That is closer to what I had in mind originally, but I actually solved the problem by reorganizing my algorithm to avoid filters.
Thank you, -Júlio 2016-10-13 8:21 GMT-07:00 Alex Mellnik <a.r.mell...@gmail.com>: > Hi Júlio, > > If you're just interested in using an arbitrary function to filter on rows > you can do something like: > > df = DataFrame(Fish = ["Amir", "Betty", "Clyde"], Mass = [1.2, 3.3, 0.4]) > filter(row) = (row[:Fish][1] != "A")&(row[:Mass]>1) > df = df[[filter(r) for r in eachrow(df)],:] > > Is that what you're looking for? If not, can you give an example of what > you want to do? > > Best, > > Alex > > On Wednesday, October 12, 2016 at 10:20:52 PM UTC-7, Júlio Hoffimann wrote: >> >> Thank you very Much David, these queries you showed are really nice. I >> meant that ideally I wouldn't need to install another package for a simple >> filter operation on the rows. >> >> -Júlio >> >> 2016-10-12 22:14 GMT-07:00 <ant...@berkeley.edu>: >> >>> Were you worried about Query being not lightweight enough in terms of >>> overhead, or in terms of syntax? >>> >>> I just added a more lightweight syntax for this scenario to Query. You >>> can now do the following two things: >>> >>> q = @where(df, i->i.price > 30.) >>> >>> that will return a filtered iterator. You can materialize that into a >>> DataFrame with collect(q, DataFrame). >>> >>> I also added a counting option. Turns out that is actually a LINQ query >>> operator, and the goal is to implement all of those in Query. The syntax is >>> simple: >>> >>> @count(df, i->i.price > 30.) >>> >>> returns the number of rows for which the filter condition is true. >>> >>> Under the hood both of these new syntax options use the normal Query >>> machinery, this just provides a simpler syntax relative to the more >>> elaborate things I've posted earlier. In terms of LINQ, this corresponds to >>> the method invocation API that LINQ has. I'm still figuring out how to >>> surface something like @count in the query expression syntax, but for now >>> one can use it via this macro. >>> >>> All of this is on master right now, so you would have to do >>> Pkg.checkout("Query") to get these macros. >>> >>> Best, >>> David >>> >>> On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann >>> wrote: >>>> >>>> Hi David, >>>> >>>> Thank you for your elaborated answer and for writing a package for >>>> general queries, that is great! I will keep the package in mind if I need >>>> something more complex. >>>> >>>> I am currently looking for a lightweight solution within DataFrames, >>>> filtering is a very common operation. Right now, I am considering >>>> converting the DataFrame to an array and looping over the rows. I wonder if >>>> there is a syntactic sugar for this loop. >>>> >>>> -Júlio >>>> >>>> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu>: >>>> >>>>> Hi Julio, >>>>> >>>>> >>>>> >>>>> you can use the Query package for the first part. To filter a >>>>> DataFrame using some arbitrary julia expression, use something like this: >>>>> >>>>> >>>>> >>>>> using DataFrames, Query, NamedTuples >>>>> >>>>> >>>>> >>>>> q = @from i in df begin >>>>> >>>>> @where <filter expression> >>>>> >>>>> @select i >>>>> >>>>> end >>>>> >>>>> >>>>> >>>>> You can use any julia code in <filter expression>. Say your DataFrame >>>>> has a column called price, then you could filter like this: >>>>> >>>>> >>>>> >>>>> @where i.price > 30. >>>>> >>>>> >>>>> >>>>> The i will be a NamedTuple type, so you can access the columns either >>>>> by their name, or also by their index, e.g. >>>>> >>>>> >>>>> >>>>> @where i[1] > 30. >>>>> >>>>> >>>>> >>>>> if you want to filter by the first column. You can also just call some >>>>> function that you have defined somewhere else: >>>>> >>>>> >>>>> >>>>> @where foo(i) >>>>> >>>>> >>>>> >>>>> As long as the <julia expression> returns a Bool, you should be good. >>>>> >>>>> >>>>> >>>>> If you run a query like this, q will be a standard julia iterator. >>>>> Right now you can’t just say length(q), although that is something I >>>>> should >>>>> probably enable at some point (I’m also looking into the VB LINQ syntax >>>>> that supports things like counting in the query expression itself). >>>>> >>>>> >>>>> >>>>> But you could materialize the query as an array and then look at the >>>>> length of that: >>>>> >>>>> >>>>> >>>>> q = @from i in df begin >>>>> >>>>> @where <filter expression> >>>>> >>>>> @select i >>>>> >>>>> @collect >>>>> >>>>> end >>>>> >>>>> count = length(q) >>>>> >>>>> >>>>> >>>>> The @collect statement means that the query will return an array of a >>>>> NamedTuple type (you can also materialize it into a whole bunch of other >>>>> data structures, take a look at the documentation). >>>>> >>>>> >>>>> >>>>> Let me know if this works, or if you have any other feedback on >>>>> Query.jl, I’m much in need of some user feedback for the package at this >>>>> point. Best way for that is to open issues here >>>>> https://github.com/davidanthoff/Query.jl. >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On >>>>> Behalf Of *Júlio Hoffimann >>>>> *Sent:* Wednesday, October 12, 2016 5:20 PM >>>>> *To:* julia-users <julia...@googlegroups.com> >>>>> *Subject:* [julia-users] Filtering DataFrame with a function >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I have a DataFrame for which I want to filter rows that match a given >>>>> criteria. I don't have the number of columns beforehand, so I cannot >>>>> explicitly list the criteria with the :symbol syntax or write down a fixed >>>>> number of indices. >>>>> >>>>> >>>>> >>>>> Is there any way to filter with a lambda expression? Or even better, >>>>> is there any efficient way to count the number of occurrences of a >>>>> specific >>>>> row of observations? >>>>> >>>>> >>>>> >>>>> -Júlio >>>>> >>>> >>>> >>