I'm also a fan of the expression-based interface (mostly because I'm used to similar things in Pandas). I haven't looked at that code, though, so I can't comment on the complexity.
Kevin On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson <blakejohnso...@gmail.com>wrote: > Sure, but the resulting expression is *much* more verbose. I just noticed > that all expression-based indexing was on the chopping block. What is left > after all this? > > I can see how axing these features would make DataFrames.jl easier to > maintain, but I found the expression stuff to present a rather nice > interface. > > --Blake > > > On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: > >> Can you do something like df[“ColA”] = f(df)? >> >> — John >> >> >> On Jan 21, 2014, at 8:48 AM, Blake Johnson <blakejo...@gmail.com> wrote: >> >> I use within! pretty frequently. What should I be using instead if that >> is on the chopping block? >> >> --Blake >> >> On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: >>> >>> I also agree with your approach, John. Based on your criteria, here >>> are some other things to consider for the chopping block. >>> >>> - expression-based indexing >>> - NamedArray (you already have an issue on this) >>> - with, within, based_on and variants >>> - @transform, @DataFrame >>> - select, filter >>> - DataStream >>> >>> Many of these were attempts to ease syntax via delayed evaluation. We >>> can either do without or try to implement something like LINQ. >>> >>> >>> >>> On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire <kevin....@gmail.com> >>> wrote: >>> > Hi John, >>> > >>> > I agree with pretty much everything you have written here, and really >>> > appreciate that you've taken the lead in cleaning things up and >>> getting us >>> > on track. >>> > >>> > Cheers! >>> > Kevin >>> > >>> > >>> > On Mon, Jan 20, 2014 at 1:57 PM, John Myles White <johnmyl...@ >>> gmail.com> >>> > wrote: >>> >> >>> >> As I said in another thread recently, I am currently the lead >>> maintainer >>> >> of more packages than I can keep up with. I think it’s been useful >>> for me to >>> >> start so many different projects, but I can’t keep maintaining most >>> of my >>> >> packages given my current work schedule. >>> >> >>> >> Without Simon Kornblith, Kevin Squire, Sean Garborg and several others >>> >>> >> doing amazing work to keep DataArrays and DataFrames going, much of >>> our >>> >> basic data infrastructure would have already become completely >>> unusable. But >>> >> even with the great work that’s been done on those package recently, >>> there’s >>> >> still lot of additional design work required. I’d like to free up >>> some of my >>> >> time to do that work. >>> >> >>> >> To keep things moving forward, I’d like to propose a couple of >>> radical New >>> >> Year’s resolutions for the packages I work on. >>> >> >>> >> (1) We need to stop adding functionality and focus entirely on >>> improving >>> >> the quality and documentation of our existing functionality. We have >>> way too >>> >> much prototype code in DataFrames that I can’t keep up with. I’m >>> about to >>> >> make a pull request for DataFrames that will remove everything >>> related to >>> >> column groupings, database-style indexing and Blocks.jl support. I >>> >> absolutely want to see us push all of those ideas forward in the >>> future, but >>> >> they need to happen in unmerged forks or separate packages until we >>> have the >>> >> resources needed to support them. Right now, they make an overwhelming >>> >>> >> maintenance challenge even more onerous. >>> >> >>> >> (2) We can’t support anything other than the master branch of most >>> >> JuliaStats packages except possibly for Distributions. I personally >>> don’t >>> >> have the time to simultaneously keep stuff working with Julia 0.2 and >>> Julia >>> >> 0.3. Moreover, many of our basic packages aren’t mature enough to >>> justify >>> >> supporting older versions. We should do a better job of supporting our >>> >>> >> master releases and not invest precious time trying to support older >>> >> releases. >>> >> >>> >> (3) We need to make more of DataArrays and DataFrames reflect the >>> Julian >>> >> worldview. Lots of our code uses an interface that is incongruous >>> with the >>> >> interfaces found in Base. Even worse, a large chunk of code has >>> >> type-stability problems that makes it very slow, when comparable code >>> that >>> >> uses normal Arrays is 100x faster. We need to develop new idioms and >>> new >>> >> strategies for making code that interacts with type-destabilizing NA’s >>> >>> >> faster. More generally, we need to make DataArrays and DataFrames fit >>> in >>> >> better with Julia when Julia and R disagree. Following R’s lead has >>> often >>> >> lead us astray because R doesn’t share Julia’s strenths or weaknesses. >>> >>> >> >>> >> (4) Going forward, there should be exactly one way to do most things. >>> The >>> >> worst part of our current codebase is that there are multiple ways to >>> >>> >> express the same computation, but (a) some of them are unusably slow >>> and (b) >>> >> some of them don’t ever get tested or maintained properly. This is >>> closely >>> >> linked to the excess proliferation of functionality described in >>> Resolution >>> >> 1 above. We need to start removing stuff from our packages and making >>> the >>> >> parts we keep both reliable and fast. >>> >> >>> >> I think we can push DataArrays and DataFrames to 1.0 status by the >>> end of >>> >> this year. But I think we need to adopt a new approach if we’re going >>> to get >>> >> there. Lots of stuff needs to get deprecated and what remains needs a >>> lot >>> >> more testing, benchmarking and documentation. >>> >> >>> >> — John >>> >> >>> > >> >> >>