Maybe I misinterpreted the term "expression-based interface".
On Wed, Jan 22, 2014 at 2:33 PM, John Myles White <johnmyleswh...@gmail.com>wrote: > My impression is that Pandas didn't support anything like delayed > evaluation. Is that wrong? > > I'm aware that the resulting expressions are a lot more verbose. That > definitely sucks. > > I'd love to see strong proposals for how we're going to do a better job of > making code shorter going forward. But too much of our current codebase is > buggy, unable to handle edge cases, slow and undocumented. I think it's > much more important that we have one way of doing things that actually > works as advertised for every Julia user than two ways of doing things, > each of which is slightly broken and performs worse than R and Pandas. > > As I've been saying lately, I'm burning out on maintaing so much Julia > code. If someone else wants to take charge of my projects, I'm ok with > that. But if I'm going to be doing the work going forward, I need to devote > my energies to making a small number of things work really well. Once we > get our core functionality solid, I'll be comfortable getting fancier stuff > working again. > > -- John > > On Jan 22, 2014, at 1:06 PM, Kevin Squire <kevin.squ...@gmail.com> wrote: > > I'm also a fan of the expression-based interface (mostly because I'm used > to similar things in Pandas). I haven't looked at that code, though, so I > can't comment on the complexity. > > Kevin > > > On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson > <blakejohnso...@gmail.com>wrote: > >> Sure, but the resulting expression is *much* more verbose. I just >> noticed that all expression-based indexing was on the chopping block. What >> is left after all this? >> >> I can see how axing these features would make DataFrames.jl easier to >> maintain, but I found the expression stuff to present a rather nice >> interface. >> >> --Blake >> >> >> On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: >> >>> Can you do something like df[“ColA”] = f(df)? >>> >>> — John >>> >>> >>> On Jan 21, 2014, at 8:48 AM, Blake Johnson <blakejo...@gmail.com> wrote: >>> >>> I use within! pretty frequently. What should I be using instead if that >>> is on the chopping block? >>> >>> --Blake >>> >>> On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: >>>> >>>> I also agree with your approach, John. Based on your criteria, here >>>> are some other things to consider for the chopping block. >>>> >>>> - expression-based indexing >>>> - NamedArray (you already have an issue on this) >>>> - with, within, based_on and variants >>>> - @transform, @DataFrame >>>> - select, filter >>>> - DataStream >>>> >>>> Many of these were attempts to ease syntax via delayed evaluation. We >>>> can either do without or try to implement something like LINQ. >>>> >>>> >>>> >>>> On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire <kevin....@gmail.com> >>>> wrote: >>>> > Hi John, >>>> > >>>> > I agree with pretty much everything you have written here, and really >>>> >>>> > appreciate that you've taken the lead in cleaning things up and >>>> getting us >>>> > on track. >>>> > >>>> > Cheers! >>>> > Kevin >>>> > >>>> > >>>> > On Mon, Jan 20, 2014 at 1:57 PM, John Myles White <johnmyl...@ >>>> gmail.com> >>>> > wrote: >>>> >> >>>> >> As I said in another thread recently, I am currently the lead >>>> maintainer >>>> >> of more packages than I can keep up with. I think it’s been useful >>>> for me to >>>> >> start so many different projects, but I can’t keep maintaining most >>>> of my >>>> >> packages given my current work schedule. >>>> >> >>>> >> Without Simon Kornblith, Kevin Squire, Sean Garborg and several >>>> others >>>> >> doing amazing work to keep DataArrays and DataFrames going, much of >>>> our >>>> >> basic data infrastructure would have already become completely >>>> unusable. But >>>> >> even with the great work that’s been done on those package recently, >>>> there’s >>>> >> still lot of additional design work required. I’d like to free up >>>> some of my >>>> >> time to do that work. >>>> >> >>>> >> To keep things moving forward, I’d like to propose a couple of >>>> radical New >>>> >> Year’s resolutions for the packages I work on. >>>> >> >>>> >> (1) We need to stop adding functionality and focus entirely on >>>> improving >>>> >> the quality and documentation of our existing functionality. We have >>>> way too >>>> >> much prototype code in DataFrames that I can’t keep up with. I’m >>>> about to >>>> >> make a pull request for DataFrames that will remove everything >>>> related to >>>> >> column groupings, database-style indexing and Blocks.jl support. I >>>> >> absolutely want to see us push all of those ideas forward in the >>>> future, but >>>> >> they need to happen in unmerged forks or separate packages until we >>>> have the >>>> >> resources needed to support them. Right now, they make an >>>> overwhelming >>>> >> maintenance challenge even more onerous. >>>> >> >>>> >> (2) We can’t support anything other than the master branch of most >>>> >> JuliaStats packages except possibly for Distributions. I personally >>>> don’t >>>> >> have the time to simultaneously keep stuff working with Julia 0.2 >>>> and Julia >>>> >> 0.3. Moreover, many of our basic packages aren’t mature enough to >>>> justify >>>> >> supporting older versions. We should do a better job of supporting >>>> our >>>> >> master releases and not invest precious time trying to support older >>>> >>>> >> releases. >>>> >> >>>> >> (3) We need to make more of DataArrays and DataFrames reflect the >>>> Julian >>>> >> worldview. Lots of our code uses an interface that is incongruous >>>> with the >>>> >> interfaces found in Base. Even worse, a large chunk of code has >>>> >> type-stability problems that makes it very slow, when comparable >>>> code that >>>> >> uses normal Arrays is 100x faster. We need to develop new idioms and >>>> new >>>> >> strategies for making code that interacts with type-destabilizing >>>> NA’s >>>> >> faster. More generally, we need to make DataArrays and DataFrames >>>> fit in >>>> >> better with Julia when Julia and R disagree. Following R’s lead has >>>> often >>>> >> lead us astray because R doesn’t share Julia’s strenths or >>>> weaknesses. >>>> >> >>>> >> (4) Going forward, there should be exactly one way to do most >>>> things. The >>>> >> worst part of our current codebase is that there are multiple ways to >>>> >>>> >> express the same computation, but (a) some of them are unusably slow >>>> and (b) >>>> >> some of them don’t ever get tested or maintained properly. This is >>>> closely >>>> >> linked to the excess proliferation of functionality described in >>>> Resolution >>>> >> 1 above. We need to start removing stuff from our packages and >>>> making the >>>> >> parts we keep both reliable and fast. >>>> >> >>>> >> I think we can push DataArrays and DataFrames to 1.0 status by the >>>> end of >>>> >> this year. But I think we need to adopt a new approach if we’re >>>> going to get >>>> >> there. Lots of stuff needs to get deprecated and what remains needs >>>> a lot >>>> >> more testing, benchmarking and documentation. >>>> >> >>>> >> — John >>>> >> >>>> > >>> >>> >>> > >