Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

Blake Johnson Tue, 21 Jan 2014 08:57:29 -0800

I use within! pretty frequently. What should I be using instead if that is 
on the chopping block?


--Blake

On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote:
>
> I also agree with your approach, John. Based on your criteria, here 
> are some other things to consider for the chopping block. 
>
> - expression-based indexing 
> - NamedArray (you already have an issue on this) 
> - with, within, based_on and variants 
> - @transform, @DataFrame 
> - select, filter 
> - DataStream 
>
> Many of these were attempts to ease syntax via delayed evaluation. We 
> can either do without or try to implement something like LINQ. 
>
>
>
> On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire 
> <kevin....@gmail.com<javascript:>> 
> wrote: 
> > Hi John, 
> > 
> > I agree with pretty much everything you have written here, and really 
> > appreciate that you've taken the lead in cleaning things up and getting 
> us 
> > on track. 
> > 
> > Cheers! 
> >    Kevin 
> > 
> > 
> > On Mon, Jan 20, 2014 at 1:57 PM, John Myles White 
> > <johnmyl...@gmail.com<javascript:>> 
>
> > wrote: 
> >> 
> >> As I said in another thread recently, I am currently the lead 
> maintainer 
> >> of more packages than I can keep up with. I think it’s been useful for 
> me to 
> >> start so many different projects, but I can’t keep maintaining most of 
> my 
> >> packages given my current work schedule. 
> >> 
> >> Without Simon Kornblith, Kevin Squire, Sean Garborg and several others 
> >> doing amazing work to keep DataArrays and DataFrames going, much of our 
> >> basic data infrastructure would have already become completely 
> unusable. But 
> >> even with the great work that’s been done on those package recently, 
> there’s 
> >> still lot of additional design work required. I’d like to free up some 
> of my 
> >> time to do that work. 
> >> 
> >> To keep things moving forward, I’d like to propose a couple of radical 
> New 
> >> Year’s resolutions for the packages I work on. 
> >> 
> >> (1) We need to stop adding functionality and focus entirely on 
> improving 
> >> the quality and documentation of our existing functionality. We have 
> way too 
> >> much prototype code in DataFrames that I can’t keep up with. I’m about 
> to 
> >> make a pull request for DataFrames that will remove everything related 
> to 
> >> column groupings, database-style indexing and Blocks.jl support. I 
> >> absolutely want to see us push all of those ideas forward in the 
> future, but 
> >> they need to happen in unmerged forks or separate packages until we 
> have the 
> >> resources needed to support them. Right now, they make an overwhelming 
> >> maintenance challenge even more onerous. 
> >> 
> >> (2) We can’t support anything other than the master branch of most 
> >> JuliaStats packages except possibly for Distributions. I personally 
> don’t 
> >> have the time to simultaneously keep stuff working with Julia 0.2 and 
> Julia 
> >> 0.3. Moreover, many of our basic packages aren’t mature enough to 
> justify 
> >> supporting older versions. We should do a better job of supporting our 
> >> master releases and not invest precious time trying to support older 
> >> releases. 
> >> 
> >> (3) We need to make more of DataArrays and DataFrames reflect the 
> Julian 
> >> worldview. Lots of our code uses an interface that is incongruous with 
> the 
> >> interfaces found in Base. Even worse, a large chunk of code has 
> >> type-stability problems that makes it very slow, when comparable code 
> that 
> >> uses normal Arrays is 100x faster. We need to develop new idioms and 
> new 
> >> strategies for making code that interacts with type-destabilizing NA’s 
> >> faster. More generally, we need to make DataArrays and DataFrames fit 
> in 
> >> better with Julia when Julia and R disagree. Following R’s lead has 
> often 
> >> lead us astray because R doesn’t share Julia’s strenths or weaknesses. 
> >> 
> >> (4) Going forward, there should be exactly one way to do most things. 
> The 
> >> worst part of our current codebase is that there are multiple ways to 
> >> express the same computation, but (a) some of them are unusably slow 
> and (b) 
> >> some of them don’t ever get tested or maintained properly. This is 
> closely 
> >> linked to the excess proliferation of functionality described in 
> Resolution 
> >> 1 above. We need to start removing stuff from our packages and making 
> the 
> >> parts we keep both reliable and fast. 
> >> 
> >> I think we can push DataArrays and DataFrames to 1.0 status by the end 
> of 
> >> this year. But I think we need to adopt a new approach if we’re going 
> to get 
> >> there. Lots of stuff needs to get deprecated and what remains needs a 
> lot 
> >> more testing, benchmarking and documentation. 
> >> 
> >>  — John 
> >> 
> > 
>

Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

Reply via email to