Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

Tom Short Tue, 21 Jan 2014 04:43:07 -0800

I also agree with your approach, John. Based on your criteria, here
are some other things to consider for the chopping block.


- expression-based indexing
- NamedArray (you already have an issue on this)
- with, within, based_on and variants
- @transform, @DataFrame
- select, filter
- DataStream

Many of these were attempts to ease syntax via delayed evaluation. We
can either do without or try to implement something like LINQ.



On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire <kevin.squ...@gmail.com> wrote:
> Hi John,
>
> I agree with pretty much everything you have written here, and really
> appreciate that you've taken the lead in cleaning things up and getting us
> on track.
>
> Cheers!
>    Kevin
>
>
> On Mon, Jan 20, 2014 at 1:57 PM, John Myles White <johnmyleswh...@gmail.com>
> wrote:
>>
>> As I said in another thread recently, I am currently the lead maintainer
>> of more packages than I can keep up with. I think it’s been useful for me to
>> start so many different projects, but I can’t keep maintaining most of my
>> packages given my current work schedule.
>>
>> Without Simon Kornblith, Kevin Squire, Sean Garborg and several others
>> doing amazing work to keep DataArrays and DataFrames going, much of our
>> basic data infrastructure would have already become completely unusable. But
>> even with the great work that’s been done on those package recently, there’s
>> still lot of additional design work required. I’d like to free up some of my
>> time to do that work.
>>
>> To keep things moving forward, I’d like to propose a couple of radical New
>> Year’s resolutions for the packages I work on.
>>
>> (1) We need to stop adding functionality and focus entirely on improving
>> the quality and documentation of our existing functionality. We have way too
>> much prototype code in DataFrames that I can’t keep up with. I’m about to
>> make a pull request for DataFrames that will remove everything related to
>> column groupings, database-style indexing and Blocks.jl support. I
>> absolutely want to see us push all of those ideas forward in the future, but
>> they need to happen in unmerged forks or separate packages until we have the
>> resources needed to support them. Right now, they make an overwhelming
>> maintenance challenge even more onerous.
>>
>> (2) We can’t support anything other than the master branch of most
>> JuliaStats packages except possibly for Distributions. I personally don’t
>> have the time to simultaneously keep stuff working with Julia 0.2 and Julia
>> 0.3. Moreover, many of our basic packages aren’t mature enough to justify
>> supporting older versions. We should do a better job of supporting our
>> master releases and not invest precious time trying to support older
>> releases.
>>
>> (3) We need to make more of DataArrays and DataFrames reflect the Julian
>> worldview. Lots of our code uses an interface that is incongruous with the
>> interfaces found in Base. Even worse, a large chunk of code has
>> type-stability problems that makes it very slow, when comparable code that
>> uses normal Arrays is 100x faster. We need to develop new idioms and new
>> strategies for making code that interacts with type-destabilizing NA’s
>> faster. More generally, we need to make DataArrays and DataFrames fit in
>> better with Julia when Julia and R disagree. Following R’s lead has often
>> lead us astray because R doesn’t share Julia’s strenths or weaknesses.
>>
>> (4) Going forward, there should be exactly one way to do most things. The
>> worst part of our current codebase is that there are multiple ways to
>> express the same computation, but (a) some of them are unusably slow and (b)
>> some of them don’t ever get tested or maintained properly. This is closely
>> linked to the excess proliferation of functionality described in Resolution
>> 1 above. We need to start removing stuff from our packages and making the
>> parts we keep both reliable and fast.
>>
>> I think we can push DataArrays and DataFrames to 1.0 status by the end of
>> this year. But I think we need to adopt a new approach if we’re going to get
>> there. Lots of stuff needs to get deprecated and what remains needs a lot
>> more testing, benchmarking and documentation.
>>
>>  — John
>>
>

Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

Reply via email to