Re: [julia-users] 0.4 Roadmap for DataFrames, DataArrays, etc...

2014-09-11 Thread John Myles White
Viral,

Can you give specific examples where NA caused troubles for you? Were they 
performance problems or something else?

If we get multi-theading really solid halfway to 0.4, we can probably use it in 
some of the NullableArrays code to speed up operations on vectors.

 -- John

On Sep 11, 2014, at 5:05 AM, Viral Shah vi...@mayin.org wrote:

 The state of NA has always been where I stop using DataFrames - and I think 
 this roadmap is perfect to coincide with the 0.4 release.
 
 What are your thoughts on multi-threading, should we be able to land that in 
 0.4? Perhaps we can speed up some easily parallelizable operations.
 
 -viral
 
 On Sunday, September 7, 2014 11:47:44 AM UTC+5:30, John Myles White wrote:
 Yeah, that’s a way more ambitious project. That’ll take at least a year to 
 make any progress at all. Before I could even begin, I need to finish DBI and 
 then build up something SQLAlchemy for Julia.
 
 Thankfully, the 0.4 changes should put DataFrames in a good state that we can 
 depend on for some time into the future.
 
  — John
 
 On Sep 6, 2014, at 11:15 PM, Iain Dunning iaindunn...@gmail.com wrote:
 
 I saw on some list/issue you were thinking of working on a more fresh 
 approach to the whole data storage situation - is that post 0.4?
 
 On Saturday, September 6, 2014 10:30:04 PM UTC-4, John Myles White wrote:
 I am hoping that the 0.4 release of Julia will coincide with a major cleanup 
 of the Data* world. I wrote up a very high level overview of my goals here: 
 https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e 
 
 There’s still more work to do to flesh out these ideas, but the basic 
 principles are pretty close to finalized. There’s also a rough draft of much 
 of the core functionality we’ll need to add to support this roadmap. 
 
 I wanted to give everyone a heads up so that people understand where the 
 Data* packages are headed. The big idea is that we’ll be pushing more work 
 out into the type system, which will give substantial performance 
 improvements. 
 
  — John 
 
 



Re: [julia-users] 0.4 Roadmap for DataFrames, DataArrays, etc...

2014-09-11 Thread Viral Shah
I don't have the codes handy, as they would usually involve trying to play
around with data in csv files. I have run into performance problems for the
most part. However, I have at times also run into cases where I felt things
were not as expressive as I would have liked. These cases are probably
because of my lack of understanding - but I will make sure I collect them
and ask the next time.

On Thu, Sep 11, 2014 at 11:43 AM, John Myles White johnmyleswh...@gmail.com
 wrote:

 Viral,

 Can you give specific examples where NA caused troubles for you? Were they
 performance problems or something else?

 If we get multi-theading really solid halfway to 0.4, we can probably use
 it in some of the NullableArrays code to speed up operations on vectors.

  -- John


 On Sep 11, 2014, at 5:05 AM, Viral Shah vi...@mayin.org wrote:

 The state of NA has always been where I stop using DataFrames - and I
 think this roadmap is perfect to coincide with the 0.4 release.

 What are your thoughts on multi-threading, should we be able to land that
 in 0.4? Perhaps we can speed up some easily parallelizable operations.

 -viral

 On Sunday, September 7, 2014 11:47:44 AM UTC+5:30, John Myles White wrote:

 Yeah, that’s a way more ambitious project. That’ll take at least a year
 to make any progress at all. Before I could even begin, I need to finish
 DBI and then build up something SQLAlchemy for Julia.

 Thankfully, the 0.4 changes should put DataFrames in a good state that we
 can depend on for some time into the future.

  — John

 On Sep 6, 2014, at 11:15 PM, Iain Dunning iaindunn...@gmail.com wrote:

 I saw on some list/issue you were thinking of working on a more fresh
 approach to the whole data storage situation - is that post 0.4?

 On Saturday, September 6, 2014 10:30:04 PM UTC-4, John Myles White wrote:

 I am hoping that the 0.4 release of Julia will coincide with a major
 cleanup of the Data* world. I wrote up a very high level overview of my
 goals here: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e

 There’s still more work to do to flesh out these ideas, but the basic
 principles are pretty close to finalized. There’s also a rough draft of
 much of the core functionality we’ll need to add to support this roadmap.

 I wanted to give everyone a heads up so that people understand where the
 Data* packages are headed. The big idea is that we’ll be pushing more work
 out into the type system, which will give substantial performance
 improvements.

  — John






-- 
-viral


[julia-users] 0.4 Roadmap for DataFrames, DataArrays, etc...

2014-09-06 Thread John Myles White
I am hoping that the 0.4 release of Julia will coincide with a major cleanup of 
the Data* world. I wrote up a very high level overview of my goals here: 
https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e

There’s still more work to do to flesh out these ideas, but the basic 
principles are pretty close to finalized. There’s also a rough draft of much of 
the core functionality we’ll need to add to support this roadmap. 

I wanted to give everyone a heads up so that people understand where the Data* 
packages are headed. The big idea is that we’ll be pushing more work out into 
the type system, which will give substantial performance improvements.

 — John