Re: dataframe implementations

Laeeth Isharc via Digitalmars-d-learn Wed, 18 Nov 2015 09:11:43 -0800

On Tuesday, 17 November 2015 at 13:56:14 UTC, Jay Norwood wrote:

I looked through the dataframe code and a couple of comments...
I had thought perhaps an app could read in the header info andtype info from hdf5, and generate D struct definitions withcolumn headers as symbol names. That would enable fasterprocessing than with the associative arrays, as well as supportthe auto-completion that would be helpful in writingexpressions.

Yes - I think that one will want to have a choice between thiskind of approach and using associative arrays. Because for somepurposes it's not convenient to have to compile code every timeyou open a strange file, and on the other hand the hit with an AAsometimes will matter.

The situation at the moment for me is that I have very littletime to work on a correct general solution for this problemmyself (yet its important for D that we do get to one). I alsolack the experience with D to do it very well very quickly. I dohave a couple of seasoned people from the community helping mewith things, but dataframes won't be the first thing they lookat, and it could be a while before we get to that. If weimplement for our own needs,then I will open source it as it iscommercially sensible as well as the right thing to do. But thatcould be a year away.


Vlad Levenfeld was also looking at this a bit.

The csv type info for columns could be inferred, or else statedin the reader call, as done as an option in julia.
In both cases the column names would have to be valid symbolnames for this to work. I believe Julia also expects this, orelse does some conversion on your column names to make themvalid symbols. I think the D csv processing would also need tocheck if the
The jupyter interactive environment supports python pandas andJulia dataframe column names in the autocompletion, and so Ithink the D debugging environment would need to provide similarcapability if it is to be considered as a fast-recompilesubstitute for interactive dataframe exploration.

Well we don't need to get there in a single bound - already justbeing able to do this at all is a big improvement, and I amalready using D with jupyter to do things.

It seems to me that your particular examples of stock datawould eventually need to handle missing data, as supported inJulia dataframes and python pandas. They both provide ways todrop or fill missing values. Did you want to support that?

Yes - we should do so eventually, and there's much more thatcould be done. But maybe a sensible basic implementation is astart and we can refine after that.

I wrote the dataframe in a couple of evenings, so I am sure itcan be improved, and even rearchitected. Pull requests welcomed,and maybe we should set up a Trello to organise ideas ? Let meknow if you are in.

Re: dataframe implementations

Reply via email to