On Tuesday, 17 November 2015 at 13:56:14 UTC, Jay Norwood wrote:
I looked through the dataframe code and a couple of comments...

I had thought perhaps an app could read in the header info and type info from hdf5, and generate D struct definitions with column headers as symbol names. That would enable faster processing than with the associative arrays, as well as support the auto-completion that would be helpful in writing expressions.

Yes - I think that one will want to have a choice between this kind of approach and using associative arrays. Because for some purposes it's not convenient to have to compile code every time you open a strange file, and on the other hand the hit with an AA sometimes will matter.

The situation at the moment for me is that I have very little time to work on a correct general solution for this problem myself (yet its important for D that we do get to one). I also lack the experience with D to do it very well very quickly. I do have a couple of seasoned people from the community helping me with things, but dataframes won't be the first thing they look at, and it could be a while before we get to that. If we implement for our own needs,then I will open source it as it is commercially sensible as well as the right thing to do. But that could be a year away.

Vlad Levenfeld was also looking at this a bit.


The csv type info for columns could be inferred, or else stated in the reader call, as done as an option in julia.

In both cases the column names would have to be valid symbol names for this to work. I believe Julia also expects this, or else does some conversion on your column names to make them valid symbols. I think the D csv processing would also need to check if the

The jupyter interactive environment supports python pandas and Julia dataframe column names in the autocompletion, and so I think the D debugging environment would need to provide similar capability if it is to be considered as a fast-recompile substitute for interactive dataframe exploration.

Well we don't need to get there in a single bound - already just being able to do this at all is a big improvement, and I am already using D with jupyter to do things.

It seems to me that your particular examples of stock data would eventually need to handle missing data, as supported in Julia dataframes and python pandas. They both provide ways to drop or fill missing values. Did you want to support that?
Yes - we should do so eventually, and there's much more that could be done. But maybe a sensible basic implementation is a start and we can refine after that.

I wrote the dataframe in a couple of evenings, so I am sure it can be improved, and even rearchitected. Pull requests welcomed, and maybe we should set up a Trello to organise ideas ? Let me know if you are in.

Reply via email to