Re: D language manipulation of dataframe type structures

Jared Miller Wed, 25 Sep 2013 11:40:51 -0700

I agree with other posters that a D REPL and
interactive/visualization data environment would be very cool,
but unfortunately doesn't exist. Batch computing is more
practical, but REPLs really hook new users. I see statistical
computing as a huge opportunity for D adoption. (R is just
super-ugly and slow, leaving Python + its various native-code
cyborg appendages as the hot new stats environment).


There are tons of ways of accomplishing the same thing in D, but
as far as I know there isn't a "standard" at this point. A
statically typed dataframe is, at minimum, just a range of
structs -- even more minimally, a bare *array* of structs, or
alternatively just a 2-D array in a thin wrapper that provides
access via column labels rather than indexes. You can manipulate
these ranges with functions from std.range and std.algorithm.
Missing or N/A data is a common issue, and can be represented in
a variety of ways, with integers being the most annoying since
there is no built-in NaN value for ints (check out the Nullable
template from std.typecons).

Supporting features like having *both* rows and columns are
accessible via labels rather than indexes requires a little bit
more wrapping. We have a NamedMatrix class at my workplace for
that purpose. It's easy to overload the index operator [] for
access, * for matrix multiplication, etc.

CSV loads can be done with std.csv; unfortunately there's no
corresponding support in that module for *writing* CSV (I've
rolled my own). At my workplace we also have a MysqlConnection
class that provides one-liner loading from a SQL query into
minimalist, range-of-structs dataframes.

Beyond that, it really depends on how you want to manipulate the
dataframes. What specific things do you want to do? If you've got
an idea, I could work up some sample code.

So yes, there are people doing it in The Real World.
Unfortunately my colleagues don't have a nice, tidy,
self-contained DataFrame module to share (yet). But having one
would be a great thing for D. The bigger problem though is
matching the huge 3rd-party stats libraries (like CRAN for R).


On Wednesday, 25 September 2013 at 03:41:36 UTC, Jay Norwood
wrote:

I've been playing with the python pandas app enablesinteractive manipulation of tables of data in their dataframestructure, which they say is similar to the structures used inR.
It appears pandas has laid claim to being a faster version ofR, but is doing so basically limited to what they can exploitfrom moving operations back and forth from underlying cythoncode.
Has anyone written an example app in D that manipulatesdataframe type structures?

Re: D language manipulation of dataframe type structures

Reply via email to