On Monday, 2 November 2015 at 13:54:09 UTC, Jay Norwood wrote:
I was reading about the Julia dataframe implementation
yesterday, trying to understand their decisions and how D might
implement.
From my notes,
1. they are currently using a dictionary of column vectors.
2. for NA (not available) they are currently using an array of
bytes, effectively as a Boolean flag, rather than a bitVector,
for performance reasons.
3. they are not currently implementing hierarchical headers.
4. they are transforming non-valid symbol header strings (read
from csv, for example) to valid symbols by replacing '.' with
underscore and prefixing numbers with 'x', as examples. This
allows use in expressions.
5. Along with 4., they currently have @with for DataVector, to
allow expressions to use, for example, :symbol_name instead of
dv[:symbol_name].
6. They have operation symbols for per element operations on
two vectors, for example a ./ b expresses applying the
operation to the vector.
7. They currently only have row indexes, no row names or
symbols.
I saw someone posting that they were working on DataFrame
implementation here, but haven't been able to locate any code
in github, and was wondering what implementation decisions are
being made here. Thanks.
What do you think about the use of NaN for missing floats? In
theory I could imagine wanting to distinguish between an NaN in
the source file and a missing value, but in my world I never felt
the need for this. For integers and bools, that is different of
course.