Re: dataframe implementations

Laeeth Isharc via Digitalmars-d-learn Wed, 18 Nov 2015 09:21:22 -0800

On Monday, 2 November 2015 at 13:54:09 UTC, Jay Norwood wrote:

I was reading about the Julia dataframe implementationyesterday, trying to understand their decisions and how D mightimplement.
From my notes,
1. they are currently using a dictionary of column vectors.
2. for NA (not available) they are currently using an array ofbytes, effectively as a Boolean flag, rather than a bitVector,for performance reasons.
3. they are not currently implementing hierarchical headers.
4. they are transforming non-valid symbol header strings (readfrom csv, for example) to valid symbols by replacing '.' withunderscore and prefixing numbers with 'x', as examples. Thisallows use in expressions.5. Along with 4., they currently have @with for DataVector, toallow expressions to use, for example, :symbol_name instead ofdv[:symbol_name].6. They have operation symbols for per element operations ontwo vectors, for example a ./ b expresses applying theoperation to the vector.7. They currently only have row indexes, no row names orsymbols.
I saw someone posting that they were working on DataFrameimplementation here, but haven't been able to locate any codein github, and was wondering what implementation decisions arebeing made here. Thanks.

What do you think about the use of NaN for missing floats? Intheory I could imagine wanting to distinguish between an NaN inthe source file and a missing value, but in my world I never feltthe need for this. For integers and bools, that is different ofcourse.

Re: dataframe implementations

Reply via email to