On Wednesday, 18 November 2015 at 17:15:38 UTC, Laeeth Isharc
wrote:
What do you think about the use of NaN for missing floats? In
theory I could imagine wanting to distinguish between an NaN in
the source file and a missing value, but in my world I never
felt the need for this. For integers and bools, that is
different of course.
The julia discussions mention another dataframe implementation, I
believe it was for R, where NaN was used. There was some mention
of the virtues of their own choice and the problems with NaN. I
think use of NaN was a particular encoding of NaN. Other
implementations they mentioned used some reserved value in each
of the numeric data types to represent NA. In the julia case, I
believe what they use is a separate byte vector for each column
that holds the NA status. They discussed some other possible
enhancements, but I don't know what they implemented. For
example, if the single byte holds the NA flag, the cell value can
hold additional info ... maybe the reason for the NA. There was
also some discussion of having the associated cell hold repeat
counts for the NA status, which I suppose meant to repeat it for
following cells in the column vector. I'll try to find the
discussions and post the link.