'Morning,


Regarding https://arrow.apache.org/docs/memory_layout.html, how should is_valid 
be interpreted for primitive types that have their own notions of is_valid?



Concretely, how should folks interpret a "valid NaN" (is_valid 1 with float 
NaN) versus an "invalid NaN" (is valid 0 with float NaN)?  In RFC-ese, MUST 
individual NaNs be valid?  Or, MUST floats all be valid by omitting the 
validity bitset?



I ask because otherwise I can see a bunch of different systems interpreting 
this detail in many different ways.  That'd be an interop nightmare.  
Especially since understanding why NaNs sneak into large datasets is already 
quite a hassle.



Anyhow, it seems worth addressing this gap at the written specification level.



(Apologies if this has been discussed previously-- I've found no searchable 
mailing list archives under http://mail-archives.apache.org/mod_mbox/arrow-dev/ 
or https://cwiki.apache.org/confluence/display/ARROW.)



Thanks,

Rhys

Reply via email to