+1 on NaNs being an interop nightmare already, especially for those who work 
with multiple coding languages at the same time.

Issues regarding NaNs may be found at 
https://issues.apache.org/jira/browse/ARROW-2806?jql=text%20~%20%22NaN%22 
<https://issues.apache.org/jira/browse/ARROW-2806?jql=text%20~%20%22NaN%22>. 
The last issue I see was from July 2018, with Python, and marked resolved 17 
July 2018. The description may be helpful.

Regards,

Donald E. Foss | @DonaldFoss <https://twitter.com/DonaldFoss>
Never Stop Learning!
------ __o
----_`\<,_
---(_)/ (_)

> On Dec 10, 2018, at 10:47 AM, Rhys Ulerich <rhys.uler...@twosigma.com> wrote:
> 
> 'Morning,
> 
> 
> 
> Regarding https://arrow.apache.org/docs/memory_layout.html, how should 
> is_valid be interpreted for primitive types that have their own notions of 
> is_valid?
> 
> 
> 
> Concretely, how should folks interpret a "valid NaN" (is_valid 1 with float 
> NaN) versus an "invalid NaN" (is valid 0 with float NaN)?  In RFC-ese, MUST 
> individual NaNs be valid?  Or, MUST floats all be valid by omitting the 
> validity bitset?
> 
> 
> 
> I ask because otherwise I can see a bunch of different systems interpreting 
> this detail in many different ways.  That'd be an interop nightmare.  
> Especially since understanding why NaNs sneak into large datasets is already 
> quite a hassle.
> 
> 
> 
> Anyhow, it seems worth addressing this gap at the written specification level.
> 
> 
> 
> (Apologies if this has been discussed previously-- I've found no searchable 
> mailing list archives under 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/ or 
> https://cwiki.apache.org/confluence/display/ARROW.)
> 
> 
> 
> Thanks,
> 
> Rhys

Reply via email to