Chris Angelico <ros...@gmail.com> writes: > But you also don't know that he hasn't. NaN doesn't mean "unknown", it > means "Not a Number". You need a more sophisticated system that allows > for uncertainty in your data.
Regardless of whether this is the right design, it's still an example of use. As to the design, using NaN to implement NA is a hack with a long history, see http://www.numpy.org/NA-overview.html for some color. Using NaN gets us a hardware-accelerated implementation with just about the right semantics. In a real example, these lists are numpy arrays with tens of millions of elements, so this isn't a trivial benefit. (Technically, that's what's in the database; a given analysis may look at a sample of 100k or so.) > You have a special business case here (the need to > record information with a "maybe" state), and you need to cope with > it, which means dedicated logic and planning and design and code. Yes, in principle. In practice, everyone is used to the semantics of R-style missing data, which are reasonably well-matched by nan. In principle, (NA == 1.0) should be a NA (missing) truth value, as should (NA == NA), but in practice having it be False is more useful. As an example, indexing R vectors by a boolean vector containing NA yields NA results, which is a feature that I never want. Cheers, Johann -- https://mail.python.org/mailman/listinfo/python-list