On Fri, Oct 19, 2018 at 7:00 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > > On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser <wieser.eric+nu...@gmail.com> > wrote: >> >> Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if >> they cause problems perhaps that should be seen as a sign that ndarray >> subclassing should be made easier and clearer. >> >> Both maskedarray and quantity seem like something that would make more >> sense at the dtype level if our dtype system was easier to extend. It might >> be good to compile a list of subclassing applications, and split them into >> “this ought to be a dtype” and “this ought to be a different type of >> container”. > > Wes Mckinney has been benchmarking masks vs sentinel values for arrow: > http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are > faster. I'm not convinced dtypes are the way to go.
We need to add better support for both user-defined dtypes and for user-defined containers in any case. So we're going to support both missing value strategies regardless, and people will be able to choose based on engineering trade-offs. A missing value dtype is going to integrate much more easily into the rest of numpy than a new container where you have to reimplement indexing etc., but maybe custom containers can be faster. Okay, cool, they're both on PyPI, pick your favorite! Trying to wedge masks into *ndarray* seems like a non-starter, though, because it would require auditing and updating basically all code using the numpy C API. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion