On Thu, Mar 8, 2018 at 9:45 PM, Stephan Hoyer <sho...@gmail.com> wrote: > On Thu, Mar 8, 2018 at 5:54 PM Juan Nunez-Iglesias <jni.s...@gmail.com> > wrote: >> >> On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote: >> >> Marten's case 1: works exactly like ndarray, but stores data differently: >> parallel arrays (e.g., dask.array), sparse arrays (e.g., >> https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g., >> always C ordered). >> >> >> Two other "hypotheticals" that would fit nicely in this space: >> - the Open Connectome folks (https://neurodata.io) proposed linearising >> indices using space-filling curves, which minimizes cache misses (or IO >> reads) for giant volumes. I believe they implemented this but can't find it >> currently. >> - the N5 format for chunked arrays on disk: >> https://github.com/saalfeldlab/n5 > > > I think these fall into another important category of duck arrays. > "Indexable" arrays the serve as storage, but that don't support computation. > These sorts of arrays typically support operations like indexing and define > handful of array-like properties (e.g., dtype and shape), but not > arithmetic, reductions or reshaping. > > This means you can't quite use them as a drop-in replacement for NumPy > arrays in all cases, but that's OK. In contrast, both dask.array and sparse > do aspire to do fill out nearly the full numpy.ndarray API.
I'm not sure if these particular formats fall into that category or not (isn't the point of the space-filling curves to support cache-efficient computation?). But I suppose you're also thinking of things like h5py.Dataset? My impression is that these are mostly handled pretty well already by defining __array__ and/or providing array operations that implicitly convert to ndarray -- do you agree? This does raise an interesting point: maybe we'll eventually want an __abstract_array__ method that asabstractarray tries calling if defined, so e.g. if your object isn't itself an array but can be efficiently converted into a *sparse* array, you have a way to declare that? I think this is something to file under "worry about later, after we have the basic infrastructure", but it's not something I'd thought of before so mentioning here. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion