On Fri, Mar 1, 2013 at 7:56 AM, Sebastian Berg <sebast...@sipsolutions.net> wrote: > On Thu, 2013-02-28 at 23:25 -0800, Robert Bradshaw wrote: >> On Thu, Feb 28, 2013 at 11:12 AM, Nathaniel Smith <n...@pobox.com> wrote: >> > On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw <rober...@gmail.com> >> > wrote: >> >> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg >> >> <sebast...@sipsolutions.net> wrote: >> >>> Hey, >> >>> >> >>> Maybe someone here already saw it (I don't have a track account, or I >> >>> would just create a ticket), but it would be nice if Cython was more >> >>> forgiving about contiguous requirements on strides. In the future this >> >>> would make it easier for numpy to go forward with changing the >> >>> contiguous flags to be more reasonable for its purpose, and second also >> >>> to allow old (and maybe for the moment remaining) corner cases in numpy >> >>> to slip past (as well as possibly the same for other programs...). An >> >>> example is (see also https://github.com/numpy/numpy/issues/2956 and the >> >>> PR linked there for more details): >> >>> >> >>> def add_one(array): >> >>> cdef double[::1] a = array >> >>> a[0] += 1. >> >>> return array >> >>> >> >>> giving: >> >>> >> >>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) >> >>> ValueError: Buffer and memoryview are not contiguous in the same >> >>> dimension. >> >>> >> >>> This could easily be changed if MemoryViews check the strides as "can be >> >>> interpreted as contiguous". That means that if shape[i] == 1, then >> >>> strides[i] are arbitrary (you can just change them if you like). This is >> >>> also the case for 0-sized arrays, which are arguably always contiguous, >> >>> no matter their strides are! >> >> >> >> I was under the impression that the primary value for contiguous is >> >> that it a foo[::1] can be interpreted as a foo*. Letting strides be >> >> arbitrary completely breaks this, right? >> > >> > Nope. The natural definition of "C contiguous" is "the array entries >> > are arranged in memory in the same way they would be if they were a >> > multidimensional C array" (i.e., what you said.) But it turns out that >> > this is *not* the definition that numpy and cython use! >> > >> > The issue is that the above definition is a constraint on the actual >> > locations of items in memory, i.e., given a shape, it tells you that >> > for every index, >> > (a) sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * >> > itemsize) >> > Obviously this equality holds if >> > (b) strides == cumprod(shape[::-1])[::-1] * itemsize >> > (Or for F-contiguity, we have >> > (b') strides == cumprod(shape) * itemsize >> > ) >> > >> > (a) is the natural definition of "C contiguous". (b) is the definition >> > of "C contiguous" used by numpy and cython. (b) implies (a). But (a) >> > does not imply (b), i.e., there are arrays that are C-contiguous which >> > numpy and cython think are discontiguous. (Also in numpy there are >> > some weird cases where numpy accidentally uses the correct definition, >> > I think, which is the point of Sebastian's example.) >> > >> > In particular, if shape[i] == 1, then the value of stride[i] really >> > should be irrelevant to judging contiguity, because the only thing you >> > can do with strides[i] is multiply it by index[i], and if shape[i] == >> > 1 then index[i] is always 0. So an array of int8's with shape = (10, >> > 1), strides = (1, 73) is contiguous according to (a), but not >> > according to (b). Also if shape[i] is 0 for any i, then the entire >> > contents of the strides array becomes irrelevant to judging >> > contiguity; all zero-sized arrays are contiguous according to (a), but >> > not (b). >> >> Thanks for clarifying. >> >> Yes, I think it makes a lot of sense to loosen our definition for >> Cython. Internally, I think the only way we use this assumption is in >> not requiring that the first/final index be multiplied by the stride, >> which should be totally fine. But this merits closer inspection as >> there may be something else. > > The only problem I saw was code that used strides[-1] instead of the > itemsize (e.g. using strides[i]/strides[-1] to then index the typed > buffer instead of using strides[i]/itemsize). But that should be easy to > check, numpy had two or so cases of that itself...
I'd be surprised if we do that, but the only way to be sure would be to look at the code. >> > (This is really annoying for numpy because given, say, a column vector >> > with shape (n, 1), it is impossible to be both C- and F-contiguous >> > according to the (b)-style definition. But people expect expect >> > various operations to preserve C versus F contiguity, so there are >> > heuristics in numpy that try to guess whether various result arrays >> > should pretend to be C- or F-contiguous, and we don't even have a >> > consistent idea of what it would mean for this code to be working >> > correctly, never mind test it and keep it working. OTOH if we just fix >> > numpy to use the (a) definition, then it turns out a bunch of >> > third-party code breaks, like, for example, cython.) >> >> Can you give some examples? >> > > Not sure for what :). I meant examples of possible breakage. > Maybe this is an example: > > In [1]: a = np.asmatrix(np.arange(9).reshape(3,3).T) > > In [2]: a.flags.f_contiguous > Out[2]: True > > In [3]: a[:,0].flags > Out[3]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > ... > > Where that view could just as well be F-contiguous, and the fact that > numpy, when in doubt, prefers C-contiguous might be surprising. And > since it would be less strict to begin with, numpy may safe a copy here > or there (without adding weird stride fixing code). > > Examples for code breakage would be this check as well as scikit-learn > and scipy in 3 or 4 cases making the assumption above of itemsize == > strides[-1] for c-contiguous arrays. Ah. So, assuming Cython itself isn't making such assumptions, what support do you want from Cython? I can see (1) accepting as c/f contiguous arrays that meet this looser definition and (2) setting these flags in memoryviews we produce under this looser definition. Is there anything else? - Robert _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel