On Fri, Mar 11, 2011 at 11:41 AM, Dag Sverre Seljebotn < d.s.seljeb...@astro.uio.no> wrote:
> There's a few libraries out there that needs to know whether or not an > array changed since the last time it was used: joblib and pymc comes to > mind. I believe joblib computes a SHA1 or md5 hash of array contents, > while pymc simply assume you never change an array and uses the id(). > > The pymc approach is fragile, while in my case the joblib approach is > too expensive since I'll call the function again many times in a row > with the same large array (yes, I can code around it, but the code gets > less streamlined). > > So, would it be possible to very quickly detect whether a NumPy array is > guaranteed to not have changed? Here's a revision counter approach: > > 1) Introduce a new 64-bit int field "modification_count" in the array > object struct. > > 2) modification_count is incremented any time it is possible that an > array changes. In particular, PyArray_DATA would increment the counter. > > 3) A new PyArray_READONLYDATA is introduced that does not increment > the counter, which can be used in strategic spots. However, the point is > simply to rule out *most* sources of having to recompute a checksum for > the array -- a non-matching modification_count is not a guarantee the > array has changed, but an unmatched modification_count is a guarantee of > an unchanged array > > 4) The counter can be ignored for readonly (base) arrays. > > 5a) A method is introduced Python-side, > arr.checksum(algorithm="md5"|"sha1"), that uses this machinery to cache > checksum computation and that can be plugged into joblib. > > 5b) Alternatively, the modification count is exposed directly to > Python-side, and it is up to users to store the modification count (e.g. > in a WeakKeyDictionary indexed by the array's base array). > > Another solution to the problem would be to allow registering event > handlers. Main reason I'm not proposing that is because I don't want to > spend the time to implement it (sounds a lot more difficult), it appears > to be considerably less backwards-compatible, and so on. > > Why not a simple dirty flag? Because you'd need one for every possible > application of this (e.g, md5 and sha1 would need seperate dirty flags, > and other uses than hashing would need yet more flags, and so on). > > What about views? Wouldn't it be easier to write another object wrapping an ndarray? Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion