There's a few libraries out there that needs to know whether or not an array changed since the last time it was used: joblib and pymc comes to mind. I believe joblib computes a SHA1 or md5 hash of array contents, while pymc simply assume you never change an array and uses the id().
The pymc approach is fragile, while in my case the joblib approach is too expensive since I'll call the function again many times in a row with the same large array (yes, I can code around it, but the code gets less streamlined). So, would it be possible to very quickly detect whether a NumPy array is guaranteed to not have changed? Here's a revision counter approach: 1) Introduce a new 64-bit int field "modification_count" in the array object struct. 2) modification_count is incremented any time it is possible that an array changes. In particular, PyArray_DATA would increment the counter. 3) A new PyArray_READONLYDATA is introduced that does not increment the counter, which can be used in strategic spots. However, the point is simply to rule out *most* sources of having to recompute a checksum for the array -- a non-matching modification_count is not a guarantee the array has changed, but an unmatched modification_count is a guarantee of an unchanged array 4) The counter can be ignored for readonly (base) arrays. 5a) A method is introduced Python-side, arr.checksum(algorithm="md5"|"sha1"), that uses this machinery to cache checksum computation and that can be plugged into joblib. 5b) Alternatively, the modification count is exposed directly to Python-side, and it is up to users to store the modification count (e.g. in a WeakKeyDictionary indexed by the array's base array). Another solution to the problem would be to allow registering event handlers. Main reason I'm not proposing that is because I don't want to spend the time to implement it (sounds a lot more difficult), it appears to be considerably less backwards-compatible, and so on. Why not a simple dirty flag? Because you'd need one for every possible application of this (e.g, md5 and sha1 would need seperate dirty flags, and other uses than hashing would need yet more flags, and so on). Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion