Re: [Python-Dev] Hashable memoryviews
Antoine Pitrou wrote: > Only if the original object is itself mutable, otherwise the memoryview > is read-only. > > I would propose the following algorithm: > 1) try to calculate the original object's hash; if it fails, consider >the memoryview unhashable (the buffer is probably mutable) With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects: >>> b1 = bytes([1,2,3,4]) >>> b2 = bytes([4,3,2,1]) >>> m1 = memoryview(b1) >>> m2 = memoryview(b2)[::-1] >>> m1 == m2 True >>> hash(b1) 4154562130492273536 >>> hash(b2) -1828484551660457336 Or: >>> a = array.array('L', [0]) >>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00' >>> m_array = memoryview(a) >>> m_bytes = memoryview(b) >>> m_cast = m_array.cast('B') >>> m_bytes == m_cast True >>> hash(b) == hash(a) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'array.array' Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
On Sun, 13 Nov 2011 11:39:46 +0100 Stefan Krah wrote: > Antoine Pitrou wrote: > > Only if the original object is itself mutable, otherwise the memoryview > > is read-only. > > > > I would propose the following algorithm: > > 1) try to calculate the original object's hash; if it fails, consider > >the memoryview unhashable (the buffer is probably mutable) > > With slices or the new casts (See: http://bugs.python.org/issue5231, > implemented in http://hg.python.org/features/pep-3118#memoryview ), > it is possible to have different hashes for equal objects: > > >>> b1 = bytes([1,2,3,4]) > >>> b2 = bytes([4,3,2,1]) > >>> m1 = memoryview(b1) > >>> m2 = memoryview(b2)[::-1] I don't understand this feature. How do you represent a reversed buffer using the buffer API, and how do you ensure that consumers (especially those written in C) see the buffer reversed? Regardless, it's simply a matter of getting the hash algorithm right (i.e. iterate in logical order rather than memory order). > >>> a = array.array('L', [0]) > >>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00' > >>> m_array = memoryview(a) > >>> m_bytes = memoryview(b) > >>> m_cast = m_array.cast('B') > >>> m_bytes == m_cast > True > >>> hash(b) == hash(a) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'array.array' In this case, the memoryview wouldn't be hashable either. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
On Sun, Nov 13, 2011 at 8:39 PM, Stefan Krah wrote: > Antoine Pitrou wrote: >> Only if the original object is itself mutable, otherwise the memoryview >> is read-only. >> >> I would propose the following algorithm: >> 1) try to calculate the original object's hash; if it fails, consider >> the memoryview unhashable (the buffer is probably mutable) > > With slices or the new casts (See: http://bugs.python.org/issue5231, > implemented in http://hg.python.org/features/pep-3118#memoryview ), > it is possible to have different hashes for equal objects: Note that Antoine isn't suggesting that the underlying hash be *used* as the memoryview's hash (that would be calculated according to the same rules as the equality comparison). Instead, the ability to hash the underlying object would just gate whether or not you could hash the memoryview at all. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
On Sun, Nov 13, 2011 at 8:49 PM, Antoine Pitrou wrote: > I don't understand this feature. How do you represent a reversed buffer > using the buffer API, and how do you ensure that consumers (especially > those written in C) see the buffer reversed? The values in the strides array are signed, so presumably just by specifying a "-1" for the relevant dimension (triggering all the usual failures if you encounter a buffer API consumer that can only handle C contiguous arrays). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
Antoine Pitrou wrote: > > > I would propose the following algorithm: > > > 1) try to calculate the original object's hash; if it fails, consider > > >the memoryview unhashable (the buffer is probably mutable) > > > > With slices or the new casts (See: http://bugs.python.org/issue5231, > > implemented in http://hg.python.org/features/pep-3118#memoryview ), > > it is possible to have different hashes for equal objects: > > > > >>> b1 = bytes([1,2,3,4]) > > >>> b2 = bytes([4,3,2,1]) > > >>> m1 = memoryview(b1) > > >>> m2 = memoryview(b2)[::-1] > > I don't understand this feature. How do you represent a reversed buffer > using the buffer API, and how do you ensure that consumers (especially > those written in C) see the buffer reversed? In this case, view->buf points to the last memory location and view->strides is -1. In general, any PEP-3118 compliant consumer must only access elements of a buffer either directly via PyBuffer_GetPointer() or in an equivalent manner. Basically, this means that you start at view->buf (which may be *any* location in the memory block) and follow the strides until you reach the desired element. Objects/abstract.c: === void* PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices) { char* pointer; int i; pointer = (char *)view->buf; for (i = 0; i < view->ndim; i++) { pointer += view->strides[i]*indices[i]; if ((view->suboffsets != NULL) && (view->suboffsets[i] >= 0)) { pointer = *((char**)pointer) + view->suboffsets[i]; } } return (void*)pointer; } > Regardless, it's simply a matter of getting the hash algorithm right > (i.e. iterate in logical order rather than memory order). If you know how the original object computes the hash then this would work. It's not obvious to me how this would work beyond bytes objects though. > > >>> a = array.array('L', [0]) > > >>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00' > > >>> m_array = memoryview(a) > > >>> m_bytes = memoryview(b) > > >>> m_cast = m_array.cast('B') > > >>> m_bytes == m_cast > > True > > >>> hash(b) == hash(a) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: unhashable type: 'array.array' > > In this case, the memoryview wouldn't be hashable either. Hmm, the point was that one could take the hash of m_bytes but not of m_cast, even though they are equal. Perhaps I misunderstood your proposal. I assumed that hash requests would be redirected to the original exporting object. As above, it would be possible to write a custom hash function for objects with type 'B'. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
Nick Coghlan wrote: > > With slices or the new casts (See: http://bugs.python.org/issue5231, > > implemented in http://hg.python.org/features/pep-3118#memoryview ), > > it is possible to have different hashes for equal objects: > > Note that Antoine isn't suggesting that the underlying hash be *used* > as the memoryview's hash (that would be calculated according to the > same rules as the equality comparison). Instead, the ability to hash > the underlying object would just gate whether or not you could hash > the memoryview at all. I think they necessarily have to use the same hash, since: exporter = m1 ==> hash(exporter) = hash(m1) m1 = m2 ==> hash(m1) = hash(m2) Am I missing something? Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
On Sun, 13 Nov 2011 13:05:24 +0100 Stefan Krah wrote: > Nick Coghlan wrote: > > > With slices or the new casts (See: http://bugs.python.org/issue5231, > > > implemented in http://hg.python.org/features/pep-3118#memoryview ), > > > it is possible to have different hashes for equal objects: > > > > Note that Antoine isn't suggesting that the underlying hash be *used* > > as the memoryview's hash (that would be calculated according to the > > same rules as the equality comparison). Instead, the ability to hash > > the underlying object would just gate whether or not you could hash > > the memoryview at all. > > I think they necessarily have to use the same hash, since: > > exporter = m1 ==> hash(exporter) = hash(m1) > m1 = m2 ==> hash(m1) = hash(m2) > > Am I missing something? The hash must simply be calculated using the same algorithm (which can even be shared as a subroutine). It's already the case for more complicated types: >>> hash(1) == hash(1.0) == hash(Decimal(1)) == hash(Fraction(1)) True Also, I think it's reasonable to limit hashability to one-dimensional memoryviews. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
Stefan Krah, 13.11.2011 13:05: Nick Coghlan wrote: With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects: Note that Antoine isn't suggesting that the underlying hash be *used* as the memoryview's hash (that would be calculated according to the same rules as the equality comparison). Instead, the ability to hash the underlying object would just gate whether or not you could hash the memoryview at all. I think they necessarily have to use the same hash, since: exporter = m1 ==> hash(exporter) = hash(m1) m1 = m2 ==> hash(m1) = hash(m2) You can't expect the memoryview() to magically know what the underlying hash function is. The only guarantee you get is that iff two memoryview instances are looking at the same (subset of) data from two hashable objects (or the same object), you will get the same hash value for both. It may or may not correspond with the hash value that the buffer exporting objects would give you. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
Antoine Pitrou wrote: > Stefan Krah wrote: > > I think they necessarily have to use the same hash, since: > > > > exporter = m1 ==> hash(exporter) = hash(m1) > > m1 = m2 ==> hash(m1) = hash(m2) > > > > Am I missing something? > > The hash must simply be calculated using the same algorithm (which > can even be shared as a subroutine). It's already the case for more > complicated types: > > >>> hash(1) == hash(1.0) == hash(Decimal(1)) == hash(Fraction(1)) > True Yes, but we control those types. I was thinking more about third-party exporters. Then again, it would be possible to publish the unified hash function as part of the PEP. Perhaps we could simply use: PyBuffer_Hash = hash(obj.tobytes()) Since tobytes() follows the logical structure, it should work for non-contiguous and multidimensional arrays as well. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
Xavier Morel writes: > On 2011-11-12, at 10:24 , Georg Brandl wrote: > > Am 12.11.2011 08:03, schrieb Stephen J. Turnbull: > >> The sensible thing is to just sort in Unicode code point order, I > >> think. > > The sensible thing is to accept that there is no solution, and to stop > > worrying. > The file could use the default collation order, that way it'd be > incorrectly sorted for everybody. "What I tell you three times is true." ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] _PyImport_FindExtensionObject() does not set _Py_PackageContext
Hi, I noticed that _PyImport_FindExtensionObject() in Python/import.c does not set _Py_PackageContext when it calls the module init function for module reinitialisation. However, PyModule_Create2() still uses that variable to figure out the fully qualified module name. Was this intentionally left out or is it just an oversight? Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] how to find the file path to an extension module at init time?
Hi, in Python modules, the "__file__" attribute is provided by the runtime before executing the module body. For extension modules, it is set only after executing the init function. I wonder if there's any way to figure out where an extension module is currently being loaded from. The _PyImport_LoadDynamicModule() function obviously knows it, but it does not pass that information on into the module init function. I'm asking specifically because I'd like to properly implement __file__ in Cython modules at module init time. There are cases where it could be faked (when compiling modules on the fly during import), but in general, it's not known at compile time where a module will get installed and run from, so I currently don't see how to do it without dedicated runtime support. That's rather unfortunate, because it's not so uncommon for packages to look up bundled data files relative to their own position using __file__, and that is pretty much always done in module level code. Another problem is that package local imports from __init__.py no longer work when it's compiled, likely because __path__ is missing on the new born module object in sys.modules. Here, it would also help if the path to the module (and to its package) was known early enough. Any ideas how this could currently be achieved? Or could this become a new feature in the future? Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashable memoryviews
> You can't expect the memoryview() to magically know what the underlying > hash function is. Hashable objects implementing the buffer interface could be required to make their hash implementation consistent with bytes hashing. IMO, that wouldn't be asking too much. There is already the issue that equality may not be transitive wrt. to buffer objects (e.g. a == memoryview(a) == memoryview(b) == b, but a != b). As that would be a bug in either a or b, failure to hash consistently would be a bug as well. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to find the file path to an extension module at init time?
> I'm asking specifically because I'd like to properly implement __file__ > in Cython modules at module init time. Why do you need to implement __file__? Python will set it eventually to its correct value, no? > Another problem is that package local imports from __init__.py no longer > work when it's compiled Does it actually work to have __init__ be an extension module? > Any ideas how this could currently be achieved? Currently, for Cython? I don't think that can work. > Or could this become a new feature in the future? Certainly. An approach similar to _Py_PackageContext should be possible. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] peps: And now for something completely different.
On Sun, 13 Nov 2011 22:33:28 +0100 barry.warsaw wrote: > > +And Now For Something Completely Different > +== So, is the release manager a man with two noses? > +Strings and bytes > +- > + > +Python 2's basic original string type are called 8-bit strings, and > +they play a dual role in Python 2 as both ASCII text and as byte > +arrays. While Python 2 also has a unicode string type, the > +fundamental ambiguity of the core string type, coupled with Python 2's > +default behavior of supporting automatic coercion from 8-bit strings > +to unicodes when the two are combined, often leads to `UnicodeError`s. > +Python 3's standard string type is a unicode, and Python 3 adds a > +bytes type, but critically, no automatic coercion between bytes and > +unicodes is provided. Thus, the core interpreter, its I/O libraries, > +module names, etc. are clear in their distinction between unicode > +strings and bytes. This clarity is often a source of difficulty in > +transitioning existing code to Python 3, because many third party > +libraries and applications are themselves ambiguous in this > +distinction. Once migrated though, most `UnicodeError`s can be > +eliminated. First class unicode (*) support also makes Python much friendlier to non-ASCII natives when it comes to things like filesystem access or error reporting. (*) even though Tom Christiansen would disagree, but perhaps we can settle on first and a half > +Imports > +--- > + > +In Python 3, star imports (e.g. ``from x import *``) are only > +premitted in module level code. permitted Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.7): Normalize the keyword arguments documentation notation in re.rst. Closes issue
On 11/13/2011 5:52 PM, eli.bendersky wrote: http://hg.python.org/cpython/rev/87ecfd5cd5d1 changeset: 73541:87ecfd5cd5d1 branch: 2.7 parent: 73529:c3b063c82ae5 user:Eli Bendersky date:Mon Nov 14 01:02:20 2011 +0200 summary: Normalize the keyword arguments documentation notation in re.rst. Closes issue #12875 -.. function:: compile(pattern[, flags=0]) +.. function:: compile(pattern, flags=0) ... This issue and the patch are about parameters with *default* arguments, which makes a corresponding argument in a call *optional*. For Python functions, both required and optional arguments can be passed by position (unless disabled) or keyword. Which is to say, for Python functions, any argument can be a keyword argument. I suspect I am not the only person somewhat confused when people use 'keyword' to mean 'optional' or 'default'. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.7): Normalize the keyword arguments documentation notation in re.rst. Closes issue
>> http://hg.python.org/cpython/rev/87ecfd5cd5d1 >> changeset: 73541:87ecfd5cd5d1 >> branch: 2.7 >> parent: 73529:c3b063c82ae5 >> user: Eli Bendersky >> date: Mon Nov 14 01:02:20 2011 +0200 >> summary: >> Normalize the keyword arguments documentation notation in re.rst. Closes >> issue #12875 > >> -.. function:: compile(pattern[, flags=0]) >> +.. function:: compile(pattern, flags=0) > > ... > > This issue and the patch are about parameters with *default* arguments, > which makes a corresponding argument in a call *optional*. For Python > functions, both required and optional arguments can be passed by position > (unless disabled) or keyword. Which is to say, for Python functions, any > argument can be a keyword argument. I suspect I am not the only person > somewhat confused when people use 'keyword' to mean 'optional' or 'default'. > You're right, Terry. Sorry for the confusing commit message. By the way, I think you may be interested in the related http://bugs.python.org/issue13386 Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com