Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Stefan Krah
Antoine Pitrou  wrote:
> Only if the original object is itself mutable, otherwise the memoryview
> is read-only.
> 
> I would propose the following algorithm:
> 1) try to calculate the original object's hash; if it fails, consider
>the memoryview unhashable (the buffer is probably mutable)

With slices or the new casts (See: http://bugs.python.org/issue5231,
implemented in http://hg.python.org/features/pep-3118#memoryview ),
it is possible to have different hashes for equal objects:

>>> b1 = bytes([1,2,3,4])
>>> b2 = bytes([4,3,2,1])
>>> m1 = memoryview(b1)
>>> m2 = memoryview(b2)[::-1]
>>> m1 == m2
True
>>> hash(b1)
4154562130492273536
>>> hash(b2)
-1828484551660457336


Or:

>>> a = array.array('L', [0])
>>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> m_array = memoryview(a)
>>> m_bytes = memoryview(b)
>>> m_cast = m_array.cast('B')
>>> m_bytes == m_cast
True
>>> hash(b) == hash(a)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'array.array'


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Antoine Pitrou
On Sun, 13 Nov 2011 11:39:46 +0100
Stefan Krah  wrote:
> Antoine Pitrou  wrote:
> > Only if the original object is itself mutable, otherwise the memoryview
> > is read-only.
> > 
> > I would propose the following algorithm:
> > 1) try to calculate the original object's hash; if it fails, consider
> >the memoryview unhashable (the buffer is probably mutable)
> 
> With slices or the new casts (See: http://bugs.python.org/issue5231,
> implemented in http://hg.python.org/features/pep-3118#memoryview ),
> it is possible to have different hashes for equal objects:
> 
> >>> b1 = bytes([1,2,3,4])
> >>> b2 = bytes([4,3,2,1])
> >>> m1 = memoryview(b1)
> >>> m2 = memoryview(b2)[::-1]

I don't understand this feature. How do you represent a reversed buffer
using the buffer API, and how do you ensure that consumers (especially
those written in C) see the buffer reversed?

Regardless, it's simply a matter of getting the hash algorithm right
(i.e. iterate in logical order rather than memory order).

> >>> a = array.array('L', [0])
> >>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00'
> >>> m_array = memoryview(a)
> >>> m_bytes = memoryview(b)
> >>> m_cast = m_array.cast('B')
> >>> m_bytes == m_cast
> True
> >>> hash(b) == hash(a)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'array.array'

In this case, the memoryview wouldn't be hashable either.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Nick Coghlan
On Sun, Nov 13, 2011 at 8:39 PM, Stefan Krah  wrote:
> Antoine Pitrou  wrote:
>> Only if the original object is itself mutable, otherwise the memoryview
>> is read-only.
>>
>> I would propose the following algorithm:
>> 1) try to calculate the original object's hash; if it fails, consider
>>    the memoryview unhashable (the buffer is probably mutable)
>
> With slices or the new casts (See: http://bugs.python.org/issue5231,
> implemented in http://hg.python.org/features/pep-3118#memoryview ),
> it is possible to have different hashes for equal objects:

Note that Antoine isn't suggesting that the underlying hash be *used*
as the memoryview's hash (that would be calculated according to the
same rules as the equality comparison). Instead, the ability to hash
the underlying object would just gate whether or not you could hash
the memoryview at all.

Cheers,
Nick.


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Nick Coghlan
On Sun, Nov 13, 2011 at 8:49 PM, Antoine Pitrou  wrote:
> I don't understand this feature. How do you represent a reversed buffer
> using the buffer API, and how do you ensure that consumers (especially
> those written in C) see the buffer reversed?

The values in the strides array are signed, so presumably just by
specifying a "-1" for the relevant dimension (triggering all the usual
failures if you encounter a buffer API consumer that can only handle C
contiguous arrays).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Stefan Krah
Antoine Pitrou  wrote:
> > > I would propose the following algorithm:
> > > 1) try to calculate the original object's hash; if it fails, consider
> > >the memoryview unhashable (the buffer is probably mutable)
> > 
> > With slices or the new casts (See: http://bugs.python.org/issue5231,
> > implemented in http://hg.python.org/features/pep-3118#memoryview ),
> > it is possible to have different hashes for equal objects:
> > 
> > >>> b1 = bytes([1,2,3,4])
> > >>> b2 = bytes([4,3,2,1])
> > >>> m1 = memoryview(b1)
> > >>> m2 = memoryview(b2)[::-1]
> 
> I don't understand this feature. How do you represent a reversed buffer
> using the buffer API, and how do you ensure that consumers (especially
> those written in C) see the buffer reversed?

In this case, view->buf points to the last memory location and view->strides
is -1. In general, any PEP-3118 compliant consumer must only access elements
of a buffer either directly via PyBuffer_GetPointer() or in an equivalent
manner.

Basically, this means that you start at view->buf (which may be *any*
location in the memory block) and follow the strides until you reach
the desired element.


Objects/abstract.c:
===

void*
PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices)
{
char* pointer;
int i;
pointer = (char *)view->buf;
for (i = 0; i < view->ndim; i++) {
pointer += view->strides[i]*indices[i];
if ((view->suboffsets != NULL) && (view->suboffsets[i] >= 0)) {
pointer = *((char**)pointer) + view->suboffsets[i];
}
}
return (void*)pointer;
}


> Regardless, it's simply a matter of getting the hash algorithm right
> (i.e. iterate in logical order rather than memory order).

If you know how the original object computes the hash then this would
work. It's not obvious to me how this would work beyond bytes objects
though.


> > >>> a = array.array('L', [0])
> > >>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00'
> > >>> m_array = memoryview(a)
> > >>> m_bytes = memoryview(b)
> > >>> m_cast = m_array.cast('B')
> > >>> m_bytes == m_cast
> > True
> > >>> hash(b) == hash(a)
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: unhashable type: 'array.array'
> 
> In this case, the memoryview wouldn't be hashable either.

Hmm, the point was that one could take the hash of m_bytes but not
of m_cast, even though they are equal. Perhaps I misunderstood your
proposal. I assumed that hash requests would be redirected to the
original exporting object.


As above, it would be possible to write a custom hash function for
objects with type 'B'.


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Stefan Krah
Nick Coghlan  wrote:
> > With slices or the new casts (See: http://bugs.python.org/issue5231,
> > implemented in http://hg.python.org/features/pep-3118#memoryview ),
> > it is possible to have different hashes for equal objects:
> 
> Note that Antoine isn't suggesting that the underlying hash be *used*
> as the memoryview's hash (that would be calculated according to the
> same rules as the equality comparison). Instead, the ability to hash
> the underlying object would just gate whether or not you could hash
> the memoryview at all.

I think they necessarily have to use the same hash, since:

exporter = m1 ==> hash(exporter) = hash(m1)
m1 = m2 ==> hash(m1) = hash(m2)


Am I missing something?


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Antoine Pitrou
On Sun, 13 Nov 2011 13:05:24 +0100
Stefan Krah  wrote:
> Nick Coghlan  wrote:
> > > With slices or the new casts (See: http://bugs.python.org/issue5231,
> > > implemented in http://hg.python.org/features/pep-3118#memoryview ),
> > > it is possible to have different hashes for equal objects:
> > 
> > Note that Antoine isn't suggesting that the underlying hash be *used*
> > as the memoryview's hash (that would be calculated according to the
> > same rules as the equality comparison). Instead, the ability to hash
> > the underlying object would just gate whether or not you could hash
> > the memoryview at all.
> 
> I think they necessarily have to use the same hash, since:
> 
> exporter = m1 ==> hash(exporter) = hash(m1)
> m1 = m2 ==> hash(m1) = hash(m2)
> 
> Am I missing something?

The hash must simply be calculated using the same algorithm (which
can even be shared as a subroutine). It's already the case for more
complicated types:

>>> hash(1) == hash(1.0) == hash(Decimal(1)) == hash(Fraction(1))
True

Also, I think it's reasonable to limit hashability to one-dimensional
memoryviews.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Stefan Behnel

Stefan Krah, 13.11.2011 13:05:

Nick Coghlan wrote:

With slices or the new casts (See: http://bugs.python.org/issue5231,
implemented in http://hg.python.org/features/pep-3118#memoryview ),
it is possible to have different hashes for equal objects:


Note that Antoine isn't suggesting that the underlying hash be *used*
as the memoryview's hash (that would be calculated according to the
same rules as the equality comparison). Instead, the ability to hash
the underlying object would just gate whether or not you could hash
the memoryview at all.


I think they necessarily have to use the same hash, since:

exporter = m1 ==>  hash(exporter) = hash(m1)
m1 = m2 ==>  hash(m1) = hash(m2)


You can't expect the memoryview() to magically know what the underlying 
hash function is. The only guarantee you get is that iff two memoryview 
instances are looking at the same (subset of) data from two hashable 
objects (or the same object), you will get the same hash value for both. It 
may or may not correspond with the hash value that the buffer exporting 
objects would give you.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Stefan Krah
Antoine Pitrou  wrote:
> Stefan Krah  wrote:
> > I think they necessarily have to use the same hash, since:
> > 
> > exporter = m1 ==> hash(exporter) = hash(m1)
> > m1 = m2 ==> hash(m1) = hash(m2)
> > 
> > Am I missing something?
> 
> The hash must simply be calculated using the same algorithm (which
> can even be shared as a subroutine). It's already the case for more
> complicated types:
> 
> >>> hash(1) == hash(1.0) == hash(Decimal(1)) == hash(Fraction(1))
> True

Yes, but we control those types. I was thinking more about third-party
exporters. Then again, it would be possible to publish the unified
hash function as part of the PEP.

Perhaps we could simply use:

PyBuffer_Hash = hash(obj.tobytes())

Since tobytes() follows the logical structure, it should work for
non-contiguous and multidimensional arrays as well.


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] order of Misc/ACKS

2011-11-13 Thread Stephen J. Turnbull
Xavier Morel writes:
 > On 2011-11-12, at 10:24 , Georg Brandl wrote:
 > > Am 12.11.2011 08:03, schrieb Stephen J. Turnbull:

 > >> The sensible thing is to just sort in Unicode code point order, I
 > >> think.

 > > The sensible thing is to accept that there is no solution, and to stop
 > > worrying.

 > The file could use the default collation order, that way it'd be
 > incorrectly sorted for everybody.

"What I tell you three times is true."

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] _PyImport_FindExtensionObject() does not set _Py_PackageContext

2011-11-13 Thread Stefan Behnel

Hi,

I noticed that _PyImport_FindExtensionObject() in Python/import.c does not 
set _Py_PackageContext when it calls the module init function for module 
reinitialisation. However, PyModule_Create2() still uses that variable to 
figure out the fully qualified module name. Was this intentionally left out 
or is it just an oversight?


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] how to find the file path to an extension module at init time?

2011-11-13 Thread Stefan Behnel

Hi,

in Python modules, the "__file__" attribute is provided by the runtime 
before executing the module body. For extension modules, it is set only 
after executing the init function. I wonder if there's any way to figure 
out where an extension module is currently being loaded from. The 
_PyImport_LoadDynamicModule() function obviously knows it, but it does not 
pass that information on into the module init function.


I'm asking specifically because I'd like to properly implement __file__ in 
Cython modules at module init time. There are cases where it could be faked 
(when compiling modules on the fly during import), but in general, it's not 
known at compile time where a module will get installed and run from, so I 
currently don't see how to do it without dedicated runtime support. That's 
rather unfortunate, because it's not so uncommon for packages to look up 
bundled data files relative to their own position using __file__, and that 
is pretty much always done in module level code.


Another problem is that package local imports from __init__.py no longer 
work when it's compiled, likely because __path__ is missing on the new born 
module object in sys.modules. Here, it would also help if the path to the 
module (and to its package) was known early enough.


Any ideas how this could currently be achieved? Or could this become a new 
feature in the future?


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashable memoryviews

2011-11-13 Thread Martin v. Löwis
> You can't expect the memoryview() to magically know what the underlying
> hash function is. 

Hashable objects implementing the buffer interface could be required to
make their hash implementation consistent with bytes hashing. IMO, that
wouldn't be asking too much.

There is already the issue that equality may not be transitive wrt. to
buffer objects (e.g. a == memoryview(a) == memoryview(b) == b, but a !=
b). As that would be a bug in either a or b, failure to hash
consistently would be a bug as well.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how to find the file path to an extension module at init time?

2011-11-13 Thread Martin v. Löwis
> I'm asking specifically because I'd like to properly implement __file__
> in Cython modules at module init time.

Why do you need to implement __file__? Python will set it eventually to
its correct value, no?

> Another problem is that package local imports from __init__.py no longer
> work when it's compiled

Does it actually work to have __init__ be an extension module?

> Any ideas how this could currently be achieved? 

Currently, for Cython? I don't think that can work.

> Or could this become a new feature in the future?

Certainly. An approach similar to _Py_PackageContext should be possible.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] peps: And now for something completely different.

2011-11-13 Thread Antoine Pitrou
On Sun, 13 Nov 2011 22:33:28 +0100
barry.warsaw  wrote:
>  
> +And Now For Something Completely Different
> +==

So, is the release manager a man with two noses?

> +Strings and bytes
> +-
> +
> +Python 2's basic original string type are called 8-bit strings, and
> +they play a dual role in Python 2 as both ASCII text and as byte
> +arrays.  While Python 2 also has a unicode string type, the
> +fundamental ambiguity of the core string type, coupled with Python 2's
> +default behavior of supporting automatic coercion from 8-bit strings
> +to unicodes when the two are combined, often leads to `UnicodeError`s.
> +Python 3's standard string type is a unicode, and Python 3 adds a
> +bytes type, but critically, no automatic coercion between bytes and
> +unicodes is provided.  Thus, the core interpreter, its I/O libraries,
> +module names, etc. are clear in their distinction between unicode
> +strings and bytes.  This clarity is often a source of difficulty in
> +transitioning existing code to Python 3, because many third party
> +libraries and applications are themselves ambiguous in this
> +distinction.  Once migrated though, most `UnicodeError`s can be
> +eliminated.

First class unicode (*) support also makes Python much friendlier to
non-ASCII natives when it comes to things like filesystem access or
error reporting.

(*) even though Tom Christiansen would disagree, but perhaps we can
settle on first and a half

> +Imports
> +---
> +
> +In Python 3, star imports (e.g. ``from x import *``) are only
> +premitted in module level code.

permitted

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython (2.7): Normalize the keyword arguments documentation notation in re.rst. Closes issue

2011-11-13 Thread Terry Reedy

On 11/13/2011 5:52 PM, eli.bendersky wrote:

http://hg.python.org/cpython/rev/87ecfd5cd5d1
changeset:   73541:87ecfd5cd5d1
branch:  2.7
parent:  73529:c3b063c82ae5
user:Eli Bendersky
date:Mon Nov 14 01:02:20 2011 +0200
summary:
   Normalize the keyword arguments documentation notation in re.rst. Closes 
issue #12875



-.. function:: compile(pattern[, flags=0])
+.. function:: compile(pattern, flags=0)

...

This issue and the patch are about parameters with *default* arguments, 
which makes a corresponding argument in a call *optional*. For Python 
functions, both required and optional arguments can be passed by 
position (unless disabled) or keyword. Which is to say, for Python 
functions, any argument can be a keyword argument. I suspect I am not 
the only person somewhat confused when people use 'keyword' to mean 
'optional' or 'default'.


tjr

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython (2.7): Normalize the keyword arguments documentation notation in re.rst. Closes issue

2011-11-13 Thread Eli Bendersky
>> http://hg.python.org/cpython/rev/87ecfd5cd5d1
>> changeset:   73541:87ecfd5cd5d1
>> branch:      2.7
>> parent:      73529:c3b063c82ae5
>> user:        Eli Bendersky
>> date:        Mon Nov 14 01:02:20 2011 +0200
>> summary:
>>   Normalize the keyword arguments documentation notation in re.rst. Closes
>> issue #12875
>
>> -.. function:: compile(pattern[, flags=0])
>> +.. function:: compile(pattern, flags=0)
>
> ...
>
> This issue and the patch are about parameters with *default* arguments,
> which makes a corresponding argument in a call *optional*. For Python
> functions, both required and optional arguments can be passed by position
> (unless disabled) or keyword. Which is to say, for Python functions, any
> argument can be a keyword argument. I suspect I am not the only person
> somewhat confused when people use 'keyword' to mean 'optional' or 'default'.
>

You're right, Terry. Sorry for the confusing commit message. By the
way, I think you may be interested in the related
http://bugs.python.org/issue13386
Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com