[issue15573] Support unknown formats in memoryview comparisons

2012-11-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 969069d464bc by Stefan Krah in branch '3.3':
Issue #15814: Use hash function that is compatible with the equality
http://hg.python.org/cpython/rev/969069d464bc

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-29 Thread Stefan Krah

Stefan Krah added the comment:

We overlooked one thing. Since hashing is defined in terms of
tobytes(), the equality-hash invariant is now broken:

 from _testbuffer import ndarray
 x = ndarray([1,2,3], shape=[3], format='f')
 y = ndarray([1,2,3], shape=[3], format='B')
 a = memoryview(x)
 b = memoryview(y)
 a == b
True
 hash(a) == hash(b)
False
 


This is one problem with the new equality definition. It puts
memoryview firmly into array territory. I'm not saying that's
wrong, I even think it was the intention of the PEP authors to
have a zero copy arrayview.

Both array.array and numpy.array sidestep the issue by not being
hashable.


I don't really see a way around all this except doing slow
element-wise hashing.

--
resolution: fixed - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-29 Thread Nick Coghlan

Nick Coghlan added the comment:

Perhaps the hash could just be based on a subset of the items? For example, 
hash the format, shape and the first and last items?

Yes, such an approach means you're more likely to get collisions if you do 
start hashing these, but that seems better than making hashing itself 
excessively slow.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-29 Thread Stefan Krah

Stefan Krah added the comment:

I'm trying to think of an optimization for the native types. It should 
be possible to cast any native type element to unsigned char and use the
truncated value for the bytes hash.

Well, for double probably it's required to go from double - int64_t -
unsigned char, otherwise the first cast is technically undefined for 
negative values, though it works with gcc.


For non-native types and compound types, Nick's suggestion of
hashing the shape and a couple of values seems to be the best
solution.


Should we do anything about this before 3.3.0?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-29 Thread Martin v . Löwis

Martin v. Löwis added the comment:

1. I STRONGLY request to take hashing out of this issue. The only further 
action that this issue can have is complete reversal of the patch, starting 
over. Now that the issue has been resolved, a NEW issue arises, namely that 
hashing is inconsistent with equality. Please re-read the original issue: the 
objective of supporting unknown formats in memoryview comparisons is achieved. 
Remember that by reopening the issue, you are blocking the release (since it's 
still a release blocker).

2. Any significant change to the 3.3.0 code base should, IMO, require an 
additional release candidate (since the release candidate really ought to be 
what is going to be released, hence the name, so NO changes between candidate 
and release). Do you really want to delay the release of Python 3.3 until 
October because of hashing of memoryview objects (which, AFAICT, only affects 
bytes objects in the standard library)?

3. I don't think memoryview objects should be hashable at all. No matter what 
definition of hashing you chose, you likely won't manage to get it consistent 
with ==. Since a memoryview compares equal with its underlying object, it 
should also hash equal.

4. (unrelated to hashing, related to the original issue) I think it will be 
possible to create non-transitive equalities, where 
A==memoryview(A)==memoryview(B)==B, yet not (A==B)

5. I'm not sure whether the hash of a memoryview will surely be constant (as it 
should): isn't it possible to create a read-only buffer for a mutable memory 
block (not PyObject), so that any hashing considering the contents will result 
in changing hashes? Or, since the hash is cached, couldn't that cause the hash 
to deviate from ==?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-29 Thread Stefan Krah

Stefan Krah added the comment:

I agree that this isn't a blocker due to the limited use of
memoryview hashing. Tracking this in #15814.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-29 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
resolution:  - fixed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread STINNER Victor

STINNER Victor added the comment:

Even if I would prefer to see a fully working new implementation of the
buffer interface, I agree with Georg: it's too late for such huge change in
Python 3.3. Can' we wait Python 3.4 to change the equality operator?

It also took me three major releases to fix all Unicode issues (and it's
probably not finished :-)).
Le 24 août 2012 23:05, Georg Brandl rep...@bugs.python.org a écrit :


 Georg Brandl added the comment:

 I'm not very keen to hold up the release for long times again, especially
 for a patch of this size and lots of potential breakage.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue15573
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Stefan Krah

Stefan Krah added the comment:

The effect of the change is pretty minimal though: Previously
not equal was returned for unknown formats, now the struct
module figures it out.

memoryobject.c has 100% coverage except for a couple of lines
that are either impossible to reach or very hard to reach.

The new code is among the most heavily exercised lines, since
in test_buffer.py the verify() function always asserts equality.


You can look at the attached memoryobject.c.gcov: *All* failure paths
are taken since Python API functions are wrapped in macros that
set PyExc_FailAPIError and return failure at predefined points
the test suite.

--
Added file: http://bugs.python.org/file26992/memoryobject.c.gcov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Nick Coghlan

Nick Coghlan added the comment:

With Stefan's major improvements to the test suite for 3.3, the risk is low and 
I *really* don't want to spend the life of the 3.3 series excusing the current 
comparison behaviour.

By contrast, explaining the shift in definition as moving from raw memory 
comparisons to by value comparisons is relatively easy.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Georg Brandl

Georg Brandl added the comment:

Well, I'm not against you committing it as long as you commit it *right now*.  
I'm about to cut the tag.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Stefan Krah

Stefan Krah added the comment:

 You can look at the attached memoryobject.c.gcov: *All* failure paths
 are taken since Python API functions are wrapped in macros that
 set PyExc_FailAPIError and return failure at predefined points
 the test suite.

That should read: With a patch that wraps Python API functions ...

Obviously the macros aren't in by default.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Stefan Krah

Stefan Krah added the comment:

Great, give me 20 minutes or so.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Nick Coghlan

Nick Coghlan added the comment:

I'm currently on IRC and double checking the patch locally.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Nick Coghlan

Nick Coghlan added the comment:

Stefan, was there something specific you wanted to do before committing? 
Otherwise I can commit it as soon as this full test run finishes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Roundup Robot

Roundup Robot added the comment:

New changeset afa3dedfee18 by Nick Coghlan in branch 'default':
Close #15573: use value-based memoryview comparisons (patch by Stefan Krah)
http://hg.python.org/cpython/rev/afa3dedfee18

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-25 Thread Stefan Krah

Stefan Krah added the comment:

Except for Misc/NEWS the patch can be committed unchanged. Please
go ahead!

[The remaining doc changes are tracked elsewhere.]

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-24 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Is this still blocking the release? If so, it should be resolved within the 
next twelve hours, or else it may block the release until September.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-24 Thread Georg Brandl

Georg Brandl added the comment:

I'm not very keen to hold up the release for long times again, especially for a 
patch of this size and lots of potential breakage.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-19 Thread Georg Brandl

Changes by Georg Brandl ge...@python.org:


--
priority: deferred blocker - release blocker

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-19 Thread Stefan Krah

Stefan Krah added the comment:

I think issue15573-struct-2.diff is ready to go and I'd rather commit
sooner than later. Nick, can I interpret your last review comment
as go ahead? :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-19 Thread Georg Brandl

Georg Brandl added the comment:

Small nit: when you put versionchanged:: 3.3 there should be a hint of *what* 
changed exactly :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-19 Thread Stefan Krah

Stefan Krah added the comment:

Right. I'm tracking all versionchanged issues in #15724.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-17 Thread Stefan Krah

Stefan Krah added the comment:

There is still one corner case involving NaNs: Released memoryviews always
compare equal. I took that over from the 3.2 implementation.

 import array
 a = array.array('d', [float('nan')])
 m = memoryview(a)
 m == m
False
 m.release()
 m == m
True


I guess we have to live with that, since it is of course impossible to access
the values of a released view.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-15 Thread Stefan Krah

Stefan Krah added the comment:

Here is a new patch that hopefully addresses all comments.

--
Added file: http://bugs.python.org/file26831/issue15573-struct-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-14 Thread Stefan Krah

Stefan Krah added the comment:

Here is a patch implementing by-value comparisons for all format strings
understood by the struct module. It is slightly longer than promised, since
for larger arrays it is necessary to cache an unpacking object for acceptable
performance. The fast path for identical single element native format strings
is unchanged.

The new comparison rules are stated in the memoryview docs.


For Georg's benefit, here are the memoryobject.c changes and the reasons why
I think the patch can go into 3.3:

  o cmp_structure() is split into cmp_format() and cmp_shape(), with
unchanged semantics.

  o The new section unpack using the struct module is largely identical
to existing parts of _testbuffer.c:

  - struct_get_unpacker()  == see _testbuffer.c:ndarray_as_list()

  - struct_unpack_single() == see base case in _testbuffer.c:unpack_rec()

  o The new code is only called in the previous default case of unpack_cmp().

  o The new code has 100% coverage.



Performance:


Identical format, bytes:


$ ./python -m timeit -n 1000 -s import array; x = array.array('B', [1]*1); 
y = array.array('B', [1]*1); x == y
1000 loops, best of 3: 116 usec per loop

$ ./python -m timeit -n 1000 -s import array; x = array.array('B', [1]*1); 
y = array.array('B', [1]*1); a = memoryview(x); b = memoryview(y) a == b
1000 loops, best of 3: 49.1 usec per loop


Identical format, double:
-

$ ./python -m timeit -n 1000 -s import array; x = array.array('d', 
[1.0]*1); y = array.array('d', [1.0]*1); x == y
1000 loops, best of 3: 319 usec per loop

$ ./python -m timeit -n 1000 -s import array; x = array.array('d', 
[1.0]*1); y = array.array('d', [1.0]*1); a = memoryview(x); b = 
memoryview(y) a == b
1000 loops, best of 3: 65.7 usec per loop


Different format ('B', 'b'):


$ ./python -m timeit -n 100 -s import array; x = array.array('B', [1]*1); 
y = array.array('b', [1]*1); x == y
100 loops, best of 3: 131 usec per loop

$ ./python -m timeit -n 1000 -s import array; x = array.array('B', [1]*1); 
y = array.array('b', [1]*1); a = memoryview(x); b = memoryview(y) a == b
1000 loops, best of 3: 3.42 msec per loop


Different format ('d', 'f'):


$ ./python -m timeit -n 1000 -s import array; x = array.array('d', 
[1.0]*1); y = array.array('f', [1.0]*1); x == y
1000 loops, best of 3: 315 usec per loop

$ ./python -m timeit -n 1000 -s import array; x = array.array('d', 
[1.0]*1); y = array.array('f', [1.0]*1); a = memoryview(x); b = 
memoryview(y) a == b
1000 loops, best of 3: 3.59 msec per loop

--
Added file: http://bugs.python.org/file26800/issue15573-struct.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-14 Thread Stefan Krah

Stefan Krah added the comment:

I didn't get my own comments as mail, so this is just a notification that I've 
answered Nick's comments.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-14 Thread Nick Coghlan

Nick Coghlan added the comment:

Yeah, as soon as I got your email I realised I'd only been searching for usage 
*in the patch* rather than the full file. Oops :(

More comments in the review.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-12 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. L??wis rep...@bugs.python.org wrote:
 Here is a more formal definition of my last proposal,
 
 v and w are equal iff
 v.shape() == w.shape() and
   ((v.format == w.format and v.tobytes() == w.tobytes()) or
v.tolist() == w.tolist())
 if tolist raises an exception for unsupported codes, they are not equal

This wouldn't solve the NaN problem though:

 nd = ndarray([(1,2)], shape=[1], format='dd', flags=ND_WRITABLE)
 nd[0] = (float('nan'), float('nan'))
 m = memoryview(nd)
 m == nd
True

What's the policy on importing modules in the Object/* hierarchy? Can we
import _struct?

The current unpack_cmp() function for identical single native format codes
is very fast. The only thing that would be needed to make it general is
to replace

   default: return -1;

with

   default: return unpack_cmp_struct();

The code is already there in _testbuffer.c. _testbuffer.c itself has 100%
code coverage (including all error conditions with a special patch).

This means that all tests are already present in test_buffer.py.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-12 Thread Stefan Krah

Stefan Krah added the comment:

To be specific, after a quick look the only function from _testbuffer.c
that would be needed is unpack_single(). This really does not look very
complex.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-12 Thread Nick Coghlan

Nick Coghlan added the comment:

We do import modules in the Object/* code if we really need to (Grep for 
PyImport_Import in the Objects dir)

It seems to me like this case is rapidly reaching (or has even gone well past) 
the point where that's the sensible thing to do.

A couple of key issues of concern are:
- the number of modules implicitly imported at startup
- the risk of attempting to import a module before the import system (and the 
interpreter in general) is fully initialised

A simple trick for alleviating both concerns is to call PyImport_Import 
directly in the code path that needs it. It will be pretty quick if it gets a 
hit in the sys.modules cache, and most of the builtin objects aren't *used* 
until well after the interpreter is fully initialised (and, in the case of 
memoryview equality comparisons, often won't be used at all).

The main alternative is to invert the dependency: move the desired 
functionality into a private API function and have the standard library 
extension module access it that way. This can end up being harder to maintain, 
though, and certainly isn't appropriate in this case (since 
_struct.unpack_from needs the full struct machinery to do the job).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-12 Thread Stefan Krah

Stefan Krah added the comment:

Thanks, Nick. I'll work on a struct module patch then. At this point for me
this is the only satisfying solution.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Nick Coghlan

Nick Coghlan added the comment:

Stefan's proposed definition is correct. Shapes define array types, format
strings define entry types, then the actual memory contents define the
value.

It's not Stefan's definition of equivalence, it's what a statically typed
array *means*.

The 3.2 way is completely broken, as it considers arrays containing
completely different data as equal if the memory layout happens to be the
same by coincidence.

3.3 is currently also broken, as it considers arrays that *do* contain the
same values to be different.

Stefan's patch fixes that by checking the shape and format first, and
*then* checking the value. It's exactly the same as doing an instance check
in an __eq__ method.

The requirement for a canonical format is for sanity's sake: the general
equivalence classes are too hard to figure out.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Nick: I still disagree. Would you agree that array.array constitutes a 
statically typed array? Yet

py array.array('b',b'foo') == array.array('B',b'foo')
True
py array.array('i',[1,2,3]) == array.array('L', [1,2,3])
True

So the array object (rightfully) performs comparison on abstract values, not on 
memory representation. In Python, a statically typed array still conceptually 
contains abstract values, not memory blocks (this is also what Stefan asserts 
for NumPy in msg167862). The static typing only restricts the values you can 
store in the container, and defines the internal representation on the C level 
(plus it typically implies a value storage, instead of a reference storage).

With your and Stefan's proposed semantics, we would get the weird case that for 
two array.arrays a and b, it might happen that

a == b and memoryview(a) != memoryview(b)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

it might *still* happen, I should say, since this problem is exactly what 
Victor says this issue intends to solve (for comparison of two 'u' arrays).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Stefan Krah

Stefan Krah added the comment:

So we have two competing proposals:

1) Py_buffers are strongly typed arrays in the ML sense (e.g. array of float,
   array of int).

This is probably what I'd prefer on the C level, imagine a function like
function like PyBuffer_Compare(v, w).

Backwards compatibility problem for users who were thinking in terms of
value comparisons:

 x = array.array('b', [127])
 y = array.array('B', [127])
 x == y
True
 memoryview(x) == memoryview(y)
False

2) Compare by value, like NumPy arrays do:

 x = numpy.array([1, 2, 3], dtype='i')
 y = numpy.array([1, 2, 3], dtype='f')
 x == y
array([True,  True,  True], dtype=bool)

I concede that this is probably what users want to see on the Python level.

Backwards compatibility problem for users who were using complicated
structs:

 from _testbuffer import *
 x = ndarray([(1,1), (2,2), (3,3)], shape=[3], format='hQ')
 x == memoryview(x)
False

Reason: While _testbuffer.ndarray already implements tolist() in full
generality, memoryview does not:

 x.tolist()
[(1, 1), (2, 2), (3, 3)]

 memoryview(x).tolist()
Traceback (most recent call last):
  File stdin, line 1, in module
NotImplementedError: memoryview: unsupported format hQ

So while I'm beginning to like Martin's proposal, the implementation is
certainly trickier and will always be quite slow for complicated format
strings.

It would be possible to keep a fast path for the primitive C types
and use the code from _testbuffer.tolist() for the slow cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Here is a patch doing the comparison on abstract values if the formats are 
different.

--
Added file: http://bugs.python.org/file26766/abstractcmp.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Nick Coghlan

Nick Coghlan added the comment:

OK, I think I finally understand what Martin is getting at from a semantic 
point of view, and I think I can better explain the background of the issue and 
why Stefan's proposed solution is both necessary and correct.

The ideal definition of equivalence for memory view objects would actually be:

memoryview(x) == memoryview(y)

if (and only if)

memoryview(x).tolist() == memoryview(y).tolist()

Now, in practice, this approach cannot be implemented, because there are too 
many format definitions (whether valid or invalid) that memoryview doesn't 
understand (and perhaps will never understand) and because it would be 
completely infeasible on large arrays with complex format definitions.

Thus, we are forced to accept a *constraint* on memoryview's definition of 
equality: individual values are always compared via raw memory comparison, thus 
values stored using different *sizes* or *layouts* in memory will always 
compare as unequal, even if they would compare as equal in Python

This is an *acceptable* constraint as, in practice, you don't perform mixed 
format arithmetic and it's not a problem if there's no automatic coercion 
between sizes and layouts.

The Python 3.2 memoryview effectively uses memcmp() directly treating 
everything as a 1D array of bytes data, completely ignoring both shape *and* 
format data. Thus:

 ab = array('b', [1, 2, 3])
 ai = array('i', [1, 2, 3])
 aL = array('L', [1, 2, 3])
 ab == ai
True
 ab == ai == aL
True
 memoryview(ab) == memoryview(ai)
False
 memoryview(ab) == memoryview(aL)
False
 memoryview(ai) == memoryview(aL)
False

This approach leads to some major false positives, such as a floating point 
value comparing equal to an integer that happens to share the same binary 
representation:

 af = array('f', [1.1])
 ai = array('i', [1066192077])
 af == ai
False
 memoryview(af) == memoryview(ai)
True

The changes in 3.3 are aimed primarily at *eliminating those false positives* 
by taking into account the shape of the array and the format of the contained 
values. It is *not* about changing the fundamental constraint that memoryview 
operates at the level of raw memory, rather than Python objects, and thus cares 
about memory layout details that are irrelevant after passing through the 
Python abstraction layer.

This contrasts with the more limited scope of the array.array module, which 
*does* take into account the Python level abstractions. Thus, there will always 
be a discrepancy between the two definitions of equality, as memoryview cares 
about memory layout details, where array.array does not.

The problem at the moment is that Python 3.3 currently has *spurious* false 
negatives that aren't caused by that fundamental constraint that comparisons 
must occur based directly on memory contents. Instead, they're being caused by 
memoryview returning False for any equality comparison for a format it doesn't 
understand. That's unacceptable, and is what Stefan's patch is intended to fix.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Nick Coghlan

Nick Coghlan added the comment:

Short version:
3.3 implemented new constraints on memoryview equality comparisons to eliminate 
cases that were incorrectly returning True when they should have returned False.

However, the format constraints are currently too restrictive, so some 
memoryview comparisons that correctly returned True in 3.2 are now also 
returning False. This patch fixes *those* comparisons to once again return True.

Moving back to deferred blocked status, since the spurious false negatives are 
a regression from 3.2. It's fine for beta2 to ship with this problem, but it 
needs to be fixed for rc1.

--
priority: high - deferred blocker

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Nick Coghlan

Nick Coghlan added the comment:

I can see value in adopting the abstraction approach, but it's not part of this 
issue. That problem already existed when using memoryview in 3.2, we can wait 
until 3.4 to fix it.

There's an actual, concrete regression in 3.3 that may break code, and 
comparing with memcmp() when the format strings match is enough to fix it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Nick Coghlan

Nick Coghlan added the comment:

And, to be honest, I'd be quite happy for congratulations, you have reached 
the threshold where you need NumPy rather than memoryview to be the long term 
answer to getting by value comparison semantics.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 However, the format constraints are currently too restrictive, so
 some memoryview comparisons that correctly returned True in 3.2 are
 now also returning False. This patch fixes *those* comparisons to
 once again return True.

No, it doesn't - at least not in all cases. If you compare 'b'
and 'B' arrays with values in range(128), then 3.2 rightfully
returned True, but will still return False under the patch.

In any case, whatever the solution is (or even whatever the problem
is), it needs to be clearly spelled out in the documentation, since
it will be far from obvious.

  Instead, they're being caused by memoryview returning False for any
  equality comparison for a format it doesn't understand. That's
  unacceptable, and is what Stefan's patch is intended to fix.

I fail to see why this is unacceptable. If you don't understand
the format code, you cannot know how to compare the values - you
cannot even know how to compare the memory representation of the
values. If the memory compares unequal, it may still be plausible
that the values are actually equal (e.g. if this is a
numerator-denominator representation of rationals). There are even
cases where the memory comparison can compare as equal, yet the
values should compare non-equal (e.g. IEEE NaNs, or relative
pointers).

I also think this is a *really* obscure problem: not many people
use memoryview objects, very few desire to compare them for equality
(most rather use them for efficient IO, and poking into structures),
and of those, nearly nobody uses them with unsupported format codes -
in particular not if u and w get implemented (since
then standard Python will not have ways to create memoryview objects
reasonably with unsupported codes). What specific example did you
think of that regresses from 3.2?

So I still fail to see why this problem can manage to block the
release of 3.3 - but that's for the release manager to decide.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 And, to be honest, I'd be quite happy for congratulations, you have
 reached the threshold where you need NumPy rather than memoryview to
 be the long term answer to getting by value comparison semantics.

IMO, this threshold is already reached when you start comparing 
memoryview objects. PEP 3118 apparently introduced it to replace
the buffer obejct, and specified that it should have __getitem__
and __setitem__ as magic methods; the PEP doesn't talk about comparison
at all. So I wonder where the desire to support comparison for
equality comes from (but I can accept that we have to deal with it
as it is there now). IMO, it would be reasonable to declare to memory
buffers as unequal if they denote different memory blocks in main
memory (i.e. at different addresses).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Nick Coghlan

Nick Coghlan added the comment:

Hmm, you're right. OK, here's a simpler proposal for 3.3:

1. Always look at shape first. If they're different, then they're not equal

2. If both formats are known, compare by unpacked value

3. If either format is unknown, compare by memory contents (just like 3.2)

The fallback case should then behave exactly like 3.2 (since 3.2 really 
couldn't handle anything other than 1D data and always ignored the format info).

I'd be happier if the compare-by-value didn't make complete copies of the 
entire array though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Le samedi 11 août 2012 à 19:52 +, Nick Coghlan a écrit :
 I'd be happier if the compare-by-value didn't make complete copies of
 the entire array though.

Ditto. If a and b are bytes objects, comparing memoryview(a) and
memoryview(b) should be as cheap as comparing a and b.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 11.08.12 21:59, schrieb Antoine Pitrou:
 Le samedi 11 août 2012 à 19:52 +, Nick Coghlan a écrit :
 I'd be happier if the compare-by-value didn't make complete copies of
 the entire array though.

 Ditto. If a and b are bytes objects, comparing memoryview(a) and
 memoryview(b) should be as cheap as comparing a and b.

I agree with Antoine's requirement, generalizing it to the simple
cases should be efficient. I wonder why the procedure couldn't instead
be

1. compare shapes
2. if the format strings are string-equal, compare the
memory representation
3. else unpack values

Then, comparing two 1D 'B' memoryviews would be a memcmp
(i.e. it wouldn't do compare-by-value in this case).

For unpacking, I don't see any way to have it efficient
and still maintainable, since mixed-type comparisons
are quite tedious to write in C, and really best done with
type normalization and multiple dispatch in the way that
unpacking will do it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-11 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Here is a more formal definition of my last proposal,

v and w are equal iff
v.shape() == w.shape() and
  ((v.format == w.format and v.tobytes() == w.tobytes()) or
   v.tolist() == w.tolist())
if tolist raises an exception for unsupported codes, they are not equal

As usual, the implementation can deviate from the spec as long as it behaves 
as-if it would use this exact algorithm (assuming unlimited memory).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Martin v . Löwis

Martin v. Löwis added the comment:

haypo: thanks for stating the issue.

ISTM that this classifies as an obscure bug: you have to use memoryviews, and 
you need to compare them for equality, and the comparison needs to be 
non-trivial, where trivial is defined 
by both are 1D byte arrays.

While this is a bug, I think it still can be fixed in a bug fix 
release of 3.3, so un-blocking.

I also think that as a first step, a specification needs to be
drafted defining when exactly a memory view should compare equal
with some other object. I can easily provide a specification that
makes the current implementation correct.

--
priority: release blocker - high

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Stefan Krah

Stefan Krah added the comment:

 I can easily provide a specification that makes the current implementation 
 correct

Yes, the current specification is: memoryview only attempts to
compare arrays with known (single character native) formats and
returns not equal otherwise.

The problem is that for backwards compatibility memoryview accepts
arrays with arbitrary format strings. In operations like tolist()
it's possible to raise NotImplemented, but for equality comparisons
that's not possible.


Note that in 3.2 memoryview would return equal for arrays that
simply aren't equal, if those arrays happen to have the same bit
pattern.


One way to deal with this is to demand a strict canonical form
of format strings for PEP-3118, see msg167687.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 Note that in 3.2 memoryview would return equal for arrays that
 simply aren't equal, if those arrays happen to have the same bit
 pattern.

This is exactly my point. If a memoryview has the same memory
as another, why are they then not rightfully considered equal?
IOW, the 3.2 behavior looks fine to me.

You apparently have a vision that equality should mean something
different for memoryviews - please explicitly state what that
view is. A mathematical definition (two memoryviews A and B
are equal iff ...) would be appreciated.

 One way to deal with this is to demand a strict canonical form
 of format strings for PEP-3118, see msg167687.

You are talking about the solution already - I still don't know
what the problem is exactly (not that I *need* to understand
the problem, but at a minimum, the documentation should state
what the intended behavior is - better if people would also
agree that the proposed behavior is reasonable).

For 3.3, I see two approaches: either move backwards to the
3.2 behavior and defer this change to 3.4 - this would make
it release-critical indeed. Or move forward to a yet-to-be
specified equality operation which as of now may not be
implemented correctly, treating any improvement to it
as a bug fix.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Stefan Krah

Stefan Krah added the comment:

PEP-3118 specifies strongly typed multi-dimensional arrays. The existing
code in the 3.2 memoryview as well as numerous comments by Travis Oliphant
make it clear that these capabilities were intended from the start for
memoryview as well.

Perhaps the name memoryview is a misnomer, since the boundaries between
memoryview and NumPy's ndarray become blurry. In fact, the small
implementation of an ndarray in Modules/_testbuffer.c is *also* a memoryview
in some sense, since it can grab a buffer from an exporter and expose it in
the same manner as memoryview.

So what I implemented is certainly not only *my* vision. The PEP essentially
describes NumPy arrays with an interchange format to convert between NumPy
and PIL arrays.

It is perhaps unfortunate that the protocol was named buffer protocol,
since it is actually an advanced array protocol.

NumPy arrays don't care about the raw memory. It is always the logical array
that matters. For example, Fortran and C arrays have different bit patterns
in memory but compare equal, a fact that the 3.2 implementation completely
misses.

Arrays v and w are equal iff format and shape are equal and for all valid
indices allowed by shape

  memcmp((char *)PyBuffer_GetPointer(v, indices),
 (char *)PyBuffer_GetPointer(w, indices),
 itemsize) == 0.

Equal formats of course imply equal itemsize.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Can you please elaborate on the specification:

1. what does it mean that the formats of v and w are equal?

2. Victor's clarification about this issue isn't about comparing two arrays, 
but an array with a string object. So: when is an array equal to some other 
(non-array) object?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Stefan Krah

Stefan Krah added the comment:

 1. what does it mean that the formats of v and w are equal?

I'm using array and Py_buffer interchangeably since a Py_buffer struct
actually describes a multi-dimensional array. v and w are Py_buffer
structs.

So v.format must equal w.format, where format is a format string in
struct module syntax. The topic of this issue is to determine under
what circumstances two strings in struct module syntax are considered
equal.

 2. Victor's clarification about this issue isn't about comparing
two arrays, but an array with a string object. So: when is an
array equal to some other (non-array) object?

 a=array.array('u', 'abc')
 v=memoryview(a)
 a == v
False

memoryview can compare against any object with a getbufferproc, in this
case array.array. memoryview_richcompare() calls PyObject_GetBuffer(other)
and proceeds to compare its own internal Py_buffer v against the obtained
Py_buffer w.

In the case of v.format == w.format the fix for unknown formats is trivial:
Just allow the comparison using v.itemsize == w.itemsize.

However, the struct module format string syntax has multiple representations
for the exact same formats, which makes a general fmtcmp() function tricky
to write.

Hence my proposal to demand a strict canonical form for PEP-3118 format
strings, which would be a proper subset of struct module format strings.

Example: 1Q 1h 1h 0c must be written as Q2h

The attached patch should largely implement this proposal. A canonical form
is perhaps not such a severe restriction, since format strings should usually
come from the exporting object. E.g. NumPy must translate its own internal
format to struct module syntax anyway.

Another option is to commit the patch that misses 1Q 1h 1h 0c == Q2h
now and aim for a completely general fmtcmp() function later.

IMO any general fmtcmp() function should also be reasonably fast.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 So v.format must equal w.format, where format is a format string in
 struct module syntax. The topic of this issue is to determine under
 what circumstances two strings in struct module syntax are considered
 equal.

And that is exactly my question: We don't need a patch implementing
it (yet), but a specification of what is to be implemented first.

I know when two strings are equal (regardless of their syntax):
if they have the same length, and contain the same characters in
the same order. Apparently, you have a different notion of equal
for strings in mind, please be explicitly what that notion is.

 memoryview can compare against any object with a getbufferproc, in this
 case array.array. memoryview_richcompare() calls PyObject_GetBuffer(other)
 and proceeds to compare its own internal Py_buffer v against the obtained
 Py_buffer w.

Can this be expressed on Python level as well? I.e. is it correct
to say: an array/buffer/memoryview A is equal to an object O iff
A is equal to memoryview(O)? Or could it be that these two equivalences
might reasonably differ?

 Hence my proposal to demand a strict canonical form for PEP-3118 format
 strings, which would be a proper subset of struct module format strings.

Can you kindly phrase this as a specification? Who is demanding what
from whom?

Proposal: two format strings are equal if their canonical forms
are equal strings. The canonical form C of a string S is created by ???

However, it appears that you may have something different in mind
where things are rejected/fail to work if the canonical form isn't
originally provided by somebody (whom?)

So another Proposal: two format strings are equal iff they are
in both in canonical form and are equal strings.

This would imply that a format string which is not in canonical
form is not equal to any other strings, not even to itself, so
this may still not be what you want. But I can't guess what it
is that you want.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread STINNER Victor

STINNER Victor added the comment:

Can't we start with something simple (for ptyhon 3.3?), and elaborate
later? In my specific example, both object have the same format string and
the same content. So i expect that they are equal.
Le 10 août 2012 19:47, Martin v. Löwis rep...@bugs.python.org a écrit :


 Martin v. Löwis added the comment:

  So v.format must equal w.format, where format is a format string in
  struct module syntax. The topic of this issue is to determine under
  what circumstances two strings in struct module syntax are considered
  equal.

 And that is exactly my question: We don't need a patch implementing
 it (yet), but a specification of what is to be implemented first.

 I know when two strings are equal (regardless of their syntax):
 if they have the same length, and contain the same characters in
 the same order. Apparently, you have a different notion of equal
 for strings in mind, please be explicitly what that notion is.

  memoryview can compare against any object with a getbufferproc, in this
  case array.array. memoryview_richcompare() calls
 PyObject_GetBuffer(other)
  and proceeds to compare its own internal Py_buffer v against the obtained
  Py_buffer w.

 Can this be expressed on Python level as well? I.e. is it correct
 to say: an array/buffer/memoryview A is equal to an object O iff
 A is equal to memoryview(O)? Or could it be that these two equivalences
 might reasonably differ?

  Hence my proposal to demand a strict canonical form for PEP-3118 format
  strings, which would be a proper subset of struct module format strings.

 Can you kindly phrase this as a specification? Who is demanding what
 from whom?

 Proposal: two format strings are equal if their canonical forms
 are equal strings. The canonical form C of a string S is created by ???

 However, it appears that you may have something different in mind
 where things are rejected/fail to work if the canonical form isn't
 originally provided by somebody (whom?)

 So another Proposal: two format strings are equal iff they are
 in both in canonical form and are equal strings.

 This would imply that a format string which is not in canonical
 form is not equal to any other strings, not even to itself, so
 this may still not be what you want. But I can't guess what it
 is that you want.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue15573
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 Can't we start with something simple (for ptyhon 3.3?), and elaborate
 later?

Sure: someone would have to make a proposal what exactly that is.
IMO, the 3.2 definition *was* simple, but apparently it was considered
too simple. So either Stefan gets to define his view of equality, or
somebody else needs to make a (specific) counter-proposal.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Stefan Krah

Stefan Krah added the comment:

The ideal specification is:

1) Arrays v and w are equal iff format and shape are equal and for all valid
   indices allowed by shape

 memcmp((char *)PyBuffer_GetPointer(v, indices),
(char *)PyBuffer_GetPointer(w, indices),
itemsize) == 0.

2) Two format strings s and t are equal if canonical(s) == canonical(t).

End ideal specification.

Purely to *facilitate* the implementation of a format comparison function,
I suggested:

3) An exporter must initialize the format field of a Py_buffer structure
   with canonical(s).

If *all* exporters obey 3), a format comparison function can simply
call strcmp(s, t) (after sorting out the byte order specifier).

Specifically, if x and y are equal, then:

  a) x == memoryview(x) == memoryview(y) == y

If x and y are equal and exporter x does *not* obey 3), but exporter y does,
then:

  b) x == memoryview(x) != memoryview(y) == y

Under rule 3) this would be the fault of exporter x.

For Python 3.3 it is also possible to state only 1) and 2), with a caveat in
the documentation that case b) might occur until the format comparison function
in memoryview implements the reductions to canonical forms.

The problem is that reductions to canonical forms might get very complicated
if #3132 is implemented.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. Loewis rep...@bugs.python.org wrote:
 Sure: someone would have to make a proposal what exactly that is.
 IMO, the 3.2 definition *was* simple, but apparently it was considered
 too simple.

It was simply broken in multiple ways. Example:

 from numpy import *
 x = array([1,2,3,4,5], dtype='B')
 y = array([5,4,3,2,1], dtype='B')
 z = y[::-1]
 
 x == z
array([ True,  True,  True,  True,  True], dtype=bool)
 memoryview(x) == memoryview(z)
False
Segmentation fault


I'm not even talking about the segfault here. Note that x == z, but
memoryview(x) != memoryview(z), because the logical structure is
not taken into account.

Likewise, one could construct cases where one array contains a float
NaN and the other an integer that happens to have the same bit pattern.

The arrays would not be equal, but their memoryviews would be equal.



 So either Stefan gets to define his view of equality, or
 somebody else needs to make a (specific) counter-proposal.

The view is defined by the PEP that clearly models NumPy. I'm curious what
counter-proposal will work with NumPy and PIL.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-10 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I find Stefan's proposed equality confusing. Why is it based on memcmp? Either 
it compares memory (i.e. internal representations), or it compares abstract 
values. If it compares abstract values, then it shouldn't be a requirement that 
the format strings are equal in any sense. Instead, the resulting values should 
be equal. So I propose this definition:

v == w iff v.shape() == w.shape() and v.tolist() == w.tolist()
if either operation fails with an exception, the objects are not equal

Of course, the implementation doesn't need to literally call tolist; instead, 
behaving as-if it had been called is fine. However, as time
is running out, I would actually propose this to be the implementation
in 3.3.

In addition, I would propose to support the 'u' and 'w' codes in tolist, to 
resolve what Victor says the actual issue is.

I'm -1 on a definition that involves equivalence of format strings.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-09 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Can someone please explain why this is a release blocker? What is the specific 
regression from 3.2 that this deals with?

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-09 Thread STINNER Victor

STINNER Victor added the comment:

 What is the specific regression from 3.2 that this deals with?

I don't know if it must be called a regression, but at least the behaviour is 
different in Python 3.2 and 3.3. For example, an Unicode array is no more equal 
to its memoryview:

Python 3.3.0b1 (default:aaa68dce117e, Aug  9 2012, 22:45:00)
   
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux 
   
Type help, copyright, credits or license for more information.  
   
 import array
 

 a=array.array('u', 'abc')
 v=memoryview(a) 
 

 a == v  
 

False   
   

ned$ python3
Python 3.2.3 (default, Jun  8 2012, 05:40:07)   
   
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2
   
Type help, copyright, credits or license for more information.  
   
 import array
 

 a=array.array('u', 'abc')
 v=memoryview(a) 
 

 a == v  
 

True

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-08 Thread Stefan Krah

Stefan Krah added the comment:

Right, byte order specifiers are always at the beginning of the string.
That is at least something. I wonder if we should tighten PEP-3118 to
demand a canonical form of format strings, such as (probably incomplete):

  - Whitespace is disallowed.

  - Except for 's', no zero count may be given.

  - A count of 1 (redundant) is disallowed.

  - Repeats must be specified in terms of count + single char.


That still leaves the '=I' != '=L' problem. Why are there two
specifiers describing uint32_t?


Anyway, as Meador says, this can get tricky and I don't think this
can be resolved before beta-2. I'm attaching a patch that should
behave well for the restricted canonical form at least.

--
keywords: +patch
Added file: http://bugs.python.org/file26725/issue15573.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-07 Thread Stefan Krah

New submission from Stefan Krah:

Continuing the discussion from #13072. I hit a snag here:

Determining in full generality whether two format strings describe
identical items is pretty complicated, see also #3132.


I'm attaching a best effort fmtcmp() function that should do the
following:

  - recognize byte order specifiers at the start of the string.

  - recognize if an explicitly specified byte order happens to
match the native byte order.

It won't catch:

  - byte order specifiers anywhere in the string.

  - C types that happen to be identical ('I', 'L' on a 32-bit
platform). I'm also not sure if that is desirable in the
first place.

  - ???


So fmtcmp() will return false negatives (not equal), but should be
correct for *most* format strings that are actually in use.


Mark, Meador: You did a lot of work on the struct module and of
course on issue #3132. Does this look like a reasonable compromise?
Did I miss obvious cases (see attachment)?

--
assignee: skrah
components: Interpreter Core
files: format.c
messages: 167618
nosy: Arfrever, georg.brandl, haypo, mark.dickinson, meador.inge, ncoghlan, 
pitrou, python-dev, skrah
priority: release blocker
severity: normal
stage: needs patch
status: open
title: Support unknown formats in memoryview comparisons
type: behavior
versions: Python 3.3
Added file: http://bugs.python.org/file26720/format.c

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-07 Thread Nick Coghlan

Nick Coghlan added the comment:

I confess I was thinking of an even simpler format strings must be identical 
fallback, but agree your way is better, since it reproduces the 3.2 behaviour 
in many more cases where ignoring the format string actually did the right 
thing.

The struct docs for the byte order specifier specifically say the first 
character of the format string can be used to indicate the byte order, size and 
alignment of the packed data, so treating format strings that include byte 
order markers elsewhere in the string as differing from each other if those 
markers are in different locations sounds fine to me.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-07 Thread Christian Heimes

Changes by Christian Heimes li...@cheimes.de:


--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15573] Support unknown formats in memoryview comparisons

2012-08-07 Thread Meador Inge

Meador Inge added the comment:

I agree that the general case is complicated.  It will get even more 
complicated if the full of PEP 3118 gets implemented since it turns into a tree 
comparison.  In general, I think you will probably have to compute some 
canonical form and then compare the canonical forms.

Here are a few more cases that don't work out in the attached algorithm:

  1. Repeat characters - '2c' == 'cc'
  2. Whitespace - 'h   h' == 'hh' 

Also, currently the byte order specifiers are always at the beginning of the 
string.  We discussed in issue3132 scoping them per the nested structures, but 
decided to drop that unless somebody barks about it since it is fairly 
complicated without a clear benefit.  So, I wouldn't worry about them being 
scattered through the string.

This seems like sort of a slippery slope.  I need to think about it more, but 
my first impression is that coming up with some way to compare format strings 
is going to be nasty.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com