[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Added file: http://bugs.python.org/file37954/doc_c-api_buffer.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file37952/doc_c-api_buffer.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Added file: http://bugs.python.org/file37953/doc_c-api_buffer.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file37953/doc_c-api_buffer.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Richard Hansen added the comment:

 When I compile and run the above (latest Cython from Git master), I
 get:
 
 ()
 ()
 
 With Cython version 0.20.1post0 I get:
 
  foo.foo()
 (-1,)
 (-1,)
 
 If you get the correct output from the latest Cython, it looks like
 this issue has been fixed.

Oops, I was running a locally modified version of Cython that contained a patch 
meant to work around issue #23349.

When I run the *actual* upstream master I get the same behavior that you do.

But either way, I don't see why it's a problem that it prints (-1,) for the 
PyBUF_ND case.  The consumer didn't request suboffsets information so I would 
expect it to be OK for the producer to put whatever it wants in the suboffsets 
field, including a junk pointer that would segfault upon dereference.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Richard Hansen added the comment:

 How might an application break with this change?
 
assert(view-suboffsets == NULL);

Fair point.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Richard Hansen added the comment:

 This leaves me +-0 for the change, with the caveat that applications
 might break.

How might an application break with this change?  Compared to the current 
PyBuffer_IsContiguous(), the patched version is the same except it returns true 
for a wider range of actually-contiguous buffers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-02-01 Thread Richard Hansen

Richard Hansen added the comment:

My preference is to apply the patch, of course.  There is a legitimate concern 
that it will break existing code, but I think there are more points in favor of 
applying the patch:
  * there exists code that the current behavior is known to break
  * it's easier to remove assert(view-suboffsets == NULL) than to
change producers to not provide an all-negative array (though I
admit there may be other more subtle ways the patch would break
existing code)
  * I may be the only one who has spoken up about this, but I doubt
I'm the only one who has experienced it

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-31 Thread Richard Hansen

Richard Hansen added the comment:

(The following message is mostly off-topic but I think it is relevant to those 
interested in this issue.  This message is about the clarity of the 
documentation regarding flag semantics, and what I think the flags should mean.)

 Cython doesn't follow the spec though (use Python 3):
 
 from _testbuffer import *
 cpdef foo():
 cdef unsigned char[:] v = bytearray(btesting)
 nd = ndarray(v, getbuf=PyBUF_ND)
 print(nd.suboffsets)
 nd = ndarray(v, getbuf=PyBUF_FULL)
 print(nd.suboffsets)

When I compile and run the above (latest Cython from Git master), I get:

()
()

Looking at the Python source code (Modules/_testbuffer.c 
ndarray_get_suboffsets() and ssize_array_as_tuple()) the above output is 
printed if the suboffsets member is NULL.

 Cython hands out suboffsets even for a PyBUF_ND request, which is
 definitely wrong.  For a PyBUF_FULL request they should only be
 provided *if needed*.

suboffsets appears to be NULL in both cases in your example, which seems 
acceptable to me given:
  * the user didn't request suboffsets information in the PyBUF_ND
case, and
  * there are no indirections in the PyBUF_FULL case.

Even if suboffsets wasn't NULL, I believe the behavior would still be correct 
for the PyBUF_ND case.  My reading of 
https://docs.python.org/3/c-api/buffer.html is that flags=PyBUF_ND implies 
that shape member MUST NOT be NULL, but strides and suboffsets can be NULL or 
not (it doesn't matter).  This interpretation of mine is due to the sentence, 
Note that each flag contains all bits of the flags below it.  If 
flags=PyBUF_ND meant that strides and suboffsets MUST be NULL, then PyBUF_ND 
and PyBUF_STRIDES would necessarily be mutually exclusive and 
flags=(PyBUF_ND|PyBUF_STRIDES) would not make sense (the strides member would 
have to be both NULL and non-NULL).

If (flags  PyBUF_INDIRECT) is false, then the consumer is not interested in 
the suboffsets member so it shouldn't matter what it points to.  (And the 
consumer should not dereference the pointer in case it points to a junk 
address.)

IMHO, if the buffer is C-style contiguous then the producer should be allowed 
to populate the shape, strides, and suboffsets members regardless of whether or 
not any of the PyBUF_ND, PyBUF_STRIDES, or PyBUF_INDIRECT flags are set.  In 
other words, for C-style contiguous buffers, the producer should be allowed to 
act as if PyBUF_INDIRECT was always provided because the consumer will always 
get an appropriate Py_buffer struct regardless of the actual state of the 
PyBUF_ND, PyBUF_STRIDES, and PyBUF_INDIRECT flags.

It *is* a bug, however, to dereference the strides or suboffsets members with 
ndarray(v, getbuf=PyBUF_ND) because the consumer didn't ask for strides or 
suboffsets information and the pointers might be bogus.

Stepping back a bit, I believe that the flags should be thought of as imposing 
requirements on the producer.  I think those requirements should be (assuming 
ndim  0):

  * PyBUF_ND:  If (flags  PyBUF_ND) is true:
  - If (flags  PyBUF_STRIDES) is false *and* the producer is
unable to produce a block of memory at [buf,buf+len)
containing all (len/itemsize) entries in a C-style contiguous
chunk, then the producer must raise an exception.
  - Otherwise, the producer must provide the shape of buffer.
If (flags  PyBUF_ND) is false:
  - If the producer is unable to produce a contiguous chunk of
(len/itemsize) entries (of any shape) in the block of memory
at [buf,buf+len), then the producer must raise an exception.
  - Otherwise, the producer is permitted to do any of the
following:
  + don't touch the shape member (don't set it to NULL or any
other value; just leave it as-is)
  + set the shape member to NULL
  + set the shape member to point to an array describing the
shape
  + set the shape member to point to any other location, even
if dereferencing the pointer would result in a segfault
  * PyBUF_STRIDES:  If (flags  PyBUF_STRIDES) is true:
  - The producer must provide the appropriate strides information.
If (flags  PyBUF_STRIDES) is false:
  - If the producer is unable to produce a block of memory at
[buf,buf+len) containing all (len/itemsize) entries, the
producer must raise an exception.
  - Otherwise, the producer is permitted to do any of the
following;
  + don't touch the strides member (don't set it to NULL or
any other value; just leave it as-is)
  + set the strides member to NULL
  + set the strides member to point to an array describing the
strides
  + set the strides member to point to any other location,
even if dereferencing the pointer would result in a
segfault
  * PyBUF_INDIRECT:  If (flags  PyBUF_INDIRECT) is true

[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-31 Thread Richard Hansen

Richard Hansen added the comment:

Attached is a documentation patch that adds what I said in msg235141.  I doubt 
everyone will agree with the changes, but maybe it will be a useful starting 
point.

(Despite not having an asterisk next to my username, I have signed the 
contributor agreement.  It just hasn't been processed yet.)

--
Added file: http://bugs.python.org/file37952/doc_c-api_buffer.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-30 Thread Richard Hansen

Richard Hansen added the comment:

 People might rely on the fact that contiguous implies suboffsets==NULL.

Cython (currently) relies on all-negatives being acceptable and equivalent to 
suboffsets==NULL.  See:
https://github.com/cython/cython/pull/367
http://thread.gmane.org/gmane.comp.python.cython.devel/15569

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-29 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


--
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-29 Thread Richard Hansen

Richard Hansen added the comment:

 The patch has an obvious syntax error :-)

Doh!

 Other than that, adding a comment would be nice.

Agreed, will do.

 Bonus points if you can write a test (3.x has infrastructure for that, see 
 Lib/test/test_buffer.py).

I'll take a look.  Thanks!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-29 Thread Richard Hansen

New submission from Richard Hansen:

According to 
https://docs.python.org/2/c-api/buffer.html#the-new-style-py-buffer-struct if 
the suboffsets member of Py_buffer is non-NULL and all members of the array are 
negative, the buffer may be contiguous.

PyBuffer_IsContiguous() does not behave this way.  It assumes that if the 
suboffsets member is non-NULL then at least one member of the array is 
non-negative, and thus assumes that the buffer is non-contiguous.  This is not 
always correct.

One consequence of this bug is PyBuffer_ToContiguous() (and thus 
memoryview.tobytes()) falls back to slow (and currently buggy, see issue 
#23349) memory copying code.

Attached is a patch that fixes this bug.  The patch is against 2.7 but can be 
trivially modified to apply to 3.x.

--
components: Interpreter Core
files: PyBuffer_IsContiguous.patch
keywords: patch
messages: 235014
nosy: rhansen
priority: normal
severity: normal
status: open
title: PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != 
NULL
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file37914/PyBuffer_IsContiguous.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23349] memoryview.to_bytes() and PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers

2015-01-29 Thread Richard Hansen

New submission from Richard Hansen:

PyBuffer_ToContiguous() has an off-by-one error when copying a buffer it thinks 
is non-contiguous.

To reproduce, put the following in foo.pyx and compile with Cython v0.21.2:

cpdef foo():
cdef unsigned char[:] v = bytearray(testing)
print repr(memoryview(v).tobytes())

 import foo
 foo.foo()
'estingt'

(This issue was fixed for Python 3.x in issue #12834 but it was not fixed for 
Python 2.7.)

--
components: Interpreter Core
files: PyBuffer_ToContiguous.patch
keywords: patch
messages: 235011
nosy: rhansen
priority: normal
severity: normal
status: open
title: memoryview.to_bytes() and PyBuffer_ToContiguous() off-by-one error for 
non-contiguous buffers
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file37912/PyBuffer_ToContiguous.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23349
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23349] PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers

2015-01-29 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


--
title: memoryview.to_bytes() and PyBuffer_ToContiguous() off-by-one error for 
non-contiguous buffers - PyBuffer_ToContiguous() off-by-one error for 
non-contiguous buffers

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23349
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-29 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


--
versions:  -Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-29 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


--
versions: +Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL

2015-01-29 Thread Richard Hansen

Richard Hansen added the comment:

I've attached a new version of the patch.  Suggestions for simplifying the test 
code would be appreciated.

--
Added file: http://bugs.python.org/file37916/PyBuffer_IsContiguous_v2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23352
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-24 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Any comments on the patches?  I'd love to see at least patches 1-3 make it into 
Python 2.7.  :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: 
http://bugs.python.org/file15729/unicode_escape_single_and_double_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15771/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15774/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15775/raw_unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unit tests.  The tests now check to make sure that single and 
double quotes are escaped.

--
Added file: http://bugs.python.org/file15809/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a minimal patch:
  * unicode_escape now backslash-escapes single and double quotes
  * raw_unicode_escape now unicode-escapes single and double quotes
  * raw_unicode_escape now unicode-escapes backslashes
  * removes pickle's escaping workarounds so that it continues to work
  * updates documentation

--
Added file: http://bugs.python.org/file15810/unicode_escape_1_minimal.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a patch for an issue discovered while looking at the code:
  * The UTF-16 decode logic in the Unicode escape encoders no longer reads past 
the end of the provided Py_UNICODE buffer if the last character's value is 
between 0xD800 and 0xDC00.

This patch is meant to be applied after unicode_escape_1_minimal.patch.

--
Added file: 
http://bugs.python.org/file15811/unicode_escape_2_utf-16_invalid_read.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a patch for another issue discovered while looking at the code:
  * The Unicode escape encoders now check to make sure that the provided size 
is nonnegative.
  * C API documentation updated to make it clear that size must be nonnegative.

This patch is meant to be applied after 
unicode_escape_2_utf-16_invalid_read.patch.

--
Added file: 
http://bugs.python.org/file15812/unicode_escape_3_check_size_nonnegative.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a patch that eliminates duplicate code:
  * Merge unicodeescape_string(), PyUnicode_EncodeRawUnicodeEscape(), and 
modified_EncodeRawUnicodeEscape() into one function called 
_PyUnicode_EncodeCustomUnicodeEscape().

This patch is meant to be applied after 
unicode_escape_3_check_size_nonnegative.patch.

--
Added file: 
http://bugs.python.org/file15813/unicode_escape_4_eliminate_duplicate_code.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15748/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unicode_escape_reorg.patch.  This addresses two additional 
issues:
  * removes pickle's workaround of raw-unicode-escape's broken escaping
  * eliminates duplicated code (the raw escape encode function was copied with 
only a slight modification in cPickle.c)

With this, all regression tests pass.

--
Added file: http://bugs.python.org/file15771/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

I believe this issue is ready to be bumped to the patch review stage.  
Thoughts?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

 Does the last patch obsolete the first two?  If so please delete the 
 obsolete ones.

Yes and no -- it depends on what the core Python developers want and are 
comfortable with:
  * unicode_escape_single_quotes.patch:  Only escapes single quotes, simple 
patch.
  * unicode_escape_single_and_double_quotes:  Superset of the above (also 
escapes double quotes), but probably unnecessary.  Still a relatively simple 
patch.
  * unicode_escape_reorg.patch:  Superset of unicode_escape_single_quotes.patch 
that also fixes raw_unicode_escape and other small issues.  It's a bigger patch 
with a greater potential for backwards-compatibility issues.  (Pickle is an 
example:  It implemented its own workaround to address raw_unicode_escape's 
broken escaping, so fixing raw_unicode_escape ended up breaking pickle.  The 
reorg patch removes pickle's workaround, but there will probably be similar 
workarounds in other existing code.)

My preference is to have unicode_escape_reorg.patch committed, but I'm not sure 
how conservative the core developers are.  The release of Python 2.7 is 
approaching, and they may not want to take on the risk right now.  If that's 
the case, I'd be happy with applying unicode_escape_single_quotes.patch for now 
and moving unicode_escape_reorg.patch to a new issue report.

 I imagine there are might be small doc updates required, as well.

Certainly Misc/NEWS will need to be patched.  I'm unfamiliar with what else the 
devs might want for documentation, so I'd love to get some additional guidance. 
 I would also appreciate additional feedback on the technical merits of the 
reorg patch before investing too much time on updating documentation.

 I haven't looked at the patch itself, but concerning your test 
 patch:  your try/except style is unnecessary, I think.  Better to 
 just let the syntax error bubble up on its own.

OK, I'll make that change.  I added the try/except as an attempt to convert a 
known ERROR to a FAIL in case that was important for some reason.

Thanks for the feedback!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15746/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unit tests for the unicode_escape codec.  I removed the 
raw_unicode_escape tests and will attach a separate patch for those.

--
Added file: http://bugs.python.org/file15774/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unit tests for the raw_unicode_escape codec.

--
Added file: http://bugs.python.org/file15775/raw_unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

 We'll need a patch that implements single and double quote escaping 
 for unicode_escape and a \u style escaping of quotes for the 
 raw_unicode_escape encoder.

OK, I'll remove unicode_escape_single_quotes.patch and update 
unicode_escape_reorg.patch.

 Other changes are not necessary.

Would you please clarify?  There are a few other (minor) bugs that were 
discovered while writing unicode_escape_reorg.patch that I think should be 
fixed:
  * the UTF-16 surrogate pair decoding logic could read past the end of the 
provided Py_UNICODE character array if the last character is between 0xD800 and 
0xDC00
  * _PyString_Resize() will be called on an empty string if the size argument 
of unicodeescape_string() is 0.  This will raise a SystemError because 
_PyString_Resize() can only be called if the object's ref count is 1 (even if 
no resizing is to take place) yet PyString_FromStringAndSize() returns a shared 
empty string instance if size is 0.
  * it is unclear what unicodeescape_string() should do if size  0

Beyond those issues, I'm worried about manageability stemming from the amount 
of code duplication.  If a bug is found in one of those encoding functions, the 
other two will likely need updating.

 The pickle copy of the codec can be left untouched (both cPickle.c 
 and pickle.py) - it doesn't matter whether quotes are escaped or not 
 in the pickle data stream.

Unfortunately, pickle.py must be modified because it does its own backslash 
escaping before encoding with the raw_unicode_escape codec.  This means that 
backslashes would become double escaped and the decoded value would differ 
(confirmed by running the pickle unit tests).

The (minor) bugs in PyUnicode_EncodeRawUnicodeEscape() are also present in 
cPickle.c, so they should probably be fixed as well.

 The codecs' encode direction is not defined anywhere in the 
 documentation, AFAIK, and basically an implementation detail.

I read the escape codec documentation (see the original post) as implying that 
the encoders can generate eval-able string literals.  I'll add some clarifying 
statements.

Thanks for the feedback!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: 
http://bugs.python.org/file15716/unicode_escape_single_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-05 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attached is a patch to the unicode unit tests.  It adds tests for the following:
  * unicode_escape escapes single quotes
  * raw_unicode_escape escapes single quotes
  * raw_unicode_escape escapes backslashes

--
Added file: http://bugs.python.org/file15746/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-05 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15742/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-05 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unicode_escape_reorg.patch.  This fixes two additional issues:
  * don't call _PyString_Resize() on an empty string because there is only one 
empty string instance, and that instance is returned when creating an empty 
string
  * make sure the size parameter is nonnegative

--
Added file: http://bugs.python.org/file15748/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-04 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

I thought about raw_unicode_escape more, and there's a way to escape quotes:  
use unicode escape sequences (e.g., ur'\u0027').  I've attached a new patch 
that does the following:

 * backslash-escapes single quotes when encoding with the unicode_escape codec 
(the original subject of this bug)
 * replaces single quotes with \u0027 when encoding with the raw_unicode_escape 
codec (a separate bug not related to the original report, but brought up in 
comments)
 * replaces backslashes with \u005c when encoding with the raw_unicode_escape 
codec (a separate bug not related to the original report)
 * fixes a corner-case bug where the UTF-16 surrogate pair decoding logic could 
read past the end of the provided Py_UNICODE character array (a separate bug 
not related to the original report)
 * eliminates redundant code in PyUnicode_EncodeRawUnicodeEscape() and 
unicodeescape_string()
 * general cleanup in unicodeescape_string()

The changes in the patch are non-trivial and have only been lightly tested.

--
Added file: http://bugs.python.org/file15742/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-03 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

 If we change this, the encoder should quote both single and double 
 quotes - simply because it is not known whether the literal 
 will use single or double quotes.

Or document that single quotes are always escaped so that the user knows he/she 
can safely use u''.  I'm not sure if there is a use case where both would 
*need* to be escaped, and escaping both has a size penalty.

I've attached an untested patch that escapes both.

If both are escaped, then the behavior of the string_escape codec should also 
be changed for consistency (it only escapes single quotes).

 The raw_unicode_escape codec would have to be fixed as well.

I'm not sure there's anything to fix.  Adding backslashes to quotes in raw 
strings changes the value of the string -- the backslashes prevent the quotes 
from ending the string literal, but they are not removed when the raw literal 
is evaluated.

Perhaps raw_unicode_escape should be fixed by raising an exception when it 
contains any quotes.

--
Added file: 
http://bugs.python.org/file15729/unicode_escape_single_and_double_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2009-12-31 Thread Richard Hansen

New submission from Richard Hansen rhan...@bbn.com:

The description of the unicode_escape codec says that it produces a
string that is suitable as Unicode literal in Python source code. [1] 
Unfortunately, this is not true as it does not escape quotes.  For example:

  print u'a\'bc\'\'\'de'.encode('unicode_escape')

outputs:

  a'bc'''de

I have attached a patch that fixes this issue by escaping single quotes.
 With the patch applied, the output is:

  a\'bc\'\'\'de

I chose to only escape single quotes because:
  1.  it simplifies the patch, and
  2.  it matches string_escape's behavior.


[1] http://docs.python.org/library/codecs.html#standard-encodings

--
components: Unicode
files: unicode_escape_single_quotes.patch
keywords: patch
messages: 97112
nosy: rhansen
severity: normal
status: open
title: unicode_escape codec does not escape quotes
type: behavior
versions: Python 2.6, Python 2.7
Added file: http://bugs.python.org/file15716/unicode_escape_single_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com