[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: Added file: http://bugs.python.org/file37954/doc_c-api_buffer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file37952/doc_c-api_buffer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: Added file: http://bugs.python.org/file37953/doc_c-api_buffer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file37953/doc_c-api_buffer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: When I compile and run the above (latest Cython from Git master), I get: () () With Cython version 0.20.1post0 I get: foo.foo() (-1,) (-1,) If you get the correct output from the latest Cython, it looks like this issue has been fixed. Oops, I was running a locally modified version of Cython that contained a patch meant to work around issue #23349. When I run the *actual* upstream master I get the same behavior that you do. But either way, I don't see why it's a problem that it prints (-1,) for the PyBUF_ND case. The consumer didn't request suboffsets information so I would expect it to be OK for the producer to put whatever it wants in the suboffsets field, including a junk pointer that would segfault upon dereference. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: How might an application break with this change? assert(view-suboffsets == NULL); Fair point. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: This leaves me +-0 for the change, with the caveat that applications might break. How might an application break with this change? Compared to the current PyBuffer_IsContiguous(), the patched version is the same except it returns true for a wider range of actually-contiguous buffers. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: My preference is to apply the patch, of course. There is a legitimate concern that it will break existing code, but I think there are more points in favor of applying the patch: * there exists code that the current behavior is known to break * it's easier to remove assert(view-suboffsets == NULL) than to change producers to not provide an all-negative array (though I admit there may be other more subtle ways the patch would break existing code) * I may be the only one who has spoken up about this, but I doubt I'm the only one who has experienced it -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: (The following message is mostly off-topic but I think it is relevant to those interested in this issue. This message is about the clarity of the documentation regarding flag semantics, and what I think the flags should mean.) Cython doesn't follow the spec though (use Python 3): from _testbuffer import * cpdef foo(): cdef unsigned char[:] v = bytearray(btesting) nd = ndarray(v, getbuf=PyBUF_ND) print(nd.suboffsets) nd = ndarray(v, getbuf=PyBUF_FULL) print(nd.suboffsets) When I compile and run the above (latest Cython from Git master), I get: () () Looking at the Python source code (Modules/_testbuffer.c ndarray_get_suboffsets() and ssize_array_as_tuple()) the above output is printed if the suboffsets member is NULL. Cython hands out suboffsets even for a PyBUF_ND request, which is definitely wrong. For a PyBUF_FULL request they should only be provided *if needed*. suboffsets appears to be NULL in both cases in your example, which seems acceptable to me given: * the user didn't request suboffsets information in the PyBUF_ND case, and * there are no indirections in the PyBUF_FULL case. Even if suboffsets wasn't NULL, I believe the behavior would still be correct for the PyBUF_ND case. My reading of https://docs.python.org/3/c-api/buffer.html is that flags=PyBUF_ND implies that shape member MUST NOT be NULL, but strides and suboffsets can be NULL or not (it doesn't matter). This interpretation of mine is due to the sentence, Note that each flag contains all bits of the flags below it. If flags=PyBUF_ND meant that strides and suboffsets MUST be NULL, then PyBUF_ND and PyBUF_STRIDES would necessarily be mutually exclusive and flags=(PyBUF_ND|PyBUF_STRIDES) would not make sense (the strides member would have to be both NULL and non-NULL). If (flags PyBUF_INDIRECT) is false, then the consumer is not interested in the suboffsets member so it shouldn't matter what it points to. (And the consumer should not dereference the pointer in case it points to a junk address.) IMHO, if the buffer is C-style contiguous then the producer should be allowed to populate the shape, strides, and suboffsets members regardless of whether or not any of the PyBUF_ND, PyBUF_STRIDES, or PyBUF_INDIRECT flags are set. In other words, for C-style contiguous buffers, the producer should be allowed to act as if PyBUF_INDIRECT was always provided because the consumer will always get an appropriate Py_buffer struct regardless of the actual state of the PyBUF_ND, PyBUF_STRIDES, and PyBUF_INDIRECT flags. It *is* a bug, however, to dereference the strides or suboffsets members with ndarray(v, getbuf=PyBUF_ND) because the consumer didn't ask for strides or suboffsets information and the pointers might be bogus. Stepping back a bit, I believe that the flags should be thought of as imposing requirements on the producer. I think those requirements should be (assuming ndim 0): * PyBUF_ND: If (flags PyBUF_ND) is true: - If (flags PyBUF_STRIDES) is false *and* the producer is unable to produce a block of memory at [buf,buf+len) containing all (len/itemsize) entries in a C-style contiguous chunk, then the producer must raise an exception. - Otherwise, the producer must provide the shape of buffer. If (flags PyBUF_ND) is false: - If the producer is unable to produce a contiguous chunk of (len/itemsize) entries (of any shape) in the block of memory at [buf,buf+len), then the producer must raise an exception. - Otherwise, the producer is permitted to do any of the following: + don't touch the shape member (don't set it to NULL or any other value; just leave it as-is) + set the shape member to NULL + set the shape member to point to an array describing the shape + set the shape member to point to any other location, even if dereferencing the pointer would result in a segfault * PyBUF_STRIDES: If (flags PyBUF_STRIDES) is true: - The producer must provide the appropriate strides information. If (flags PyBUF_STRIDES) is false: - If the producer is unable to produce a block of memory at [buf,buf+len) containing all (len/itemsize) entries, the producer must raise an exception. - Otherwise, the producer is permitted to do any of the following; + don't touch the strides member (don't set it to NULL or any other value; just leave it as-is) + set the strides member to NULL + set the strides member to point to an array describing the strides + set the strides member to point to any other location, even if dereferencing the pointer would result in a segfault * PyBUF_INDIRECT: If (flags PyBUF_INDIRECT) is true
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: Attached is a documentation patch that adds what I said in msg235141. I doubt everyone will agree with the changes, but maybe it will be a useful starting point. (Despite not having an asterisk next to my username, I have signed the contributor agreement. It just hasn't been processed yet.) -- Added file: http://bugs.python.org/file37952/doc_c-api_buffer.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: People might rely on the fact that contiguous implies suboffsets==NULL. Cython (currently) relies on all-negatives being acceptable and equivalent to suboffsets==NULL. See: https://github.com/cython/cython/pull/367 http://thread.gmane.org/gmane.comp.python.cython.devel/15569 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: -- type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: The patch has an obvious syntax error :-) Doh! Other than that, adding a comment would be nice. Agreed, will do. Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py). I'll take a look. Thanks! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
New submission from Richard Hansen: According to https://docs.python.org/2/c-api/buffer.html#the-new-style-py-buffer-struct if the suboffsets member of Py_buffer is non-NULL and all members of the array are negative, the buffer may be contiguous. PyBuffer_IsContiguous() does not behave this way. It assumes that if the suboffsets member is non-NULL then at least one member of the array is non-negative, and thus assumes that the buffer is non-contiguous. This is not always correct. One consequence of this bug is PyBuffer_ToContiguous() (and thus memoryview.tobytes()) falls back to slow (and currently buggy, see issue #23349) memory copying code. Attached is a patch that fixes this bug. The patch is against 2.7 but can be trivially modified to apply to 3.x. -- components: Interpreter Core files: PyBuffer_IsContiguous.patch keywords: patch messages: 235014 nosy: rhansen priority: normal severity: normal status: open title: PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 Added file: http://bugs.python.org/file37914/PyBuffer_IsContiguous.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23349] memoryview.to_bytes() and PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers
New submission from Richard Hansen: PyBuffer_ToContiguous() has an off-by-one error when copying a buffer it thinks is non-contiguous. To reproduce, put the following in foo.pyx and compile with Cython v0.21.2: cpdef foo(): cdef unsigned char[:] v = bytearray(testing) print repr(memoryview(v).tobytes()) import foo foo.foo() 'estingt' (This issue was fixed for Python 3.x in issue #12834 but it was not fixed for Python 2.7.) -- components: Interpreter Core files: PyBuffer_ToContiguous.patch keywords: patch messages: 235011 nosy: rhansen priority: normal severity: normal status: open title: memoryview.to_bytes() and PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file37912/PyBuffer_ToContiguous.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23349 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23349] PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers
Changes by Richard Hansen rhan...@bbn.com: -- title: memoryview.to_bytes() and PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers - PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23349 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: -- versions: -Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Changes by Richard Hansen rhan...@bbn.com: -- versions: +Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23352] PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL
Richard Hansen added the comment: I've attached a new version of the patch. Suggestions for simplifying the test code would be appreciated. -- Added file: http://bugs.python.org/file37916/PyBuffer_IsContiguous_v2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23352 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Any comments on the patches? I'd love to see at least patches 1-3 make it into Python 2.7. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15729/unicode_escape_single_and_double_quotes.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15771/unicode_escape_reorg.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15774/unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15775/raw_unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching updated unit tests. The tests now check to make sure that single and double quotes are escaped. -- Added file: http://bugs.python.org/file15809/unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching a minimal patch: * unicode_escape now backslash-escapes single and double quotes * raw_unicode_escape now unicode-escapes single and double quotes * raw_unicode_escape now unicode-escapes backslashes * removes pickle's escaping workarounds so that it continues to work * updates documentation -- Added file: http://bugs.python.org/file15810/unicode_escape_1_minimal.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching a patch for an issue discovered while looking at the code: * The UTF-16 decode logic in the Unicode escape encoders no longer reads past the end of the provided Py_UNICODE buffer if the last character's value is between 0xD800 and 0xDC00. This patch is meant to be applied after unicode_escape_1_minimal.patch. -- Added file: http://bugs.python.org/file15811/unicode_escape_2_utf-16_invalid_read.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching a patch for another issue discovered while looking at the code: * The Unicode escape encoders now check to make sure that the provided size is nonnegative. * C API documentation updated to make it clear that size must be nonnegative. This patch is meant to be applied after unicode_escape_2_utf-16_invalid_read.patch. -- Added file: http://bugs.python.org/file15812/unicode_escape_3_check_size_nonnegative.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching a patch that eliminates duplicate code: * Merge unicodeescape_string(), PyUnicode_EncodeRawUnicodeEscape(), and modified_EncodeRawUnicodeEscape() into one function called _PyUnicode_EncodeCustomUnicodeEscape(). This patch is meant to be applied after unicode_escape_3_check_size_nonnegative.patch. -- Added file: http://bugs.python.org/file15813/unicode_escape_4_eliminate_duplicate_code.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15748/unicode_escape_reorg.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching updated unicode_escape_reorg.patch. This addresses two additional issues: * removes pickle's workaround of raw-unicode-escape's broken escaping * eliminates duplicated code (the raw escape encode function was copied with only a slight modification in cPickle.c) With this, all regression tests pass. -- Added file: http://bugs.python.org/file15771/unicode_escape_reorg.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: I believe this issue is ready to be bumped to the patch review stage. Thoughts? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Does the last patch obsolete the first two? If so please delete the obsolete ones. Yes and no -- it depends on what the core Python developers want and are comfortable with: * unicode_escape_single_quotes.patch: Only escapes single quotes, simple patch. * unicode_escape_single_and_double_quotes: Superset of the above (also escapes double quotes), but probably unnecessary. Still a relatively simple patch. * unicode_escape_reorg.patch: Superset of unicode_escape_single_quotes.patch that also fixes raw_unicode_escape and other small issues. It's a bigger patch with a greater potential for backwards-compatibility issues. (Pickle is an example: It implemented its own workaround to address raw_unicode_escape's broken escaping, so fixing raw_unicode_escape ended up breaking pickle. The reorg patch removes pickle's workaround, but there will probably be similar workarounds in other existing code.) My preference is to have unicode_escape_reorg.patch committed, but I'm not sure how conservative the core developers are. The release of Python 2.7 is approaching, and they may not want to take on the risk right now. If that's the case, I'd be happy with applying unicode_escape_single_quotes.patch for now and moving unicode_escape_reorg.patch to a new issue report. I imagine there are might be small doc updates required, as well. Certainly Misc/NEWS will need to be patched. I'm unfamiliar with what else the devs might want for documentation, so I'd love to get some additional guidance. I would also appreciate additional feedback on the technical merits of the reorg patch before investing too much time on updating documentation. I haven't looked at the patch itself, but concerning your test patch: your try/except style is unnecessary, I think. Better to just let the syntax error bubble up on its own. OK, I'll make that change. I added the try/except as an attempt to convert a known ERROR to a FAIL in case that was important for some reason. Thanks for the feedback! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15746/unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching updated unit tests for the unicode_escape codec. I removed the raw_unicode_escape tests and will attach a separate patch for those. -- Added file: http://bugs.python.org/file15774/unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching updated unit tests for the raw_unicode_escape codec. -- Added file: http://bugs.python.org/file15775/raw_unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: We'll need a patch that implements single and double quote escaping for unicode_escape and a \u style escaping of quotes for the raw_unicode_escape encoder. OK, I'll remove unicode_escape_single_quotes.patch and update unicode_escape_reorg.patch. Other changes are not necessary. Would you please clarify? There are a few other (minor) bugs that were discovered while writing unicode_escape_reorg.patch that I think should be fixed: * the UTF-16 surrogate pair decoding logic could read past the end of the provided Py_UNICODE character array if the last character is between 0xD800 and 0xDC00 * _PyString_Resize() will be called on an empty string if the size argument of unicodeescape_string() is 0. This will raise a SystemError because _PyString_Resize() can only be called if the object's ref count is 1 (even if no resizing is to take place) yet PyString_FromStringAndSize() returns a shared empty string instance if size is 0. * it is unclear what unicodeescape_string() should do if size 0 Beyond those issues, I'm worried about manageability stemming from the amount of code duplication. If a bug is found in one of those encoding functions, the other two will likely need updating. The pickle copy of the codec can be left untouched (both cPickle.c and pickle.py) - it doesn't matter whether quotes are escaped or not in the pickle data stream. Unfortunately, pickle.py must be modified because it does its own backslash escaping before encoding with the raw_unicode_escape codec. This means that backslashes would become double escaped and the decoded value would differ (confirmed by running the pickle unit tests). The (minor) bugs in PyUnicode_EncodeRawUnicodeEscape() are also present in cPickle.c, so they should probably be fixed as well. The codecs' encode direction is not defined anywhere in the documentation, AFAIK, and basically an implementation detail. I read the escape codec documentation (see the original post) as implying that the encoders can generate eval-able string literals. I'll add some clarifying statements. Thanks for the feedback! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15716/unicode_escape_single_quotes.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attached is a patch to the unicode unit tests. It adds tests for the following: * unicode_escape escapes single quotes * raw_unicode_escape escapes single quotes * raw_unicode_escape escapes backslashes -- Added file: http://bugs.python.org/file15746/unicode_escape_tests.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Changes by Richard Hansen rhan...@bbn.com: Removed file: http://bugs.python.org/file15742/unicode_escape_reorg.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: Attaching updated unicode_escape_reorg.patch. This fixes two additional issues: * don't call _PyString_Resize() on an empty string because there is only one empty string instance, and that instance is returned when creating an empty string * make sure the size parameter is nonnegative -- Added file: http://bugs.python.org/file15748/unicode_escape_reorg.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: I thought about raw_unicode_escape more, and there's a way to escape quotes: use unicode escape sequences (e.g., ur'\u0027'). I've attached a new patch that does the following: * backslash-escapes single quotes when encoding with the unicode_escape codec (the original subject of this bug) * replaces single quotes with \u0027 when encoding with the raw_unicode_escape codec (a separate bug not related to the original report, but brought up in comments) * replaces backslashes with \u005c when encoding with the raw_unicode_escape codec (a separate bug not related to the original report) * fixes a corner-case bug where the UTF-16 surrogate pair decoding logic could read past the end of the provided Py_UNICODE character array (a separate bug not related to the original report) * eliminates redundant code in PyUnicode_EncodeRawUnicodeEscape() and unicodeescape_string() * general cleanup in unicodeescape_string() The changes in the patch are non-trivial and have only been lightly tested. -- Added file: http://bugs.python.org/file15742/unicode_escape_reorg.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
Richard Hansen rhan...@bbn.com added the comment: If we change this, the encoder should quote both single and double quotes - simply because it is not known whether the literal will use single or double quotes. Or document that single quotes are always escaped so that the user knows he/she can safely use u''. I'm not sure if there is a use case where both would *need* to be escaped, and escaping both has a size penalty. I've attached an untested patch that escapes both. If both are escaped, then the behavior of the string_escape codec should also be changed for consistency (it only escapes single quotes). The raw_unicode_escape codec would have to be fixed as well. I'm not sure there's anything to fix. Adding backslashes to quotes in raw strings changes the value of the string -- the backslashes prevent the quotes from ending the string literal, but they are not removed when the raw literal is evaluated. Perhaps raw_unicode_escape should be fixed by raising an exception when it contains any quotes. -- Added file: http://bugs.python.org/file15729/unicode_escape_single_and_double_quotes.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7615] unicode_escape codec does not escape quotes
New submission from Richard Hansen rhan...@bbn.com: The description of the unicode_escape codec says that it produces a string that is suitable as Unicode literal in Python source code. [1] Unfortunately, this is not true as it does not escape quotes. For example: print u'a\'bc\'\'\'de'.encode('unicode_escape') outputs: a'bc'''de I have attached a patch that fixes this issue by escaping single quotes. With the patch applied, the output is: a\'bc\'\'\'de I chose to only escape single quotes because: 1. it simplifies the patch, and 2. it matches string_escape's behavior. [1] http://docs.python.org/library/codecs.html#standard-encodings -- components: Unicode files: unicode_escape_single_quotes.patch keywords: patch messages: 97112 nosy: rhansen severity: normal status: open title: unicode_escape codec does not escape quotes type: behavior versions: Python 2.6, Python 2.7 Added file: http://bugs.python.org/file15716/unicode_escape_single_quotes.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7615 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com