New submission from Serhiy Storchaka:

The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s 
protects from unlimited output when broken pointer points on random 
non-null-terminated data. %.200R is used to limit the size of human-readable 
messages.

In all these case formatted string can look well-formed with short data, but 
mis-formed (not closed quote, truncated backslash escaping or � decoded from 
truncated UTF-8 sequence) with long data.

I propose to make truncating in PyUnicode_FromFormat() more smart.

1. Truncated %R should keep at least one end character (the quote or ">").
2. Truncated output should include "..." or "[...]" as truncating sign.
3. \c, \OOO, \xXX, \uXXXX, and \UXXXXXXXX should not be truncated. It is better 
to omit these sequences at all (cut the string before them) that output them 
truncated.
4. Doesn't truncate UTF-8 sequence inside a character for %s.

----------
components: Interpreter Core
messages: 258092
nosy: gvanrossum, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: More correct string truncating in PyUnicode_FromFormat()
type: enhancement
versions: Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26090>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to