New submission from Serhiy Storchaka <storchaka+cpyt...@gmail.com>:

There is a flaw in PyUnicode_DecodeCodePageStateful() (exposed as 
_codecs.code_page_decode() at Python level). Since MultiByteToWideChar() takes 
the size of the input as C int, it can not be used for decoding more than 2 
GiB. Large input is split on chunks of size 2 GiB which are decoded separately. 
The problem is if it split in the middle of a multibyte character. In this case 
decoding chunks will always fail or replace incomplete parts of the multibyte 
character at both ends with what the error handler returns.

It is hard to reproduce this bug, because you need to decode more than 2 GiB, 
and you will need at least 14 GiB of RAM for this (maybe more).

----------
components: Interpreter Core, Windows
messages: 338061
nosy: doerwalter, lemburg, paul.moore, serhiy.storchaka, steve.dower, 
tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: Flaw in Windows code page decoder for large input
type: behavior
versions: Python 2.7, Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36311>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to