Edit report at https://bugs.php.net/bug.php?id=62462&edit=1
ID: 62462
User updated by: c2h5oh at poczta dot fm
Reported by: c2h5oh at poczta dot fm
Summary: quoted_printable_encode splits line in the middle of
UTF8 character
-Status: Feedback
+Status: Open
Type: Bug
Package: *Mail Related
Operating System: Linux
PHP Version: 5.3.14
Block user comment: N
Private report: N
New Comment:
>From RFC 2045:
"Because quoted-printable data is generally assumed to be line-
oriented, it is to be expected that the representation of the breaks
between the lines of quoted-printable data may be altered in
transport, in the same manner that plain text mail has always been
altered in Internet mail when passing between systems with differing
newline conventions."
We have no guarantee that it will be possible to merge split character during
decoding.
This has caused problems before: https://bugzilla.mozilla.org/show_bug.cgi?
id=684508
It's a problem widespread enough that for example SwiftMailer doesn't use
quoted_printable_encode, but own PHP
implementation which is more than an order of magnitude slower (
https://github.com/swiftmailer/swiftmailer/issues/220 ).
76 characters per line is the upper limit - adding soft line break earlier
produces perfectly valid encoding that doesn't
cause such problems.
Previous Comments:
------------------------------------------------------------------------
[2012-07-15 02:31:30] [email protected]
Could you explain why soft line breaks is a problem? The software decoding the
QP
string should ignore the linebreaks and reassemble the string in the original
form. Quoting from https://en.wikipedia.org/wiki/Quoted-printable:
A soft line break consists of an "=" at the end of an encoded line, and does
not
appear as a line break in the decoded text.
So where the corruption of the utf-8 comes from?
------------------------------------------------------------------------
[2012-07-02 11:42:51] c2h5oh at poczta dot fm
Description:
------------
quoted_printable_encode adds, among other things, soft line breaks if line
lenght
is greater than 76 characters.
If that 76th character happens to be in the middle if encoded UTF8 character
then
this character will be split into two lines corrupting the encoded sting.
Test script:
---------------
<?php
echo
quoted_printable_encode('Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
');
Expected result:
----------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85
Actual result:
--------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85
(compare ends of each line)
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=62462&edit=1