Edit report at https://bugs.php.net/bug.php?id=62462&edit=1
ID: 62462
User updated by: c2h5oh at poczta dot fm
Reported by: c2h5oh at poczta dot fm
Summary: quoted_printable_encode splits line in the middle of
UTF8 character
-Status: Feedback
+Status: Open
Type: Bug
Package: *Mail Related
Operating System: Linux
PHP Version: 5.3.14
Block user comment: N
Private report: N
New Comment:
The important part is "quoted-printable data is generally assumed to be line-
oriented", I don't think we can or should assume that nothing happens to line-
breaks in transport.
As for the "if some clients/agents break the email text, we can't really be
responsible for them and if they do that": the encoding after patch is still
valid encoding and at the same time prevents "breaking" email text.
I have asked one of the people who made the decision to use own implementation
of quoted_printable_encode in SwiftMailer instead of the function to update
this
ticket with the reasoning behind it.
Previous Comments:
------------------------------------------------------------------------
[2012-07-16 06:41:33] [email protected]
The quoted bug is Thunderbird issue that produces real linebreak - note the
absence of = at the end of line here:
=CE=B1=CE=B1=CE=B1=CE=B1=CE=B1=CE=B1=CE <------- ERROR
instead of soft breaks. Clearly, this is a bug since the encoder should not
insert hard line breaks. PHP produces soft breaks, which is in full compliance
with the standard.
I'm not sure also how RFC quote is relevant - if some clients/agents break the
email text, we can't really be responsible for them and if they do that, they
break QP encoding and something else - like base64 - should be used. I don't
see
however why it makes PHP's way of encoding be wrong.
Could you describe the scenario where the way PHP does things lead to something
breaking, which is not due to bug in some other product? I also read
swiftmailer
report and also couldn't find description of any scenario where PHP encoding is
wrong.
------------------------------------------------------------------------
[2012-07-15 12:02:06] c2h5oh at poczta dot fm
>From RFC 2045:
"Because quoted-printable data is generally assumed to be line-
oriented, it is to be expected that the representation of the breaks
between the lines of quoted-printable data may be altered in
transport, in the same manner that plain text mail has always been
altered in Internet mail when passing between systems with differing
newline conventions."
We have no guarantee that it will be possible to merge split character during
decoding.
This has caused problems before: https://bugzilla.mozilla.org/show_bug.cgi?
id=684508
It's a problem widespread enough that for example SwiftMailer doesn't use
quoted_printable_encode, but own PHP
implementation which is more than an order of magnitude slower (
https://github.com/swiftmailer/swiftmailer/issues/220 ).
76 characters per line is the upper limit - adding soft line break earlier
produces perfectly valid encoding that doesn't
cause such problems.
------------------------------------------------------------------------
[2012-07-15 02:31:30] [email protected]
Could you explain why soft line breaks is a problem? The software decoding the
QP
string should ignore the linebreaks and reassemble the string in the original
form. Quoting from https://en.wikipedia.org/wiki/Quoted-printable:
A soft line break consists of an "=" at the end of an encoded line, and does
not
appear as a line break in the decoded text.
So where the corruption of the utf-8 comes from?
------------------------------------------------------------------------
[2012-07-02 11:42:51] c2h5oh at poczta dot fm
Description:
------------
quoted_printable_encode adds, among other things, soft line breaks if line
lenght
is greater than 76 characters.
If that 76th character happens to be in the middle if encoded UTF8 character
then
this character will be split into two lines corrupting the encoded sting.
Test script:
---------------
<?php
echo
quoted_printable_encode('Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
Ä
');
Expected result:
----------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85
Actual result:
--------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85
(compare ends of each line)
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=62462&edit=1