[PHP-BUG] Bug #53596 [NEW]: Function iconv_mime_decode failed to decode utf-8 header

2010-12-23 Thread anton dot a dot minin at gmail dot com
From: 
Operating system: CentOS release 5.5
PHP version:  5.3.4
Package:  ICONV related
Bug Type: Bug
Bug description:Function iconv_mime_decode failed to decode utf-8 header

Description:

---

From manual page: http://www.php.net/function.iconv-mime-decode

---



Function iconv_mime_decode can't decode header with non-ascii characters,
if 

charset differs from ISO-8859-1.



For example iconv_mime_decode can't decode string

Subject: =?utf-8?Q?=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82,=20=D0=9C=D0=B5?=

 =?utf-8?Q?=D0=B4=D0=B2=D0=B5=D0=B4!=20(Hello,=20Bear!)?=

Test script:
---
?php



$plan = array(

// It's erroneous way to encode non-ascii text with ISO-8859-1, but 

// in this case the encode and the decode are inverse functions

// i. e. $a == decode(encode($a))

array(

'description'= Non-ASCII characters, ISO-8859-1 to ISO-8859-1
conversion,

'subject'= Привет, Медведь! (Hello, Bear!),


'prefs'  = array(

'input-charset'  = 'iso-8859-1',

'output-charset' = 'iso-8859-1',

)

),

// unfortunately fails

array(

'description'= Non-ASCII characters and UTF-8,

'subject'= Привет, Медвед! (Hello, Bear!),

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'utf-8',

)

),

array(

'description'= Only ASCII characters and UTF-8,

'subject'= Hello, Bear!,

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'utf-8',

)

),

array(

'description'= Only ASCII characters and Windows-1251
charset,

'subject'= Hello, Bear!,

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'windows-1251',

)

),

array(

'description'= Non-ASCII characters and Windows-1251
charset,

'subject'= Привет, Медведь! (Hello,
Bear!),

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'windows-1251',

)

)

);



foreach ($plan as $case) {



printf(\n\nStart: %s\n%s\n, $case['description'], str_repeat(=,
80));



$prefs = $case['prefs'];

$prefs['scheme'] = 'Q';



$subject_encoded = iconv_mime_encode('Subject', $case['subject'],
$prefs);



printf(Encoded subject: %s\n, var_export($subject_encoded, 1));



if (!$subject_encoded) {

$status = 'FAILED due to iconv_mime_encode';

} else {

$status = false === iconv_mime_decode($subject_encoded) ? 'FAILED'
: 'PASSED';

}



printf([%s] %s\n, $status, $case['description']);

}



echo \n;



Expected result:

All tests should pass.

Actual result:
--
Start: Non-ASCII characters, ISO-8859-1 to ISO-8859-1 conversion



Encoded subject: 'Subject:
=?iso-8859-1?Q?=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82,?

=

 =?iso-8859-1?Q?=20=D0=9C=D0=B5=D0=B4=D0=B2=D0=B5=D0=B4=D1=8C!=20(Hello,?=

 =?iso-8859-1?Q?=20Bear!)?='

[PASSED] Non-ASCII characters, ISO-8859-1 to ISO-8859-1 conversion





Start: Non-ASCII characters and UTF-8



Encoded subject: 'Subject: =?utf-8?Q?

=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82,=20=D0=9C=D0=B5?=

 =?utf-8?Q?=D0=B4=D0=B2=D0=B5=D0=B4!=20(Hello,=20Bear!)?='

[FAILED] Non-ASCII characters and UTF-8





Start: Only ASCII characters and UTF-8



Encoded subject: 'Subject: =?utf-8?Q?Hello,=20Bear!?='

[PASSED] Only ASCII characters and UTF-8





Start: Only ASCII characters and Windows-1251 charset



Encoded subject: 'Subject: =?windows-1251?Q?Hello,=20Bear!?='

[PASSED] Only ASCII characters and Windows-1251 charset





Start: Non-ASCII characters and Windows-1251 charset



Encoded subject: 'Subject: =?windows-1251?Q?

=CF=F0=E8=E2=E5=F2,=20=CC=E5=E4=E2=E5=E4=FC!=20(?=

 =?windows-1251?Q?Hello,=20Bear!)?='

[FAILED] Non-ASCII characters and Windows-1251 charset

-- 
Edit bug report at http://bugs.php.net/bug.php?id=53596edit=1
-- 
Try a snapshot (PHP 5.2):
http://bugs.php.net/fix.php?id=53596r=trysnapshot52
Try a snapshot (PHP 5.3):
http://bugs.php.net/fix.php?id=53596r=trysnapshot53
Try a snapshot (trunk):  
http://bugs.php.net/fix.php?id=53596r=trysnapshottrunk
Fixed in SVN:
http://bugs.php.net/fix.php?id=53596r=fixed
Fixed in SVN and need be 

Bug #53596 [Opn-Csd]: Function iconv_mime_decode failed to decode utf-8 header

2010-12-23 Thread anton dot a dot minin at gmail dot com
Edit report at http://bugs.php.net/bug.php?id=53596edit=1

 ID: 53596
 User updated by:anton dot a dot minin at gmail dot com
 Reported by:anton dot a dot minin at gmail dot com
 Summary:Function iconv_mime_decode failed to decode utf-8
 header
-Status: Open
+Status: Closed
 Type:   Bug
 Package:ICONV related
 Operating System:   CentOS release 5.5
 PHP Version:5.3.4
 Block user comment: N
 Private report: N

 New Comment:

With the php.ini option iconv.internal_encoding=utf-8 it works properly.


Previous Comments:

[2010-12-23 11:02:27] anton dot a dot minin at gmail dot com

Description:

---

From manual page: http://www.php.net/function.iconv-mime-decode

---



Function iconv_mime_decode can't decode header with non-ascii
characters, if 

charset differs from ISO-8859-1.



For example iconv_mime_decode can't decode string

Subject:
=?utf-8?Q?=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82,=20=D0=9C=D0=B5?=

 =?utf-8?Q?=D0=B4=D0=B2=D0=B5=D0=B4!=20(Hello,=20Bear!)?=

Test script:
---
?php



$plan = array(

// It's erroneous way to encode non-ascii text with ISO-8859-1, but


// in this case the encode and the decode are inverse functions

// i. e. $a == decode(encode($a))

array(

'description'= Non-ASCII characters, ISO-8859-1 to
ISO-8859-1 conversion,

'subject'= Привет, Медведь! (Hello,
Bear!), 

'prefs'  = array(

'input-charset'  = 'iso-8859-1',

'output-charset' = 'iso-8859-1',

)

),

// unfortunately fails

array(

'description'= Non-ASCII characters and UTF-8,

'subject'= Привет, Медвед! (Hello,
Bear!),

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'utf-8',

)

),

array(

'description'= Only ASCII characters and UTF-8,

'subject'= Hello, Bear!,

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'utf-8',

)

),

array(

'description'= Only ASCII characters and Windows-1251
charset,

'subject'= Hello, Bear!,

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'windows-1251',

)

),

array(

'description'= Non-ASCII characters and Windows-1251
charset,

'subject'= Привет, Медведь! (Hello,
Bear!),

'prefs'  = array(

'input-charset'  = 'utf-8',

'output-charset' = 'windows-1251',

)

)

);



foreach ($plan as $case) {



printf(\n\nStart: %s\n%s\n, $case['description'], str_repeat(=,
80));



$prefs = $case['prefs'];

$prefs['scheme'] = 'Q';



$subject_encoded = iconv_mime_encode('Subject', $case['subject'],
$prefs);



printf(Encoded subject: %s\n, var_export($subject_encoded, 1));



if (!$subject_encoded) {

$status = 'FAILED due to iconv_mime_encode';

} else {

$status = false === iconv_mime_decode($subject_encoded) ?
'FAILED' : 'PASSED';

}



printf([%s] %s\n, $status, $case['description']);

}



echo \n;



Expected result:

All tests should pass.

Actual result:
--
Start: Non-ASCII characters, ISO-8859-1 to ISO-8859-1 conversion



Encoded subject: 'Subject:
=?iso-8859-1?Q?=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82,?

=

 =?iso-8859-1?Q?=20=D0=9C=D0=B5=D0=B4=D0=B2=D0=B5=D0=B4=D1=8C!=20(Hello,?=

 =?iso-8859-1?Q?=20Bear!)?='

[PASSED] Non-ASCII characters, ISO-8859-1 to ISO-8859-1 conversion





Start: Non-ASCII characters and UTF-8



Encoded subject: 'Subject: =?utf-8?Q?

=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82,=20=D0=9C=D0=B5?=

 =?utf-8?Q?=D0=B4=D0=B2=D0=B5=D0=B4!=20(Hello,=20Bear!)?='

[FAILED] Non-ASCII characters and UTF-8





Start: Only ASCII characters and UTF-8



Encoded subject: 'Subject: =?utf-8?Q?Hello,=20Bear!?='

[PASSED] Only ASCII characters and UTF-8





Start: Only ASCII characters and Windows-1251 charset



Encoded subject: 'Subject: =?windows-1251?Q?Hello,=20Bear!?='

[PASSED] Only ASCII characters and Windows-1251 charset





Start: Non-ASCII characters and Windows-1251 charset



Encoded subject: 'Subject: =?windows-1251?Q?

=CF=F0=E8=E2=E5=F2,=20=CC