Edit report at http://bugs.php.net/bug.php?id=52923&edit=1

 ID:                 52923
 Updated by:         paj...@php.net
 Reported by:        masteram at gmail dot com
 Summary:            parse_url corrupts some UTF-8 strings
 Status:             Open
 Type:               Feature/Change Request
 Package:            *URL Functions
 Operating System:   MS Windows XP
 PHP Version:        5.3.3
 Block user comment: N

 New Comment:

What's about a parse_url_utf8, like what we have for IDN? It could be
easy to implement it using either native OS APIs (when available) or
using external libraries (there is a couple of good one out there).


Previous Comments:
------------------------------------------------------------------------
[2010-09-25 11:42:29] ras...@php.net

Reclassifying as a feature request.  parse_url has never been
multibyte-aware.

------------------------------------------------------------------------
[2010-09-25 11:09:39] masteram at gmail dot com

Description:
------------
I have tested this with PHP 5.2.9 and 5.3.3.

Some UTF-8 strings are not being processed correctly by parse_url.

In the given example, the result of the evaluation of strings which
contains the chars 'ם' or 'א' is corrupt, whereas the string
'מישהו'(which does not contain the above chars) is being processed
correctly.

The affected characters (in UTF-8) are comprised of the following
bytes:

ם - d7|9d

א - d7|90



Those are converted to a char which contains the following bytes:
d7|5f.



In addition to ruining the url, this char is not safe with
preg_replace.

Therefore, if we merge the result of parse_url back into a string, and
then attempting to replace, say, spaces with underscores using
preg_replace, we will get an empty string.



I believe that this is similar to bug #26391.

Test script:
---------------
$url = 'http://www.mysite.org/he/פרויקטים/ByYear.html';

$url = parse_url($url); //$url['path'] is now corrupt



$url = preg_replace('/\s+/u','_',$url['path']); //$url is now undefined

Expected result:
----------------
The correct portion of the url.

Actual result:
--------------
Corrupt string (or blank after using preg_replace).


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=52923&edit=1

Reply via email to