Edit report at http://bugs.php.net/bug.php?id=54369&edit=1
ID: 54369 Comment by: tomas dot brastavicius at quantum dot lt Reported by: tomas dot brastavicius at quantum dot lt Summary: parse_url() incorrectly determines the start of query and fragment parts of URL Status: Open Type: Bug Package: URL related PHP Version: Irrelevant Block user comment: N Private report: N New Comment: Another comment about this issue: http://marc.info/?l=php-internals&m=130183032307080&w=2 @Peter Yes, according to RFC 1738 the test URLs are not valid. But: 1. It is not defined that parse_url() parses URLs according to RFC 1738. 2. parse_url() "is not meant to validate given URL". See http://php.net/manual/en/function.parse-url.php 3. Why it is better to return invalid hostname ("#" and "/" are invalid characters, current parse_url() version) instead of invalid query or fragment (patched parse_url() version) ? @tokul at users dot sourceforge dot net Checked My arguments for the patch acceptance are as follows: 1. parse_url() documentation's "Return Values" section clearly states that query and fragment component starts after "?" and "#" character respectively. 2. I don't know any specification that allows "#" and "?" in the hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I am working with) that allows "/" character in both query and fragment parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and http://tools.ietf.org/html/rfc3986.html#section-3.5 3. It has been already stated (although different content) that parse_url() parses URLs according to RFC3986. See http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ? Previous Comments: ------------------------------------------------------------------------ [2011-04-03 14:10:58] tokul at users dot sourceforge dot net Check url encoding documentation first. http://en.wikipedia.org/wiki/Percent-encoding Then fix your $url value. You use reserved character for other purpose. ------------------------------------------------------------------------ [2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt Description: ------------ Attached patch fixes the issue. Test script: --------------- $url = 'http://www.example.com#fra/gment'; echo $url . "\n"; var_dump(parse_url($url)); $url = 'http://www.example.com?p=1/param'; echo $url . "\n"; var_dump(parse_url($url)); // No host, should return false $url = 'http://#fra/gment'; echo $url . "\n"; var_dump(parse_url($url)); // No host, should return false $url = 'http://?p=1/param'; echo $url . "\n"; var_dump(parse_url($url)); Expected result: ---------------- http://www.example.com#fra/gment array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(15) "www.example.com" ["fragment"]=> string(9) "fra/gment" } http://www.example.com?p=1/param array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(15) "www.example.com" ["query"]=> string(9) "p=1/param" } http://#fra/gment bool(false) http://?p=1/param bool(false) Actual result: -------------- http://www.example.com#fra/gment array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(19) "www.example.com#fra" ["path"]=> string(6) "/gment" } http://www.example.com?p=1/param array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(19) "www.example.com?p=1" ["path"]=> string(6) "/param" } http://#fra/gment array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(4) "#fra" ["path"]=> string(6) "/gment" } http://?p=1/param array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(4) "?p=1" ["path"]=> string(6) "/param" } ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=54369&edit=1