Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL
Edit report at http://bugs.php.net/bug.php?id=54369edit=1 ID: 54369 Comment by: tokul at users dot sourceforge dot net Reported by:tomas dot brastavicius at quantum dot lt Summary:parse_url() incorrectly determines the start of query and fragment parts of URL Status: Open Type: Bug Package:URL related PHP Version:Irrelevant Block user comment: N Private report: N New Comment: Check url encoding documentation first. http://en.wikipedia.org/wiki/Percent-encoding Then fix your $url value. You use reserved character for other purpose. Previous Comments: [2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt Description: Attached patch fixes the issue. Test script: --- $url = 'http://www.example.com#fra/gment'; echo $url . \n; var_dump(parse_url($url)); $url = 'http://www.example.com?p=1/param'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://#fra/gment'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://?p=1/param'; echo $url . \n; var_dump(parse_url($url)); Expected result: http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [fragment]= string(9) fra/gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [query]= string(9) p=1/param } http://#fra/gment bool(false) http://?p=1/param bool(false) Actual result: -- http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(19) www.example.com#fra [path]= string(6) /gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(19) www.example.com?p=1 [path]= string(6) /param } http://#fra/gment array(3) { [scheme]= string(4) http [host]= string(4) #fra [path]= string(6) /gment } http://?p=1/param array(3) { [scheme]= string(4) http [host]= string(4) ?p=1 [path]= string(6) /param } -- Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1
Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL
Edit report at http://bugs.php.net/bug.php?id=54369edit=1 ID: 54369 Comment by: tomas dot brastavicius at quantum dot lt Reported by:tomas dot brastavicius at quantum dot lt Summary:parse_url() incorrectly determines the start of query and fragment parts of URL Status: Open Type: Bug Package:URL related PHP Version:Irrelevant Block user comment: N Private report: N New Comment: Another comment about this issue: http://marc.info/?l=php-internalsm=130183032307080w=2 @Peter Yes, according to RFC 1738 the test URLs are not valid. But: 1. It is not defined that parse_url() parses URLs according to RFC 1738. 2. parse_url() is not meant to validate given URL. See http://php.net/manual/en/function.parse-url.php 3. Why it is better to return invalid hostname (# and / are invalid characters, current parse_url() version) instead of invalid query or fragment (patched parse_url() version) ? @tokul at users dot sourceforge dot net Checked My arguments for the patch acceptance are as follows: 1. parse_url() documentation's Return Values section clearly states that query and fragment component starts after ? and # character respectively. 2. I don't know any specification that allows # and ? in the hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I am working with) that allows / character in both query and fragment parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and http://tools.ietf.org/html/rfc3986.html#section-3.5 3. It has been already stated (although different content) that parse_url() parses URLs according to RFC3986. See http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ? Previous Comments: [2011-04-03 14:10:58] tokul at users dot sourceforge dot net Check url encoding documentation first. http://en.wikipedia.org/wiki/Percent-encoding Then fix your $url value. You use reserved character for other purpose. [2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt Description: Attached patch fixes the issue. Test script: --- $url = 'http://www.example.com#fra/gment'; echo $url . \n; var_dump(parse_url($url)); $url = 'http://www.example.com?p=1/param'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://#fra/gment'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://?p=1/param'; echo $url . \n; var_dump(parse_url($url)); Expected result: http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [fragment]= string(9) fra/gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [query]= string(9) p=1/param } http://#fra/gment bool(false) http://?p=1/param bool(false) Actual result: -- http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(19) www.example.com#fra [path]= string(6) /gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(19) www.example.com?p=1 [path]= string(6) /param } http://#fra/gment array(3) { [scheme]= string(4) http [host]= string(4) #fra [path]= string(6) /gment } http://?p=1/param array(3) { [scheme]= string(4) http [host]= string(4) ?p=1 [path]= string(6) /param } -- Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1
Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL
Edit report at http://bugs.php.net/bug.php?id=54369edit=1 ID: 54369 Comment by: tomas dot brastavicius at quantum dot lt Reported by:tomas dot brastavicius at quantum dot lt Summary:parse_url() incorrectly determines the start of query and fragment parts of URL Status: Open Type: Bug Package:URL related PHP Version:Irrelevant Block user comment: N Private report: N New Comment: One more comment about this issue: http://marc.info/?l=php-internalsm=130183094107548w=2 Previous Comments: [2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt Another comment about this issue: http://marc.info/?l=php-internalsm=130183032307080w=2 @Peter Yes, according to RFC 1738 the test URLs are not valid. But: 1. It is not defined that parse_url() parses URLs according to RFC 1738. 2. parse_url() is not meant to validate given URL. See http://php.net/manual/en/function.parse-url.php 3. Why it is better to return invalid hostname (# and / are invalid characters, current parse_url() version) instead of invalid query or fragment (patched parse_url() version) ? @tokul at users dot sourceforge dot net Checked My arguments for the patch acceptance are as follows: 1. parse_url() documentation's Return Values section clearly states that query and fragment component starts after ? and # character respectively. 2. I don't know any specification that allows # and ? in the hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I am working with) that allows / character in both query and fragment parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and http://tools.ietf.org/html/rfc3986.html#section-3.5 3. It has been already stated (although different content) that parse_url() parses URLs according to RFC3986. See http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ? [2011-04-03 14:10:58] tokul at users dot sourceforge dot net Check url encoding documentation first. http://en.wikipedia.org/wiki/Percent-encoding Then fix your $url value. You use reserved character for other purpose. [2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt Description: Attached patch fixes the issue. Test script: --- $url = 'http://www.example.com#fra/gment'; echo $url . \n; var_dump(parse_url($url)); $url = 'http://www.example.com?p=1/param'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://#fra/gment'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://?p=1/param'; echo $url . \n; var_dump(parse_url($url)); Expected result: http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [fragment]= string(9) fra/gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [query]= string(9) p=1/param } http://#fra/gment bool(false) http://?p=1/param bool(false) Actual result: -- http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(19) www.example.com#fra [path]= string(6) /gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(19) www.example.com?p=1 [path]= string(6) /param } http://#fra/gment array(3) { [scheme]= string(4) http [host]= string(4) #fra [path]= string(6) /gment } http://?p=1/param array(3) { [scheme]= string(4) http [host]= string(4) ?p=1 [path]= string(6) /param } -- Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1
Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL
Edit report at http://bugs.php.net/bug.php?id=54369edit=1 ID: 54369 Comment by: tokul at users dot sourceforge dot net Reported by:tomas dot brastavicius at quantum dot lt Summary:parse_url() incorrectly determines the start of query and fragment parts of URL Status: Open Type: Bug Package:URL related PHP Version:Irrelevant Block user comment: N Private report: N New Comment: You can't argue that function is broken and needs fixes, if you feed broken data and expect good output. Use valid urls in your tests, if you want to show that function is broken. Previous Comments: [2011-04-03 18:36:42] tomas dot brastavicius at quantum dot lt One more comment about this issue: http://marc.info/?l=php-internalsm=130183094107548w=2 [2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt Another comment about this issue: http://marc.info/?l=php-internalsm=130183032307080w=2 @Peter Yes, according to RFC 1738 the test URLs are not valid. But: 1. It is not defined that parse_url() parses URLs according to RFC 1738. 2. parse_url() is not meant to validate given URL. See http://php.net/manual/en/function.parse-url.php 3. Why it is better to return invalid hostname (# and / are invalid characters, current parse_url() version) instead of invalid query or fragment (patched parse_url() version) ? @tokul at users dot sourceforge dot net Checked My arguments for the patch acceptance are as follows: 1. parse_url() documentation's Return Values section clearly states that query and fragment component starts after ? and # character respectively. 2. I don't know any specification that allows # and ? in the hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I am working with) that allows / character in both query and fragment parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and http://tools.ietf.org/html/rfc3986.html#section-3.5 3. It has been already stated (although different content) that parse_url() parses URLs according to RFC3986. See http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ? [2011-04-03 14:10:58] tokul at users dot sourceforge dot net Check url encoding documentation first. http://en.wikipedia.org/wiki/Percent-encoding Then fix your $url value. You use reserved character for other purpose. [2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt Description: Attached patch fixes the issue. Test script: --- $url = 'http://www.example.com#fra/gment'; echo $url . \n; var_dump(parse_url($url)); $url = 'http://www.example.com?p=1/param'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://#fra/gment'; echo $url . \n; var_dump(parse_url($url)); // No host, should return false $url = 'http://?p=1/param'; echo $url . \n; var_dump(parse_url($url)); Expected result: http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [fragment]= string(9) fra/gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(15) www.example.com [query]= string(9) p=1/param } http://#fra/gment bool(false) http://?p=1/param bool(false) Actual result: -- http://www.example.com#fra/gment array(3) { [scheme]= string(4) http [host]= string(19) www.example.com#fra [path]= string(6) /gment } http://www.example.com?p=1/param array(3) { [scheme]= string(4) http [host]= string(19) www.example.com?p=1 [path]= string(6) /param } http://#fra/gment array(3) { [scheme]= string(4) http [host]= string(4) #fra [path]= string(6) /gment } http://?p=1/param array(3) { [scheme]= string(4) http [host]= string(4) ?p=1 [path]= string(6) /param } -- Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1