Edit report at http://bugs.php.net/bug.php?id=54369&edit=1

 ID:                 54369
 Comment by:         tomas dot brastavicius at quantum dot lt
 Reported by:        tomas dot brastavicius at quantum dot lt
 Summary:            parse_url() incorrectly determines the start of
                     query and fragment parts of URL
 Status:             Open
 Type:               Bug
 Package:            URL related
 PHP Version:        Irrelevant
 Block user comment: N
 Private report:     N

 New Comment:

Another comment about this issue:
http://marc.info/?l=php-internals&m=130183032307080&w=2





@Peter

Yes, according to RFC 1738 the test URLs are not valid. But:



1. It is not defined that parse_url() parses URLs according to RFC
1738.



2. parse_url() "is not meant to validate given URL". See
http://php.net/manual/en/function.parse-url.php



3. Why it is better to return invalid hostname ("#" and "/" are invalid
characters, current parse_url() version) instead of invalid query or
fragment (patched parse_url() version) ?





@tokul at users dot sourceforge dot net

Checked





My arguments for the patch acceptance are as follows:



1. parse_url() documentation's "Return Values" section clearly states
that query and fragment component starts after "?" and "#" character
respectively.



2. I don't know any specification that allows "#" and "?" in the
hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I
am working with) that allows "/" character in both query and fragment
parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5



3. It has been already stated (although different content) that
parse_url() parses URLs according to RFC3986. See
http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ?


Previous Comments:
------------------------------------------------------------------------
[2011-04-03 14:10:58] tokul at users dot sourceforge dot net

Check url encoding documentation first.

http://en.wikipedia.org/wiki/Percent-encoding



Then fix your $url value. You use reserved character for other purpose.

------------------------------------------------------------------------
[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt

Description:
------------
Attached patch fixes the issue.

Test script:
---------------
$url = 'http://www.example.com#fra/gment';

echo $url . "\n";

var_dump(parse_url($url));



$url = 'http://www.example.com?p=1/param';

echo $url . "\n";

var_dump(parse_url($url));



// No host, should return false

$url = 'http://#fra/gment';

echo $url . "\n";

var_dump(parse_url($url));



// No host, should return false

$url = 'http://?p=1/param';

echo $url . "\n";

var_dump(parse_url($url));

Expected result:
----------------
http://www.example.com#fra/gment

array(3) {

  ["scheme"]=>

  string(4) "http"

  ["host"]=>

  string(15) "www.example.com"

  ["fragment"]=>

  string(9) "fra/gment"

}

http://www.example.com?p=1/param

array(3) {

  ["scheme"]=>

  string(4) "http"

  ["host"]=>

  string(15) "www.example.com"

  ["query"]=>

  string(9) "p=1/param"

}

http://#fra/gment

bool(false)

http://?p=1/param

bool(false)

Actual result:
--------------
http://www.example.com#fra/gment

array(3) {

  ["scheme"]=>

  string(4) "http"

  ["host"]=>

  string(19) "www.example.com#fra"

  ["path"]=>

  string(6) "/gment"

}

http://www.example.com?p=1/param

array(3) {

  ["scheme"]=>

  string(4) "http"

  ["host"]=>

  string(19) "www.example.com?p=1"

  ["path"]=>

  string(6) "/param"

}

http://#fra/gment

array(3) {

  ["scheme"]=>

  string(4) "http"

  ["host"]=>

  string(4) "#fra"

  ["path"]=>

  string(6) "/gment"

}

http://?p=1/param

array(3) {

  ["scheme"]=>

  string(4) "http"

  ["host"]=>

  string(4) "?p=1"

  ["path"]=>

  string(6) "/param"

}


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54369&edit=1

Reply via email to