Edit report at https://bugs.php.net/bug.php?id=54369&edit=1
ID: 54369
Comment by: woody dot gilk at gmail dot com
Reported by: tomas dot brastavicius at quantum dot lt
Summary: [PATCH] parse_url() incorrectly determines the start
of query and fragment parts
Status: Open
Type: Bug
Package: URL related
PHP Version: Irrelevant
Block user comment: N
Private report: N
New Comment:
According to RFC, the URL http://www.example.com?foo=bar is a completely valid
URL. To quote:
> For example, the URI <mailto:[email protected]> has a path of
> "[email protected]",
whereas the URI <foo://info.example.com?fred> has an empty path.
There is nothing in the RFC spec that says a path must be included in the URL.
Please fix this bug.
Previous Comments:
------------------------------------------------------------------------
[2011-06-29 21:37:39] lenzai2004-dev at yahoo dot com
The point is not about wether the patch is relevant or not.
But for this bug and other cases, parse_url is returning corrupt result.
It could be fixed in 2 ways:
- patch it
- or detect invalid url and return error.
I've been trying to use this function and after significant volume of URLs I
always find cases where it returns incorrect data.
I had to rewrite everything in PHP and it's quite slow.
------------------------------------------------------------------------
[2011-05-17 20:12:50] tomas dot brastavicius at quantum dot lt
Changed report name as described in the bug report spec.
------------------------------------------------------------------------
[2011-04-03 19:36:33] tokul at users dot sourceforge dot net
You can't argue that function is broken and needs fixes, if you feed broken
data and expect good output. Use valid urls in your tests, if you want to show
that function is broken.
------------------------------------------------------------------------
[2011-04-03 18:36:42] tomas dot brastavicius at quantum dot lt
One more comment about this issue:
http://marc.info/?l=php-internals&m=130183094107548&w=2
------------------------------------------------------------------------
[2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt
Another comment about this issue:
http://marc.info/?l=php-internals&m=130183032307080&w=2
@Peter
Yes, according to RFC 1738 the test URLs are not valid. But:
1. It is not defined that parse_url() parses URLs according to RFC 1738.
2. parse_url() "is not meant to validate given URL". See
http://php.net/manual/en/function.parse-url.php
3. Why it is better to return invalid hostname ("#" and "/" are invalid
characters, current parse_url() version) instead of invalid query or fragment
(patched parse_url() version) ?
@tokul at users dot sourceforge dot net
Checked
My arguments for the patch acceptance are as follows:
1. parse_url() documentation's "Return Values" section clearly states that
query and fragment component starts after "?" and "#" character respectively.
2. I don't know any specification that allows "#" and "?" in the hostnames
(someone knows ?) but I know at least RFC3986 (unfortunately I am working with)
that allows "/" character in both query and fragment parts. See
http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5
3. It has been already stated (although different content) that parse_url()
parses URLs according to RFC3986. See http://bugs.php.net/bug.php?id=50484. May
be Adam Harvey knows more ?
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=54369
--
Edit this bug report at https://bugs.php.net/bug.php?id=54369&edit=1