Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL

2011-04-03 Thread tokul at users dot sourceforge dot net
Edit report at http://bugs.php.net/bug.php?id=54369edit=1

 ID: 54369
 Comment by: tokul at users dot sourceforge dot net
 Reported by:tomas dot brastavicius at quantum dot lt
 Summary:parse_url() incorrectly determines the start of
 query and fragment parts of URL
 Status: Open
 Type:   Bug
 Package:URL related
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

Check url encoding documentation first.

http://en.wikipedia.org/wiki/Percent-encoding



Then fix your $url value. You use reserved character for other purpose.


Previous Comments:

[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt

Description:

Attached patch fixes the issue.

Test script:
---
$url = 'http://www.example.com#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



$url = 'http://www.example.com?p=1/param';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://?p=1/param';

echo $url . \n;

var_dump(parse_url($url));

Expected result:

http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [fragment]=

  string(9) fra/gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [query]=

  string(9) p=1/param

}

http://#fra/gment

bool(false)

http://?p=1/param

bool(false)

Actual result:
--
http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com#fra

  [path]=

  string(6) /gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com?p=1

  [path]=

  string(6) /param

}

http://#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) #fra

  [path]=

  string(6) /gment

}

http://?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) ?p=1

  [path]=

  string(6) /param

}






-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1


Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL

2011-04-03 Thread tomas dot brastavicius at quantum dot lt
Edit report at http://bugs.php.net/bug.php?id=54369edit=1

 ID: 54369
 Comment by: tomas dot brastavicius at quantum dot lt
 Reported by:tomas dot brastavicius at quantum dot lt
 Summary:parse_url() incorrectly determines the start of
 query and fragment parts of URL
 Status: Open
 Type:   Bug
 Package:URL related
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

Another comment about this issue:
http://marc.info/?l=php-internalsm=130183032307080w=2





@Peter

Yes, according to RFC 1738 the test URLs are not valid. But:



1. It is not defined that parse_url() parses URLs according to RFC
1738.



2. parse_url() is not meant to validate given URL. See
http://php.net/manual/en/function.parse-url.php



3. Why it is better to return invalid hostname (# and / are invalid
characters, current parse_url() version) instead of invalid query or
fragment (patched parse_url() version) ?





@tokul at users dot sourceforge dot net

Checked





My arguments for the patch acceptance are as follows:



1. parse_url() documentation's Return Values section clearly states
that query and fragment component starts after ? and # character
respectively.



2. I don't know any specification that allows # and ? in the
hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I
am working with) that allows / character in both query and fragment
parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5



3. It has been already stated (although different content) that
parse_url() parses URLs according to RFC3986. See
http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ?


Previous Comments:

[2011-04-03 14:10:58] tokul at users dot sourceforge dot net

Check url encoding documentation first.

http://en.wikipedia.org/wiki/Percent-encoding



Then fix your $url value. You use reserved character for other purpose.


[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt

Description:

Attached patch fixes the issue.

Test script:
---
$url = 'http://www.example.com#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



$url = 'http://www.example.com?p=1/param';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://?p=1/param';

echo $url . \n;

var_dump(parse_url($url));

Expected result:

http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [fragment]=

  string(9) fra/gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [query]=

  string(9) p=1/param

}

http://#fra/gment

bool(false)

http://?p=1/param

bool(false)

Actual result:
--
http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com#fra

  [path]=

  string(6) /gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com?p=1

  [path]=

  string(6) /param

}

http://#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) #fra

  [path]=

  string(6) /gment

}

http://?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) ?p=1

  [path]=

  string(6) /param

}






-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1


Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL

2011-04-03 Thread tomas dot brastavicius at quantum dot lt
Edit report at http://bugs.php.net/bug.php?id=54369edit=1

 ID: 54369
 Comment by: tomas dot brastavicius at quantum dot lt
 Reported by:tomas dot brastavicius at quantum dot lt
 Summary:parse_url() incorrectly determines the start of
 query and fragment parts of URL
 Status: Open
 Type:   Bug
 Package:URL related
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

One more comment about this issue:
http://marc.info/?l=php-internalsm=130183094107548w=2


Previous Comments:

[2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt

Another comment about this issue:
http://marc.info/?l=php-internalsm=130183032307080w=2





@Peter

Yes, according to RFC 1738 the test URLs are not valid. But:



1. It is not defined that parse_url() parses URLs according to RFC
1738.



2. parse_url() is not meant to validate given URL. See
http://php.net/manual/en/function.parse-url.php



3. Why it is better to return invalid hostname (# and / are invalid
characters, current parse_url() version) instead of invalid query or
fragment (patched parse_url() version) ?





@tokul at users dot sourceforge dot net

Checked





My arguments for the patch acceptance are as follows:



1. parse_url() documentation's Return Values section clearly states
that query and fragment component starts after ? and # character
respectively.



2. I don't know any specification that allows # and ? in the
hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I
am working with) that allows / character in both query and fragment
parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5



3. It has been already stated (although different content) that
parse_url() parses URLs according to RFC3986. See
http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ?


[2011-04-03 14:10:58] tokul at users dot sourceforge dot net

Check url encoding documentation first.

http://en.wikipedia.org/wiki/Percent-encoding



Then fix your $url value. You use reserved character for other purpose.


[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt

Description:

Attached patch fixes the issue.

Test script:
---
$url = 'http://www.example.com#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



$url = 'http://www.example.com?p=1/param';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://?p=1/param';

echo $url . \n;

var_dump(parse_url($url));

Expected result:

http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [fragment]=

  string(9) fra/gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [query]=

  string(9) p=1/param

}

http://#fra/gment

bool(false)

http://?p=1/param

bool(false)

Actual result:
--
http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com#fra

  [path]=

  string(6) /gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com?p=1

  [path]=

  string(6) /param

}

http://#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) #fra

  [path]=

  string(6) /gment

}

http://?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) ?p=1

  [path]=

  string(6) /param

}






-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1


Bug #54369 [Com]: parse_url() incorrectly determines the start of query and fragment parts of URL

2011-04-03 Thread tokul at users dot sourceforge dot net
Edit report at http://bugs.php.net/bug.php?id=54369edit=1

 ID: 54369
 Comment by: tokul at users dot sourceforge dot net
 Reported by:tomas dot brastavicius at quantum dot lt
 Summary:parse_url() incorrectly determines the start of
 query and fragment parts of URL
 Status: Open
 Type:   Bug
 Package:URL related
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

You can't argue that function is broken and needs fixes, if you feed
broken data and expect good output. Use valid urls in your tests, if you
want to show that function is broken.


Previous Comments:

[2011-04-03 18:36:42] tomas dot brastavicius at quantum dot lt

One more comment about this issue:
http://marc.info/?l=php-internalsm=130183094107548w=2


[2011-04-03 18:09:08] tomas dot brastavicius at quantum dot lt

Another comment about this issue:
http://marc.info/?l=php-internalsm=130183032307080w=2





@Peter

Yes, according to RFC 1738 the test URLs are not valid. But:



1. It is not defined that parse_url() parses URLs according to RFC
1738.



2. parse_url() is not meant to validate given URL. See
http://php.net/manual/en/function.parse-url.php



3. Why it is better to return invalid hostname (# and / are invalid
characters, current parse_url() version) instead of invalid query or
fragment (patched parse_url() version) ?





@tokul at users dot sourceforge dot net

Checked





My arguments for the patch acceptance are as follows:



1. parse_url() documentation's Return Values section clearly states
that query and fragment component starts after ? and # character
respectively.



2. I don't know any specification that allows # and ? in the
hostnames (someone knows ?) but I know at least RFC3986 (unfortunately I
am working with) that allows / character in both query and fragment
parts. See http://tools.ietf.org/html/rfc3986.html#section-3.4 and
http://tools.ietf.org/html/rfc3986.html#section-3.5



3. It has been already stated (although different content) that
parse_url() parses URLs according to RFC3986. See
http://bugs.php.net/bug.php?id=50484. May be Adam Harvey knows more ?


[2011-04-03 14:10:58] tokul at users dot sourceforge dot net

Check url encoding documentation first.

http://en.wikipedia.org/wiki/Percent-encoding



Then fix your $url value. You use reserved character for other purpose.


[2011-03-24 15:46:33] tomas dot brastavicius at quantum dot lt

Description:

Attached patch fixes the issue.

Test script:
---
$url = 'http://www.example.com#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



$url = 'http://www.example.com?p=1/param';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://#fra/gment';

echo $url . \n;

var_dump(parse_url($url));



// No host, should return false

$url = 'http://?p=1/param';

echo $url . \n;

var_dump(parse_url($url));

Expected result:

http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [fragment]=

  string(9) fra/gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(15) www.example.com

  [query]=

  string(9) p=1/param

}

http://#fra/gment

bool(false)

http://?p=1/param

bool(false)

Actual result:
--
http://www.example.com#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com#fra

  [path]=

  string(6) /gment

}

http://www.example.com?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(19) www.example.com?p=1

  [path]=

  string(6) /param

}

http://#fra/gment

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) #fra

  [path]=

  string(6) /gment

}

http://?p=1/param

array(3) {

  [scheme]=

  string(4) http

  [host]=

  string(4) ?p=1

  [path]=

  string(6) /param

}






-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54369edit=1