#21226 [Bgs-Opn]: function parse_url() fails

2002-12-30 Thread jmcastagnetto
 ID:   21226
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Bogus
+Status:   Open
 Bug Type: *URL Functions
 Operating System: w2000
 PHP Version:  4.3.0
 Assigned To:  iliaa
 New Comment:

Reopening this bug. A closer look at RFC 2396 indicates that:





...  This generic URI syntax consists of a sequence of four main
components:





scheme://authoritypath?query ...


.


.. absoluteURI   = scheme : ( hier_part | opaque_part )





URI that are hierarchical in nature use the slash / character for


separating hierarchical components. ...


...


hier_part = ( net_path | abs_path ) [ ? query ]   


net_path  = // authority [ abs_path ]   


abs_path  = /  path_segments





URI that do not make use of the slash / character for separating


   hierarchical components are considered opaque by the generic URI


   parser.   





opaque_part   = uric_no_slash *uric   


uric_no_slash = unreserved | escaped | ; | ? | : | @ |


   | = | + | $ | ,


...





Later in section 3.3 of that RFC the syntax of the path component is
clarified. Similar clarification is made in section 3.2 on what is
considered as a correct authority component. 





Bottomline the $url given by the bug reporter is mostly conformant to
being a hierarchical URI in nature, although not the usual case. As
section 3.2 that deals w/ the authority component states that:





... The authority component is preceded by a double slash // and
is


   terminated by the next slash /, question-mark ?, or by the end
of


   the URI.  Within the authority component, the characters ;,
:,


   @, ?, and / are reserved. ...





And that is reinforced in the BNF syntax later in the RFC. Not sure if
all web servers will interpret correctly a URL w/o a path but w/ a
query part immediately after the authority part, in view of the fact
that the /' in the path is usually internally mapped by the server to
wherever the physical files are in the filesystem.





The following code works as expected:





$url =
http://user:[EMAIL PROTECTED]:8080/foo.php?bar=1boom=0;;


print_r(parse_url($url));





Giving as output:





Array


(


[scheme] = http


[host] = www.example.com


[port] = 8080


[user] = user


[pass] = passwd


[path] = /foo.php


[query] = bar=1boom=0


)





Tested w/ current CVS head on a RH Linux 6.1 machine:





$ php_cvs -v


PHP 4.4.0-dev (cli) (built: Dec 27 2002 14:00:56)


Copyright (c) 1997-2002 The PHP Group


Zend Engine v1.4.0, Copyright (c) 1998-2002 Zend Technologies





as well as 4.3.0 (on the same OS)





$ php -v


PHP 4.3.0 (cli) (built: Dec 29 2002 23:59:53)


Copyright (c) 1997-2002 The PHP Group


Zend Engine v1.3.0, Copyright (c) 1998-2002 Zend Technologies


Previous Comments:


[2002-12-28 09:10:25] [EMAIL PROTECTED]

Thank you for your works on my report, but I'm suprised you pass this
report as bogus since :
- 'port' was not number in my example, but it was only to be more
comprehensive. Warning is the same with a digit port.
- Same example was working successfully without warning in 4.2 and
previous.
- parse_url function manual doesn't tell about trailing slashes before
path or query part of url (I triple check ;-)
- RFC 2396 (Uniform Resource Identifiers (URI): Generic Syntax) doesn't
specify that you *MUST* have / after port part.

So if you consider these points, either you should modify parse_url
function or modify parse_url documentation. But please do not just pass
report as bogus!!!
At worst put it a 'closed'.

Thanks for your help.



[2002-12-28 00:39:43] [EMAIL PROTECTED]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Port can only be a numeric number from 0-9, although in reality the
port range is from 1-65535, clearly a non numeric port number is not
valid, hence invalidating the passed URL.
The 2nd example is also wrong, without the '/' between the port  the
rest of the request the code MUST assume that the following data is
part of the port, hence the URL is not valid once again.
This is NOT a bug.



[2002-12-27 19:21:59] [EMAIL PROTECTED]

Add / after end of port part is a good solution. Thanks.
Do you consider that it's a bug or parse_url is url RFC compliant ?



[2002-12-27 19:04:18] [EMAIL PROTECTED]

Seems to come from 'port' part of url.

If we consider this :

#21226 [Bgs-Opn]: function parse_url() fails

2002-12-30 Thread pollita
 ID:   21226
 Updated by:   [EMAIL PROTECTED]
 Reported By:  [EMAIL PROTECTED]
-Status:   Bogus
+Status:   Open
 Bug Type: *URL Functions
 Operating System: w2000
 PHP Version:  4.3.0
 Assigned To:  iliaa


Previous Comments:


[2002-12-30 02:31:30] [EMAIL PROTECTED]

Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions. 

Thank you for your interest in PHP.

There are two things wrong with your $url value:

1) portnumber is not a valid portnumber, this must be a number

2) There must be a / between the portnumber and ANYTHING following. 
This does also include the ?foo query string specified in your case
where the document requested is the directory index.



[2002-12-30 02:03:43] [EMAIL PROTECTED]

Reopening this bug. A closer look at RFC 2396 indicates that:





...  This generic URI syntax consists of a sequence of four main
components:





scheme://authoritypath?query ...


.


.. absoluteURI   = scheme : ( hier_part | opaque_part )





URI that are hierarchical in nature use the slash / character for


separating hierarchical components. ...


...


hier_part = ( net_path | abs_path ) [ ? query ]   


net_path  = // authority [ abs_path ]   


abs_path  = /  path_segments





URI that do not make use of the slash / character for separating


   hierarchical components are considered opaque by the generic URI


   parser.   





opaque_part   = uric_no_slash *uric   


uric_no_slash = unreserved | escaped | ; | ? | : | @ |


   | = | + | $ | ,


...





Later in section 3.3 of that RFC the syntax of the path component is
clarified. Similar clarification is made in section 3.2 on what is
considered as a correct authority component. 





Bottomline the $url given by the bug reporter is mostly conformant to
being a hierarchical URI in nature, although not the usual case. As
section 3.2 that deals w/ the authority component states that:





... The authority component is preceded by a double slash // and
is


   terminated by the next slash /, question-mark ?, or by the end
of


   the URI.  Within the authority component, the characters ;,
:,


   @, ?, and / are reserved. ...





And that is reinforced in the BNF syntax later in the RFC. Not sure if
all web servers will interpret correctly a URL w/o a path but w/ a
query part immediately after the authority part, in view of the fact
that the /' in the path is usually internally mapped by the server to
wherever the physical files are in the filesystem.





The following code works as expected:





$url =
http://user:[EMAIL PROTECTED]:8080/foo.php?bar=1boom=0;;


print_r(parse_url($url));





Giving as output:





Array


(


[scheme] = http


[host] = www.example.com


[port] = 8080


[user] = user


[pass] = passwd


[path] = /foo.php


[query] = bar=1boom=0


)





Tested w/ current CVS head on a RH Linux 6.1 machine:





$ php_cvs -v


PHP 4.4.0-dev (cli) (built: Dec 27 2002 14:00:56)


Copyright (c) 1997-2002 The PHP Group


Zend Engine v1.4.0, Copyright (c) 1998-2002 Zend Technologies





as well as 4.3.0 (on the same OS)





$ php -v


PHP 4.3.0 (cli) (built: Dec 29 2002 23:59:53)


Copyright (c) 1997-2002 The PHP Group


Zend Engine v1.3.0, Copyright (c) 1998-2002 Zend Technologies



[2002-12-28 09:10:25] [EMAIL PROTECTED]

Thank you for your works on my report, but I'm suprised you pass this
report as bogus since :
- 'port' was not number in my example, but it was only to be more
comprehensive. Warning is the same with a digit port.
- Same example was working successfully without warning in 4.2 and
previous.
- parse_url function manual doesn't tell about trailing slashes before
path or query part of url (I triple check ;-)
- RFC 2396 (Uniform Resource Identifiers (URI): Generic Syntax) doesn't
specify that you *MUST* have / after port part.

So if you consider these points, either you should modify parse_url
function or modify parse_url documentation. But please do not just pass
report as bogus!!!
At worst put it a 'closed'.

Thanks for your help.



[2002-12-28 00:39:43] [EMAIL PROTECTED]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Port can only be a numeric