Req #39078 [Com]: Plus sign in URL arg received as space

techlivezheng at gmail dot com Wed, 22 Feb 2012 21:45:59 -0800

Edit report at https://bugs.php.net/bug.php?id=39078&edit=1


 ID:                 39078
 Comment by:         techlivezheng at gmail dot com
 Reported by:        main at springtimesoftware dot com
 Summary:            Plus sign in URL arg received as space
 Status:             Not a bug
 Type:               Feature/Change Request
 Package:            *General Issues
 Operating System:   Windows XP
 PHP Version:        5.1.6
 Block user comment: N
 Private report:     N

 New Comment:

My fault, this is accturally not a bug. There is no need to use rawurlencode, 
otherwise, it will cause " + " become "+++"ã

The value contained "+" in both $_GET and $_POST must have been decoded before 
passed to php, and then it has been decoded by url_decode again in php leading 
"+" become " "ã

Apache may be able to do that, one possiable cause is mod_rewrite module. 
Because everything must be decoded before mod_rewrite to work, after that, it 
doesn't encode again. 

This is what exactly happend.

" + " --------> "+%2B+" --------> " + " --------> "   "

       apache          mod_rewrite         php

Use "B" FLAG for mod_rewrite can fix this, see 
http://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_b


Previous Comments:
------------------------------------------------------------------------
[2012-02-23 02:33:27] techlivezheng at gmail dot com

Please use rawurldecode instead of urldecode to process $_GET value.

------------------------------------------------------------------------
[2010-10-27 17:28:36] [email protected]

Not a bug; see http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

------------------------------------------------------------------------
[2009-10-15 02:06:05] yolcoyama at gmail dot com

Since I encountered the same problem in php,
I wondered the cause of bug is really the php.
Chosing another script language (python) to attest,
in python (cgi), following code with query "q=c++" yields output of: {'q': ['c  
']}.
This shows that plus-sign is replaced with blank space independently on 
language (at least not only in php).

I found a solution (not fundamental) to receive query arithmetic characters
as raw string: rawurldecode(urlencode($whatever_qs))

It behaved as if blank space is restored to plus-sign (or other arithmetics 
sign).

* index.py
#!/usr/bin/python
import cgi,os
print "Content-Type: text/plain; charset=utf-8"
print
print cgi.parse_qs(os.environ['QUERY_STRING'])

Shinobu Y.

------------------------------------------------------------------------
[2009-10-06 17:05:38] toby dot walsh at fxhome dot com

I believe derick probably meant to link to rfc 2396

http://www.ietf.org/rfc/rfc2396.txt

It says...

----
Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved
   purpose.  If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","
----

notice the "+" symbol is now in the reserved list.

This issue is confusing because the old rfc did indeed say that the "+" symbol 
did not need to be encoded. The new rfc 2396 actually draws attention to this 
change.

----
G.2. Modifications from both RFC 1738 and RFC 1808

Changed to URI syntax instead of just URL.

Confusion regarding the terms "character encoding", the URI
"character set", and the escaping of characters with %<hex><hex>
equivalents has (hopefully) been reduced.  Many of the BNF rule names
regarding the character sets have been changed to more accurately
describe their purpose and to encompass all "characters" rather than
just US-ASCII octets.  Unless otherwise noted here, these
modifications do not affect the URI syntax.

Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters
as if URI-interpreting software were limited to a single set of
characters with a reserved purpose (i.e., as meaning something other
than the data to which the characters correspond), and that this set
was fixed by the URI scheme.  However, this has not been true in
practice; any character that is interpreted differently when it is
escaped is, in effect, reserved.  Furthermore, the interpreting
engine on a HTTP server is often dependent on the resource, not just
the URI scheme.  The description of reserved characters has been
changed accordingly.

The plus "+", dollar "$", and comma "," characters have been added to
those in the "reserved" set, since they are treated as reserved
within the query component.
----

So I believe PHP is correct to decode the "+" as a " ".

You should be using the javascript function encodeURIComponent() to  escape 
your strings. encodeURIComponent will encode "+" chars properly. Here's a good 
page which shows the difference between javascripts encoding functions.

http://xkr.us/articles/javascript/encode-compare/

------------------------------------------------------------------------
[2009-08-10 15:02:31] boriss at web dot de

I'd like to see an option to change runtime behavior of PHP, too. Even if the 
Javascript function escape() would work a user could still enter an URL with a 
query string himself. Imagine you have a search engine and someone enters an 
URL with ?query=C++. If you use $_GET['query'] you just don't know if someone 
searches for "C++" or "C  ".

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=39078


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=39078&edit=1

Req #39078 [Com]: Plus sign in URL arg received as space

Reply via email to