Edit report at https://bugs.php.net/bug.php?id=39078&edit=1
ID: 39078 Comment by: techlivezheng at gmail dot com Reported by: main at springtimesoftware dot com Summary: Plus sign in URL arg received as space Status: Not a bug Type: Feature/Change Request Package: *General Issues Operating System: Windows XP PHP Version: 5.1.6 Block user comment: N Private report: N New Comment: Please use rawurldecode instead of urldecode to process $_GET value. Previous Comments: ------------------------------------------------------------------------ [2010-10-27 17:28:36] cataphr...@php.net Not a bug; see http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 ------------------------------------------------------------------------ [2009-10-15 02:06:05] yolcoyama at gmail dot com Since I encountered the same problem in php, I wondered the cause of bug is really the php. Chosing another script language (python) to attest, in python (cgi), following code with query "q=c++" yields output of: {'q': ['c ']}. This shows that plus-sign is replaced with blank space independently on language (at least not only in php). I found a solution (not fundamental) to receive query arithmetic characters as raw string: rawurldecode(urlencode($whatever_qs)) It behaved as if blank space is restored to plus-sign (or other arithmetics sign). * index.py #!/usr/bin/python import cgi,os print "Content-Type: text/plain; charset=utf-8" print print cgi.parse_qs(os.environ['QUERY_STRING']) Shinobu Y. ------------------------------------------------------------------------ [2009-10-06 17:05:38] toby dot walsh at fxhome dot com I believe derick probably meant to link to rfc 2396 http://www.ietf.org/rfc/rfc2396.txt It says... ---- Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," ---- notice the "+" symbol is now in the reserved list. This issue is confusing because the old rfc did indeed say that the "+" symbol did not need to be encoded. The new rfc 2396 actually draws attention to this change. ---- G.2. Modifications from both RFC 1738 and RFC 1808 Changed to URI syntax instead of just URL. Confusion regarding the terms "character encoding", the URI "character set", and the escaping of characters with %<hex><hex> equivalents has (hopefully) been reduced. Many of the BNF rule names regarding the character sets have been changed to more accurately describe their purpose and to encompass all "characters" rather than just US-ASCII octets. Unless otherwise noted here, these modifications do not affect the URI syntax. Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters as if URI-interpreting software were limited to a single set of characters with a reserved purpose (i.e., as meaning something other than the data to which the characters correspond), and that this set was fixed by the URI scheme. However, this has not been true in practice; any character that is interpreted differently when it is escaped is, in effect, reserved. Furthermore, the interpreting engine on a HTTP server is often dependent on the resource, not just the URI scheme. The description of reserved characters has been changed accordingly. The plus "+", dollar "$", and comma "," characters have been added to those in the "reserved" set, since they are treated as reserved within the query component. ---- So I believe PHP is correct to decode the "+" as a " ". You should be using the javascript function encodeURIComponent() to escape your strings. encodeURIComponent will encode "+" chars properly. Here's a good page which shows the difference between javascripts encoding functions. http://xkr.us/articles/javascript/encode-compare/ ------------------------------------------------------------------------ [2009-08-10 15:02:31] boriss at web dot de I'd like to see an option to change runtime behavior of PHP, too. Even if the Javascript function escape() would work a user could still enter an URL with a query string himself. Imagine you have a search engine and someone enters an URL with ?query=C++. If you use $_GET['query'] you just don't know if someone searches for "C++" or "C ". ------------------------------------------------------------------------ [2008-07-16 20:18:49] edA-qa at disemia dot com I would also like to add that decoding '+' to a space is just plain wrong. I got burnt again by this when using base64_encode, which should produce URL safe strings, but for PHP it doesn't, since it may include the '+'. A global option to use the proper rawurldecode would be great. Otherwise I'm stuck, like many developers, in reparsing the query string/url manually and unable to use _POST and _GET. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=39078 -- Edit this bug report at https://bugs.php.net/bug.php?id=39078&edit=1