OK, in which case it must be some relatively recent change, since an
unescaped & in the QUERY_STRING was a valid separator. A pointer to the relevant RFC would be nice so we can add that to the URL that started this thread.



Here?

http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2


To summarise & as a separator in a QUERY_STRING is valid, however when you represent a URI within HTML, the & has to be escaped to & same as for any other occurances of &.


So you type (or the browser still generates)

 my.pl?foo=1&bar=2

in the address bar of your browser, which is what comes through the request. This is still perfectly valid.

In the source of a HTML doc you've got to put it as

 <a href="my.pl?foo=1&amp;bar=2">Click &amp; Go</a>


I've not tested, but I think putting my.pl?foo=1&amp;bar=2 in the browser actually generates foo=1, amp=undef, and bar=2...



So should we commit something like the following? Or should we just nuke the whole section altogether?

Index: src/docs/tutorials/client/browserbugs/browserbugs.pod
===================================================================
--- src/docs/tutorials/client/browserbugs/browserbugs.pod (revision 164401)
+++ src/docs/tutorials/client/browserbugs/browserbugs.pod (working copy)
@@ -37,6 +37,12 @@


=head1 Preventing QUERY_STRING from getting corrupted because of &entity key names

+This entry is now irrelevant since you must not use C<&> to separate
+fields in the C<QUERY_STRING> as explained here:
+http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
+If for some reason, you still want to do that, then make sure to read
+the rest of this section.
+
 In a URL which contains a query string, if the string has multiple
 parts separated by ampersands and it contains a key named "reg", for
 example C<http://example.com/foo.pl?foo=bar&reg=foobar>, then some

I'd suggest rewording the "answer" to something like:

----------------

In a URL which contains a query string, if the string has multiple parts separated by ampersands and it contains a key named "reg", for example http://example.com/foo.pl?foo=bar&reg=foobar, then some browsers will interpret &reg as an SGML entity and encode it as &reg;. This will result in a corrupted QUERY_STRING.

The behaviour is actually correct, and the problem is that you have not correctly encoded your ampersands into entities in your HTML. What you should have in the source of your HTML is http://example.com/foo.pl?foo=bar&amp;reg=foobar.

A much better, and recommended solution is to separate parameter pairs with ; instead of &. CGI.pm, Apache::Request and $r->args() support a semicolon instead of an ampersand as a separator. So your URI should look like this: http://example.com/foo.pl?foo=bar;reg=foobar.
Note that this is only an issue within HTML documents when you are building your own URLs with query strings. It is not a problem when the URL is the result of submitting a form because the browsers have to get that right. It is also not a problem when typing URLs directly into the address bar of the browser.


Reference: http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

-------------------



Carl










Reply via email to