HTML entities in URLs and urlencoding
We recently received the following bug report for the python-markdown implementation: The are escaped in URLs. An example: [Link](http://www.site.com/?param1=value1param2=value1) Should output: a href=http://www.site.com/?param1=value1param2=value1;Link/a Currently outputs: a href=http://www.site.com/?param1=value1amp;param2=value1;Link/a So the must not be escaped! A fix is easy, but it occurred to me that perhaps links should be urlencoded -- at least some chars should be. Specifically the unsafe chars listed in RFC 1738 [1]. The reserved chars probably should too when not used in their approved manner (i.e.: A colon should only be allowed after the scheme (http://) or in the location (usr:[EMAIL PROTECTED]:port) but should be encoded anywhere else). Of course, that involves extra work. So I went to check what other implementations do [2] and discovered that every one escapes with html entities. Is there something I'm missing or is this a bug? As far as I can tell, the amp; breaks the query string. [1]: http://www.rfc-editor.org/rfc/rfc1738.txt [2]: http://babelmark.bobtfish.net/?markdown=%5BLink%5D%28http%3A%2F%2Fwww.site.com%2F%3Fparam1%3Dvalue1%26param2%3Dvalue1%29normalize=onsrc=1dest=2 -- Waylan Limberg [EMAIL PROTECTED] ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: HTML entities in URLs and urlencoding
Am Dienstag, 1. April 2008 schrieb Waylan Limberg: We recently received the following bug report for the python-markdown implementation: The are escaped in URLs. An example: [Link](http://www.site.com/?param1=value1param2=value1) Should output: a href=http://www.site.com/?param1=value1param2=value1;Link/a No it shouldn't. This is invalid (x)HTML. Currently outputs: a href=http://www.site.com/?param1=value1amp;param2=value1;Link/a Which is valid (x)HTML! So the must not be escaped! It must! See also http://htmlhelp.com/tools/validator/problems.html#amp -- Milian Wolff http://milianw.de OpenPGP key: CD1D1393 signature.asc Description: This is a digitally signed message part. ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: HTML entities in URLs and urlencoding
On Mon, Mar 31, 2008 at 9:53 PM, Milian Wolff [EMAIL PROTECTED] wrote: [snip] So the must not be escaped! It must! See also http://htmlhelp.com/tools/validator/problems.html#amp Doh'! I knew that. Of course, I just typed a url with an amp; in my address bar and it didn't work. Now I feel like an idiot. Thanks for bringing me back. -- Waylan Limberg [EMAIL PROTECTED] ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
On ampersands in query strings (was: HTML entities in URLs and urlencoding)
* Waylan Limberg [EMAIL PROTECTED] [2008-04-01 03:50]: As far as I can tell, the amp; breaks the query string. No, it doesn’t, as you found out. However, on a tangential note: if you write web apps, *please* make sure that you support the semicolon as a query parameter separator as well as the ampersand: http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2 More importantly, please **please** make sure that the URIs your code generates use semicolons rather than ampersands. Semicolons need not be escaped in HTML and XML, which makes copy-pasting users much less likely to produce invalid markup regardless of the context they’re working in. Even though this W3C recommendation is over a decade old, use of ampersands in query strings persists. (In fact, PHP not only does not emit URIs with semicolon-separated query strings, by default it cannot even parse them! You need to set an unbreak-me config option to make it recognise the semicolon.) Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss