HTML entities in URLs and urlencoding

2008-03-31 Thread Waylan Limberg
We recently received the following bug report for the python-markdown
implementation:

 The  are escaped in URLs.

 An example:
 [Link](http://www.site.com/?param1=value1param2=value1)

 Should output:
 a href=http://www.site.com/?param1=value1param2=value1;Link/a

 Currently outputs:
 a href=http://www.site.com/?param1=value1amp;param2=value1;Link/a

 So the  must not be escaped!

A fix is easy, but it occurred to me that perhaps links should be
urlencoded -- at least some chars should be. Specifically the unsafe
chars listed in RFC 1738 [1]. The reserved chars probably should too
when not used in their approved manner (i.e.: A colon should only be
allowed after the scheme (http://) or in the location
(usr:[EMAIL PROTECTED]:port) but should be encoded anywhere else). Of course,
that involves extra work. So I went to check what other
implementations do [2] and discovered that every one escapes with html
entities. Is there something I'm missing or is this a bug? As far as I
can tell, the amp; breaks the query string.

[1]: http://www.rfc-editor.org/rfc/rfc1738.txt
[2]: 
http://babelmark.bobtfish.net/?markdown=%5BLink%5D%28http%3A%2F%2Fwww.site.com%2F%3Fparam1%3Dvalue1%26param2%3Dvalue1%29normalize=onsrc=1dest=2
-- 

Waylan Limberg
[EMAIL PROTECTED]
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: HTML entities in URLs and urlencoding

2008-03-31 Thread Milian Wolff
Am Dienstag, 1. April 2008 schrieb Waylan Limberg:
 We recently received the following bug report for the python-markdown

 implementation:
  The  are escaped in URLs.
 
  An example:
  [Link](http://www.site.com/?param1=value1param2=value1)
 
  Should output:
  a href=http://www.site.com/?param1=value1param2=value1;Link/a

No it shouldn't. This is invalid (x)HTML.

  Currently outputs:
  a
  href=http://www.site.com/?param1=value1amp;param2=value1;Link/a

Which is valid (x)HTML!

  So the  must not be escaped!

It must! See also http://htmlhelp.com/tools/validator/problems.html#amp



-- 
Milian Wolff
http://milianw.de
OpenPGP key: CD1D1393


signature.asc
Description: This is a digitally signed message part.
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: HTML entities in URLs and urlencoding

2008-03-31 Thread Waylan Limberg
On Mon, Mar 31, 2008 at 9:53 PM, Milian Wolff [EMAIL PROTECTED] wrote:
[snip]

So the  must not be escaped!

  It must! See also http://htmlhelp.com/tools/validator/problems.html#amp


Doh'! I knew that. Of course, I just typed a url with an amp; in my
address bar and it didn't work. Now I feel like an idiot. Thanks for
bringing me back.


-- 

Waylan Limberg
[EMAIL PROTECTED]
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


On ampersands in query strings (was: HTML entities in URLs and urlencoding)

2008-03-31 Thread Aristotle Pagaltzis
* Waylan Limberg [EMAIL PROTECTED] [2008-04-01 03:50]:
 As far as I can tell, the amp; breaks the query string.

No, it doesn’t, as you found out.

However, on a tangential note: if you write web apps, *please*
make sure that you support the semicolon as a query parameter
separator as well as the ampersand:

http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2

More importantly, please **please** make sure that the URIs your
code generates use semicolons rather than ampersands. Semicolons
need not be escaped in HTML and XML, which makes copy-pasting
users much less likely to produce invalid markup regardless of
the context they’re working in.

Even though this W3C recommendation is over a decade old, use of
ampersands in query strings persists. (In fact, PHP not only does
not emit URIs with semicolon-separated query strings, by default
it cannot even parse them! You need to set an unbreak-me config
option to make it recognise the semicolon.)

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss