Re: Markdown doesn't always generate XHTML
Rad Geek wrote: Ulf Ochsenfahrt wrote: > And worse, people are not made aware of this fact. Made aware of what? John Gruber's documentation is certainly quite explicit that Markdown allows for raw HTML; that's part of the point of Markdown, as opposed to other plaintext syntaxes that try to replace HTML entirely. If you expect it to be something it's not (e.g. a validating producer or a sanitizer) then you'll no doubt be disappointed, but I don't think it's fair to claim that Markdown implementers are the ones leading you to expect some other kind of behavior than what you get. Apparently, I attribute a different meaning to the word 'explicit'. First of all, the Main page on daring fireball says: > Markdown is a text-to-HTML conversion tool for web writers. Markdown > allows you to write using an easy-to-read, easy-to-write plain text > format, then convert it to structurally valid *XHTML* (or HTML). (Emphasis mine.) That appears to be quite a strong statement if you ask me. On the Syntax page, it says that it allows inline HTML. But it does not says that this is potentially dangerous. Ture, it doesn't say that the inlined HTML is sanity checked, either. However, it does list HTML tag names, and it even goes so far as saying that markdown does process text inside span-level tags, so it must be aware of them to some extend at least. I guess I should have researched more thoroughly before I started using markdown for that forum. But I politely disagree with you when you say that 'John Gruber's documentation is quite explicit' that markdown is dangerous (the word you chose was 'inappropriate') when used in this context. Cheers, -- Ulf ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown doesn't always generate XHTML
Ulf Ochsenfahrt wrote: Yes, there are situations where all document authors are trusted (authentication isn't trust though), but the fact remains that this makes markdown completely unusable for anything else. Ulf, No, it doesn't. All it does is make Markdown *alone* inappropriate for content generated by untrusted users. But that shouldn't be surprising. Markdown is designed to work as a preprocessor, not as an alternative to HTML or as a sanitizer. If you need an HTML sanitizer, there are lots of them available, and there should be nothing stopping you from using Markdown in order to generate the HTML and then an appropriate second tool to sanitize it: $body = Markdown($source); $body = WhitelistBasedFilter($body); In fact that's precisely what a lot of Markdown consumers (e.g. WordPress with PHP Markdown turned on for comments) do. > And worse, people are not made aware of this fact. Made aware of what? John Gruber's documentation is certainly quite explicit that Markdown allows for raw HTML; that's part of the point of Markdown, as opposed to other plaintext syntaxes that try to replace HTML entirely. If you expect it to be something it's not (e.g. a validating producer or a sanitizer) then you'll no doubt be disappointed, but I don't think it's fair to claim that Markdown implementers are the ones leading you to expect some other kind of behavior than what you get. -C ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown doesn't always generate XHTML
On Sat, Mar 15, 2008 at 3:49 AM, Ulf Ochsenfahrt <[EMAIL PROTECTED]> wrote: > Waylan Limberg wrote: > > Regarding the security issues, I understand your concerns, but there > > are some situations were all document authors are trusted > > (authenticated) users and have a legitimate need for that feature. We > > can't cut them off for everyone else. However, I know that > > Python-Markdown has an option to not allow any html in a document > > (this "safe_mode" can be set to either replace with a customizable > > message, remove completely, or escape the html). Of course, to stay in > > line with the Markdown standard, it is off by default, but very easy > > to turn on in your code. Other implementations may offer a similar > > option. > > Yes, there are situations where all document authors are trusted > (authentication isn't trust though), but the fact remains that this > makes markdown completely unusable for anything else. And worse, people > are not made aware of this fact. I only encountered this by coincidence, > because one of my users entered what looked like html tags into the forum. > > In summary: > Markdown wasn't designed to handle this situation. Some implementations > provide a 'safe mode' which aims to filter the code either before or > after markdown conversion. > > > Markdownj (Java, which I've been using) doesn't provide such an option. > > Markdown.pl doesn't provide such an option. > > Nanoki tries to, and fails (see related mail by Michel Fortin) on: > > alert("Hello world!") > > > PHP Markdown has something like this, and it has to be enabled in the > source (?). It fails when no_markup=true and no_entities=false on: > You see, Maruku is used inside Jacques Distler's math-enabled branch of Instiki [1] which outputs well-formed XHTML + MathML + SVG. You can't really leave anything to chance. If there is only one error somewhere, the document does not validate and therefore it does not render. Maruku's treatment of raw XML is that it requires it to be well-formed XML, with some user-friendly exceptions inspired by HTML (user doesn't have to close ,, etc.). If it isn't well formed, it triggers an error (nicely displayed on stderr, or intercepted by API). Parsing goes on, but for convenience it outputs the error in the document (see above; can be hidden by CSS). Jacques also did some work on sanitizing the XHTML document, but this logically happens after Maruku. [1]: http://golem.ph.utexas.edu/instiki/show/HomePage -- Andrea Censi PhD student, Control & Dynamical Systems, Caltech http://www.cds.caltech.edu/~andrea/ "Life is too important to be taken seriously" (Oscar Wilde) ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Javascript in URLs (was: Markdown doesn't always generate XHTML)
Le 2008-03-15 à 0:39, Waylan Limberg a écrit : On Fri, Mar 14, 2008 at 11:22 PM, Michel Fortin PHP Markdown also has a no-markup mode which would filter script tags and any other HTML tags. But this doesn't prevent anyone from inserting their own script on the page. Do you know you can inject a script in a URL? Guess what this does: [link](javascript:alert%28'Hello%20world!'%29) This is a good point, and something I hadn't thought about myself. I would think that markdown should *not* allow that regardless of any safe/no-markup/whatever-you-call-it mode. If someone legitimately wants javascript in their links/images/etc then they should be writing raw html. What do you think? Well if you want your "safe" mode to be really safe, then sure you should not allow `javascript:` URIs indeed. But in general I believe Markdown should work with any URI. Markdown is a mean of writing web documents of all kinds, not only content from external untrusted sources, and there are many legitimate reasons one would want to write a `javascript:` URI. Why would you want a "non-safe" Markdown to disallow such URIs in its link syntax if we're going to be able to add them using HTML tags anyway? Of course, then how do we do that? Some possabilites I came up with without much thought: 1. Trunicate a url at "javascript:" 2. Completely remove the entire url (perhaps replace with blank string or "#") 3. Leave the markup for the entire link as plan text (in other words - its not considered a match) 4. Do some kind of escaping (not sure what at this point) and leave it in the url Whatever you do, you first have to detect script URIs, all of them; this is no trivial matters. Most of these will run a script in IE or some other browser (based on the [XSS cheat sheet][1]): [link](vbscript:msgbox%28%22Hello%20world!%22%29) [link](livescript:alert%28'Hello%20world!'%29) [link](mocha:[code]) [link](jAvAsCrIpT:alert%28'Hello%20world!'%29) [link](ja vas cr ipt:alert%28'Hello%20world!'%29) [link](ja vas cr ipt:alert%28'Hello%20world!'%29) [link](ja vas cr ipt:alert%28'Hello%20world!'%29) [link](ja%09 %0Avas cr ipt:alert%28'Hello %20world!'%29) [link](ja%20vas%20cr%20ipt:alert%28'Hello%20world!'%29) [link](live%20script:alert%28'Hello%20world!'%29) I can't claim this is an exhaustive list, nor that they're all going to work, but it should give an idea of the problem at hand. I think blacklisting known dangerous schemes is always going to leave holes. A better approach is to have a white list of known "safe" URI schemes and disallow any scheme not in that list. But would be utterly restrictive for any "non-safe" Markdown. Security filters already exist to do that (like kses); I'd say it's much simpler *and* safer to use such a specialized filter on Markdown's output than trying to come with our own integrated within Markdown. [1]: http://ha.ckers.org/xss.html Michel Fortin [EMAIL PROTECTED] http://michelf.com/ ___ Markdown-Discuss mailing list Markdown-Discuss@six.pairlist.net http://six.pairlist.net/mailman/listinfo/markdown-discuss
Re: Markdown doesn't always generate XHTML
Waylan Limberg wrote: Regarding the security issues, I understand your concerns, but there are some situations were all document authors are trusted (authenticated) users and have a legitimate need for that feature. We can't cut them off for everyone else. However, I know that Python-Markdown has an option to not allow any html in a document (this "safe_mode" can be set to either replace with a customizable message, remove completely, or escape the html). Of course, to stay in line with the Markdown standard, it is off by default, but very easy to turn on in your code. Other implementations may offer a similar option. Yes, there are situations where all document authors are trusted (authentication isn't trust though), but the fact remains that this makes markdown completely unusable for anything else. And worse, people are not made aware of this fact. I only encountered this by coincidence, because one of my users entered what looked like html tags into the forum. In summary: Markdown wasn't designed to handle this situation. Some implementations provide a 'safe mode' which aims to filter the code either before or after markdown conversion. Markdownj (Java, which I've been using) doesn't provide such an option. Markdown.pl doesn't provide such an option. Nanoki tries to, and fails (see related mail by Michel Fortin) on:PHP Markdown has something like this, and it has to be enabled in the source (?). It fails when no_markup=true and no_entities=false on: