Re: Markdown doesn't always generate XHTML

2008-03-17 Thread Ulf Ochsenfahrt

Rad Geek wrote:

Ulf Ochsenfahrt wrote:
 > And worse, people are not made aware of this fact.

Made aware of what? John Gruber's documentation is certainly quite 
explicit that Markdown allows for raw HTML; that's part of the point of 
Markdown, as opposed to other plaintext syntaxes that try to replace 
HTML entirely. If you expect it to be something it's not (e.g. a 
validating producer or a sanitizer) then you'll no doubt be 
disappointed, but I don't think it's fair to claim that Markdown 
implementers are the ones leading you to expect some other kind of 
behavior than what you get.


Apparently, I attribute a different meaning to the word 'explicit'. 
First of all, the Main page on daring fireball says:


> Markdown is a text-to-HTML conversion tool for web writers. Markdown
> allows you to write using an easy-to-read, easy-to-write plain text
> format, then convert it to structurally valid *XHTML* (or HTML).

(Emphasis mine.) That appears to be quite a strong statement if you ask me.

On the Syntax page, it says that it allows inline HTML. But it does not 
says that this is potentially dangerous. Ture, it doesn't say that the 
inlined HTML is sanity checked, either.


However, it does list HTML tag names, and it even goes so far as saying 
that markdown does process text inside span-level tags, so it must be 
aware of them to some extend at least.



I guess I should have researched more thoroughly before I started using 
markdown for that forum. But I politely disagree with you when you say 
that 'John Gruber's documentation is quite explicit' that markdown is 
dangerous (the word you chose was 'inappropriate') when used in this 
context.


Cheers,

-- Ulf
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown doesn't always generate XHTML

2008-03-15 Thread Rad Geek

Ulf Ochsenfahrt wrote:

Yes, there are situations where all document authors are trusted 
(authentication isn't trust though), but the fact remains that this 
makes markdown completely unusable for anything else.


Ulf,

No, it doesn't. All it does is make Markdown *alone* inappropriate for 
content generated by untrusted users. But that shouldn't be surprising. 
Markdown is designed to work as a preprocessor, not as an alternative to 
HTML or as a sanitizer. If you need an HTML sanitizer, there are lots of 
them available, and there should be nothing stopping you from using 
Markdown in order to generate the HTML and then an appropriate second 
tool to sanitize it:


$body = Markdown($source);
$body = WhitelistBasedFilter($body);

In fact that's precisely what a lot of Markdown consumers (e.g. 
WordPress with PHP Markdown turned on for comments) do.


> And worse, people are not made aware of this fact.

Made aware of what? John Gruber's documentation is certainly quite 
explicit that Markdown allows for raw HTML; that's part of the point of 
Markdown, as opposed to other plaintext syntaxes that try to replace 
HTML entirely. If you expect it to be something it's not (e.g. a 
validating producer or a sanitizer) then you'll no doubt be 
disappointed, but I don't think it's fair to claim that Markdown 
implementers are the ones leading you to expect some other kind of 
behavior than what you get.


-C


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown doesn't always generate XHTML

2008-03-15 Thread Andrea Censi
On Sat, Mar 15, 2008 at 3:49 AM, Ulf Ochsenfahrt <[EMAIL PROTECTED]> wrote:
> Waylan Limberg wrote:
>  > Regarding the security issues, I understand your concerns, but there
>  > are some situations were all document authors are trusted
>  > (authenticated) users and have a legitimate need for that feature. We
>  > can't cut them off for everyone else. However, I know that
>  > Python-Markdown has an option to not allow any html in a document
>  > (this "safe_mode" can be set to either replace with a customizable
>  > message, remove completely, or escape the html). Of course, to stay in
>  > line with the Markdown standard, it is off by default, but very easy
>  > to turn on in your code. Other implementations may offer a similar
>  > option.
>
>  Yes, there are situations where all document authors are trusted
>  (authentication isn't trust though), but the fact remains that this
>  makes markdown completely unusable for anything else. And worse, people
>  are not made aware of this fact. I only encountered this by coincidence,
>  because one of my users entered what looked like html tags into the forum.
>
>  In summary:
>  Markdown wasn't designed to handle this situation. Some implementations
>  provide a 'safe mode' which aims to filter the code either before or
>  after markdown conversion.
>
>
>  Markdownj (Java, which I've been using) doesn't provide such an option.
>
>  Markdown.pl doesn't provide such an option.
>
>  Nanoki tries to, and fails (see related mail by Michel Fortin) on:
>
>   alert("Hello world!")
>  
>
>  PHP Markdown has something like this, and it has to be enabled in the
>  source (?). It fails when no_markup=true and no_entities=false on:
>  

You see, Maruku is used inside Jacques Distler's math-enabled branch
of Instiki [1] which outputs well-formed XHTML + MathML + SVG. You
can't really leave anything to chance. If there is only one error
somewhere, the document does not validate and therefore it does not
render.

Maruku's treatment of raw XML is that it requires it to be well-formed
XML, with some user-friendly exceptions inspired by HTML (user doesn't
have to close ,, etc.).
If it isn't well formed,  it triggers an error (nicely displayed on
stderr, or intercepted by API). Parsing goes on, but for convenience
it outputs the error in the document (see above; can be hidden by
CSS).

Jacques also did some work on sanitizing the XHTML document, but this
logically happens after Maruku.

[1]: http://golem.ph.utexas.edu/instiki/show/HomePage

-- 
Andrea Censi
PhD student, Control & Dynamical Systems, Caltech
http://www.cds.caltech.edu/~andrea/
 "Life is too important to be taken seriously" (Oscar Wilde)
___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Javascript in URLs (was: Markdown doesn't always generate XHTML)

2008-03-15 Thread Michel Fortin

Le 2008-03-15 à 0:39, Waylan Limberg a écrit :


On Fri, Mar 14, 2008 at 11:22 PM, Michel Fortin


PHP Markdown also has a no-markup mode which would filter script tags
and any other HTML tags. But this doesn't prevent anyone from
inserting their own script on the page. Do you know you can inject a
script in a URL? Guess what this does:

[link](javascript:alert%28'Hello%20world!'%29)


This is a good point, and something I hadn't thought about myself. I
would think that markdown should *not* allow that regardless of any
safe/no-markup/whatever-you-call-it mode. If someone legitimately
wants javascript in their links/images/etc then they should be writing
raw html. What do you think?


Well if you want your "safe" mode to be really safe, then sure you  
should not allow `javascript:` URIs indeed.


But in general I believe Markdown should work with any URI. Markdown  
is a mean of writing web documents of all kinds, not only content from  
external untrusted sources, and there are many legitimate reasons one  
would want to write a `javascript:` URI.


Why would you want a "non-safe" Markdown to disallow such URIs in its  
link syntax if we're going to be able to add them using HTML tags  
anyway?




Of course, then how do we do that? Some possabilites I came up with
without much thought:

1. Trunicate a url at "javascript:"
2. Completely remove the entire url (perhaps replace with blank  
string or "#")

3. Leave the markup for the entire link as plan text (in other words -
its not considered a match)
4. Do some kind of escaping (not sure what at this point) and leave it
in the url


Whatever you do, you first have to detect script URIs, all of them;  
this is no trivial matters. Most of these will run a script in IE or  
some other browser (based on the [XSS cheat sheet][1]):


[link](vbscript:msgbox%28%22Hello%20world!%22%29)
[link](livescript:alert%28'Hello%20world!'%29)
[link](mocha:[code])
[link](jAvAsCrIpT:alert%28'Hello%20world!'%29)
[link](ja vas cr ipt:alert%28'Hello%20world!'%29)
[link](ja vas cr ipt:alert%28'Hello%20world!'%29)
[link](ja vas cr ipt:alert%28'Hello%20world!'%29)
[link](ja%09 %0Avas cr
ipt:alert%28'Hello 
%20world!'%29)

[link](ja%20vas%20cr%20ipt:alert%28'Hello%20world!'%29)
[link](live%20script:alert%28'Hello%20world!'%29)

I can't claim this is an exhaustive list, nor that they're all going  
to work, but it should give an idea of the problem at hand.


I think blacklisting known dangerous schemes is always going to leave  
holes. A better approach is to have a white list of known "safe" URI  
schemes and disallow any scheme not in that list. But would be utterly  
restrictive for any "non-safe" Markdown.


Security filters already exist to do that (like kses); I'd say it's  
much simpler *and* safer to use such a specialized filter on  
Markdown's output than trying to come with our own integrated within  
Markdown.


 [1]: http://ha.ckers.org/xss.html


Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/


___
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss


Re: Markdown doesn't always generate XHTML

2008-03-15 Thread Ulf Ochsenfahrt

Waylan Limberg wrote:

Regarding the security issues, I understand your concerns, but there
are some situations were all document authors are trusted
(authenticated) users and have a legitimate need for that feature. We
can't cut them off for everyone else. However, I know that
Python-Markdown has an option to not allow any html in a document
(this "safe_mode" can be set to either replace with a customizable
message, remove completely, or escape the html). Of course, to stay in
line with the Markdown standard, it is off by default, but very easy
to turn on in your code. Other implementations may offer a similar
option.


Yes, there are situations where all document authors are trusted 
(authentication isn't trust though), but the fact remains that this 
makes markdown completely unusable for anything else. And worse, people 
are not made aware of this fact. I only encountered this by coincidence, 
because one of my users entered what looked like html tags into the forum.


In summary:
Markdown wasn't designed to handle this situation. Some implementations 
provide a 'safe mode' which aims to filter the code either before or 
after markdown conversion.



Markdownj (Java, which I've been using) doesn't provide such an option.

Markdown.pl doesn't provide such an option.

Nanoki tries to, and fails (see related mail by Michel Fortin) on:


PHP Markdown has something like this, and it has to be enabled in the source (?). It fails when no_markup=true and no_entities=false on: