Re: [Haskell-cafe] Re: the Network.URI parser

2008-05-28 Thread Peter Gammie

On 28/05/2008, at 12:28 PM, Miguel Mitrofanov wrote:

I am taking comments on a web forum from arbitrary people. The  
interpretation of the HTML occurs at the user's browser. A lot of  
people will be using outdated browsers (IE 5.5 / 6), ergo security  
(at the source) becomes my problem. I cannot force them to upgrade  
their browsers.


I think this is very wrong for two reasons. First of all, the more  
web sites care of old browsers, the later people will upgrade them,  
therefore preventing the progress in Web (though IE 5.5 is not THAT  
old and bad, so this argument is not so strong). In Russia we some  
times say that a user with an outdated browser is an EPTH (Evil  
Pinocchio To Himself, don't ask me about source of this term).


I am not encouraging people to stick with IE 5.5, I am trying to  
prevent them from being exploited when visiting my site. It is a good- 
faith-best-effort, not something I will formally prove.


Secondly, I don't think that filtering HTML coming from an arbitrary  
user is a good idea. HTML is not very human-readable and too complex  
to achieve real safety without lots of work. My suggestion is to use  
some home-grown wiki-like syntax - it's easier to enter (*bold*  
instead of bold), easier to read (and your users would  
sometimes read their comments before posting - to check  
correctness), and easier to process, since it can't have security  
holes you're not aware of.


Did you read my post? I validate the XHTML against a restricted  
variant of the XHTML 1.0 Strict DTD. I want to ensure that if it  
validates, it is "safe", as I explained before. I think the "style"  
attribute is unsafe, so I can remove it from the DTD. (We can simulate  
the effect of "style" by providing pre-made CSS classes and vetting  
the "class" attribute.) I am sure you can generalise from here.


As for some other kind of markup: if my users were sophisticated  
enough to use something else, then I would use it. The target audience  
is not very literate, let alone computer literate.



But you're right, we are off topic.


Sorry to reply to your post then, I couldn't resist. :-/

cheers
peter
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: the Network.URI parser

2008-05-27 Thread Miguel Mitrofanov
I am taking comments on a web forum from arbitrary people. The  
interpretation of the HTML occurs at the user's browser. A lot of  
people will be using outdated browsers (IE 5.5 / 6), ergo security  
(at the source) becomes my problem. I cannot force them to upgrade  
their browsers.


I think this is very wrong for two reasons. First of all, the more web  
sites care of old browsers, the later people will upgrade them,  
therefore preventing the progress in Web (though IE 5.5 is not THAT  
old and bad, so this argument is not so strong). In Russia we some  
times say that a user with an outdated browser is an EPTH (Evil  
Pinocchio To Himself, don't ask me about source of this term).


Secondly, I don't think that filtering HTML coming from an arbitrary  
user is a good idea. HTML is not very human-readable and too complex  
to achieve real safety without lots of work. My suggestion is to use  
some home-grown wiki-like syntax - it's easier to enter (*bold*  
instead of bold), easier to read (and your users would  
sometimes read their comments before posting - to check correctness),  
and easier to process, since it can't have security holes you're not  
aware of.


But you're right, we are off topic.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: the Network.URI parser

2008-05-27 Thread Peter Gammie

On 27/05/2008, at 6:08 PM, Neil Mitchell wrote:


It most certainly is a security flaw.


In the src of an img, yes, probably. In the href of a link, its a
completely valid thing to do - and one that I've done loads of times.
The URI is fine, its just the particular location that is dodgy.


Sure, but for other reasons (potential inaccessibility) I am quite  
happy to ban JavaScript from URIs. (Not all URIs, just the ones coming  
from untrusted users.)


whole pile of dodgy URIs. Most get culled (in my case) by the HaXml  
parser
and/or XHTML 1.0 Strict validation, and now I hope to eliminate the  
rest by

carefully handling the URIs.


I don't think that's possible. A URI can validly have javascript, and
can validly be a lot of things which are unsafe.


Sure, I now realise my notion of allowable URI goes beyond (is an  
additional restriction of) the RFC.


One way to show the URI is valid is to fetch what it is pointing to,  
and ensure it is an image or whatever.


On that topic, does anyone have any good advice for handling these  
things?


My advice is that you are targeting security at the wrong level. You
shouldn't be cleaning the HTML to get a secure page, you should be
having the level that interprets the HTML be secure regardless of the
input.


I am taking comments on a web forum from arbitrary people. The  
interpretation of the HTML occurs at the user's browser. A lot of  
people will be using outdated browsers (IE 5.5 / 6), ergo security (at  
the source) becomes my problem. I cannot force them to upgrade their  
browsers.


If anyone knows of the state-of-the-art in this area, I'd  
appreciate a

pointer.

http://htmlpurifier.org/live/smoketests/printDefinition.php

doesn't seem to think the style attribute is unsafe. Have they not  
been

following the MySpace fiascos?


Safety is a property of the HTML viewer, not of the HTML or CSS.


Well, yes and no. I am heavily restricting the XHTML I accept (e.g. no  
scripts, no style attribute, ...), in an attempt to keep things  
visually accessible and avoid phishing attacks. I was alluding to the  
use of absolute positioning in CSS. If I had a CSS parser I might  
allow the style attribute.


Safety for me involves making sure that what is displayed is  
trustworthy and easily identifiable as such. This is not something the  
HTML viewer can always help with.


I think we're off-topic enough for me to stop here. Thanks for your  
comments.


cheers
peter
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe