[google-appengine] Re: sanitizing user submitted HTML

2009-01-28 Thread Savraj

Super -- I used Dave's technique above and it totally works -- just
copy the html5lib directory into your source tree.

-s

On Jan 22, 12:21 pm, Chris Tan  wrote:
> Hi Dave,
>
> Html5liblooks like a well maintained and active project.
> The stack overflow clone white-lists a subset of the default safe
> elements (e.g. no button elements) which looks alright to me.  Of
> course, I'm no expert at this, so don't quote me on that :)
>
> Thanks for sharing,
>
> Chris
>
> On Jan 21, 4:36 pm, Dave  wrote:
>
> > Thanks Chris and Alexander,
>
> > I took a look at both... from the links I also 
> > foundhttp://code.google.com/p/soclone/source/browse/trunk/soclone/utils/ht...
> > which useshtml5lib. It puts a wrapper onhtml5liband helped me
> > figure out how  to make it work.
>
> > What is wicked cool is that what appeared to be a nightmare seems to
> > work just great. For others attempting same thing do this:
> > 1- get & installhtml5lib. Note: phthon manage.py install failed for
> > me so i just copied it to my project folder.
> > 2- get the code from link above and save it file in your project (i.e.
> > htmlsanitize.py)
> > 3- I run the code as a clean in my forms(i.e. def clean_comment) such
> > as below:
>
> >         def clean_comment(self):
> >                 import htmlsanitize
> >                 data = 
> > htmlsanitize.sanitize_html(self.cleaned_data['comment'])
> >                 return data
>
> > So far so good for me.
>
> > Would love to hear 'thumbs up' or 'thumbs down' if this is a good
> > approach.
>
> > thx again
>
> > Dave
>
> > Chris Tan wrote:
> > > Check out:
> > >http://feedparser.org/docs/html-sanitization.html
>
> > > On Jan 21, 2:47 pm, Dave  wrote:
> > > > There must be an easy answer for this problem and I almost feel dumb
> > > > for asking BUT I can't figure it out and have spent too much time
> > > > trying. The scenerio is a comment/blog situation. I am using tinyMCE
> > > > which is creating 'trustable' html. I can display this with django by
> > > > using {{field|safe}}... all is good.
>
> > > > The problem is some bozo will have their way with the textarea by
> > > > turning of their javascript. So I'm trying to figure out best way to
> > > > sanitize the data. The normal escaping of data won't work because it
> > > > clobbers the 'good' html from tinyMCE. Anyway would be good to
> > > > sanitize even the tinyMCE generated html.
>
> > > > I've been looking at using html5 lib/parser but can't seem to get it
> > > > to work. I've even gone through creating a replace method to escape
> > > > everything and then put back the 'good' tags. However, that seems like
> > > > a round-about way to go and get's really nasty when considering img,
> > > > span, etc. tags tinyMCE creates so nicely. Surely many have come
> > > > across this and there an easy answer.
>
> > > > All suggestions and recommendations are greatly appreciated.
>
> > > > thx,
>
> > > > Dave
>
>
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: sanitizing user submitted HTML

2009-01-22 Thread Chris Tan

Hi Dave,

Html5lib looks like a well maintained and active project.
The stack overflow clone white-lists a subset of the default safe
elements (e.g. no button elements) which looks alright to me.  Of
course, I'm no expert at this, so don't quote me on that :)

Thanks for sharing,

Chris


On Jan 21, 4:36 pm, Dave  wrote:
> Thanks Chris and Alexander,
>
> I took a look at both... from the links I also 
> foundhttp://code.google.com/p/soclone/source/browse/trunk/soclone/utils/ht...
> which uses html5lib. It puts a wrapper on html5lib and helped me
> figure out how  to make it work.
>
> What is wicked cool is that what appeared to be a nightmare seems to
> work just great. For others attempting same thing do this:
> 1- get & install html5lib. Note: phthon manage.py install failed for
> me so i just copied it to my project folder.
> 2- get the code from link above and save it file in your project (i.e.
> htmlsanitize.py)
> 3- I run the code as a clean in my forms(i.e. def clean_comment) such
> as below:
>
>         def clean_comment(self):
>                 import htmlsanitize
>                 data = 
> htmlsanitize.sanitize_html(self.cleaned_data['comment'])
>                 return data
>
> So far so good for me.
>
> Would love to hear 'thumbs up' or 'thumbs down' if this is a good
> approach.
>
> thx again
>
> Dave
>
> Chris Tan wrote:
> > Check out:
> >http://feedparser.org/docs/html-sanitization.html
>
> > On Jan 21, 2:47 pm, Dave  wrote:
> > > There must be an easy answer for this problem and I almost feel dumb
> > > for asking BUT I can't figure it out and have spent too much time
> > > trying. The scenerio is a comment/blog situation. I am using tinyMCE
> > > which is creating 'trustable' html. I can display this with django by
> > > using {{field|safe}}... all is good.
>
> > > The problem is some bozo will have their way with the textarea by
> > > turning of their javascript. So I'm trying to figure out best way to
> > > sanitize the data. The normal escaping of data won't work because it
> > > clobbers the 'good' html from tinyMCE. Anyway would be good to
> > > sanitize even the tinyMCE generated html.
>
> > > I've been looking at using html5 lib/parser but can't seem to get it
> > > to work. I've even gone through creating a replace method to escape
> > > everything and then put back the 'good' tags. However, that seems like
> > > a round-about way to go and get's really nasty when considering img,
> > > span, etc. tags tinyMCE creates so nicely. Surely many have come
> > > across this and there an easy answer.
>
> > > All suggestions and recommendations are greatly appreciated.
>
> > > thx,
>
> > > Dave
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: sanitizing user submitted HTML

2009-01-21 Thread Dave

Thanks Chris and Alexander,

I took a look at both... from the links I also found
http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py
which uses html5lib. It puts a wrapper on html5lib and helped me
figure out how  to make it work.

What is wicked cool is that what appeared to be a nightmare seems to
work just great. For others attempting same thing do this:
1- get & install html5lib. Note: phthon manage.py install failed for
me so i just copied it to my project folder.
2- get the code from link above and save it file in your project (i.e.
htmlsanitize.py)
3- I run the code as a clean in my forms(i.e. def clean_comment) such
as below:

def clean_comment(self):
import htmlsanitize
data = htmlsanitize.sanitize_html(self.cleaned_data['comment'])
return data

So far so good for me.

Would love to hear 'thumbs up' or 'thumbs down' if this is a good
approach.

thx again

Dave

Chris Tan wrote:
> Check out:
> http://feedparser.org/docs/html-sanitization.html
>
>
> On Jan 21, 2:47�pm, Dave  wrote:
> > There must be an easy answer for this problem and I almost feel dumb
> > for asking BUT I can't figure it out and have spent too much time
> > trying. The scenerio is a comment/blog situation. I am using tinyMCE
> > which is creating 'trustable' html. I can display this with django by
> > using {{field|safe}}... all is good.
> >
> > The problem is some bozo will have their way with the textarea by
> > turning of their javascript. So I'm trying to figure out best way to
> > sanitize the data. The normal escaping of data won't work because it
> > clobbers the 'good' html from tinyMCE. Anyway would be good to
> > sanitize even the tinyMCE generated html.
> >
> > I've been looking at using html5 lib/parser but can't seem to get it
> > to work. I've even gone through creating a replace method to escape
> > everything and then put back the 'good' tags. However, that seems like
> > a round-about way to go and get's really nasty when considering img,
> > span, etc. tags tinyMCE creates so nicely. Surely many have come
> > across this and there an easy answer.
> >
> > All suggestions and �recommendations are greatly appreciated.
> >
> > thx,
> >
> > Dave
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: sanitizing user submitted HTML

2009-01-21 Thread Chris Tan

Check out:
http://feedparser.org/docs/html-sanitization.html


On Jan 21, 2:47 pm, Dave  wrote:
> There must be an easy answer for this problem and I almost feel dumb
> for asking BUT I can't figure it out and have spent too much time
> trying. The scenerio is a comment/blog situation. I am using tinyMCE
> which is creating 'trustable' html. I can display this with django by
> using {{field|safe}}... all is good.
>
> The problem is some bozo will have their way with the textarea by
> turning of their javascript. So I'm trying to figure out best way to
> sanitize the data. The normal escaping of data won't work because it
> clobbers the 'good' html from tinyMCE. Anyway would be good to
> sanitize even the tinyMCE generated html.
>
> I've been looking at using html5 lib/parser but can't seem to get it
> to work. I've even gone through creating a replace method to escape
> everything and then put back the 'good' tags. However, that seems like
> a round-about way to go and get's really nasty when considering img,
> span, etc. tags tinyMCE creates so nicely. Surely many have come
> across this and there an easy answer.
>
> All suggestions and  recommendations are greatly appreciated.
>
> thx,
>
> Dave
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: sanitizing user submitted HTML

2009-01-21 Thread Alexander Kojevnikov

You can port this code [1] from C# to Python, shouldn't take long. The
code is used on the StackOverflow [2] website for exactly the same
purposes as yours.

[1] http://refactormycode.com/codes/333-sanitize-html
[2] http://stackoverflow.com/

On Jan 22, 9:47 am, Dave  wrote:
> There must be an easy answer for this problem and I almost feel dumb
> for asking BUT I can't figure it out and have spent too much time
> trying. The scenerio is a comment/blog situation. I am using tinyMCE
> which is creating 'trustable' html. I can display this with django by
> using {{field|safe}}... all is good.
>
> The problem is some bozo will have their way with the textarea by
> turning of their javascript. So I'm trying to figure out best way to
> sanitize the data. The normal escaping of data won't work because it
> clobbers the 'good' html from tinyMCE. Anyway would be good to
> sanitize even the tinyMCE generated html.
>
> I've been looking at using html5 lib/parser but can't seem to get it
> to work. I've even gone through creating a replace method to escape
> everything and then put back the 'good' tags. However, that seems like
> a round-about way to go and get's really nasty when considering img,
> span, etc. tags tinyMCE creates so nicely. Surely many have come
> across this and there an easy answer.
>
> All suggestions and  recommendations are greatly appreciated.
>
> thx,
>
> Dave
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---