[google-appengine] Re: sanitizing user submitted HTML
Super -- I used Dave's technique above and it totally works -- just copy the html5lib directory into your source tree. -s On Jan 22, 12:21 pm, Chris Tan wrote: > Hi Dave, > > Html5liblooks like a well maintained and active project. > The stack overflow clone white-lists a subset of the default safe > elements (e.g. no button elements) which looks alright to me. Of > course, I'm no expert at this, so don't quote me on that :) > > Thanks for sharing, > > Chris > > On Jan 21, 4:36 pm, Dave wrote: > > > Thanks Chris and Alexander, > > > I took a look at both... from the links I also > > foundhttp://code.google.com/p/soclone/source/browse/trunk/soclone/utils/ht... > > which useshtml5lib. It puts a wrapper onhtml5liband helped me > > figure out how to make it work. > > > What is wicked cool is that what appeared to be a nightmare seems to > > work just great. For others attempting same thing do this: > > 1- get & installhtml5lib. Note: phthon manage.py install failed for > > me so i just copied it to my project folder. > > 2- get the code from link above and save it file in your project (i.e. > > htmlsanitize.py) > > 3- I run the code as a clean in my forms(i.e. def clean_comment) such > > as below: > > > def clean_comment(self): > > import htmlsanitize > > data = > > htmlsanitize.sanitize_html(self.cleaned_data['comment']) > > return data > > > So far so good for me. > > > Would love to hear 'thumbs up' or 'thumbs down' if this is a good > > approach. > > > thx again > > > Dave > > > Chris Tan wrote: > > > Check out: > > >http://feedparser.org/docs/html-sanitization.html > > > > On Jan 21, 2:47 pm, Dave wrote: > > > > There must be an easy answer for this problem and I almost feel dumb > > > > for asking BUT I can't figure it out and have spent too much time > > > > trying. The scenerio is a comment/blog situation. I am using tinyMCE > > > > which is creating 'trustable' html. I can display this with django by > > > > using {{field|safe}}... all is good. > > > > > The problem is some bozo will have their way with the textarea by > > > > turning of their javascript. So I'm trying to figure out best way to > > > > sanitize the data. The normal escaping of data won't work because it > > > > clobbers the 'good' html from tinyMCE. Anyway would be good to > > > > sanitize even the tinyMCE generated html. > > > > > I've been looking at using html5 lib/parser but can't seem to get it > > > > to work. I've even gone through creating a replace method to escape > > > > everything and then put back the 'good' tags. However, that seems like > > > > a round-about way to go and get's really nasty when considering img, > > > > span, etc. tags tinyMCE creates so nicely. Surely many have come > > > > across this and there an easy answer. > > > > > All suggestions and recommendations are greatly appreciated. > > > > > thx, > > > > > Dave > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: sanitizing user submitted HTML
Hi Dave, Html5lib looks like a well maintained and active project. The stack overflow clone white-lists a subset of the default safe elements (e.g. no button elements) which looks alright to me. Of course, I'm no expert at this, so don't quote me on that :) Thanks for sharing, Chris On Jan 21, 4:36 pm, Dave wrote: > Thanks Chris and Alexander, > > I took a look at both... from the links I also > foundhttp://code.google.com/p/soclone/source/browse/trunk/soclone/utils/ht... > which uses html5lib. It puts a wrapper on html5lib and helped me > figure out how to make it work. > > What is wicked cool is that what appeared to be a nightmare seems to > work just great. For others attempting same thing do this: > 1- get & install html5lib. Note: phthon manage.py install failed for > me so i just copied it to my project folder. > 2- get the code from link above and save it file in your project (i.e. > htmlsanitize.py) > 3- I run the code as a clean in my forms(i.e. def clean_comment) such > as below: > > def clean_comment(self): > import htmlsanitize > data = > htmlsanitize.sanitize_html(self.cleaned_data['comment']) > return data > > So far so good for me. > > Would love to hear 'thumbs up' or 'thumbs down' if this is a good > approach. > > thx again > > Dave > > Chris Tan wrote: > > Check out: > >http://feedparser.org/docs/html-sanitization.html > > > On Jan 21, 2:47 pm, Dave wrote: > > > There must be an easy answer for this problem and I almost feel dumb > > > for asking BUT I can't figure it out and have spent too much time > > > trying. The scenerio is a comment/blog situation. I am using tinyMCE > > > which is creating 'trustable' html. I can display this with django by > > > using {{field|safe}}... all is good. > > > > The problem is some bozo will have their way with the textarea by > > > turning of their javascript. So I'm trying to figure out best way to > > > sanitize the data. The normal escaping of data won't work because it > > > clobbers the 'good' html from tinyMCE. Anyway would be good to > > > sanitize even the tinyMCE generated html. > > > > I've been looking at using html5 lib/parser but can't seem to get it > > > to work. I've even gone through creating a replace method to escape > > > everything and then put back the 'good' tags. However, that seems like > > > a round-about way to go and get's really nasty when considering img, > > > span, etc. tags tinyMCE creates so nicely. Surely many have come > > > across this and there an easy answer. > > > > All suggestions and recommendations are greatly appreciated. > > > > thx, > > > > Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: sanitizing user submitted HTML
Thanks Chris and Alexander, I took a look at both... from the links I also found http://code.google.com/p/soclone/source/browse/trunk/soclone/utils/html.py which uses html5lib. It puts a wrapper on html5lib and helped me figure out how to make it work. What is wicked cool is that what appeared to be a nightmare seems to work just great. For others attempting same thing do this: 1- get & install html5lib. Note: phthon manage.py install failed for me so i just copied it to my project folder. 2- get the code from link above and save it file in your project (i.e. htmlsanitize.py) 3- I run the code as a clean in my forms(i.e. def clean_comment) such as below: def clean_comment(self): import htmlsanitize data = htmlsanitize.sanitize_html(self.cleaned_data['comment']) return data So far so good for me. Would love to hear 'thumbs up' or 'thumbs down' if this is a good approach. thx again Dave Chris Tan wrote: > Check out: > http://feedparser.org/docs/html-sanitization.html > > > On Jan 21, 2:47�pm, Dave wrote: > > There must be an easy answer for this problem and I almost feel dumb > > for asking BUT I can't figure it out and have spent too much time > > trying. The scenerio is a comment/blog situation. I am using tinyMCE > > which is creating 'trustable' html. I can display this with django by > > using {{field|safe}}... all is good. > > > > The problem is some bozo will have their way with the textarea by > > turning of their javascript. So I'm trying to figure out best way to > > sanitize the data. The normal escaping of data won't work because it > > clobbers the 'good' html from tinyMCE. Anyway would be good to > > sanitize even the tinyMCE generated html. > > > > I've been looking at using html5 lib/parser but can't seem to get it > > to work. I've even gone through creating a replace method to escape > > everything and then put back the 'good' tags. However, that seems like > > a round-about way to go and get's really nasty when considering img, > > span, etc. tags tinyMCE creates so nicely. Surely many have come > > across this and there an easy answer. > > > > All suggestions and �recommendations are greatly appreciated. > > > > thx, > > > > Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: sanitizing user submitted HTML
Check out: http://feedparser.org/docs/html-sanitization.html On Jan 21, 2:47 pm, Dave wrote: > There must be an easy answer for this problem and I almost feel dumb > for asking BUT I can't figure it out and have spent too much time > trying. The scenerio is a comment/blog situation. I am using tinyMCE > which is creating 'trustable' html. I can display this with django by > using {{field|safe}}... all is good. > > The problem is some bozo will have their way with the textarea by > turning of their javascript. So I'm trying to figure out best way to > sanitize the data. The normal escaping of data won't work because it > clobbers the 'good' html from tinyMCE. Anyway would be good to > sanitize even the tinyMCE generated html. > > I've been looking at using html5 lib/parser but can't seem to get it > to work. I've even gone through creating a replace method to escape > everything and then put back the 'good' tags. However, that seems like > a round-about way to go and get's really nasty when considering img, > span, etc. tags tinyMCE creates so nicely. Surely many have come > across this and there an easy answer. > > All suggestions and recommendations are greatly appreciated. > > thx, > > Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: sanitizing user submitted HTML
You can port this code [1] from C# to Python, shouldn't take long. The code is used on the StackOverflow [2] website for exactly the same purposes as yours. [1] http://refactormycode.com/codes/333-sanitize-html [2] http://stackoverflow.com/ On Jan 22, 9:47 am, Dave wrote: > There must be an easy answer for this problem and I almost feel dumb > for asking BUT I can't figure it out and have spent too much time > trying. The scenerio is a comment/blog situation. I am using tinyMCE > which is creating 'trustable' html. I can display this with django by > using {{field|safe}}... all is good. > > The problem is some bozo will have their way with the textarea by > turning of their javascript. So I'm trying to figure out best way to > sanitize the data. The normal escaping of data won't work because it > clobbers the 'good' html from tinyMCE. Anyway would be good to > sanitize even the tinyMCE generated html. > > I've been looking at using html5 lib/parser but can't seem to get it > to work. I've even gone through creating a replace method to escape > everything and then put back the 'good' tags. However, that seems like > a round-about way to go and get's really nasty when considering img, > span, etc. tags tinyMCE creates so nicely. Surely many have come > across this and there an easy answer. > > All suggestions and recommendations are greatly appreciated. > > thx, > > Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---