i80and wrote: > I'm working on a program to remove tags from a HTML document, leaving > just the content, but I want to do it simply. I've finished a system > to remove simple tags, but I want all CSS and JS to be removed. What > re pattern could I use to do that? > > I've tried > '<script[\S\s]*/script>' > but that didn't work properly. I'm fairly basic in my knowledge of > Python, so I'm still trying to learn re. > What pattern would work?
I use re.compile("<script.*?</script>",re.DOTALL) for scripts. I strip this out first since my tag stripping re will strip out script tags as well hope this was of help. -- http://mail.python.org/mailman/listinfo/python-list