On 06.09.2012 20:41, jamieorc wrote: > Hey all, I'm looking for a lightweight way to strip html from a long > String of text and leave just the text. I've come across JSoup, but at > over 300kb for the lib, not quite lightweight. > > Suggestions?
I've found Jericho HTML Parser to be fast, robust, and well documented: http://jericho.htmlparser.net/docs/index.html Its TextExtractor class seems to do exactly what you need: http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/TextExtractor.html http://jericho.htmlparser.net/samples/console/src/ExtractText.java -- Timo -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en