On 06.09.2012 20:41, jamieorc wrote:
> Hey all, I'm looking for a lightweight way to strip html from a long
> String of text and leave just the text. I've come across JSoup, but at
> over 300kb for the lib, not quite lightweight. 
> 
> Suggestions?

I've found Jericho HTML Parser to be fast, robust, and well documented:
http://jericho.htmlparser.net/docs/index.html

Its TextExtractor class seems to do exactly what you need:
http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/TextExtractor.html
http://jericho.htmlparser.net/samples/console/src/ExtractText.java

-- 
Timo

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to