2012/9/6 jamieorc <jamie...@gmail.com>

> Hey all, I'm looking for a lightweight way to strip html from a long
> String of text and leave just the text. I've come across JSoup, but at over
> 300kb for the lib, not quite lightweight.
>
> Suggestions?
>

JSoup is good way to do it. If you need to identify the "main" part of a
Web page, Boilerplate
is a great library. Because Boilerplate is such a pain to get started with
(dependency and documentation wise), I highly suggest that you use
Crawlista for this:

http://github.com/michaelklishin/crawlista

300kb does not sound like a lot. JVM will only load what is really used.
-- 
MK

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to