2012/9/6 jamieorc <jamie...@gmail.com> > Hey all, I'm looking for a lightweight way to strip html from a long > String of text and leave just the text. I've come across JSoup, but at over > 300kb for the lib, not quite lightweight. > > Suggestions? >
JSoup is good way to do it. If you need to identify the "main" part of a Web page, Boilerplate is a great library. Because Boilerplate is such a pain to get started with (dependency and documentation wise), I highly suggest that you use Crawlista for this: http://github.com/michaelklishin/crawlista 300kb does not sound like a lot. JVM will only load what is really used. -- MK -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en