Hi,

I have worked on a similar project before and have found the following link useful

http://blog.prashanthellina.com/2009/07/27/extracting-relevant-text-from-html-pages/

Best regards
~ Mukul Joshi

Director & CEO,
SpotOn Software Pvt. Ltd.
_SpotOn : One stop spot for your mobile development_

On 6/6/2011 6:31 AM, Base wrote:
hi all,

I am working on an app that will parse web pages to do some NLP and
statistics.  I am able to parse the HTML using several different tool
( enlive, HTML parser, etc).  However I would like to discard all the
rest of the junk in the web page that is not pertinent (I.e. Ads).
Does anyone have any experience doing this?  Any tips On how to do
this - or even better, tools that you can recommend?   I have been
digging around on this for a while now and am stuck!

Thanks!

Base


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to