doc = Nokogiri::HTML(html)
    (doc/'body').text

I believe will do what you want.

Please correct me if I'm wrong

-Cam

On 22/04/2010, at 10:28 AM, Korny Sietsma wrote:

> Hi folks - I'm looking for something that will load a web page and extract 
> just visible text elements from it.
> I could probably write something using nokogiri (is nokogiri still the best 
> option for html parsing?) but I was wondering if someone had alread done 
> something similar.
> 
> I don't need initially to crawl links, though this might be a later 
> requirement - maybe there's a web crawler that could do the job...
> 
> - Korny
> 
> -- 
> Kornelis Sietsma  korny at my surname dot com
> kornys on twitter/fb/gtalk/gwave www.sietsma.com/korny
> "Every jumbled pile of person has a thinking part
> that wonders what the part that isn't thinking
> isn't thinking of"
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Ruby or Rails Oceania" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/rails-oceania?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rails-oceania?hl=en.

Reply via email to