hi all,

does anyone know a way of cleaning up text that has been crawled from
the web? for example, most web pages have a lot of noise ie text from
menus, footers, adverts, etc.. i am looking for a way to clean this up
and end up with clean text say continuous paragraphs that actually have
some information in them. thats all i want to index. 

thanks.

fadzi


Reply via email to