Could you use something like python's Beautiful Soup library?
http://www.crummy.com/software/BeautifulSoup/

It'll scrape the page and you can drill down to isolate the main content
block.

It's what I used to make a script that parses rss feeds, scrapes the stories
from the sites then clusters the stories based on the story, rather than the
rss feed, content.

You can see the pre-alpha result, it's very much a work in progress
(especially the UI): www.codemeup.com

Whilst this example only displays the rss title, the script has gone to the
actual page and pulled out the content for word frequency analysis.

hope that's useful.

S.



On Feb 17, 2008 9:26 AM, Richard Askew <[EMAIL PROTECTED]> wrote:

> Hello everyone, firs time poster here!
>
> I wondered if you could help me. I am currently in my final year of
> University and I am currently drawing up ideas for my dissertation. I am
> looking to do some work with the BBC news feeds. At the moment I can receive
> the feeds and get the headline and brief description of the story. Is there
> a way in which I could go on and retrieve the whole news story for that
> particular feed so it can be presented in an application?
>
> Thank you for your time and I look forward to hearing from you.
>
> Richard Askew
>
> *****************************************************************************************
> To view the terms under which this email is distributed, please go to
> http://www.hull.ac.uk/legal/email_disclaimer.html
>
> *****************************************************************************************
>

Reply via email to