[backstage] BBC Breaking News RSS feeds query

2008-07-31 Thread robl

Hi,

I'm doing some work around breaking news and had a few queries about the 
 available feeds.  Looking on the backstage site there appears to be a 
few sources of breaking news feeds :


http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/breaking_news/rss.xml
http://newsrss.bbc.co.uk/rss/newsonline_world_edition/breaking_news/rss.xml

These feeds don't contain links to a story on the BBC news site, they 
only link through to news.bbc.co.uk.  I had a couple of questions :


1. Are these alerts in any way linked to a news story (i.e. is there a 
story published at the same time as the alert goes out via RSS ?)  If 
so, could the RSS feed be altered to contain the link to the story ?


2. Does the guid in the feed (e.g. guid 
isPermaLink=falseurn:news_bbc_co_uk:breaking_news:33400/guid
_) have any relationship to a story (so in this example does 33400 map 
to a story on the site ?)


Looking at the RSS entry titles and then the corresponding story on the 
BBC news site, they seem to be very similar or identical in many cases 
(I suspect the different ones are where the story has been subsequently 
updated) so I'm guessing in the worst case scenario I could match the 
website story title to the alert title to identify a story as 'breaking'.


Is there an easy way to identify a particular story as 'breaking news' 
at the moment ?


Thanks,

Rob
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Muddy Boots on Backstage

2007-11-27 Thread robl

Fearghas McKay wrote:

Noah

On 27 Nov 2007, at 10:57, Noah Slater wrote:


To which I have two suggestions:

 1) Leave the /discussion/ list you're on.
 2) Move to the next message, trash the message and move on.
 3) Filter all email with freedom in the body into /dev/null and be
done with it.




My fourth suggestion would be that perhaps the discussion you want to 
have is not on topic for a list. As such continuing the discussion you 
want to have may be off topic for most list members.


As to whether this list is an advocacy list for freedom I will leave 
as the list owners' call.



Or just change the post title and start a new post :

Free Software Nonsense was (Re: [backstage] Muddy Boots on Backstage)

That way this thread about MuddyBoots is actually useful to anyone who 
wants to find out about it and anybody who wants to talk about Free 
Software Nonsense can do.




-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Muddy Boots on Backstage

2007-11-26 Thread robl




Hi, Rob - this is neat, though not entirely sure that it's working 
entirely as you might want...


http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=701 
http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=701


...a page about The Sun (and the News of the World) has lots of 
links off to the NASA website - presumably because of the use of the 
word Sun...


Nice, though - and something to think about.


Hi James,

Thanks for this, it highlights one of the challenges we face when trying 
to find correct contextual meaning where ambiguity exists, we haven't 
got it right in all cases yet :)


I thought I'd work it through and highlight areas that could be 
improved.  The initial story has been categorised as being related to 
the following tags (via the yahoo term extraction service) :


(http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=viewid=701)

   * media ownership
   * editorial control
   * ownership laws
   * communications committee
   * independent board
   * evening newspapers
   * evidence http://en.wikipedia.org/wiki/Evidence
   * news corporation http://en.wikipedia.org/wiki/News_Corporation
   * chairman http://en.wikipedia.org/wiki/Chair_%28official%29
   * mr http://en.wikipedia.org/wiki/MR
   * house of lords http://en.wikipedia.org/wiki/House_of_Lords
   * news of the world http://en.wikipedia.org/wiki/News_of_the_World
   * mr murdoch
   * parliamentary committee http://en.wikipedia.org/wiki/Committee
   * murdoch http://en.wikipedia.org/wiki/Murdoch
   * fox news http://en.wikipedia.org/wiki/Fox_News_Channel
   * sky news http://en.wikipedia.org/wiki/Sky_News
   * sun http://en.wikipedia.org/wiki/Sun_%28disambiguation%29
   * news station http://en.wikipedia.org/wiki/News_station
   * rupert murdoch http://en.wikipedia.org/wiki/Rupert_Murdoch

The obvious problem with this is the sun tag, it is an ambiguous term 
that has many meanings, as evidenced at :


http://en.wikipedia.org/wiki/Sun_(disambiguation)

Currently we only follow the links off these disambiguation pages to 
gather external links, however if we were to improve our usage of the 
disambiguation pages we could cut down on these false positives (in fact 
that's top of the list of the things we'd like to experiment with).


The other problem here is that we display inks if they have any matches 
in del.icio.us with the story tags listed above.  We should probably put 
some metrics around the minimum number of tags a story must match to be 
a recommended link, in this case that would have meant we wouldn't have 
recommended the 'planetary' sun links if we had a minimum match of 2 tags.


Thanks for the feedback !


-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Muddy Boots on Backstage

2007-11-26 Thread robl

Tom Loosemore wrote:

Thanks for the feedback !



Muddy boots is cool...

  

Thanks :)

TheyWorkForYou.com adds links to Hansard by matching Proper Names with
Wikipedia entries.
http://www.theyworkforyou.com/debates/?id=2007-11-21a.1190.1

The number false positives is acceptable and the wikipedia links are
miles better than the user-generated glossary with which the site was
launched. But it's still limited since it only parses for Capitalised
Phrases or ACRONYMS.

Shifting to term extraction seemed an obvious route, but as I think
Muddy Boots shows, term extraction tends to throw up unacceptably
large number of  'false positive' terms- these result in crappy random
links and are user experience poison.

However, you can minimise false positive terms by running the copy
through several different flavours of term extractor, and only using
terms thrown up by x or more of them (where x depends on your appetite
for false positives vs false negatives).

  
I like this idea as obviously the context for the story (i.e. the tags 
we use to define it) impacts the final link recommendations, it's one of 
the two weak points in the system at the moment (the other being the 
previously mentioned disambiguation issues), however it's nice to have a 
platform that we can start to test these kind of ideas out ...

So, why not throw the copy through several more term extractors then
only use the overlapping terms?

- The BBC has at least one *excellent* term extractor in house which
adds extra metadata like 'this term is a person/place/topic'... would
be a lovely API to offer, hint hint...
-
  
Seconded !  Anybody else have any other recommendations for term 
extraction services ?

Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/
  


-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Muddy Boots on Backstage

2007-11-26 Thread robl

Brian Butterworth wrote:
How about using a two-frame page as the link with a rate this link 
option shown as a one-line toolbar at the top of the page?  Users 
could then rate the appropriateness of the link from wrong to 
fantastic, which would allow automatic removal of incorrect links 
and an simple administration list of links considered poor.


That was another idea we had, both from the perspective of feeding 
meta-data back to Wikipedia and also getting end-users to moderate 
links, although in our use-case we had the system helping journalists in 
finding relevant external link material, the one's they chose from the 
complete list were marked as known 'good' meta-data for the story and 
fed back into the system (and if they had the time they could mark 'bad' 
suggestions as well).



So for example if you choose a MuddyBoots 'red' report [1] (i.e. 
requires moderation) you'll see there are far more links that *could* be 
relevant to the article and the journalists could choose from these and 
add them to a news story, thus creating a feedback mechanism into the 
system.


[1] 
http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=714report_type=red

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


[backstage] Muddy Boots on Backstage

2007-11-21 Thread robl

Hi Everyone,

Just thought I'd accompany the latest post to the backstage blog 
(http://backstage.bbc.co.uk/news/archives/2007/11/from_last_years_1.html) 
with some examples of muddyboots in action.  For those of you who aren't 
aware of the project it's probably best to look at 
http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=more.  
Essentially we're attempting to use Wikipedia and other commons authored 
data sources to augment the meta-data around BBC news stories, this 
ultimately took the form of automated contextually relevant  link 
recommendations based off data within Wikipedia and del.icio.us 
(although we have some other ideas about how this data could be used ...)


It's still a prototype so it's not production ready by any means, there 
are still stories where we are unable to recommend links and there are 
others where ambiguity becomes a problem and identifying what context a 
story has can be difficult (although we have some ideas around using the 
disambiguation data within Wikipedia to improve this).


Here are a few links to stories where I thought muddyboots added some 
interest and hopefully a little of that Wikipedia 'browse experience' :


http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=646
http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=630
http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=622
http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=pageid=643

If you'd like to see how those recommendations were arrived at then each 
story has a 'View' action which can be used to get a breakdown of each 
stage of the muddyboots process, for example :


http://muddyboots.rattleresearch.com/cgi-bin/mb.cgi?action=viewid=622

It's worth noting we only keep the last 50 story submissions in the 
system, so these links will eventually 'age' out.


(Disclaimer : I worked on the project)

Thanks,

Rob
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] More iPlayer protesting

2007-08-01 Thread robl



Not that I'm condoning the choice, personally I'll always prefer an agnostic
system, but, well, maybe the BBC were just realists when it came to the
practicalities of development cost versus ROI from creating versions for
(EXTREMELY) minority OSes? I mean, come on, hands up who here on the list
uses Linux as their primary OS. 


Me
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/