On Tuesday 24 April 2007 20:47, Jonathan Powell wrote:
> This whole XML pipelining thing has captured my interest more than 
> anything else technological in the past year. Your talk looks 
> fascinating - I wish I could be there 
>
> Do you use much of this technology in the BBC? 

It's a natural fit for the BBC really - any production chain tends to be
a pipeline of hardware boxes. Having pipelines of software boxes is
natural & logical. It's something that makes sense to grow. Getting
information out from R&D that there's something they can just use can
be tricky...

(he says sneakily using a public mailing list for that purpose ;)

> I haven't yet seen a good 
> example of it being used for anything particularly substantial - does it
> have excessive overhead?

In the spirit of Backstage being "use our stuff to build your stuff" it
makes sense for me to mention Kamaelia here again.

What do you mean by substantive? I'm not talking about XML pipelining or
mashups below but the more general aspect of pipelining (which Kamaelia
uses as its default system creation approach). Sure we can pipe XML around
and you can probably make a mashup using it (I've never bothered, despite
webserving and RSS capabilities :-), but they're relatively trivial
applications in my mind.

eg XML parsing is used by our simple presentation tool (Kamaelia: Show):
    * 
http://svn.sourceforge.net/viewvc/kamaelia/trunk/Code/Python/Kamaelia/Tools/Show.py?revision=2301&view=markup

There was a box made ~18 months ago by Radio & Music Interactive (using
Kamaelia for prototyping since Kamaelia wasn't optimised then) which used
a pipelining approach for grabbing all BBC radio output, transcoding it
for a record of transmission and then the data was used for generating
podcasts. As I understand it this led directly to the podcast trials.

The code to do that as a pipeline is relatively trivial really. Doing *NOT*
as a pipeline can make your life a lot harder.

We (in R&D) then heard that Kamaelia had been used for this and decided
to optimise it (At that time it did have excessive overheads). We've
since built a similar system for TV:
   * http://kamaelia.sourceforge.net/KamaeliaMacro

Which has been up and running for around 12 months now. You can see
the front end here, but unfortunately can't give anyone a login for
the data (obvious reasons!). It is useful as a record of transmission
however:
   * http://bbc.kamaelia.org/cgi-bin/blog/blog.cgi

Due to space concerns etc it purges data. (We missed that off initially
and running out of diskspace twice has caused the only 2 crashes for
that system :-) This was intended to cover all channels, but availability
of hardware has been problematic.

You CAN build one for yourself of course, since it's just timeshifting
(so long as you don't redistribute). Pretty simple too, just need a
linux box and and a DVB-T stick and however much storage you want.

If anyone's curious about this, following up here on the Kamaelia
mailing lists is welcome. The code for it is here:

   * 
http://svn.sourceforge.net/viewvc/kamaelia/trunk/Code/Python/Kamaelia/Examples/DVB_Systems/Macro.py?revision=2257&view=markup

In terms of code, rather than a simple pipeline something like this needs
to be a graphline. At somepoint later today probably I'll add in the
diagrams from my talk on Kamaelia Macro's internals at EuroOSCON. The
structure is outlined OK on the page noted above.

A pipelining system is only as good as they components available inside
it, and for a flavour of the sorts of components our reference is here:
   * http://kamaelia.sourceforge.net/Components

In terms of overhead, the entire system has been optimised now meaning
we can do realtime transcoding tasks, video playback, collaborative
whiteboarding, sketching over running video, use Open GL, distribution
via bit torrent (since that's been integrated), simple game based
interfaces. There's extensive DVB support due to the work on Kamaelia
Macro.

Pipelining can be a much simpler way of writing code. For example writing
a splitting server is as simple as writing something like this:

   Backplane("DataToServe").activate()
   Pipeline(
       TCPClient( someserver, someport ), # grab source stream to split
       publishTo("DataToServe"),
   ).activate()

   def ServeTalk(): # Protocol handler created to handle each connected client
        return subscribeTo("DataToServe")

   SimpleServer( ServeTalk, 1602).run()

That's a simple scalable splitting server. It's also the core of a basic
P2P streaming system with no QoS, since it is also lightweight enough to
it on a client's system. The only thing missing from it is mesh setup &
resiliency.

IMO, XML pipelining and mashups are simple subsets of the more general
principle that Kamaelia works on. We do have a webserver component as
well (written last year by a Google Summer of Code student as an unexpected
extra, being fleshed out this year by another), which means we can bridge
into the web world that way. As well as that there's also a webclient,
which when combined with feedparser, bridges the other way:

   * http://kamaelia.sourceforge.net/Cookbook/HTTPClient shows how to bridge
     with feedparser.
   * http://kamaelia.sourceforge.net/Cookbook/HTTPServer shows how to use the
     webserver at present.

One of the problems we've seen with this approach is due to the dataflow &
naturally concurrent nature of pipeline, some people can find it harder to
integrate with traditional style code. As a result this year our interaction
with Google's Summer Of Code[1] is with a number of projects which are aimed
at making it simpler for people to integrate Kamaelia facilities with
non-Kamaelia systems.

An overview of GSOC 2007 projects:

   * Web Server Consolidation - This task will extend the web-server component
     in Kamaelia to make it useful as a general purpose web-server component.
     (Why? Being able to have a scalable targeted lightweight webserver anywhere
     that is Kamaelia based opens up huge options. Eg Desktop Django/Pylons,
     adding a web interface to your PVR trivially, local rather than remote
     mashups, offline web applications, etc. Simpler mashing of web and non-web
     apps.

   * Compose: Shard Extensions - This task extends our graphical builder tool
     to allow creation of linear components graphicallally. Compose is currently
     a line in the sand to say "it should be simple for expert users, not just
     programmers to build systems". Being able to originate new components 
rather
     than just use existing ones (where we are now), is an aim for this project.

   * Filehandle Like API - This task allows us to treat components like you
     would a filehandle. File reading and writing & flushing etc normally
     happens concurrently to your normal code, but you don't think of it that
     way, so that's the idea here. In practice this would allow the ability to
     embed kamaelia systems in "normal" code.
 
   * AIM/IRC Bridge - This may seem bizarre, but having textual input and
     output to programs is useful for testing. AIM & IRC components extend
     this IO to more interesting user level areas. For example combining AIM
     clients with Kamaelia Macro would mean that you could (for example) - if
     you wrote a parser - be able to say to your PVR:

         recordForMe BBCONE "Neighbours /data/neighbours.ts
         recordForMe CBBC "Class TV" /data/schoolsprogrammes.ts

     The latter would record every educational programme broadcast by CBBC on
     weekday mornings. The former would record every episode of Neighbours.
     (Parsing this is trivial, the interesting bit is the AIM & IRC components 
:-)

     Or you could have it watching the DVB-EIT stream to look for programmes
     you may want to watch and have it (literally) tell you. (eg hook up the 
output
     from the AIM component to a text to speech component. Creating that on Mac 
OS
     is as simple as UnixProcess("/usr/bin/say").

     Building your own twitter style server would be relatively trivial once
     these components are written. You could also however share the server
     as something lightweight and have a distributed distribution network
     relatively easily.

     The recordForMe component can be found here BTW:
        * 
http://svn.sourceforge.net/viewvc/kamaelia/trunk/Code/Python/Kamaelia/Examples/DVB_Systems/PersonalVideoRecorder.py?revision=2715&view=markup

     How that works:
        * http://kamaelia.sourceforge.net/Cookbook/DVB/PersonalVideoRecorder

   * Test framework - whilst we use unit test, _testing_ data flow or
     pipelining _systems_ is a harder problem for which there's no easy
     solution at present. Obviously we have a number of test harnesses,
     and the aim of this project is to integrate them and make them
     reusable as a starting point.

More information via Google's SOC pages here:
   * http://code.google.com/soc/bbc/about.html

Things already done though:
   * Kamaelia Macro - 
http://kamaelia.sourceforge.net/Developers/Projects/KamaeliaMacro
   * Mobile Reframer - 
http://kamaelia.sourceforge.net/Developers/Projects/MobileReframer
   * Multicast RTP MPEG Remultiplexer -
     
http://kamaelia.sourceforge.net/Developers/Projects/MulticastRtpMpegRemultiplexer
   * Multicast Proxy Tools - 
http://kamaelia.sourceforge.net/Developers/Projects/MulticastProxyTools
   * Whiteboard - http://kamaelia.sourceforge.net/Developers/Projects/Whiteboard
     - This is a simple looking app, but has audio support, multiple pages,
       ability for remote control and can build ad-hoc P2P style meshes for
       sharing whiteboard sessions and audio. Also has recording & playback
       capabilities.
   * Compose - http://kamaelia.sourceforge.net/Developers/Projects/Compose
   * Video Cut Detector - 
http://kamaelia.sourceforge.net/Developers/Projects/VideoCutDetector
   * Last years GSOC overview: 
http://kamaelia.sourceforge.net/Developers/Projects/GoogleSummerOfCode2006

If you want to know more the places to start:

Introductions:
   * http://kamaelia.sourceforge.net/Introduction - short and simple overview.

   * http://kamaelia.sourceforge.net/t/TN-LinuxFormat-Kamaelia.pdf
     - An article I wrote for Linux Format, which describes how to install
     and use the Kamaelia whiteboarding application (works best using a
     tablet PC or some form of tablet interface)

   * http://kamaelia.sourceforge.net/t/TN-LightTechnicalIntroToKamaelia.pdf
     - An Article I wrote for Linux Magazin (german version of Linux Magazine,
     text above is english :). In terms of installing from a developers
     perspective, personally I think my instructions here are simpler/better
     than above. (Though that's targeted at getting a specific app running :)

Next steps:
   * http://kamaelia.sourceforge.net/MiniAxon (internals tutorial)
   * http://tinyurl.com/2p9zku - overview of how to go from a stub program
     to reusable components.
   * http://kamaelia.sourceforge.net/Cookbook - Cookbook - contains *lots*
     of examples. Some trivial, some not.

We tend to use IRC for collaboration (hence the desire above for IRC
integration) so if people do get started, popping by the IRC channel is
probably not a bad idea :)

Oh and the point of Kamaelia? To make concurrency simple, safe and natural
to work with. <rhetorical>You *do* want to be able to use your multicore
systems capabilities as they grow?</rhetorical> :-)

Anyone not using pipelining in some form in 5 years time (unix pipelines,
IPC, mashups, XML pipeline, Kamaelia style, Erlang style) will probably
only be using trivial systems IMO, and certainly not scalable ones (esp
if they're CPU intensive)

Regards,


Michael. 
--
Michael Sparks, Senior Research Engineer, BBC Research, Technology Group
[EMAIL PROTECTED], Kamaelia Project Lead, http://kamaelia.sf.net/

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/[email protected]/

Reply via email to