Thanks - have added Mnesia to my list of things to check. And nginx
does sound so much better than pound or haproxy - both of which I've
tried to use (with little success when under load) in the past.

I would love to be able to use something like Google App Engine. 
I think this manually configured virtual machines stage we're
all currently at is temporary - in the future our apps won't have a
clue what they're running on, they'll use an API like Google App
Engine.

However, I don't think we can use it for Mapumental. We use
GDAL (http://gdal.org/) as a C library for rendering the tiles,
and our own C++ code for public transport route finding (see
https://secure.mysociety.org/cvstrac/rlog?f=mysociety/iso/bin/fastplan-coopt.cpp)

Neither can be run on Google App Engine.

Francis

On Mon, Aug 17, 2009 at 09:44:43AM +0100, Seb Bacon wrote:
> Hi Francis,
> 
> I was talking with someone at work about Mnesia, which sounds like
> it's worth considering. It is distributed among N nodes, so it's good
> for problems that require good cache locality, i.e. do a lot with the
> data (because all data is on every node and replicates everywhere
> quickly). For some types of data sets that breaks down quite soon of
> course (you pretty much want to only have up to RAM-size large
> dataset, e.g. up to 64 GB). Mnesia cares about replication of changes
> all around, about failed notes, netsplits and syncing back from them
> etc.
> 
> I don't know much about MongoDB or CouchDB. Maybe you have to manage
> syncing yourself on the application layer, but they probably scale
> much further (depending on what you do in your application). But you
> could also
> have smaller clusters of Mnesia nodes and application code replicating
> between them and multiplying presence of buckets across the clusters
> that are requested often or something such. Another global Mnesia to
> hold routing information (which bucket where).
> 
> So a combination might also make sense, Mnesia for the routing
> information on broker nodes and CouchDB or Memcached or MongoDB on the
> storage nodes with the large blobs of tile and other precomputed data.
> So your
> application severs would pick a broker node at random, ask it where
> some blob is and pass through the blob from the storage node to the
> client. The brokers could also increment per-object access counters
> and run some async jobs to have frequently accessed objects copied to
> more storage nodes etc.
> 
> Instead of NFS for distributing tiles, you could consider a web
> service running off an httpd server like nginx.
> 
> Another possibility for the entire infrastructure is Google App
> Engine, which utilises BigTable for fast, distributed data indexing
> and querying, and serves apps from a python or java runtime.  There is
> a queue API, a memcached API, a simple image manipulation API, and a
> very good pricing model, which works out considerably cheaper than AWS
> for all models I've considered; for example, CPU time is theoretically
> billed at the same rate in AWS and GAE, but in GAE you just pay for
> real CPU time, compared with AWS where you pay for instance uptime.
> Of course, the price you pay for the cheapness and free scaling in GAE
> is lack of control, and lack of customer service, and no choice of
> where the data is stored (but I don't think mapumental has data
> privacy concerns...?) . The flip side to the lack of control is that
> the complexity is constrained.  Personally I'm impressed by GAE and
> will be continuing to use it on new projects where I can, but I've not
> used it on a massively resource-intensive job yet. The only part of a
> GAE app that isn't easily portable to a new architecture is the
> datastore access, which can be abstracted away easily enough, so you
> could always chose to migrate from GAE to AWS at a later date.
> 
> Seb
> 
> 2009/8/14 Francis Irving <[email protected]>:
> > Mapumental is a website which shows contour maps of public transport
> > travel times, house prices and other data. It's in closed beta.
> >
> > http://mapumental.channel4.com/
> >
> > It uses lots of CPU running the transport route finding for each
> > postcode, and rendering the tiles as they are served.
> >
> > Before we can openly release it, we need to make it scale easily
> > (say, on Amazon Web Services).
> >
> > Currently it is using
> > * A PostgreSQL database to store the points behind the static datasets
> > such as scenicness and house prices.
> > * Binary files on NFS to store the generated datasets of travel times.
> > PostgreSQL was too slow, and used too much memory, to load in the
> > large number of rows that would be required (300,000 for each user entered
> > postcode).
> > * A rendered tile cache, containing PNG files on the NFS filesystem.
> > * PostgreSQL for queueing the jobs for the transport route finder.
> >
> > We now want to:
> > * make the site scale easily (on Amazon Web Service),
> > * make it easy to add more data sets.
> > We had problems with NFS, so I need something to replace the binary
> > files in NFS and the tile cache. It might also be prudent to use
> > something easier to scale than a PostgreSQL database, although I
> > suspect the load on it would be low so perhaps it isn't a problem.
> >
> > So the new version of Mapumental that I'm currently plannning has to
> > store:
> >    a) cache of tiles rendered (some fairly generated rarely
> >    and frequently accessed e.g. house prices, some not accessed
> >    often compared to generation times, e.g. public transport route)
> >    b) coordinates and values of arbitary point datasets (e.g.
> >    school quality, asthma air quality, wind speed, route by
> >    car to a particular postcode etc. etc.)
> >
> > I'm looking for good, open source, alternatives to NFS and PostgreSQL
> > to do this. Distributed data stores and queueing systems.
> >
> > What should I look at? What can I trust?
> >
> > I've already surveyed the field, and have my own ideas about what to
> > do, but would be interested if anyone here has some experience or
> > views on any of the obvious technologies.
> >
> > I'd like it to be stable and mature, and realistically it would
> > already be in a Debian package.
> >
> > Francis
> >
> > _______________________________________________
> > Mailing list [email protected]
> > Archive, settings, or unsubscribe:
> > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
> >
> 
> 
> 
> -- 
> skype: seb.bacon
> mobile: 07790 939224
> 
> _______________________________________________
> Mailing list [email protected]
> Archive, settings, or unsubscribe:
> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
> 

_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to