This is may be a bit new for your needs, but to the untrained eye looks interesting
http://fluidinfo.com/fluiddb 2009/8/17 Seb Bacon <[email protected]> > 2009/8/17 Francis Irving <[email protected]>: > > However, I don't think we can use it for Mapumental. We use > > GDAL (http://gdal.org/) as a C library for rendering the tiles, > > and our own C++ code for public transport route finding (see > > > https://secure.mysociety.org/cvstrac/rlog?f=mysociety/iso/bin/fastplan-coopt.cpp > ) > > > > Neither can be run on Google App Engine. > > I suppose it wouldn't make sense to expose them as a web service in a > different infrastructure..? > > Seb > > > > On Mon, Aug 17, 2009 at 09:44:43AM +0100, Seb Bacon wrote: > >> Hi Francis, > >> > >> I was talking with someone at work about Mnesia, which sounds like > >> it's worth considering. It is distributed among N nodes, so it's good > >> for problems that require good cache locality, i.e. do a lot with the > >> data (because all data is on every node and replicates everywhere > >> quickly). For some types of data sets that breaks down quite soon of > >> course (you pretty much want to only have up to RAM-size large > >> dataset, e.g. up to 64 GB). Mnesia cares about replication of changes > >> all around, about failed notes, netsplits and syncing back from them > >> etc. > >> > >> I don't know much about MongoDB or CouchDB. Maybe you have to manage > >> syncing yourself on the application layer, but they probably scale > >> much further (depending on what you do in your application). But you > >> could also > >> have smaller clusters of Mnesia nodes and application code replicating > >> between them and multiplying presence of buckets across the clusters > >> that are requested often or something such. Another global Mnesia to > >> hold routing information (which bucket where). > >> > >> So a combination might also make sense, Mnesia for the routing > >> information on broker nodes and CouchDB or Memcached or MongoDB on the > >> storage nodes with the large blobs of tile and other precomputed data. > >> So your > >> application severs would pick a broker node at random, ask it where > >> some blob is and pass through the blob from the storage node to the > >> client. The brokers could also increment per-object access counters > >> and run some async jobs to have frequently accessed objects copied to > >> more storage nodes etc. > >> > >> Instead of NFS for distributing tiles, you could consider a web > >> service running off an httpd server like nginx. > >> > >> Another possibility for the entire infrastructure is Google App > >> Engine, which utilises BigTable for fast, distributed data indexing > >> and querying, and serves apps from a python or java runtime. There is > >> a queue API, a memcached API, a simple image manipulation API, and a > >> very good pricing model, which works out considerably cheaper than AWS > >> for all models I've considered; for example, CPU time is theoretically > >> billed at the same rate in AWS and GAE, but in GAE you just pay for > >> real CPU time, compared with AWS where you pay for instance uptime. > >> Of course, the price you pay for the cheapness and free scaling in GAE > >> is lack of control, and lack of customer service, and no choice of > >> where the data is stored (but I don't think mapumental has data > >> privacy concerns...?) . The flip side to the lack of control is that > >> the complexity is constrained. Personally I'm impressed by GAE and > >> will be continuing to use it on new projects where I can, but I've not > >> used it on a massively resource-intensive job yet. The only part of a > >> GAE app that isn't easily portable to a new architecture is the > >> datastore access, which can be abstracted away easily enough, so you > >> could always chose to migrate from GAE to AWS at a later date. > >> > >> Seb > >> > >> 2009/8/14 Francis Irving <[email protected]>: > >> > Mapumental is a website which shows contour maps of public transport > >> > travel times, house prices and other data. It's in closed beta. > >> > > >> > http://mapumental.channel4.com/ > >> > > >> > It uses lots of CPU running the transport route finding for each > >> > postcode, and rendering the tiles as they are served. > >> > > >> > Before we can openly release it, we need to make it scale easily > >> > (say, on Amazon Web Services). > >> > > >> > Currently it is using > >> > * A PostgreSQL database to store the points behind the static datasets > >> > such as scenicness and house prices. > >> > * Binary files on NFS to store the generated datasets of travel times. > >> > PostgreSQL was too slow, and used too much memory, to load in the > >> > large number of rows that would be required (300,000 for each user > entered > >> > postcode). > >> > * A rendered tile cache, containing PNG files on the NFS filesystem. > >> > * PostgreSQL for queueing the jobs for the transport route finder. > >> > > >> > We now want to: > >> > * make the site scale easily (on Amazon Web Service), > >> > * make it easy to add more data sets. > >> > We had problems with NFS, so I need something to replace the binary > >> > files in NFS and the tile cache. It might also be prudent to use > >> > something easier to scale than a PostgreSQL database, although I > >> > suspect the load on it would be low so perhaps it isn't a problem. > >> > > >> > So the new version of Mapumental that I'm currently plannning has to > >> > store: > >> > a) cache of tiles rendered (some fairly generated rarely > >> > and frequently accessed e.g. house prices, some not accessed > >> > often compared to generation times, e.g. public transport route) > >> > b) coordinates and values of arbitary point datasets (e.g. > >> > school quality, asthma air quality, wind speed, route by > >> > car to a particular postcode etc. etc.) > >> > > >> > I'm looking for good, open source, alternatives to NFS and PostgreSQL > >> > to do this. Distributed data stores and queueing systems. > >> > > >> > What should I look at? What can I trust? > >> > > >> > I've already surveyed the field, and have my own ideas about what to > >> > do, but would be interested if anyone here has some experience or > >> > views on any of the obvious technologies. > >> > > >> > I'd like it to be stable and mature, and realistically it would > >> > already be in a Debian package. > >> > > >> > Francis > >> > > >> > _______________________________________________ > >> > Mailing list [email protected] > >> > Archive, settings, or unsubscribe: > >> > > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public > >> > > >> > >> > >> > >> -- > >> skype: seb.bacon > >> mobile: 07790 939224 > >> > >> _______________________________________________ > >> Mailing list [email protected] > >> Archive, settings, or unsubscribe: > >> > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public > >> > > > > > > -- > skype: seb.bacon > mobile: 07790 939224 > > _______________________________________________ > Mailing list [email protected] > Archive, settings, or unsubscribe: > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >
_______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
