Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).
On Sat, Jan 24, 2009 at 10:38 AM, Steve Hill st...@nexusuk.org wrote: On Fri, 23 Jan 2009, Matt Amos wrote: this might be helpful http://svn.openstreetmap.org/applications/utils/export/tile_expiry/ Yes, I had a look at that script, but it only expires tiles with nodes on them, which I think is rather too simplistic. The readme says that it is unusual for the gap between nodes to be larger than a tile, but in my experience this just isn't true at all. in my experience it is unusual for the gap between nodes to be larger than a meta-tile, which is all we really care about as we re-render meta-tiles, not single tiles. it does happen, of course, but at high zoom levels in areas of low detail (i.e: not very interesting places). i find that browser caching issues tend to produce far more problems, but i don't know enough about the arcana of HTTP caching to try and fix the issues ;-) So my idea was to work on the postgis objects themselves during import. This should have some advantages: 1. We don't need to duplicate any work in translating OSM objects into the objects that are actually rendered - osm2pgsql already does this and we don't have to know or care how. to be fair, this is a pretty simple transformation - a single call to proj4. of course, re-using osm2pgsql has an advantage if you're rendering in several different projections. 2. We don't need to duplicate work parsing the OSM XML file - this should give some speed improvements. the script in SVN has no trouble expiring tiles in less than a minute, which is all that is required. measuring the load on the server shows that the overhead is negligible. 3. There should be a reduced number of database lookups because the only extra things we need to look up in the database are the postgis objects that are being deleted. the only database lookups that are done are to fetch the old positions of nodes and ways when the object is modified. due to the non-local algorithms mapnik uses to place text, i couldn't see a way to optimise this to a smaller part of the way. :-( The plan is to have osm2pgsql insert a list of dirty tiles for the maximum zoom level into a postgres table. I wrote a script that goes through each zoom level, starting at the maximum and working back to 0. Each zoom level has a minimum age associated with it and when the tile has been dirty for that long it is deleted and the coordinates for the tile at zoom-1 are inserted into the table. The idea being that low-zoom tiles change more frequently than high-zoom tiles, but are less interesting and more effort to render so shouldn't be re-rendered immediately. this is tied into the mapnik style. for example - changes to residential roads do not need to be propagated above the level at which residential roads are rendered. it would be interesting to extract this information automatically from the style file. even more interesting to try and diff styles and expire tiles based on those... this is one case where one big raid array is much better than many distributed disks. I was wondering if anyone had done any tests on the speed of a database that is distributed over a cluster of servers. I would imagine that there would be speed improvements, but I'm not sure what the overhead is like for actually working out which server contains the data you're after. it would be interesting to try this with a geographical distribution of both databases and rendering requests. i agree that the front-end (i.e: load-balancing) server would add quite a lot of complexity, especially if the rendering+diff load is highly geographically localised. it might be possible to get a similar speed-up with lower complexity by partitioning the tables. especially if suitable partition boundaries could found which are crossed by very few ways. Another possible solution is to have a number of completely independent rendering machines with their own copy of the database and just round-robin the rendering requests between them. This is obviously not something that could be done with BOINC or similar - not many people would want to dedicate 60GB of their hard drive to the OSM postgis database. :) But it could be done with a cluster of dedicated servers. i think this could be implemented quite quickly using existing load-balancing software. the only problem would be trying to cluster tile requests which come from the same meta-tile to avoid having all the servers in the cluster pointlessly simultaneously rendering the same meta-tile. However, I would be really interested to see just how much load there would be on the rendering servers if tiles were rendered on-demand only if they hadn't been rendered before or if they have really become dirty since the last render. It just may be that there is no need to chuck lots of hardware at the problem if tile expiry is done well. i totally agree. i've had a server re-rendering *all* the minutely updated meta-tiles (8-core
Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).
On Thu, Jan 22, 2009 at 10:26 AM, Steve Hill st...@nexusuk.org wrote: On Wed, 21 Jan 2009, Chris Andrew wrote: I notice that people often mention the delay in map edits being applied and made _live_. On a related note... For OpenPisteMap, I apply the diffs to the PostGIS DB every minute, so it only lags behind the live data by a few minutes. However, it doesn't currently automatically expire any tiles from the cache, so it won't re-render a tile after the data has been changed. this might be helpful http://svn.openstreetmap.org/applications/utils/export/tile_expiry/ I'm currently working on modifying osm2pgsql to create a list of tiles that have been changed as it applies the diffs so that they can be removed from the cache (and thus re-rendered on the fly when someone requests them). the above script expires meta-tiles with minutely updates. see the blog post here http://blog.cloudmade.com/2009/01/23/nearly-live-tiles/ and the tile server itself http://matt.sandbox.cloudmade.com/ With the OSM community growing by the day, this problem can only get bigger. Does anyone know whether anyone has consider using a distributed client [1] such as BOINC [2] to do the _number crunching_? From my experience, the number crunching doesn't really seem to be the limiting factor - database I/O is the biggest overhead for OpenPisteMap (although that may be partly down to the massive amount of SRTM contours data it has to handle while rendering each tile). +1 this is one case where one big raid array is much better than many distributed disks. cheers, matt ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).
On Wed, 21 Jan 2009, Chris Andrew wrote: I notice that people often mention the delay in map edits being applied and made _live_. On a related note... For OpenPisteMap, I apply the diffs to the PostGIS DB every minute, so it only lags behind the live data by a few minutes. However, it doesn't currently automatically expire any tiles from the cache, so it won't re-render a tile after the data has been changed. I'm currently working on modifying osm2pgsql to create a list of tiles that have been changed as it applies the diffs so that they can be removed from the cache (and thus re-rendered on the fly when someone requests them). My initial, very simplistic attempt was rather unsuccessful though - I made osm2pgsql calculate bounding boxes around every object being deleted or created. However, for some objects the bounding box can be extremely large (especially relations) so it expires a very large number of tiles. I think my next attempt will involve calculating which tiles a LINESTRING intersects. However, I'm not sure what to do about POLYGONs - technically, every tile within the polygon should be expired, but that could be a potentially huge area. Maybe the answer is simply to put in some sanity checks that ignore polygons that cover massive areas. With the OSM community growing by the day, this problem can only get bigger. Does anyone know whether anyone has consider using a distributed client [1] such as BOINC [2] to do the _number crunching_? From my experience, the number crunching doesn't really seem to be the limiting factor - database I/O is the biggest overhead for OpenPisteMap (although that may be partly down to the massive amount of SRTM contours data it has to handle while rendering each tile). - Steve xmpp:st...@nexusuk.org sip:st...@nexusuk.org http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).
This is basically what ti...@home tries to do. It is much better to use more frequent updates with mapnik, which requires a lot of bandwidth and processing power. Shaun On 21 Jan 2009, at 22:38, Chris Andrew wrote: Hi, everybody. I notice that people often mention the delay in map edits being applied and made _live_. With the OSM community growing by the day, this problem can only get bigger. Does anyone know whether anyone has consider using a distributed client [1] such as BOINC [2] to do the _number crunching_? For those not familiar, it means that anyone's computer can use spare processing power to do calculations, without disturbing the normal work of the computer. [1] http://en.wikipedia.org/wiki/Grid_computing [2] http://en.wikipedia.org/wiki/Boinc Just an idea. Cheers, Chris. -- Reasons why you may want to try GNU/Linux: http://www.getgnulinux.org/ ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).
2009/1/21 Chris Andrew cjhand...@gmail.com: With the OSM community growing by the day, this problem can only get bigger. Does anyone know whether anyone has consider using a distributed client [1] such as BOINC [2] to do the _number crunching_? The osmarender layers are done with a similar idea: http://wiki.openstreetmap.org/wiki/Tiles%40home Not quite a neat little client like BOINC, distributed.net etc. but it works, and turns around within hours. Doing the same thing for the Mapnik layer would be a whole 'nother project! cheers, LT ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).
I just looked at the Tiles installation instructions, and it looks like a nightmare. It also seems strange that people struggle to install, when open source easy to install products exist. I looked at the install for Debian and Ubuntu, and having used GNU/Linux for 10 years, was surprised at how off-putting it is. I've got no idea of the history of Tiles, but are we trying to prove something by making it difficult to use (by the documentation's own admission)? This isn't meant to start flaming, it's just an OSM newbie's slant on things. BTW, loving the whole project and am aiming to slowly map the whole planet, then maybe beyond ;-) Cheers, Chris. 2009/1/21 LeedsTracker leedstrac...@gmail.com: 2009/1/21 Chris Andrew cjhand...@gmail.com: With the OSM community growing by the day, this problem can only get bigger. Does anyone know whether anyone has consider using a distributed client [1] such as BOINC [2] to do the _number crunching_? The osmarender layers are done with a similar idea: http://wiki.openstreetmap.org/wiki/Tiles%40home Not quite a neat little client like BOINC, distributed.net etc. but it works, and turns around within hours. Doing the same thing for the Mapnik layer would be a whole 'nother project! cheers, LT -- Reasons why you may want to try GNU/Linux: http://www.getgnulinux.org/ ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb