Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).

2009-01-24 Thread Matt Amos
On Sat, Jan 24, 2009 at 10:38 AM, Steve Hill st...@nexusuk.org wrote:
 On Fri, 23 Jan 2009, Matt Amos wrote:

 this might be helpful
 http://svn.openstreetmap.org/applications/utils/export/tile_expiry/

 Yes, I had a look at that script, but it only expires tiles with nodes on
 them, which I think is rather too simplistic.  The readme says that it is
 unusual for the gap between nodes to be larger than a tile, but in my
 experience this just isn't true at all.

in my experience it is unusual for the gap between nodes to be larger
than a meta-tile, which is all we really care about as we re-render
meta-tiles, not single tiles. it does happen, of course, but at high
zoom levels in areas of low detail (i.e: not very interesting places).

i find that browser caching issues tend to produce far more problems,
but i don't know enough about the arcana of HTTP caching to try and
fix the issues ;-)

 So my idea was to work on the postgis objects themselves during import. This
 should have some advantages:
 1. We don't need to duplicate any work in translating OSM objects into the
 objects that are actually rendered - osm2pgsql already does this and we
 don't have to know or care how.

to be fair, this is a pretty simple transformation - a single call to
proj4. of course, re-using osm2pgsql has an advantage if you're
rendering in several different projections.

 2. We don't need to duplicate work parsing the OSM XML file - this should
 give some speed improvements.

the script in SVN has no trouble expiring tiles in less than a minute,
which is all that is required. measuring the load on the server shows
that the overhead is negligible.

 3. There should be a reduced number of database lookups because the only
 extra things we need to look up in the database are the postgis objects that
 are being deleted.

the only database lookups that are done are to fetch the old positions
of nodes and ways when the object is modified. due to the non-local
algorithms mapnik uses to place text, i couldn't see a way to optimise
this to a smaller part of the way. :-(

 The plan is to have osm2pgsql insert a list of dirty tiles for the maximum
 zoom level into a postgres table.  I wrote a script that goes through each
 zoom level, starting at the maximum and working back to 0.  Each zoom level
 has a minimum age associated with it and when the tile has been dirty for
 that long it is deleted and the coordinates for the tile at zoom-1 are
 inserted into the table.  The idea being that low-zoom tiles change more
 frequently than high-zoom tiles, but are less interesting and more effort to
 render so shouldn't be re-rendered immediately.

this is tied into the mapnik style. for example - changes to
residential roads do not need to be propagated above the level at
which residential roads are rendered. it would be interesting to
extract this information automatically from the style file. even more
interesting to try and diff styles and expire tiles based on those...

 this is one case where one big raid array is much better than many
 distributed disks.

 I was wondering if anyone had done any tests on the speed of a database that
 is distributed over a cluster of servers.  I would imagine that there would
 be speed improvements, but I'm not sure what the overhead is like for
 actually working out which server contains the data you're after.

it would be interesting to try this with a geographical distribution
of both databases and rendering requests. i agree that the front-end
(i.e: load-balancing) server would add quite a lot of complexity,
especially if the rendering+diff load is highly geographically
localised.

it might be possible to get a similar speed-up with lower complexity
by partitioning the tables. especially if suitable partition
boundaries could found which are crossed by very few ways.

 Another possible solution is to have a number of completely independent
 rendering machines with their own copy of the database and just round-robin
 the rendering requests between them.  This is obviously not something that
 could be done with BOINC or similar - not many people would want to dedicate
 60GB of their hard drive to the OSM postgis database. :) But it could be
 done with a cluster of dedicated servers.

i think this could be implemented quite quickly using existing
load-balancing software. the only problem would be trying to cluster
tile requests which come from the same meta-tile to avoid having all
the servers in the cluster pointlessly simultaneously rendering the
same meta-tile.

 However, I would be really interested to see just how much load there would
 be on the rendering servers if tiles were rendered on-demand only if they
 hadn't been rendered before or if they have really become dirty since the
 last render.  It just may be that there is no need to chuck lots of hardware
 at the problem if tile expiry is done well.

i totally agree. i've had a server re-rendering *all* the minutely
updated meta-tiles (8-core 

Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).

2009-01-23 Thread Matt Amos
On Thu, Jan 22, 2009 at 10:26 AM, Steve Hill st...@nexusuk.org wrote:
 On Wed, 21 Jan 2009, Chris Andrew wrote:

 I notice that people often mention the delay in map edits being
 applied and made _live_.

 On a related note...

 For OpenPisteMap, I apply the diffs to the PostGIS DB every minute, so it
 only lags behind the live data by a few minutes.  However, it doesn't
 currently automatically expire any tiles from the cache, so it won't
 re-render a tile after the data has been changed.

this might be helpful
http://svn.openstreetmap.org/applications/utils/export/tile_expiry/

 I'm currently working on modifying osm2pgsql to create a list of tiles
 that have been changed as it applies the diffs so that they can be removed
 from the cache (and thus re-rendered on the fly when someone requests
 them).

the above script expires meta-tiles with minutely updates. see the
blog post here http://blog.cloudmade.com/2009/01/23/nearly-live-tiles/
and the tile server itself http://matt.sandbox.cloudmade.com/

 With the OSM community growing by the day, this problem can only get
 bigger.  Does anyone know whether anyone has consider using a
 distributed client [1] such as BOINC [2] to do the _number crunching_?

 From my experience, the number crunching doesn't really seem to be the
 limiting factor - database I/O is the biggest overhead for OpenPisteMap
 (although that may be partly down to the massive amount of SRTM contours
 data it has to handle while rendering each tile).

+1

this is one case where one big raid array is much better than many
distributed disks.

cheers,

matt

___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).

2009-01-22 Thread Steve Hill
On Wed, 21 Jan 2009, Chris Andrew wrote:

 I notice that people often mention the delay in map edits being
 applied and made _live_.

On a related note...

For OpenPisteMap, I apply the diffs to the PostGIS DB every minute, so it 
only lags behind the live data by a few minutes.  However, it doesn't 
currently automatically expire any tiles from the cache, so it won't 
re-render a tile after the data has been changed.

I'm currently working on modifying osm2pgsql to create a list of tiles 
that have been changed as it applies the diffs so that they can be removed 
from the cache (and thus re-rendered on the fly when someone requests 
them).

My initial, very simplistic attempt was rather unsuccessful though - I 
made osm2pgsql calculate bounding boxes around every object being deleted 
or created.  However, for some objects the bounding box can be extremely 
large (especially relations) so it expires a very large number of tiles.

I think my next attempt will involve calculating which tiles a LINESTRING
intersects.  However, I'm not sure what to do about POLYGONs - 
technically, every tile within the polygon should be expired, but that 
could be a potentially huge area.  Maybe the answer is simply to put in 
some sanity checks that ignore polygons that cover massive areas.

 With the OSM community growing by the day, this problem can only get
 bigger.  Does anyone know whether anyone has consider using a
 distributed client [1] such as BOINC [2] to do the _number crunching_?

From my experience, the number crunching doesn't really seem to be the 
limiting factor - database I/O is the biggest overhead for OpenPisteMap 
(although that may be partly down to the massive amount of SRTM contours 
data it has to handle while rendering each tile).

  - Steve
xmpp:st...@nexusuk.org   sip:st...@nexusuk.org   http://www.nexusuk.org/

  Servatis a periculum, servatis a maleficum - Whisper, Evanescence


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).

2009-01-21 Thread Shaun McDonald
This is basically what ti...@home tries to do. It is much better to  
use more frequent updates with mapnik, which requires a lot of  
bandwidth and processing power.

Shaun

On 21 Jan 2009, at 22:38, Chris Andrew wrote:

 Hi, everybody.

 I notice that people often mention the delay in map edits being
 applied and made _live_.

 With the OSM community growing by the day, this problem can only get
 bigger.  Does anyone know whether anyone has consider using a
 distributed client [1] such as BOINC [2] to do the _number crunching_?

 For those not familiar, it means that anyone's computer can use spare
 processing power to do calculations, without disturbing the normal
 work of the computer.

 [1]  http://en.wikipedia.org/wiki/Grid_computing

 [2]  http://en.wikipedia.org/wiki/Boinc

 Just an idea.

 Cheers,

 Chris.

 -- 
 Reasons why you may want to try GNU/Linux:

 http://www.getgnulinux.org/

 ___
 Talk-GB mailing list
 Talk-GB@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-gb


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).

2009-01-21 Thread LeedsTracker
2009/1/21 Chris Andrew cjhand...@gmail.com:
 With the OSM community growing by the day, this problem can only get
 bigger.  Does anyone know whether anyone has consider using a
 distributed client [1] such as BOINC [2] to do the _number crunching_?

The osmarender layers are done with a similar idea:
http://wiki.openstreetmap.org/wiki/Tiles%40home

Not quite a neat little client like BOINC, distributed.net etc. but it
works, and turns around within hours.

Doing the same thing for the Mapnik layer would be a whole 'nother project!

cheers,
LT

___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Efficient processing of map data for rendering (BOINC).

2009-01-21 Thread Chris Andrew
I just looked at the Tiles installation instructions, and it looks
like a nightmare.  It also seems strange that people struggle to
install, when open source easy to install products exist.

I looked at the install for Debian and Ubuntu, and having used
GNU/Linux for 10 years, was surprised at how off-putting it is.

I've got no idea of the history of Tiles, but are we trying to prove
something by making it difficult to use (by the documentation's own
admission)?

This isn't meant to start flaming, it's just an OSM newbie's slant on things.

BTW, loving the whole project and am aiming to slowly map the whole
planet, then maybe beyond ;-)

Cheers,

Chris.

2009/1/21 LeedsTracker leedstrac...@gmail.com:
 2009/1/21 Chris Andrew cjhand...@gmail.com:
 With the OSM community growing by the day, this problem can only get
 bigger.  Does anyone know whether anyone has consider using a
 distributed client [1] such as BOINC [2] to do the _number crunching_?

 The osmarender layers are done with a similar idea:
 http://wiki.openstreetmap.org/wiki/Tiles%40home

 Not quite a neat little client like BOINC, distributed.net etc. but it
 works, and turns around within hours.

 Doing the same thing for the Mapnik layer would be a whole 'nother project!

 cheers,
 LT




-- 
Reasons why you may want to try GNU/Linux:

http://www.getgnulinux.org/

___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb