Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Jochen Topf
On Sun, Oct 26, 2008 at 06:11:04PM -0700, Michal Migurski wrote:
 What is the difference between osmosis and osm2pgsql, with regards to  
 postGIS?

osm2pgsql creates the structure needed for Mapnik. Osmosis creates a
structure more simliar to the one in the OSM central database.

 If I've been maintaining a dataset based on osm2pgsql with the  
 provided default.style, would a dataset based on osmosis result in a  
 substantially different table structure?

Yes.

Jochen
-- 
Jochen Topf  [EMAIL PROTECTED]  http://www.remote.org/jochen/  +49-721-388298


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Tom Hughes
Shaun McDonald wrote:
 On 27 Oct 2008, at 00:50, Michal Migurski wrote:
 
 Planet dumps are not snapshots - they do not represent a consistent
 view at any particular point in time because they take a number of
 hours to generate, during which time new changes are constantly
 being made to the contents of the database.
 
 Shouldn't it be possible to ignore any changes that happen after the
 cutoff, though?
 
 At the moment we don't look at the time stamps when dumping the planet  
 file.

It's not as simple as that - you also have to switch to reading the 
history tables rather than the current tables or you won't be able to 
see what the state of the object used to be if it has changed since the 
snapshot time.

Which means you're reading much more data, and either having to track 
the state of each object (in order to find the most recent valid change) 
or you have to index scan so that you're seeing things in timestamp order.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Jochen Topf
On Mon, Oct 27, 2008 at 08:22:32AM +, Tom Hughes wrote:
 Shaun McDonald wrote:
  On 27 Oct 2008, at 00:50, Michal Migurski wrote:
  
  Planet dumps are not snapshots - they do not represent a consistent
  view at any particular point in time because they take a number of
  hours to generate, during which time new changes are constantly
  being made to the contents of the database.
  
  Shouldn't it be possible to ignore any changes that happen after the
  cutoff, though?
  
  At the moment we don't look at the time stamps when dumping the planet  
  file.
 
 It's not as simple as that - you also have to switch to reading the 
 history tables rather than the current tables or you won't be able to 
 see what the state of the object used to be if it has changed since the 
 snapshot time.
 
 Which means you're reading much more data, and either having to track 
 the state of each object (in order to find the most recent valid change) 
 or you have to index scan so that you're seeing things in timestamp order.

If the planet dump plus the diff from the same day is what everybody
wants anyway, why not do this on the server side and hold the planet
back after the first diff is available, run this over the planet and
then publish that as the planet?

Jochen
-- 
Jochen Topf  [EMAIL PROTECTED]  http://www.remote.org/jochen/  +49-721-388298


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Brett Henderson
Others have already commented on most of your points but I'll add my 
thoughts in case there's some gaps.

Michal Migurski wrote:
 Hi,

 I've been trying to keep up to date with the dumps and diffs from 
 http://planet.openstreetmap.org/ 
 , and I'm running into a number of bugs related to cutoff dates.

 In keeping my Bay Area tiles 
 (http://mike.teczno.com/notes/cascadenik-openstreetmap.html 
 ) up to date, I've been grabbing complete planet.osm dumps about once  
 per month, and filling in the intervening time with daily diffs. I've  
 noticed some misalignments between the data in the dumps and the  
 osm2pgsql importer that leads to unavoidable holes in the data.

 It seems that they could be fixed in either osm2pgsql, the planet  
 files, or both.

 The final event in each weekly planet dump does not fall on an even  
 day boundary. In the case of the most recent Oct. 22nd planet.osm, it  
 was necessary to experiment with hourly diffs from that day to find  
 that the boundary was approx. 2:00pm. Hourlies up to and including  
 2008102213-2008102214.osc.gz failed, hourlies after that succeeded. I  
 could go more granular here, checking the minute diffs as well for a  
 more precise breakpoint, but it seems odd that the planet dump does  
 not break cleanly on a midnight boundary so that it's possible to pick  
 up the differences moving forward.
   
Yep, as others have commented there are two tables types in the osm 
database; current tables, and history tables.  The planet dumper just 
reads current tables which is the fastest approach.  Unfortunately the 
current tables change constantly during the planet generation process 
resulting in inconsistencies.  It is possible to produce a consistent 
snapshot reading history tables and osmosis has the ability to do just 
that but it is significantly slower.  It is also possible to produce a 
consistent snapshot by taking an inconsistent planet and applying 
changesets from a point in time prior to the planet dump beginning 
through to a point after completion, this effectively produces the same 
result at much reduced load on the main database.
 osm2pgsql itself notifies the user of inconsistencies by failing. I  
 can see that effort has been put into making it more resilient (e.g. 
 http://trac.openstreetmap.org/changeset/10464) 
 . Does osm2pgsql have something like a `--force` switch? I haven't  
 been able to find one. In looking at the diff files, it seems that it  
 should be possible to ignore possible conflicts by simply overwriting  
 whatever's in the DB with whatever's in the .osc file.
   
Yes, that's true.  I can't comment on osm2pgsql but when osmosis 
processes changeset files it does exactly that.
 Finally, the boundaries between the hourlies and dailies seem  
 misaligned.
   
This shouldn't be the case.
 After running the remaining hourlies for the 22nd, I attempted to pick  
 up on the 23rd with a daily. The final hourly I used was  
 2008102223-2008102300.osc.gz. It's my expectation that I should be  
 able to immediately follow that with 20081023-20081024.osc.gz, but  
 this led to duplicate key violation suggesting that there's an overlap  
 between the two files. Continuing with hourlies *works*, but is  
 tedious and I suspect slower than the dailies.
   
You should have been able to do what you've suggested.  If you are 
finding problems, please provide me with some example data which is 
misaligned between the two types of changesets.  I've gone to a fair bit 
of trouble to ensure that timestamp management is correct.  For example, 
all changesets and file names are using UTC even though the database 
itself is using BST.  If I've made a mistake somewhere I'd like to know 
about it.  Given that daily, hourly and minute changesets are using 
*identical* code, I find it hard to believe they're inconsistent with 
each other.
 My sense from reading other people's experiences has been that it's a  
 common pattern to rely solely on the weekly planet dumps, incurring  
 the substantial overhead of parsing and importing the full 5GB dump  
 once every week, and then re-rendering the complete set of tiles.
   
For a long time weekly planet dumps were the only bulk data available.  
Osmosis changesets have been on the scene for some time now though and 
are gradually being utilised by more and more clients.  As the planet 
grows, this will become more critical.  Who knows, if the kinks 
gradually get ironed out of the osm2pgsql program we may even begin to 
see the main mapnik tile generator move to using changesets.
 My hope has been to proceed in a more incremental fashion, since this  
 makes it possible to track what specific tiles need to be re-rendered  
 on a near-constant schedule, based on actual content or activity, vs.  
 simple cache expiration. Right now I'm doing this daily, I'd like to  
 do it as often as hourly.
   
Yep, that was one of my original aims.
 I can see a few possible solutions.

 The cutoff times for 

Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Brett Henderson
Jochen Topf wrote:
 If the planet dump plus the diff from the same day is what everybody
 wants anyway, why not do this on the server side and hold the planet
 back after the first diff is available, run this over the planet and
 then publish that as the planet?
   
It would add delay to the planet creation process.  I don't know how 
much of an issue that would be.

How many people still download the full planet on a regular basis?  I 
would hope that people would begin to use changesets even if they only 
require a complete xml file.  For bandwidth reasons alone the gains are 
well worthwhile, plus you can get far more regular updates than weekly.  
The script below automates keeping a snapshot file in sync:
http://svn.openstreetmap.org/applications/utils/osmosis/script/contrib/replicate_osm_file.sh


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Frederik Ramm
Hi,

Brett Henderson wrote:
 Brett Henderson has offered to look into creating the dailies from 
 history as well, but I don't know about the status of that.
   
 Are you referring to the daily changesets? 
[...]
 Or did you mean planets instead of dailies? 

Mix-up on my part, sorry, yes I meant the planets.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail [EMAIL PROTECTED]  ##  N49°00'09 E008°23'33

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Michal Migurski
 Yep, as others have commented there are two tables types in the osm  
 database; current tables, and history tables.  The planet dumper  
 just reads current tables which is the fastest approach.   
 Unfortunately the current tables change constantly during the  
 planet generation process resulting in inconsistencies.  It is  
 possible to produce a consistent snapshot reading history tables  
 and osmosis has the ability to do just that but it is significantly  
 slower.  It is also possible to produce a consistent snapshot by  
 taking an inconsistent planet and applying changesets from a point  
 in time prior to the planet dump beginning through to a point after  
 completion, this effectively produces the same result at much  
 reduced load on the main database.


I'm liking Jochen Topf's suggestion here:

If the planet dump plus the diff from the same day is what everybody  
wants anyway, why not do this on the server side and hold the planet  
back after the first diff is available, run this over the planet and  
then publish that as the planet?


 Finally, the boundaries between the hourlies and dailies seem   
 misaligned.


 This shouldn't be the case.
 After running the remaining hourlies for the 22nd, I attempted to  
 pick  up on the 23rd with a daily. The final hourly I used was   
 2008102223-2008102300.osc.gz. It's my expectation that I should be   
 able to immediately follow that with 20081023-20081024.osc.gz, but   
 this led to duplicate key violation suggesting that there's an  
 overlap  between the two files. Continuing with hourlies *works*,  
 but is  tedious and I suspect slower than the dailies.


 You should have been able to do what you've suggested.  If you are  
 finding problems, please provide me with some example data which is  
 misaligned between the two types of changesets.

Try the two files mentioned above - that's where I saw this behavior,  
they're quite recent.

2008102223-2008102300.osc.gz
20081023-20081024.osc.gz


 My sense from reading other people's experiences has been that it's  
 a  common pattern to rely solely on the weekly planet dumps,  
 incurring  the substantial overhead of parsing and importing the  
 full 5GB dump  once every week, and then re-rendering the complete  
 set of tiles.


 For a long time weekly planet dumps were the only bulk data  
 available.  Osmosis changesets have been on the scene for some time  
 now though and are gradually being utilised by more and more  
 clients.  As the planet grows, this will become more critical.  Who  
 knows, if the kinks gradually get ironed out of the osm2pgsql  
 program we may even begin to see the main mapnik tile generator move  
 to using changesets.

I would love to rely on these exclusively, it's much more efficient.  
But, I was seeing a fair bit of information fall through the cracks so  
that's why I'm re-synching to planet every four weeks.



 I can see a few possible solutions.

 The cutoff times for files on planet.openstreetmap.org could  
 behave  more consistently. A weekly dump should end at 11:59pm so  
 that dailies  can immediately pick up user activity. Hourly and  
 daily dumps should  be synchronized. This seems more difficult.


 You only need a single consistent snapshot to get started.  You can  
 download a planet, then download the two daily changesets either  
 side of the planet generation window, then use osmosis to patch the  
 planet.  This will give you a consistent snapshot.  Once you've  
 imported that into your target database you can then start using  
 daily changesets to keep up to date (or hourly or minute as  
 appropriate).

 While it would be nice to have planet dumps already in consistent  
 form, it does add a significant overhead to the whole process.  It's  
 not terribly hard to fix on the client side.

Probably what I need to do is get a fresh update of osm2pgsql. I can  
see now that the revision I'm using is older than #10464, where some  
inconsistency resilience was added.


-mike.




michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Martijn van Oosterhout
On Mon, Oct 27, 2008 at 9:39 PM, Michal Migurski [EMAIL PROTECTED] wrote:
 I'm liking Jochen Topf's suggestion here:

If the planet dump plus the diff from the same day is what everybody
 wants anyway, why not do this on the server side and hold the planet
 back after the first diff is available, run this over the planet and
 then publish that as the planet?

1. Because there are plenty of uses for the planet dump that don't
need consistant snapshots.

2. Because such consistant snapshots have been available elsewhere for
quite a while now and people who need them can get them. There's no
particular reason why it has to be on the same site as the normal
planet dumps.
 Probably what I need to do is get a fresh update of osm2pgsql. I can
 see now that the revision I'm using is older than #10464, where some
 inconsistency resilience was added.

Umm, yeah. I was ofcourse assuming you were running the latest
version, otherwise anything is possible, The creates-as-modifies fix
was done two months ago.

Have a nice day,
-- 
Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Martijn van Oosterhout
On Mon, Oct 27, 2008 at 9:40 PM, Michal Migurski [EMAIL PROTECTED] wrote:
 Now that I think about it though, I think what I did was take one of
 the planet dumps from http://hypercube.telascience.org/planet/ (which
 *are* consistant snapshots), and run the dailies from there.

 Is there any reason to not use those? They seem to be more frequent
 than the planet.openstreetmap.org ones - is there some disadvantage?
 How are they created?

Umm, they are created by taking the planet dumps and applying the
daily diffs every day. They are used to produce consistant snapshots
of for example, NL and by the coastline checker (which really likes
having consistant snapshots to work with).

Have a nice day,
-- 
Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Brett Henderson
On Tue, Oct 28, 2008 at 7:39 AM, Michal Migurski [EMAIL PROTECTED] wrote:

  Finally, the boundaries between the hourlies and dailies seem
  misaligned.
 
 
  This shouldn't be the case.
  After running the remaining hourlies for the 22nd, I attempted to
  pick  up on the 23rd with a daily. The final hourly I used was
  2008102223-2008102300.osc.gz. It's my expectation that I should be
  able to immediately follow that with 20081023-20081024.osc.gz, but
  this led to duplicate key violation suggesting that there's an
  overlap  between the two files. Continuing with hourlies *works*,
  but is  tedious and I suspect slower than the dailies.
 
 
  You should have been able to do what you've suggested.  If you are
  finding problems, please provide me with some example data which is
  misaligned between the two types of changesets.

 Try the two files mentioned above - that's where I saw this behavior,
 they're quite recent.

2008102223-2008102300.osc.gz
20081023-20081024.osc.gz


I need you to provide some specific examples of broken data.  If you can say
that way 27123456 is created in both of the above files even though they
are for different time periods then I can take a look at why this may have
occurred.  Just saying that there is misalignment between those two files
doesn't help me at all.  Presumably you ran into a specific problem and
received a specific error message, this is the kind of information I need.
I only do this project in my spare time and can't go looking for problems
that I'm not sure even exist, I have enough known problems to look into
already :-)





  My sense from reading other people's experiences has been that it's
  a  common pattern to rely solely on the weekly planet dumps,
  incurring  the substantial overhead of parsing and importing the
  full 5GB dump  once every week, and then re-rendering the complete
  set of tiles.
 
 
  For a long time weekly planet dumps were the only bulk data
  available.  Osmosis changesets have been on the scene for some time
  now though and are gradually being utilised by more and more
  clients.  As the planet grows, this will become more critical.  Who
  knows, if the kinks gradually get ironed out of the osm2pgsql
  program we may even begin to see the main mapnik tile generator move
  to using changesets.

 I would love to rely on these exclusively, it's much more efficient.
 But, I was seeing a fair bit of information fall through the cracks so
 that's why I'm re-synching to planet every four weeks.


Again, please provide some specific examples.  If data is being missed I'd
like to know about it.  Osmosis provides some tools that may be useful
here.  You can download a planet, apply changesets for a week, then compare
against the next planet and see what the differences are.  Obviously both
planets would need appropriate changesets applied to make them consistent
before performing a comparison to eliminate noise.

I probably should do some of these comparisons myself, but again just
haven't found time yet and nobody else has complained about missing data.
The minute changesets run 5 minutes behind the API so could potentially miss
data if a lock is held for several minutes.  The daily and hourly changesets
run at least 20 minutes behind API (forget off the top of my head) and
should be extremely unlikely to miss data.

Brett
___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-27 Thread Michal Migurski
 On Mon, Oct 27, 2008 at 9:39 PM, Michal Migurski [EMAIL PROTECTED]  
 wrote:
 I'm liking Jochen Topf's suggestion here:

   If the planet dump plus the diff from the same day is what  
 everybody
 wants anyway, why not do this on the server side and hold the planet
 back after the first diff is available, run this over the planet and
 then publish that as the planet?

 1. Because there are plenty of uses for the planet dump that don't
 need consistant snapshots.

Those uses would not be impacted by consistent snapshots.


 2. Because such consistant snapshots have been available elsewhere for
 quite a while now and people who need them can get them. There's no
 particular reason why it has to be on the same site as the normal
 planet dumps.

Yet there is no link to these places from planet.openstreetmap.org  
that indicates that the files available there differ in some important  
or useful way. The telascience.org source you suggested is described  
as extracts of NL, Scandinavia and Taiwan at 
http://wiki.openstreetmap.org/index.php/Planet.osm 
, rather than a complete dump of Planet with different datetime  
boundaries.

I'm happy to keep bellying up to the trial  error bar here, but as I  
mention in a previous mail, the volume of data involved means that  
individual attempts at the data (successful or not) have multiple-day  
costs associated with them.


 Umm, yeah. I was ofcourse assuming you were running the latest
 version, otherwise anything is possible, The creates-as-modifies fix
 was done two months ago.


I'll recompile and replace the two-month-old version of osm2pgsql I've  
been using.

-mike.


michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-26 Thread Tom Hughes
Michal Migurski wrote:

 The final event in each weekly planet dump does not fall on an even  
 day boundary. In the case of the most recent Oct. 22nd planet.osm, it  
 was necessary to experiment with hourly diffs from that day to find  
 that the boundary was approx. 2:00pm. Hourlies up to and including  
 2008102213-2008102214.osc.gz failed, hourlies after that succeeded. I  
 could go more granular here, checking the minute diffs as well for a  
 more precise breakpoint, but it seems odd that the planet dump does  
 not break cleanly on a midnight boundary so that it's possible to pick  
 up the differences moving forward.

Planet dumps are not snapshots - they do not represent a consistent view 
at any particular point in time because they take a number of hours to 
generate, during which time new changes are constantly being made to the 
contents of the database.

I believe that it is supposed to be safe to apply diffs which overlap 
with the planet dump in order to bring it to a consistent state however.

 The cutoff times for files on planet.openstreetmap.org could behave  
 more consistently. A weekly dump should end at 11:59pm so that dailies  
 can immediately pick up user activity. Hourly and daily dumps should  
 be synchronized. This seems more difficult.

As explained above, there is no cutoff time as such, and it isn't 
possible to implement one as things stand. It may be possible once we 
have working transactions, though it's not clear that a transaction that 
lasts many hours would be sensible or workable.

BTW I'm not sure why you CCed the OSMF board on this... I don't think it 
needs their input at all.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-26 Thread Frederik Ramm
Hi,

Michal Migurski wrote:
 I've noticed some misalignments between the data in the dumps and the  
 osm2pgsql importer that leads to unavoidable holes in the data.

As TomH has already said, this is not a bug, it stems from the fact that 
the full planet export reads the current tables and as such is subject 
to changes that occur during the export process. (There may even be 
inconsistencies when something like this happens: Exporter dumps nodes, 
exporter starts dumping ways, user adds new node into way, new way 
version is dumped referring to new node that is not in the dump.)

The daily, hourly, and minutely diffs have a clean cutoff date because 
they are taken from the history tables.

Brett Henderson has offered to look into creating the dailies from 
history as well, but I don't know about the status of that.

If you use osmosis, it is safe (and in fact recommended) that, after 
loading the database with a planet file initially, you should load that 
same day's diff file as the first diff, creating a clean cutoff point. 
It is possible that the same is not working with osm2pgsql, I have no 
experience there.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail [EMAIL PROTECTED]  ##  N49°00'09 E008°23'33

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-26 Thread Michal Migurski
 The final event in each weekly planet dump does not fall on an  
 even  day boundary. In the case of the most recent Oct. 22nd  
 planet.osm, it  was necessary to experiment with hourly diffs from  
 that day to find  that the boundary was approx. 2:00pm. Hourlies up  
 to and including  2008102213-2008102214.osc.gz failed, hourlies  
 after that succeeded. I  could go more granular here, checking the  
 minute diffs as well for a  more precise breakpoint, but it seems  
 odd that the planet dump does  not break cleanly on a midnight  
 boundary so that it's possible to pick  up the differences moving  
 forward.

 Planet dumps are not snapshots - they do not represent a consistent  
 view at any particular point in time because they take a number of  
 hours to generate, during which time new changes are constantly  
 being made to the contents of the database.

Shouldn't it be possible to ignore any changes that happen after the  
cutoff, though? I may not understand the structure of the OSM  
database, but it seems like if it supports rollbacks, then in theory  
it ought to be possible to only include things before a given  
timestamp when creating the dump file. That, or make it clear what the  
actual cutoff time is in the dumpfile.

I understand that in practice, practice is different from theory. =)


 I believe that it is supposed to be safe to apply diffs which  
 overlap with the planet dump in order to bring it to a consistent  
 state however.

This is what I would have hoped, however osm2pgsql does not appear to  
allow it. It feels like the easiest solution would be to give  
osm2pgsql a --force option, and add some explanation of timing and  
cutoffs to http://planet.openstreetmap.org/README.


 BTW I'm not sure why you CCed the OSMF board on this... I don't  
 think it needs their input at all.

Mikel Maron suggested that I cc: team@, when I spoke to him about this  
a few days ago, because it's connected to a *.openstreetmap.org service.

Thanks for your reply!



-mike.


michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-26 Thread Michal Migurski
On Oct 26, 2008, at 5:50 PM, Frederik Ramm wrote:

 Brett Henderson has offered to look into creating the dailies from  
 history as well, but I don't know about the status of that.

 If you use osmosis, it is safe (and in fact recommended) that, after  
 loading the database with a planet file initially, you should load  
 that same day's diff file as the first diff, creating a clean cutoff  
 point. It is possible that the same is not working with osm2pgsql, I  
 have no experience there.


What is the difference between osmosis and osm2pgsql, with regards to  
postGIS?

If I've been maintaining a dataset based on osm2pgsql with the  
provided default.style, would a dataset based on osmosis result in a  
substantially different table structure?

-mike.


michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] osm2pgsql planet: frustrations, cutoffs, and idempotence

2008-10-26 Thread Shaun McDonald

On 27 Oct 2008, at 00:50, Michal Migurski wrote:

 The final event in each weekly planet dump does not fall on an
 even  day boundary. In the case of the most recent Oct. 22nd
 planet.osm, it  was necessary to experiment with hourly diffs from
 that day to find  that the boundary was approx. 2:00pm. Hourlies up
 to and including  2008102213-2008102214.osc.gz failed, hourlies
 after that succeeded. I  could go more granular here, checking the
 minute diffs as well for a  more precise breakpoint, but it seems
 odd that the planet dump does  not break cleanly on a midnight
 boundary so that it's possible to pick  up the differences moving
 forward.

 Planet dumps are not snapshots - they do not represent a consistent
 view at any particular point in time because they take a number of
 hours to generate, during which time new changes are constantly
 being made to the contents of the database.

 Shouldn't it be possible to ignore any changes that happen after the
 cutoff, though?

At the moment we don't look at the time stamps when dumping the planet  
file.

 I may not understand the structure of the OSM
 database, but it seems like if it supports rollbacks, then in theory
 it ought to be possible to only include things before a given
 timestamp when creating the dump file. That, or make it clear what the
 actual cutoff time is in the dumpfile.

We currently don't support rollbacks. It would require a rewrite of  
the dump script, and more time and processing to be able to produce a  
consistent planet dump.



 I understand that in practice, practice is different from theory. =)

Have you got the rails port running?




 I believe that it is supposed to be safe to apply diffs which
 overlap with the planet dump in order to bring it to a consistent
 state however.

 This is what I would have hoped, however osm2pgsql does not appear to
 allow it. It feels like the easiest solution would be to give
 osm2pgsql a --force option, and add some explanation of timing and
 cutoffs to http://planet.openstreetmap.org/README.


The initial import that you do with osm2pgsql, must be using a special  
mode to allow diff imports. Could it be that you need to update to the  
latest version of osm2pgsql? You should be able to happily apply the  
diffs to an inconsistent planet dump, to get a consistent planet dump.

This will become easier when the version numbers are exposed in the  
0.6 API. The diff mechanism would then be able to look at the version  
numbers of the nodes/ways/relations and be able to deal with them  
appropriately.

Shaun


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk