Hi All,

I thought I should give an update on recent Osmosis replication changes.

The existing "minute" diffs are irreparably broken.  The timestamp approach
for extracting changes only works if a long delay is used.  1 hour is
probably a minimum and unacceptable for supposedly minute diffs.
The existing "minute-slow" diffs were created as a means of auditing the
minute diffs.  The number of differences between using a 5 minute delay and
30 minute delay is quite scary.
Both "minute" and "minute-slow" will get disabled soon.

The new "minute-replicate" diffs are a new approach for replication that
uses PostgreSQL transaction ids to identify changed data in the database.
Bugs aside they should never miss data.  I have recently fixed a "one off"
error which was causing the last transaction id in each interval to be
missed in some but not all cases.  I believe this mechanism is now reliable
but only time will tell for sure.  Please post to the dev list if any missed
data is detected.

The very new "hour-replicate" diffs are a new experimental process (totally
untested) that takes "minute-replicate" diffs and rolls them up into hourly
changes.  This is preferable to existing "hourly" diffs due to the
elimination of a delay, and the elimination of the need to query the main
database.  Eventually I will disable the existing "hourly" diffs in favour
of the new "hour-replicate" diffs but this is a bit further off.  Note that
unlike existing hourly diffs, the new hourly diffs are not exactly hour
aligned.  One file will get produced per hour, but the changes they contain
won't necessarily be hour aligned.  This is largely invisible to clients
though, the key thing is still to apply them in order and starting from a
point prior to your initial data import timestamp.

The "history" diffs are in the process of being generated and are well
through 2008 as we speak.  These are effectively daily diffs but aren't
getting deleted on a rolling window basis.  This is effectively creating a
full history dump of the database.  This has been in the wings for a while,
but only possible now that there is some more disk space available.  These
are still timestamp based extracts due to transaction id queries being
useless for historical queries.  As a result of the use of timestamps, these
will be run with a large delay to avoid missing data.  I'll probably set
this delay to 1 day to be safe, but perhaps a couple of hours would be
enough.
The existing "daily" diffs will probably get disabled at some point in the
near future as a result of these new diffs which perform much the same
purpose.

One big point to note is that all new extracts are *full* history diffs
which means that they may contain multiple changes for a single entity.
There is a new (as yet undocumented) task in osmosis called
--simplify-change which allows a full history diff to be collapsed into an
older style delta diff.  This new task may be needed to make some existing
Osmosis tasks work correctly.  This is all fairly new so there will be a few
bugs in Osmosis until this is worked through.

Once this all settles down, all changesets on the planet server will be full
history diffs.  Full history for the life of the db will be available.
Minute diffs will be available that allow an offline database to be kept
within 1-2 minutes of the main API.

That summarises the changes currently happening.  Some of it is a bit
experimental, but that doesn't mean it shouldn't work.  Please yell if you
see any problems, it's the only way they'll get investigated and fixed.

On a related note, the good thing about the new transaction id based
mechanism is that it allows zero delay to be used which means at least
theoretically that we could get much lower than a 1-2 minute delay.  But
achieving less than 1-2 minutes can't be done easily with the existing
file-based distribution approach.  Moving away from a file-based
distribution approach has serious implications for reliability in the face
of server and network outages, cacheability, bandwidth consumption, and
server resource usage.  As a result, the existing approach is likely to
represent the state of the art in the near to medium future.  We need to
stabilise the existing features before attempting new ones :-)

Osmosis itself is in a fairly complete state, but the implications of full
history changes need to be examined.  Once this is sorted out I'll create an
overdue release, in the meantime the nightly builds will have to suffice.

I think that was all, feel free to ask any questions.  But be patient for
responses from me because my connectivity is limited at the moment, I'm not
online every day and it's difficult for me to investigate problems.

Brett
_______________________________________________
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to