Hi All, I thought I should give an update on recent Osmosis replication changes.
The existing "minute" diffs are irreparably broken. The timestamp approach for extracting changes only works if a long delay is used. 1 hour is probably a minimum and unacceptable for supposedly minute diffs. The existing "minute-slow" diffs were created as a means of auditing the minute diffs. The number of differences between using a 5 minute delay and 30 minute delay is quite scary. Both "minute" and "minute-slow" will get disabled soon. The new "minute-replicate" diffs are a new approach for replication that uses PostgreSQL transaction ids to identify changed data in the database. Bugs aside they should never miss data. I have recently fixed a "one off" error which was causing the last transaction id in each interval to be missed in some but not all cases. I believe this mechanism is now reliable but only time will tell for sure. Please post to the dev list if any missed data is detected. The very new "hour-replicate" diffs are a new experimental process (totally untested) that takes "minute-replicate" diffs and rolls them up into hourly changes. This is preferable to existing "hourly" diffs due to the elimination of a delay, and the elimination of the need to query the main database. Eventually I will disable the existing "hourly" diffs in favour of the new "hour-replicate" diffs but this is a bit further off. Note that unlike existing hourly diffs, the new hourly diffs are not exactly hour aligned. One file will get produced per hour, but the changes they contain won't necessarily be hour aligned. This is largely invisible to clients though, the key thing is still to apply them in order and starting from a point prior to your initial data import timestamp. The "history" diffs are in the process of being generated and are well through 2008 as we speak. These are effectively daily diffs but aren't getting deleted on a rolling window basis. This is effectively creating a full history dump of the database. This has been in the wings for a while, but only possible now that there is some more disk space available. These are still timestamp based extracts due to transaction id queries being useless for historical queries. As a result of the use of timestamps, these will be run with a large delay to avoid missing data. I'll probably set this delay to 1 day to be safe, but perhaps a couple of hours would be enough. The existing "daily" diffs will probably get disabled at some point in the near future as a result of these new diffs which perform much the same purpose. One big point to note is that all new extracts are *full* history diffs which means that they may contain multiple changes for a single entity. There is a new (as yet undocumented) task in osmosis called --simplify-change which allows a full history diff to be collapsed into an older style delta diff. This new task may be needed to make some existing Osmosis tasks work correctly. This is all fairly new so there will be a few bugs in Osmosis until this is worked through. Once this all settles down, all changesets on the planet server will be full history diffs. Full history for the life of the db will be available. Minute diffs will be available that allow an offline database to be kept within 1-2 minutes of the main API. That summarises the changes currently happening. Some of it is a bit experimental, but that doesn't mean it shouldn't work. Please yell if you see any problems, it's the only way they'll get investigated and fixed. On a related note, the good thing about the new transaction id based mechanism is that it allows zero delay to be used which means at least theoretically that we could get much lower than a 1-2 minute delay. But achieving less than 1-2 minutes can't be done easily with the existing file-based distribution approach. Moving away from a file-based distribution approach has serious implications for reliability in the face of server and network outages, cacheability, bandwidth consumption, and server resource usage. As a result, the existing approach is likely to represent the state of the art in the near to medium future. We need to stabilise the existing features before attempting new ones :-) Osmosis itself is in a fairly complete state, but the implications of full history changes need to be examined. Once this is sorted out I'll create an overdue release, in the meantime the nightly builds will have to suffice. I think that was all, feel free to ask any questions. But be patient for responses from me because my connectivity is limited at the moment, I'm not online every day and it's difficult for me to investigate problems. Brett
_______________________________________________ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev