Hi Rahul, On Tue, Apr 14, 2020 at 07:22:46PM +0000, Rahul Reddy wrote: > I am working on the issue > #1683<https://github.com/osm-search/Nominatim/issues/1683>(Update script for > running updates on a database with multiple countries). > This<https://gist.github.com/krahulreddy/8d08a8b2a77581810effa88d1641a571> is > a modified script I used. This worked for me. But there are a few issues. > > 1) The sequenceNumber in import_status might not be the same for all the > counties. Unless this is fied, there might be data loss during updates. This > could be fixed by changing the structure of the import_status table to allow > country specific entries.(Is it a good idea?)
The previous script would just keep the sequence numbers outside the database in a file per country. You can do that with pyosmium-get-changes, too. Have a look at the '-f' parameter. But I don't see an issue if you keep the numbers in a table in the database. Just create a separate table with all the info you need, i.e. next to the sequence number, you'd also need the replication URL. > 2) In Setup.php, init-updates option gets the latest date from the lib > function getDatabaseDate(), which returns date corresponding to the object > that has the highest osm_id. This would be wrong if the latest changes > include deletions. I think comparing lastimportdate in import_status with the > previous approach could be a good thing. This will help avoid repeated > updates on deleted nodes. It's correct that the function looks for the highest node ID. The little trick here is that it then looks up the date for version 1 of that object. The OSM database assignes node ids sequentially when new objects are created. So it is fair to assume that the node with the highest ID in any OSM file was one of the last ones created. Version 1 is always the 'creation' version, thus giving us a good estimate about the date of the file. There might be some additional deletions or modifications after that date that still made it into the file but that is okay, because Nominatim is "replay-safe". That means you can reapply changes to the database as long as they are still applied in order. That all said, when using multiple files, I would not recommend to use Nominatim's getDatabaseDate() function because the files might be from different dates. You should instead determine the intial sequence ID from the input files directly. pyosmium-get-changes can do that for you. Have a look at: https://docs.osmcode.org/pyosmium/latest/updating_osm_data.html#preparing-the-state-file > I also wrote a shell script to setup db with multiple countries, which can be > found > here<https://gist.github.com/krahulreddy/948679bae414b5bfbdbe5fe489126eea>. Combining the files first. That's nice. We really should get all this into the documentation eventually. My suggestion would be to add an 'Advanced Installations' section in the 'Administration guide' and have a chapter about importing multiple countries there. The scripts can go in the `utils/` directory. > An alternate approach for setting up updates for multiple countries would be > to modify the Replication URL constant. This could be done by editing the > existing utils/update.php, or by maintaining a separate copy of > utils/update.php with necessary modifications. Intersting thought. You could actually borrow a hidden feature from testing to make that work. There is the possibility to inject your own settings before all the standard settings are configured. Just set the NOMINATIM_SETTINGS environment variable to point to your custom php settings file just containing the replication url. [1] You'd still have to modify utils/update.php to make it use a configurable table for the update status. But that sounds okay. Feel free to give this a try. [1] https://github.com/osm-search/Nominatim/blob/master/settings/defaults.php#L4 Kind regards Sarah _______________________________________________ Geocoding mailing list [email protected] https://lists.openstreetmap.org/listinfo/geocoding

