On 4 July 2015 at 10:47, Simon Nuttall <[email protected]> wrote: > On 3 July 2015 at 19:20, Sarah Hoffmann <[email protected]> wrote: >> On Fri, Jul 03, 2015 at 07:23:54AM +0100, Simon Nuttall wrote: >>> Now it is showing these again: >>>· >>> Done 274 in 136 @ 2.014706 per second - Rank 26 ETA (seconds): 2467.854004 >>>· >>> Presumably this means it is now playing catchup relative to the >>> original download data? >> >> I would suppose so. >> >>> How can I tell what date it has caught up to? (And thus get an idea of >>> when it is likely to finish?) >> >> Have a look at the import_osmosis_log table. It gives you a good idea >> how long the batches take. > > Ah yes - pretty slow :-( > > nominatim=# select * from import_osmosis_log order by endtime desc limit 12; > batchend | batchsize | starttime | endtime > | event > ---------------------+-----------+---------------------+---------------------+----------- > 2015-06-09 12:54:02 | 40037028 | 2015-07-04 09:30:16 | 2015-07-04 > 09:30:29 | osmosis > 2015-06-09 11:55:01 | 36866133 | 2015-07-04 08:57:52 | 2015-07-04 > 09:30:16 | index > 2015-06-09 11:55:01 | 36866133 | 2015-07-04 08:34:17 | 2015-07-04 > 08:57:52 | osm2pgsql > 2015-06-09 11:55:01 | 36866133 | 2015-07-04 08:34:06 | 2015-07-04 > 08:34:17 | osmosis > 2015-06-09 10:55:02 | 42220289 | 2015-07-04 08:06:14 | 2015-07-04 > 08:34:06 | index > 2015-06-09 10:55:02 | 42220289 | 2015-07-04 07:41:23 | 2015-07-04 > 08:06:14 | osm2pgsql > 2015-06-09 10:55:02 | 42220289 | 2015-07-04 07:41:11 | 2015-07-04 > 07:41:23 | osmosis > 2015-06-09 09:55:02 | 34076756 | 2015-07-04 07:14:30 | 2015-07-04 > 07:41:11 | index > 2015-06-09 09:55:02 | 34076756 | 2015-07-04 06:53:59 | 2015-07-04 > 07:14:30 | osm2pgsql > 2015-06-09 09:55:02 | 34076756 | 2015-07-04 06:53:49 | 2015-07-04 > 06:53:59 | osmosis > 2015-06-09 08:56:01 | 26087298 | 2015-07-04 06:20:20 | 2015-07-04 > 06:53:49 | index > 2015-06-09 08:56:01 | 26087298 | 2015-07-04 06:07:22 | 2015-07-04 > 06:20:20 | osm2pgsql > > >> >>> Is it catching up by downloading minutely diffs or using larger >>> intervals, then switching to minutely diffs when it is almost fully up >>> to date? >> >> That depends how you have configured it. If it is set to the URL >> of the minutelies it will use minutely diffs but accumulate them >> to batches of the size you have configured. When it has caught up >> it will just accumulate the latest minutelies, so batches become >> smaller. > > Ah yes, I see the configuration.txt has:
(oops - last email was sent prematurely) # The URL of the directory containing change files. baseUrl=http://planet.openstreetmap.org/replication/minute # Defines the maximum time interval in seconds to download in a single invocation. # Setting to 0 disables this feature. maxInterval = 3600 > > >> >>> This phase still seems very disk intensive, will that settle down and >>> become much less demanding when it has eventually got up to date? >> >> It will become less but there still is IO going on. Given that your >> initial import took about 10 times as long as the best time I've seen, >> it will probably take a long time to catch up. You should consider >> running with --index-instances 2 while catching up and you should >> really investigate where the bottleneck in the system is. I notice that our postgresql.conf has work_mem = 512MB which seems a bit small? But this seems healthy: maintenance_work_mem = 10GB >> >>> Can the whole installed running Nominatim be copied to another >>> machine? And set running? >>> >>> Presumably this is a database dump and copy - but how practical is that? >> >> Yes, dump and restore is possible. You should be aware that indexes >> are not dumped, so it still takes a day or two to restore the complete >> database. >> >>> Are there alternative ideas such as replication or backup? >> >> For backup you can do partial dumps that contain only tables needed >> for querying the database. These dumps can be faster restored but >> they are not updateable, so they are more of an interim solution >> to install on a spare emergency server while the main DB is reimported. >> The dump/backup script used for the osm.org servers can be found here: >> >> https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/templates/default/backup-nominatim.erb >> >> If you go down that road, I recommend actually trying the restore >> at least once, so you get an idea about the time and space requirements. >> >> Replication is possible as well. In fact, the two osm.org servers have >> been running as master and slave with streaming replication for about >> two weeks now. You should disable writing logs to the database. >> Otherwise the setup is fairly standard, following largely this >> guide: https://wiki.postgresql.org/wiki/Streaming_Replication You've put off trying this - for now at least. >> >>> > string(123) "INSERT INTO import_osmosis_log values >>> > ('2015-06-08T07:58:02Z',25816916,'2015-07-03 06:07:34','2015-07-03 >>> > 06:44:10','index')" >>> > 2015-07-03 06:44:10 Completed index step for 2015-06-08T07:58:02Z in >>> > 36.6 minutes >>> > 2015-07-03 06:44:10 Completed all for 2015-06-08T07:58:02Z in 58.05 >>> > minutes >>> > 2015-07-03 06:44:10 Sleeping 0 seconds >>> > /usr/local/bin/osmosis --read-replication-interval >>> > workingDirectory=/home/nominatim/Nominatim/settings --simplify-change >>> > --write-xml-change /home/nominatim/Nominatim/data/osmosischange.osc >>> > >>> > Which presumably means it is updating June 8th? (What else can I read >>> > from this?) >> >> See above, check out the import_osmosis_log. The important thing to take >> away is how long it takes to update which interval. If on average the >> import takes longer than real time you are in trouble. >> >>> > Also, at what point is it safe to expose the Nominatim as a live service? >> >> As soon as the import is finished. Search queries might interfere with >> the updates when your server gets swarmed with lots of parallel queries >> but I doubt that you have enough traffic for that. Yeah - shouldn't be too many - at this stage. >> Just make sure to keep >> the number of requests that can hit the database in parallel at a moderate >> level. Use php-fpm with limited pools for that and experiment with the >> limits until you get the maximum performance. >> >> Sarah > > _______________________________________________ Geocoding mailing list [email protected] https://lists.openstreetmap.org/listinfo/geocoding

