On 4 July 2015 at 10:52, Simon Nuttall <[email protected]> wrote: > On 4 July 2015 at 10:47, Simon Nuttall <[email protected]> wrote: >> On 3 July 2015 at 19:20, Sarah Hoffmann <[email protected]> wrote: >>> On Fri, Jul 03, 2015 at 07:23:54AM +0100, Simon Nuttall wrote: >>>> Now it is showing these again: >>>>· >>>> Done 274 in 136 @ 2.014706 per second - Rank 26 ETA (seconds): >>>> 2467.854004 >>>>· >>>> Presumably this means it is now playing catchup relative to the >>>> original download data? >>> >>> I would suppose so. >>> >>>> How can I tell what date it has caught up to? (And thus get an idea of >>>> when it is likely to finish?) >>> >>> Have a look at the import_osmosis_log table. It gives you a good idea >>> how long the batches take. >> >> Ah yes - pretty slow :-( >> >> nominatim=# select * from import_osmosis_log order by endtime desc limit 12; >> batchend | batchsize | starttime | endtime >> | event >> ---------------------+-----------+---------------------+---------------------+----------- >> 2015-06-09 12:54:02 | 40037028 | 2015-07-04 09:30:16 | 2015-07-04 >> 09:30:29 | osmosis >> 2015-06-09 11:55:01 | 36866133 | 2015-07-04 08:57:52 | 2015-07-04 >> 09:30:16 | index >> 2015-06-09 11:55:01 | 36866133 | 2015-07-04 08:34:17 | 2015-07-04 >> 08:57:52 | osm2pgsql >> 2015-06-09 11:55:01 | 36866133 | 2015-07-04 08:34:06 | 2015-07-04 >> 08:34:17 | osmosis >> 2015-06-09 10:55:02 | 42220289 | 2015-07-04 08:06:14 | 2015-07-04 >> 08:34:06 | index >> 2015-06-09 10:55:02 | 42220289 | 2015-07-04 07:41:23 | 2015-07-04 >> 08:06:14 | osm2pgsql >> 2015-06-09 10:55:02 | 42220289 | 2015-07-04 07:41:11 | 2015-07-04 >> 07:41:23 | osmosis >> 2015-06-09 09:55:02 | 34076756 | 2015-07-04 07:14:30 | 2015-07-04 >> 07:41:11 | index >> 2015-06-09 09:55:02 | 34076756 | 2015-07-04 06:53:59 | 2015-07-04 >> 07:14:30 | osm2pgsql >> 2015-06-09 09:55:02 | 34076756 | 2015-07-04 06:53:49 | 2015-07-04 >> 06:53:59 | osmosis >> 2015-06-09 08:56:01 | 26087298 | 2015-07-04 06:20:20 | 2015-07-04 >> 06:53:49 | index >> 2015-06-09 08:56:01 | 26087298 | 2015-07-04 06:07:22 | 2015-07-04 >> 06:20:20 | osm2pgsql >> >> >>> >>>> Is it catching up by downloading minutely diffs or using larger >>>> intervals, then switching to minutely diffs when it is almost fully up >>>> to date? >>> >>> That depends how you have configured it. If it is set to the URL >>> of the minutelies it will use minutely diffs but accumulate them >>> to batches of the size you have configured. When it has caught up >>> it will just accumulate the latest minutelies, so batches become >>> smaller. >> >> Ah yes, I see the configuration.txt has: > > (oops - last email was sent prematurely) > > # The URL of the directory containing change files. > baseUrl=http://planet.openstreetmap.org/replication/minute > > # Defines the maximum time interval in seconds to download in a single > invocation. > # Setting to 0 disables this feature. > maxInterval = 3600 > >> >> >>> >>>> This phase still seems very disk intensive, will that settle down and >>>> become much less demanding when it has eventually got up to date? >>> >>> It will become less but there still is IO going on. Given that your >>> initial import took about 10 times as long as the best time I've seen, >>> it will probably take a long time to catch up. You should consider >>> running with --index-instances 2 while catching up and you should >>> really investigate where the bottleneck in the system is. > > I notice that our postgresql.conf has > > work_mem = 512MB > > which seems a bit small? > > But this seems healthy: > maintenance_work_mem = 10GB > >>> >>>> Can the whole installed running Nominatim be copied to another >>>> machine? And set running? >>>> >>>> Presumably this is a database dump and copy - but how practical is that? >>> >>> Yes, dump and restore is possible. You should be aware that indexes >>> are not dumped, so it still takes a day or two to restore the complete >>> database. >>> >>>> Are there alternative ideas such as replication or backup? >>> >>> For backup you can do partial dumps that contain only tables needed >>> for querying the database. These dumps can be faster restored but >>> they are not updateable, so they are more of an interim solution >>> to install on a spare emergency server while the main DB is reimported. >>> The dump/backup script used for the osm.org servers can be found here: >>> >>> https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/templates/default/backup-nominatim.erb >>> >>> If you go down that road, I recommend actually trying the restore >>> at least once, so you get an idea about the time and space requirements. >>> >>> Replication is possible as well. In fact, the two osm.org servers have >>> been running as master and slave with streaming replication for about >>> two weeks now. You should disable writing logs to the database. >>> Otherwise the setup is fairly standard, following largely this >>> guide: https://wiki.postgresql.org/wiki/Streaming_Replication > > You've put off trying this - for now at least. > >>> >>>> > string(123) "INSERT INTO import_osmosis_log values >>>> > ('2015-06-08T07:58:02Z',25816916,'2015-07-03 06:07:34','2015-07-03 >>>> > 06:44:10','index')" >>>> > 2015-07-03 06:44:10 Completed index step for 2015-06-08T07:58:02Z in >>>> > 36.6 minutes >>>> > 2015-07-03 06:44:10 Completed all for 2015-06-08T07:58:02Z in 58.05 >>>> > minutes >>>> > 2015-07-03 06:44:10 Sleeping 0 seconds >>>> > /usr/local/bin/osmosis --read-replication-interval >>>> > workingDirectory=/home/nominatim/Nominatim/settings --simplify-change >>>> > --write-xml-change /home/nominatim/Nominatim/data/osmosischange.osc >>>> > >>>> > Which presumably means it is updating June 8th? (What else can I read >>>> > from this?) >>> >>> See above, check out the import_osmosis_log. The important thing to take >>> away is how long it takes to update which interval. If on average the >>> import takes longer than real time you are in trouble. >>> >>>> > Also, at what point is it safe to expose the Nominatim as a live service? >>> >>> As soon as the import is finished. Search queries might interfere with >>> the updates when your server gets swarmed with lots of parallel queries >>> but I doubt that you have enough traffic for that. > > Yeah - shouldn't be too many - at this stage. > >>> Just make sure to keep >>> the number of requests that can hit the database in parallel at a moderate >>> level. Use php-fpm with limited pools for that and experiment with the >>> limits until you get the maximum performance. >>> >>> Sarah >> >>
Just a few more questions... I'll need to restart postgres to have some config changes take effect so this means I'll need to interrupt the updates. During the ./utils/update.php --import-osmosis-all --no-npi --osm2pgsql-cache 24000 --index-instances 2 phase can I stop it at anytime with Ctrl+C ? Or only when I am seeing lines like.. Done 929 in 147 @ 6.319728 per second - Rank 26 ETA (seconds): 801.616821 To resume do I just use the same command or do I have to do anything else first? Thanks again for your patient help with this - we use MySQL in CycleStreets and Postgres is rather unfamiliar territory for me. -- Simon Nuttall Route Master, CycleStreets.net _______________________________________________ Geocoding mailing list [email protected] https://lists.openstreetmap.org/listinfo/geocoding

