Re: registry.db-journal file location
Rainer Müller wrote: >> Indeed, I'm not. I'm seeing the vacuming effects when I do a `port work` or >> similar AFTER a port uninstall. In the example I gave, I have to remember to >> do the port uninstall after starting the fetch,checksum,extract,patch steps >> if I do not want to waste time waiting to get control back. > > Hm, then it is probably not actually the VACUUM operation that causes > the delay...? Why would you think that? To be clear, I'm talking about doing `port uninstall` in one terminal and the other command(s) in another terminal (or after `port uninstall foo &`, in thebackground). The description of what VACUUM does makes it a very likely candidate for an IO intensive operation that takes time AND blocks both read and write access. R
Re: registry.db-journal file location
On 2018-2-6 04:23 , Rainer Müller wrote: > We could maybe keep a timestamp of the last vacuum run, so we only do > this operation once per day. Of course that only helps if we are sure > that it is the cause of the delay... Ideally we would only vacuum when above some threshold of wasted space in the database. - Josh
Re: registry.db-journal file location
On 2018-02-05 16:47, René J.V. Bertin wrote: > On Monday February 05 2018 15:34:52 Rainer Müller wrote: >> in your setup. To try this, you should be able to convert your registry >> database to this mode without making any changes to base itself. > > I presume that I can always keep a backup copy (= this affects only the > registry.db file)? Exactly, it is only a setting in the SQLite file. You should be able to switch between the journal modes of WAL or DELETE with the sqlite command-line client without losing any data. > Can you remember how large the test registry was (or how many ports > installed, if that's more determinant)> On the main machine, I'm at approx. > 260Mb for the registry; on the test rig that's only 95Mb (24Mb compressed). > If the VACUUM does indeed do defragmentation by moving data to fill empty > space left after uninstalling this certainly explains what I'm seeing. That's > not "some disk activity", that's quite a bit of disk activity. For safe > defragging you first create a contiguous copy of a file/record, then compare > the copies, remove the old one, copy back the file to its new location and do > a 2nd verification before removing the temp. copy. It was only based on experience from developers. My main setup currently has a registry.db of about 180 MB, but this is still on Sierra with HFS+. While I have also seen same delays on exit, it never bothered me too much. > I think I've already seen discussions whether it's reasonably safe to disable > the sqlite journal on a filesystem like ZFS which already provides its own > comparable protection, and that the answer was probably no despite the > potentially costly redundancy. The SQLite journal is necessary to implement transactions (a sequence of SQL statements that allow a rollback in case of an error). So even when the backing storage of the database is guaranteed not to fail, it will always be necessary. >> However, you are not supposed to see VACUUM after read-only actions such >> as 'port work' or similar as it is guarded by a flag whether changes >> have been made. > > Indeed, I'm not. I'm seeing the vacuming effects when I do a `port work` or > similar AFTER a port uninstall. In the example I gave, I have to remember to > do the port uninstall after starting the fetch,checksum,extract,patch steps > if I do not want to waste time waiting to get control back. Hm, then it is probably not actually the VACUUM operation that causes the delay...? > It's not a huge deal, but having to wait a minute here, a minute there does > grow old, esp. if you just want to launch a lengthy upgrade operation and go > do something else in the meantime (and really esp. if that something else is > for some*one* else, like feeding the cats or doing the dishes ;) ) We could maybe keep a timestamp of the last vacuum run, so we only do this operation once per day. Of course that only helps if we are sure that it is the cause of the delay... Rainer
Re: registry.db-journal file location
Rainer Müller wrote: > Maybe switching to write-ahead log would increase the SQLite performance > in your setup. To try this, you should be able to convert your registry > database to this mode without making any changes to base itself. First impression is that it doesn't make a lot of difference. I used sqlitebrowser, which shows at least 1 other pragma that probably has to be changed, in the code this time: synchronous. According to http://www.sqlite.org/pragma.html#pragma_synchronous this should be "normal" in WAL mode (and might possibly even be OFF on COW filesystems like ZFS or APFS). R.
Re: registry.db-journal file location
On Monday February 05 2018 15:34:52 Rainer Müller wrote: Hi, >journal ensures consistency of the data, therefore the journal has to be >kept right next to the database file itself. If it were on volatile >storage like tmpfs, it would be lost on interruption/crash/etc. I'd guess that the main point of using a journal is that in case of a catastrophe you only use the journal, so the new information. Most of the time that should be possible to overcome the effects of that for a registry like the one in MacPorts. FWIW, ZFS can be configured to put its journal on a dedicated (and fast) log device, and so can HFS, AFAIK. But evidently I wasn't proposing to put the journal on volatile storage, just on faster storage (or if you want, local storage, if you put $prefix on a NAS. (Wait, I was ... /tmp and $TMPFS are cleared on reboot. Hmmm...) >in your setup. To try this, you should be able to convert your registry >database to this mode without making any changes to base itself. I presume that I can always keep a backup copy (= this affects only the registry.db file)? >database when detaching. This is meant to reduce fragmentation and save >disk space (by default the file size of a SQLite database never >shrinks), but of course it causes some disk activity. That certainly explains why the effect is so much more noticeable after uninstalling or a reinstall (upgrade --force). You're reducing fragmentation in the DB with that operation, but could well be increasing on-disk fragmentation. >As far as I remember, when we tested it, it did not take significant >time, but depending on your setup, this might be the expensive operation. Can you remember how large the test registry was (or how many ports installed, if that's more determinant)? On the main machine, I'm at approx. 260Mb for the registry; on the test rig that's only 95Mb (24Mb compressed). If the VACUUM does indeed do defragmentation by moving data to fill empty space left after uninstalling this certainly explains what I'm seeing. That's not "some disk activity", that's quite a bit of disk activity. For safe defragging you first create a contiguous copy of a file/record, then compare the copies, remove the old one, copy back the file to its new location and do a 2nd verification before removing the temp. copy. I think I've already seen discussions whether it's reasonably safe to disable the sqlite journal on a filesystem like ZFS which already provides its own comparable protection, and that the answer was probably no despite the potentially costly redundancy. >However, you are not supposed to see VACUUM after read-only actions such >as 'port work' or similar as it is guarded by a flag whether changes >have been made. Indeed, I'm not. I'm seeing the vacuming effects when I do a `port work` or similar AFTER a port uninstall. In the example I gave, I have to remember to do the port uninstall after starting the fetch,checksum,extract,patch steps if I do not want to waste time waiting to get control back. It's not a huge deal, but having to wait a minute here, a minute there does grow old, esp. if you just want to launch a lengthy upgrade operation and go do something else in the meantime (and really esp. if that something else is for some*one* else, like feeding the cats or doing the dishes ;) ) R.
Re: registry.db-journal file location
On 2018-02-05 11:08, René J.V. Bertin wrote: > I have a test system on which certain operations, esp. uninstalling, block > for up to over a minute after they appear to be done (= after the last debug > info was printed). This doesn't just block the port command in question, it > also blocks (almost?) any other port command I may execute, even those that > probably don't need the registry like `port work`, `port clean` etc. I think > this was a new "feature" that appeared somewhere in 2.4.99 . > > There is disk activity during that period, I presume related to registry > reorganisation, and yesterday I noticed a registry.db-journal file in the > registry directory which was there and updated until the ongoing uninstall > command finally exitted. > > This particular system has MacPorts installed on a ZFS pool with lz4 > compression and copies=2. I think this expensive operation could be sped up > by using /tmp or $TMPDIR for the temporary registry file(s) - a change that > should be transparent for regular users but potentially give a significant > gain for more advanced set-ups where the temp directories are on much faster > storage than that which holds the MacPorts prefix (or just a different drive). The whole point of using SQLite is to guarantee a consistent state. The journal ensures consistency of the data, therefore the journal has to be kept right next to the database file itself. If it were on volatile storage like tmpfs, it would be lost on interruption/crash/etc. > I did a quick search for "journal" but the word doesn't appear to occur in > the entire MacPorts src tree. Does that mean the use of temporary db/journal > files is entirely internal to libsqlite3? https://sqlite.org/tempfiles.html#rollback_journals Maybe switching to write-ahead log would increase the SQLite performance in your setup. To try this, you should be able to convert your registry database to this mode without making any changes to base itself. https://sqlite.org/wal.html#activating_and_configuring_wal_mode Another thing causing this could be the VACUUM when MacPorts exits. If changes were made to the registry, base always run a VACUUM on the database when detaching. This is meant to reduce fragmentation and save disk space (by default the file size of a SQLite database never shrinks), but of course it causes some disk activity. As far as I remember, when we tested it, it did not take significant time, but depending on your setup, this might be the expensive operation. https://github.com/macports/macports-base/blob/9606edb0b27814d5eb31525806e3253aad62fffe/src/registry2.0/registry.c#L129 However, you are not supposed to see VACUUM after read-only actions such as 'port work' or similar as it is guarded by a flag whether changes have been made. Rainer
registry.db-journal file location
Hi, I have a test system on which certain operations, esp. uninstalling, block for up to over a minute after they appear to be done (= after the last debug info was printed). This doesn't just block the port command in question, it also blocks (almost?) any other port command I may execute, even those that probably don't need the registry like `port work`, `port clean` etc. I think this was a new "feature" that appeared somewhere in 2.4.99 . There is disk activity during that period, I presume related to registry reorganisation, and yesterday I noticed a registry.db-journal file in the registry directory which was there and updated until the ongoing uninstall command finally exitted. This particular system has MacPorts installed on a ZFS pool with lz4 compression and copies=2. I think this expensive operation could be sped up by using /tmp or $TMPDIR for the temporary registry file(s) - a change that should be transparent for regular users but potentially give a significant gain for more advanced set-ups where the temp directories are on much faster storage than that which holds the MacPorts prefix (or just a different drive). I did a quick search for "journal" but the word doesn't appear to occur in the entire MacPorts src tree. Does that mean the use of temporary db/journal files is entirely internal to libsqlite3? I've given some thought how I'd approach things if that were indeed the case. I know almost nothing about DB "manhandling", so mostly out of educational curiosity: would the following kind of approach work (make sense) for operations that change the registry? 1. set a write lock on the registry file (but still allow read operations) 2. clone the entire file to $reg_temp_dir/registry.new (using lowlevel block IO if possible) 3. apply the intended changes to the clone in $reg_temp_dir 4. move to registry.new in the registry dir 5. remove (unlink) the current registry, rename registry.new to registry.db 6. remote the write lock I have a feeling that this could even speed up things a bit on my main system, where $prefix sits on a partition that has accumulated significant free-space fragmentation. I think this algorithm should also work when $reg_temp_dir is identical to the registry dir (skipping step 4 of course). I don't know what the penalty would be, but simple read operations should still be possible. It should (again?) become possible to 1) uninstall the old versions of a bunch of ports-to-be-upgraded and 2) fetch the sources for those ports and check the patches without having to wait for the uninstall. Thanks, R.