Re: registry.db-journal file location

2018-02-05 Thread René J . V . Bertin
Rainer Müller wrote:

>> Indeed, I'm not. I'm seeing the vacuming effects when I do a `port work` or
>> similar AFTER a port uninstall. In the example I gave, I have to remember to
>> do the port uninstall after starting the fetch,checksum,extract,patch steps
>> if I do not want to waste time waiting to get control back.
> 
> Hm, then it is probably not actually the VACUUM operation that causes
> the delay...?

Why would you think that? To be clear, I'm talking about doing `port uninstall` 
in one terminal and the other command(s) in another terminal (or after `port 
uninstall foo &`, in thebackground). The description of what VACUUM does makes 
it a very likely candidate for an IO intensive operation that takes time AND 
blocks both read and write access.

R



Re: registry.db-journal file location

2018-02-05 Thread Joshua Root
On 2018-2-6 04:23 , Rainer Müller wrote:
> We could maybe keep a timestamp of the last vacuum run, so we only do
> this operation once per day. Of course that only helps if we are sure
> that it is the cause of the delay...

Ideally we would only vacuum when above some threshold of wasted space
in the database.

- Josh


Re: registry.db-journal file location

2018-02-05 Thread Rainer Müller
On 2018-02-05 16:47, René J.V. Bertin wrote:
> On Monday February 05 2018 15:34:52 Rainer Müller wrote:
>> in your setup. To try this, you should be able to convert your registry
>> database to this mode without making any changes to base itself.
> 
> I presume that I can always keep a backup copy (= this affects only the 
> registry.db file)?

Exactly, it is only a setting in the SQLite file. You should be able to
switch between the journal modes of WAL or DELETE with the sqlite
command-line client without losing any data.

> Can you remember how large the test registry was (or how many ports 
> installed, if that's more determinant)> On the main machine, I'm at approx. 
> 260Mb for the registry; on the
test rig that's only 95Mb (24Mb compressed).
> If the VACUUM does indeed do defragmentation by moving data to fill empty 
> space left after uninstalling this certainly explains what I'm seeing. That's 
> not "some disk activity", that's quite a bit of disk activity. For safe 
> defragging you first create a contiguous copy of a file/record, then compare 
> the copies, remove the old one, copy back the file to its new location and do 
> a 2nd verification before removing the temp. copy.

It was only based on experience from developers. My main setup currently
has a registry.db of about 180 MB, but this is still on Sierra with
HFS+. While I have also seen same delays on exit, it never bothered me
too much.

> I think I've already seen discussions whether it's reasonably safe to disable 
> the sqlite journal on a filesystem like ZFS which already provides its own 
> comparable protection, and that the answer was probably no despite the 
> potentially costly redundancy.

The SQLite journal is necessary to implement transactions (a sequence of
SQL statements that allow a rollback in case of an error). So even when
the backing storage of the database is guaranteed not to fail, it will
always be necessary.

>> However, you are not supposed to see VACUUM after read-only actions such
>> as 'port work' or similar as it is guarded by a flag whether changes
>> have been made.
> 
> Indeed, I'm not. I'm seeing the vacuming effects when I do a `port work` or 
> similar AFTER a port uninstall. In the example I gave, I have to remember to 
> do the port uninstall after starting the fetch,checksum,extract,patch steps 
> if I do not want to waste time waiting to get control back.

Hm, then it is probably not actually the VACUUM operation that causes
the delay...?

> It's not a huge deal, but having to wait a minute here, a minute there does 
> grow old, esp. if you just want to launch a lengthy upgrade operation and go 
> do something else in the meantime (and really esp. if that something else is 
> for some*one* else, like feeding the cats or doing the dishes ;) )

We could maybe keep a timestamp of the last vacuum run, so we only do
this operation once per day. Of course that only helps if we are sure
that it is the cause of the delay...

Rainer


Re: registry.db-journal file location

2018-02-05 Thread René J . V . Bertin
Rainer Müller wrote:


> Maybe switching to write-ahead log would increase the SQLite performance
> in your setup. To try this, you should be able to convert your registry
> database to this mode without making any changes to base itself.

First impression is that it doesn't make a lot of difference. I used 
sqlitebrowser, which shows at least 1 other pragma that probably has to be 
changed, in the code this time: synchronous. According to 
http://www.sqlite.org/pragma.html#pragma_synchronous this should be "normal" in 
WAL mode (and might possibly even be OFF on COW filesystems like ZFS or APFS).

R.



Re: registry.db-journal file location

2018-02-05 Thread René J . V . Bertin
On Monday February 05 2018 15:34:52 Rainer Müller wrote:

Hi,

>journal ensures consistency of the data, therefore the journal has to be
>kept right next to the database file itself. If it were on volatile
>storage like tmpfs, it would be lost on interruption/crash/etc.

I'd guess that the main point of using a journal is that in case of a 
catastrophe you only use the journal, so the new information. Most of the time 
that should be possible to overcome the effects of that for a registry like the 
one in MacPorts.
FWIW, ZFS can be configured to put its journal on a dedicated (and fast) log 
device, and so can HFS, AFAIK.
But evidently I wasn't proposing to put the journal on volatile storage, just 
on faster storage (or if you want, local storage, if you put $prefix on a NAS.
(Wait, I was ... /tmp and $TMPFS are cleared on reboot. Hmmm...)

>in your setup. To try this, you should be able to convert your registry
>database to this mode without making any changes to base itself.

I presume that I can always keep a backup copy (= this affects only the 
registry.db file)?

>database when detaching. This is meant to reduce fragmentation and save
>disk space (by default the file size of a SQLite database never
>shrinks), but of course it causes some disk activity.

That certainly explains why the effect is so much more noticeable after 
uninstalling or a reinstall (upgrade --force).
You're reducing fragmentation in the DB with that operation, but could well be 
increasing on-disk fragmentation.

>As far as I remember, when we tested it, it did not take significant
>time, but depending on your setup, this might be the expensive operation.

Can you remember how large the test registry was (or how many ports installed, 
if that's more determinant)? 
On the main machine, I'm at approx. 260Mb for the registry; on the test rig 
that's only 95Mb (24Mb compressed). 
If the VACUUM does indeed do defragmentation by moving data to fill empty space 
left after uninstalling this certainly explains what I'm seeing. That's not 
"some disk activity", that's quite a bit of disk activity. For safe defragging 
you first create a contiguous copy of a file/record, then compare the copies, 
remove the old one, copy back the file to its new location and do a 2nd 
verification before removing the temp. copy.

I think I've already seen discussions whether it's reasonably safe to disable 
the sqlite journal on a filesystem like ZFS which already provides its own 
comparable protection, and that the answer was probably no despite the 
potentially costly redundancy.

>However, you are not supposed to see VACUUM after read-only actions such
>as 'port work' or similar as it is guarded by a flag whether changes
>have been made.

Indeed, I'm not. I'm seeing the vacuming effects when I do a `port work` or 
similar AFTER a port uninstall. In the example I gave, I have to remember to do 
the port uninstall after starting the fetch,checksum,extract,patch steps if I 
do not want to waste time waiting to get control back.

It's not a huge deal, but having to wait a minute here, a minute there does 
grow old, esp. if you just want to launch a lengthy upgrade operation and go do 
something else in the meantime (and really esp. if that something else is for 
some*one* else, like feeding the cats or doing the dishes ;) )

R.


Re: registry.db-journal file location

2018-02-05 Thread Rainer Müller
On 2018-02-05 11:08, René J.V. Bertin wrote:
> I have a test system on which certain operations, esp. uninstalling, block 
> for up to over a minute after they appear to be done (= after the last debug 
> info was printed). This doesn't just block the port command in question, it 
> also blocks (almost?) any other port command I may execute, even those that 
> probably don't need the registry like `port work`, `port clean` etc. I think 
> this was a new "feature" that appeared somewhere in 2.4.99 .
> 
> There is disk activity during that period, I presume related to registry 
> reorganisation, and yesterday I noticed a registry.db-journal file in the 
> registry directory which was there and updated until the ongoing uninstall 
> command finally exitted.
> 
> This particular system has MacPorts installed on a ZFS pool with lz4 
> compression and copies=2. I think this expensive operation could be sped up 
> by using /tmp or $TMPDIR for the temporary registry file(s) - a change that 
> should be transparent for regular users but potentially give a significant 
> gain for more advanced set-ups where the temp directories are on much faster 
> storage than that which holds the MacPorts prefix (or just a different drive).

The whole point of using SQLite is to guarantee a consistent state. The
journal ensures consistency of the data, therefore the journal has to be
kept right next to the database file itself. If it were on volatile
storage like tmpfs, it would be lost on interruption/crash/etc.

> I did a quick search for "journal" but the word doesn't appear to occur in 
> the entire MacPorts src tree. Does that mean the use of temporary db/journal 
> files is entirely internal to libsqlite3?

https://sqlite.org/tempfiles.html#rollback_journals

Maybe switching to write-ahead log would increase the SQLite performance
in your setup. To try this, you should be able to convert your registry
database to this mode without making any changes to base itself.

https://sqlite.org/wal.html#activating_and_configuring_wal_mode


Another thing causing this could be the VACUUM when MacPorts exits. If
changes were made to the registry, base always run a VACUUM on the
database when detaching. This is meant to reduce fragmentation and save
disk space (by default the file size of a SQLite database never
shrinks), but of course it causes some disk activity.

As far as I remember, when we tested it, it did not take significant
time, but depending on your setup, this might be the expensive operation.

https://github.com/macports/macports-base/blob/9606edb0b27814d5eb31525806e3253aad62fffe/src/registry2.0/registry.c#L129

However, you are not supposed to see VACUUM after read-only actions such
as 'port work' or similar as it is guarded by a flag whether changes
have been made.

Rainer