https://bugs.kde.org/show_bug.cgi?id=476479

--- Comment #5 from tagwer...@innerjoin.org ---
> My system uses OpenRC. 
Don't know whether OpenRC gives you a way of limiting the memory use (with
cgroups?). I only know the systemd unit files. Putting some sort of cap on the
memory use is sensible. 

> ... told that after each KF5/6 upgrade touching Baloo it will need 
> re-indexing ...
Probably more complicated. Previously if could be that when you mount disks on
a reboot, they get a different device number each time. This was a clear issue
with BTRFS if you have multple subvolumes, there was a race and disks came up
with different minor device numbers. OK, "previously" applies to Baloo. Baloo
used to rely on the device number (device number and inode) to build an
internal DocID for each file it indexed. If the device number changed on a
reboot then Baloo thought it had a whole set of new files and indeed them all
again. Bad.

This may also be happening with your Ext4/LUKS2 setup. I'm afraid I don't know
how this presents itself to the system.

With Frameworks 5.111 there's been a patch to use an "unvariant" File system ID
(rather than the minor device number). This means there will be "one more"
reindexing and then the index should be stable. It shouldn't be every KF5/KF6
change, it should be more stable after this one...

    https://invent.kde.org/frameworks/baloo/-/merge_requests/131
    https://discuss.kde.org/t/baloo-and-frameworks-5-111/6348

You can keep watch on the device number / inode on disk with "stat filename",
see how Baloo has indexed it with "balooshow -x filename" and also check for
"multiple hits" for the same file if you do a "baloosearch -i filename".

There's also a possible "gotcha" that happens if you are worried about how the
indexing is going and watch with "balooctl status". This counts the files
waiting to be indexed - and holds the index "read only" when it's doing it. If
baloo_file/baoo_file_extractor wants to write at that moment, the write is an
append. Suddenly the index is bigger (Bug 437754)

> ... It should be programed the way that it will make needed internal changes 
> of existing index file after each incompatible upgrade of Baloo internals ...
Not sure there's a watertight way of doing this - beyond keeping a hash of the
files and comparing.

> ... I am having plenty of literature in pdf and epub formats ...
These can sometimes be slow to index, each file need to be read as a stream of
text. PDF's can be compressed and things like graphs can take a *load* of CPU
to render....

Not sure whether this all helps.

Probably the thing to do it to check what "stat" says for your files; change
the indexing "includes" so you can see what happens with a small set of folder;
pkill baloo_file and purge the index. Sorry.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to