[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

bugzilla_noreply Sat, 04 Sep 2021 01:44:36 -0700

https://bugs.kde.org/show_bug.cgi?id=400704


--- Comment #31 from tagwer...@innerjoin.org ---
It could be that there are several different issues being "bundled together".

1...

    There are, for example, problems with openSUSE that runs BTRFS
    with multiple subvols, check with finding one of the files indexed
    and trying the following...

        stat testfile
        balooshow -x testfile 

    and

        baloosearch -i filename:testfile 

    The "stat" would give you the device and inode number of the file.
    You should see the same numbers listed in the "balooshow -x"
    results. See:

        https://bugs.kde.org/show_bug.cgi?id=402154#c12

    If the device/inode numbers change for a file, baloo will think it
    is a different file and index it again. You can see this evidenced
    in the "baloosearch -i" results, you could get multiple results
    (different ID's; same file)

2...

    Repeated spike loads at logon. In cases where there are *very* *many*
    new files, even if content indexing is disabled, the initial scan by
    baloo_file takes too many resources,

    My reading of the behaviour is that baloo_file does not "batch up"
    updates to the index as it discovers new/changed/deleted files.
    There's therefore no hint (looking at "balooctl status") that there's
    any progress being made, it may be that the indexing if "Idle" as
    just an initial scan is being done (and not content indexing) and
    the RAM used by baloo_file can grow steadily (potentially extending
    to swap space).

    As per Bug 394750:

        https://bugs.kde.org/show_bug.cgi?id=394750#c13

    If the updates from an "initial scan" are done as a single transaction
    there are no checkpoints. Killing the process and starting again,
    rebooting or logging out and back in again will start "from scratch".

    Bug 428416 is also interesting in terms of what baloo_file is doing
    when it deals with a large indexing run.

3...

    It seems likely that with baloo reindexing files as they reappear
    with different ID's (as per '1' above) the index size balloons;
    on disc and in terms of pages pulled into memory. This will
    compound issue '2'.

4...

    On a positive note, the impact (as seen by the user) of a sync of
    the dirty pages to disc could be manageable if the index is on
    an SSD

    Comment 19 argues against increasing the batch size (that the data
    will have to be written at some time). This would hammer HDD users
    but maybe have has less impact on SSD users.

    With an SSD, there's the counter argument that you want to avoid
    frequent rewrites to prolong the life of the disc. Gut feeling is
    that with a larger batch size, the data written to disc is less
    in total.

Wishlist/Proposals/Suggestions

    I think baloo needs to "batch up" its transactions in its initial scan.
    If I were to suggest "how often", I'd pick a time interval, maybe
    every 15 or 30 seconds.

    It would be nice to have a "balooctl" option (or a setting within
    baloofilerc) to tune the batch size used for baloo_file_extractor.
    That would make it possible to do indexing comparisons "in the
    real world"

Consider this as a "Where are we?" summary; an attempt to collect together
different threads and weave in new evidence.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

Reply via email to