https://bugs.kde.org/show_bug.cgi?id=400704
--- Comment #31 from tagwer...@innerjoin.org --- It could be that there are several different issues being "bundled together". 1... There are, for example, problems with openSUSE that runs BTRFS with multiple subvols, check with finding one of the files indexed and trying the following... stat testfile balooshow -x testfile and baloosearch -i filename:testfile The "stat" would give you the device and inode number of the file. You should see the same numbers listed in the "balooshow -x" results. See: https://bugs.kde.org/show_bug.cgi?id=402154#c12 If the device/inode numbers change for a file, baloo will think it is a different file and index it again. You can see this evidenced in the "baloosearch -i" results, you could get multiple results (different ID's; same file) 2... Repeated spike loads at logon. In cases where there are *very* *many* new files, even if content indexing is disabled, the initial scan by baloo_file takes too many resources, My reading of the behaviour is that baloo_file does not "batch up" updates to the index as it discovers new/changed/deleted files. There's therefore no hint (looking at "balooctl status") that there's any progress being made, it may be that the indexing if "Idle" as just an initial scan is being done (and not content indexing) and the RAM used by baloo_file can grow steadily (potentially extending to swap space). As per Bug 394750: https://bugs.kde.org/show_bug.cgi?id=394750#c13 If the updates from an "initial scan" are done as a single transaction there are no checkpoints. Killing the process and starting again, rebooting or logging out and back in again will start "from scratch". Bug 428416 is also interesting in terms of what baloo_file is doing when it deals with a large indexing run. 3... It seems likely that with baloo reindexing files as they reappear with different ID's (as per '1' above) the index size balloons; on disc and in terms of pages pulled into memory. This will compound issue '2'. 4... On a positive note, the impact (as seen by the user) of a sync of the dirty pages to disc could be manageable if the index is on an SSD Comment 19 argues against increasing the batch size (that the data will have to be written at some time). This would hammer HDD users but maybe have has less impact on SSD users. With an SSD, there's the counter argument that you want to avoid frequent rewrites to prolong the life of the disc. Gut feeling is that with a larger batch size, the data written to disc is less in total. Wishlist/Proposals/Suggestions I think baloo needs to "batch up" its transactions in its initial scan. If I were to suggest "how often", I'd pick a time interval, maybe every 15 or 30 seconds. It would be nice to have a "balooctl" option (or a setting within baloofilerc) to tune the batch size used for baloo_file_extractor. That would make it possible to do indexing comparisons "in the real world" Consider this as a "Where are we?" summary; an attempt to collect together different threads and weave in new evidence. -- You are receiving this mail because: You are watching all bug changes.