https://bugs.kde.org/show_bug.cgi?id=404057
Kai Krakow <k...@kaishome.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |k...@kaishome.de --- Comment #3 from Kai Krakow <k...@kaishome.de> --- @Martin Thanks for pointing me here. I can confirm the observations: RSS can grow easily above 3-4 GB. baloo_file_extractor generates a lot IO with high throughput (sometimes 100 MB/s), mostly while scraping PDF files (i.e. my Calibre library), up to the point that the whole desktop becomes unresponsive and laggy. It's mostly read accesses with writes coming by in bursts once in a while. Especially btrfs has it's problems with these access patterns. The DB is already created nocow. The index file seems to be growing and growing. Last time I purged it when it reached 19 GB. This is about the point when the system becomes unusable due to IO stalls. "balooctl" cannot really do anything: Run "balooctl stop" and it wouldn't stop (or restart instantly). Run "balooctl disable" and it will be back on next reboot. Run "balooctl start" and it says that another instance is already running even when there isn't. I'm not sure if baloo is currently even able to monitor and know its own status. VSS of at least two baloo processes is 256GB. While I know that this is only allocated not used, it still seems to have an effect on kernel memory allocation performance. The system feels snappier when I "killall baloo" even when baloo was idle and only used minor amounts of memory. It should probably just not do that. I'm not sure if this is by using mmap. But if it is, it may explain a lot of the overwhelming IO patterns. Eventually baloo finishes if letting it run long enough. But the whole process repeats from scratch when rebooting the machine. The counter for indexed files is growing by a huge amount after each reboot - as if it doesn't properly detect duplicates nor cleanup old stuff. It looks like it detects all files as new/modified (which is not true) and adds them to the index again. CPU usage was moderate and nothing I care about too much because it runs at low CPU priority. System specs: Linux 4.20.6-gentoo with CK patchset, i7-3770K, 16 GB RAM BFQ-MQ IO scheduler 4-disk RAID-1 btrfs running through bcache on a 400G SSD caching partition systemd with dbus-user-session Baloo database directory is made nocow (otherwise I get very rhythmic IO noise from the harddisks as it seems to rewrite data over and over again, resulting in a lot of fragmentation and cow relocations) Wishlist entry: It should be possible to easily move baloo into a cgroup (maybe it could create one itself, or we could configure it to optionally join a systemd slice) so I could limit its memory usage. Modern kernels will limit cache usage that way, too. Currently when running baloo, it will dominate the disk cache for it's own purpose. OTOH, maybe it's just missing proper cache hinting via fadvise(). Limiting memory usage via cgroups is already pretty effective for browsers, see here: https://github.com/kakra/gentoo-cgw I already considered doing something similar for baloo but I think it's preferable if it would manage its own resource usage better by itself. Baloo could also monitor loadavg and reduce its impact on system performance automatically. Here's an example which has been very successful: https://github.com/Zygo/bees/commit/e66086516fdb9f9cc2d703fb8101f6116ce169e9 It inverts the loadavg function to calculate the current point-in-time load and adjusts its resource impact based on this, targeting a user-defined loadavg. This commit did magic to system responsiveness while the daemon is running and working. -- You are receiving this mail because: You are watching all bug changes.