https://bugs.kde.org/show_bug.cgi?id=404057

            Bug ID: 404057
           Summary: Uses an insane amount of memory (RSS/PSS) and writes a
                    *ton* of data  while indexing
           Product: frameworks-baloo
           Version: 5.54.0
          Platform: Debian unstable
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: Baloo File Daemon
          Assignee: baloo-bugs-n...@kde.org
          Reporter: mar...@lichtvoll.de
  Target Milestone: ---

SUMMARY
I see that baloo_file_extractor easily uses 5 GiB or more of RSS (resident
memory). The Proportional Set Size which attributes shared memory to all of the
processes who share it proportionately is almost as high. So it appears to me
the process uses almost all of the memory for itself.



STEPS TO REPRODUCE
1. Have it index a lot of files
2. Watch memory usage 
3. If you like to kick it beyond any sanity:
   - have it go at the results of git clone
https://github.com/danielmiessler/SecLists.git
   - here it eats the resources of a quite potent laptop with 16 GiB of RAM as
if there was no tomorrow.

OBSERVED RESULT
Sample of smemstat -T:
   PID      Swap       USS       PSS       RSS User       Command
  4791     0,0 B  6136,7 M  6142,8 M  6169,7 M martin    
/usr/bin/baloo_file_extractor

   PID      Swap       USS       PSS       RSS User       Command
  4791     0,0 B  4595,1 M  4598,2 M  4617,6 M martin    
/usr/bin/baloo_file_extractor

Yes, there are times when Baloo even frees some memory again, just to use even
more later on.

Granted, this laptop has 16 GiB of RAM, but this still appears to be off for
me. Also I see the machine actually swapping out.

Also the disk I/O it generates is beyond anything that I would even consider to
be remotely sane for a laptop or any desktop machine:

pidstat -p 4791 -d 1
Linux 5.0.0-rc4-tp520 (merkaba)         07.02.2019      _x86_64_        (4 CPU)

12:32:21      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:32:22     1000      4791  75736,00      0,00      0,00       4 
baloo_file_extr
12:32:23     1000      4791  33348,00 111232,00      0,00       3 
baloo_file_extr
12:32:24     1000      4791  54288,00      0,00      0,00       4 
baloo_file_extr
12:32:25     1000      4791  20516,00 119616,00      0,00       2 
baloo_file_extr
12:32:26     1000      4791  24296,00      0,00      0,00       2 
baloo_file_extr
12:32:27     1000      4791  35532,00      0,00      0,00       3 
baloo_file_extr
12:32:28     1000      4791  32548,00 113112,00      0,00       3 
baloo_file_extr
12:32:29     1000      4791  26720,00      0,00      0,00       1 
baloo_file_extr
12:32:30     1000      4791  24048,00 103496,00      0,00       6 
baloo_file_extr
12:32:31     1000      4791   7636,00      0,00      0,00      71 
baloo_file_extr
12:32:32     1000      4791  16208,00      0,00      0,00      36 
baloo_file_extr
12:32:33     1000      4791  18048,00      0,00      0,00      67 
baloo_file_extr
12:32:34     1000      4791  23236,00      0,00      0,00      63 
baloo_file_extr
12:32:35     1000      4791  16700,00      0,00      0,00      61 
baloo_file_extr
12:32:36     1000      4791  20736,00 122392,00      0,00      23 
baloo_file_extr
12:32:37     1000      4791  26752,00      0,00      0,00      36 
baloo_file_extr
12:32:38     1000      4791  42456,00      0,00      0,00       4 
baloo_file_extr
12:32:39     1000      4791  25156,00 118104,00      0,00       2 
baloo_file_extr
12:32:40     1000      4791  12828,00      0,00      0,00       1 
baloo_file_extr
12:32:41     1000      4791  14512,00      0,00      0,00       3 
baloo_file_extr
12:32:42     1000      4791   7384,00      0,00      0,00       0 
baloo_file_extr
12:32:43     1000      4791   2316,00 420664,00      0,00       1 
baloo_file_extr
12:32:44     1000      4791      0,00  56520,00      0,00       0 
baloo_file_extr
12:32:45     1000      4791      0,00  75188,00      0,00       0 
baloo_file_extr
12:32:46     1000      4791      0,00  55376,00      0,00       0 
baloo_file_extr
12:32:47     1000      4791      0,00  64496,00      0,00      33 
baloo_file_extr
12:32:48     1000      4791      0,00      0,00      0,00      85 
baloo_file_extr
12:32:49     1000      4791      0,00      0,00      0,00      89 
baloo_file_extr
12:32:50     1000      4791      0,00      0,00      0,00      86 
baloo_file_extr
12:32:51     1000      4791     16,00      0,00      0,00      83 
baloo_file_extr
12:32:52     1000      4791   2772,00    220,00      0,00      58 
baloo_file_extr
12:32:53     1000      4791  28056,00      4,00      0,00       3 
baloo_file_extr
12:32:54     1000      4791  81328,00      0,00      0,00       8 
baloo_file_extr
12:32:55     1000      4791  71740,00      0,00      0,00       8 
baloo_file_extr
12:32:56     1000      4791  46088,00      0,00      0,00       6 
baloo_file_extr
12:32:57     1000      4791  44320,00      0,00      0,00       5 
baloo_file_extr
12:32:58     1000      4791  29576,00      0,00      0,00       4 
baloo_file_extr
12:32:59     1000      4791  41568,00      0,00      0,00       5 
baloo_file_extr
12:33:00     1000      4791  31244,00      0,00      0,00       5 
baloo_file_extr

12:33:00      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:33:01     1000      4791  23764,00      0,00      0,00       4 
baloo_file_extr
12:33:02     1000      4791  24272,00      0,00      0,00       5 
baloo_file_extr
12:33:03     1000      4791  19840,00      0,00      0,00       5 
baloo_file_extr
12:33:04     1000      4791  22096,00      0,00      0,00       5 
baloo_file_extr
12:33:05     1000      4791  14696,00      0,00      0,00       4 
baloo_file_extr
12:33:06     1000      4791  14204,00      0,00      0,00       4 
baloo_file_extr
12:33:07     1000      4791  12336,00      0,00      0,00       3 
baloo_file_extr
12:33:08     1000      4791  23796,00      0,00      0,00       3 
baloo_file_extr
12:33:09     1000      4791  21076,00      0,00      0,00       3 
baloo_file_extr
12:33:10     1000      4791   8280,00 194116,00      0,00       2 
baloo_file_extr
12:33:11     1000      4791    744,00 777584,00      0,00       4 
baloo_file_extr

Yep, that is right: that are 770 MiB!

12:33:12     1000      4791    160,00      0,00      0,00      39 
baloo_file_extr
12:33:13     1000      4791     16,00      0,00      0,00      90 
baloo_file_extr
12:33:14     1000      4791      0,00      0,00      0,00      53 
baloo_file_extr
12:33:15     1000      4791      0,00      0,00      0,00     139 
baloo_file_extr
12:33:16     1000      4791      0,00      0,00      0,00     103 
baloo_file_extr
12:33:17     1000      4791      0,00  29072,00      0,00      88 
baloo_file_extr
12:33:18     1000      4791      0,00  70980,00      0,00      68 
baloo_file_extr
^C
Durchschn.:  1000      4791  19701,54  42669,68      0,00      26 
baloo_file_extr

Yes, that is about 42 MiB/s! But on the other hand the index size does not
nearly increase by that rate. So what does it actually write there? The index
is currently at 9,48 GiB.

Now I have a gem here:

   PID      Swap       USS       PSS       RSS User       Command
  4791     0,0 B  8615,9 M  8617,1 M  8630,8 M martin    
/usr/bin/baloo_file_extractor

According to balooctl status during that time it indexed:
[…]SecLists/Passwords/Common-Credentials/10-million-password-list-top-100000.txt:
OK
[…]SecLists/Passwords/Common-Credentials/10-million-password-list-top-1000000.txt

Seriously there are two things wrong with that:
- That file is *only* 8.2 MiB big
- There is never ever an excuse to use 8 GiB of RSS for file indexing.

I bet there should be a size limit at what to grok. Baloo certainly should not
try to index files which are several GiB big.

And yes, I can tell it to exclude those, but then its something else. In my
oppinion it is Baloo's responsibility to keep resource usage within check.

So in short: Recent Baloo, I did not see this prior to KDE Frameworks 5.54, at
least not in that dimension, basically manages to hog a ThinkPad T520 with
Sandybridge dual core, 16 GiB of RAM, and dual SSD BTRFS RAID 1.

For now I let it run, in the hope that eventually at some time it completes and
stays quiet without me having to kill its processes, as it does not appear to
respond to balooctl stop in a reasonable time either.


EXPECTED RESULT
A more reasonable memory and I/O usage while indexing. Basically Baloo should
stay in the background. IMHO there is never ever an excuse for
baloo_file_extractor to use 8 GiB or more of RSS. Never… ever…


SOFTWARE/OS VERSIONS
Linux: Debian Unstable
KDE Plasma Version: 5.14.5
KDE Frameworks Version: 5.54
Qt Version: 5.11.3

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to