https://bugs.kde.org/show_bug.cgi?id=400704

Adam Fontenot <adam.m.fontenot+...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adam.m.fontenot+kde@gmail.c
                   |                            |om

--- Comment #43 from Adam Fontenot <adam.m.fontenot+...@gmail.com> ---
(In reply to tagwerk19 from comment #38)
>     It would make sense to have time/memory limits for such actions (and
> flag the file as
>     "failed" if the extraction exceeds them).

Was thinking about this and similar IO problems, and decided to have a look at
how Gnome's "tracker" is handling things these days. Going to document my
findings here in the hope it's useful as inspiration for how we might handle
similar problems. I think it's an important point of comparison for Baloo.

I have mostly positive things to say, although Tracker also has some flaws (it
didn't pick up my XDG Documents folder by default, it didn't index the contents
of files with text/plain mimetimes that don't have file extensions, and it uses
a large amount of CPU while searching in Nautilus).

 * I enabled Tracker to index my home folder (with content indexing) and it
uses 474 MB on my $HOME. I've completely disabled content indexing for Baloo,
but it's somehow using 1.4 GB. Suffice it to say that Baloo is weirdly
inefficient. (ContentIndexingDB is empty, so it's not old content indexes.)
More research needed here, any suggestions appreciated.

 * Unlike Baloo, Tracker does not hang when given pathological files. (See the
link in tagwerk19's comment for an example.) I get a very sensible "Crash/hang
handling file" message in the log for this file and it's otherwise ignored.
Among other checks, they appear to kill the process if the content indexer
takes more than 30 seconds on a file, which seems quite reasonable:
https://gitlab.gnome.org/GNOME/tracker-miners/-/blob/master/src/tracker-extract/tracker-extract.c

 * They have some cool features around full text search including unaccenting
and case folding, and use SPARQL for queries:
https://wiki.gnome.org/Projects/Tracker/Features I haven't seen enough
documentation from Baloo to know how we stack up there.

 * Tracker and Baloo both blacklist source code files by default, among several
other types. Baloo doesn't expose this to the user in the UI, which I think
might surprise some users who expect more configurability from KDE.

 * Tracker seems not to be very configurable. There's a bit of under the hood
adjustment possible, but mostly the focus seems to be on having good heuristics
out of the box. I don't think we could trivially swap Tracker for Baloo and
having everything we need work. We'll need to keep improving Baloo. :-)

This comment might be better off on the Wiki somewhere, but it seems pretty
underutilized and I'm not sure where I'd put it or if anyone would even read it
there.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to