https://bugs.kde.org/show_bug.cgi?id=419819

            Bug ID: 419819
           Summary: Baloo ranks relevance of results undesirably
           Product: frameworks-baloo
           Version: 5.68.0
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: general
          Assignee: stefan.bru...@rwth-aachen.de
          Reporter: bughat...@protonmail.com
  Target Milestone: ---

SUMMARY
Baloo ranks the relevance of results undesirably. For example, including the
entire exact title of a pdf book can produce many seemingly irrelevant results
but not the intended result (at least not visibly in the selection of only the
most relevant results shown in the start menu).

STEPS TO REPRODUCE & OBSERVED RESULTS
I have pdf's of many textbooks on my filesystem. I can search contents of the
title, e.g. "Advanced Engineering Electromagnetics" and about 10 results
appear--none of which are the intended textbook with the file name "C A Balanis
- Advanced Engineering Electromagnetics (Second Edition)John Wiley and Sons
(2012).pdf". The first result is actually "Advanced Window Behavior" which
should not be even listed in the results given that the words "Engineering" and
"Electromagnetics" are also included in the search. 

If I can get a textbook I search for to appear, it is often the lowermost
result which would indicate that Baloo believes it is the least relevant.
Indeed, the textbooks that Baloo believes more relevant potentially have the
words "engineering" or "electromagnetics" or "advanced" within them.
Nevertheless, in my opinion most of these results should instead be calculated
to be irrelevant to the search. But in any case the file with the all of the
searched words in the filename (not to mention in the same order with the same
casing) should appear as the first result.

This is just an example; many file searches are like this. Including almost
every word in the textbook title and the authors' names does not necessarily
result in the desired result appearing at all (let alone at the top), say if
these names are mentioned in other files even in places different from the
other query words even referring to other people with the same name.

Gnome Tracker demonstrates the desired behavior for these kinds of searches.
But I must say that the intra-file searching capabilities of Baloo are
impressive. Also, Baloo appears to search the pdf metadata whereas Gnome
Tracker does not.

EXPECTED RESULT
I propose that you make the fact that several query words match some files'
names or pdf metadata grant those files higher relevance than files that merely
contain all of the query words at some place within them.

Secondarily, I propose you also ensure that successive word matches are given
priority over dispersed word matches within a file (I do not know if this is
the case or how it would be implemented). Successive matches within a file may
approach the relevance level of dispersed matches within the file name or pdf
metadata. The distinction between exact matches and dispersed matches may not
matter as much when both are within the file name or pdf metadata. 

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch Linux
(available in About System)
KDE Plasma Version: 5.18.4
KDE Frameworks Version: 5.68.0
Qt Version: 5.14.2

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to