https://bugs.kde.org/show_bug.cgi?id=464226

            Bug ID: 464226
           Summary: Baloo and Nulls
    Classification: Frameworks and Libraries
           Product: frameworks-baloo
           Version: unspecified
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: major
          Priority: NOR
         Component: Baloo File Daemon
          Assignee: baloo-bugs-n...@kde.org
          Reporter: tagwer...@innerjoin.org
  Target Milestone: ---

Created attachment 155254
  --> https://bugs.kde.org/attachment.cgi?id=155254&action=edit
Text file containing a \000

SUMMARY:
    Baloo seems to stumble when it meets a "null" character in a text file.

    A parallel or more general case of:

        https://invent.kde.org/frameworks/baloo/-/merge_requests/87

STEPS TO REPRODUCE:
    Download the test file into an indexed folder. The file contains:

        -> ^@ <-

    where the ^@ is a "null" byte. Ask baloo what it has as the indexed data:

        $ balooshow -x file-with-a-000.txt

OBSERVED RESULTS:
    You get:

        1625990000fc01 64513 1451417 file-with-a-000.txt
[/home/test/Documents/file-with-a-000.txt]
                Mtime: 1673373876 2023-01-10T18:04:36
                Ctime: 1673373876 2023-01-10T18:04:36
                Cached properties:
                        Line Count: 1

        Internal Info
        Terms:   < > Mplain Mtext T5 T8 X20-1
        File Name Terms: F000 Fa Ffile Ftxt Fwith
        XAttr Terms:
        Internal Error - malformed term (short): ''
        Internal Error - malformed term (short): ''
        lineCount: 1

EXPECTED RESULTS:

        Internal Info
        Terms:   < > Mplain Mtext T5 T8 X20-1
        File Name Terms: F000 Fa Ffile Ftxt Fwith
        XAttr Terms:
        lineCount: 1

ADDITIONAL INFORMATION
    Igor Poboiko's "baloo-checkdb.py" script:

       
https://invent.kde.org/frameworks/baloo/uploads/bdc9f5f17fc96490b7bd4a22ac664843/baloo-checkdb.py

    gives a couple of errors:

        ...
        Checking whether posting[docterms[docid]] contains docid (can take some
time)...
        ERROR: 6236232384314369 (/home/test/Documents/file-with-a-000.txt) has
term  which wasn't found in PostingDB
        ERROR: 6236232384314369 (/home/test/Documents/file-with-a-000.txt) has
term  which wasn't found in PostingDB
        ...

    and the merge request mentions

        ... TermGenerator then generates proper (yet meaningless) terms out of
those
        characters, and they end up in database ...

    In this case it's happening for a "null" in a text file rather than a
problematic
    PDF. I think it should *not* be possible for a file to corrupt the
database.
    A worry might be that a "specially crafted" file could perform mischief and
flagging
    as "major" because of this.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to