https://bugs.kde.org/show_bug.cgi?id=410680

skierpage <skierp...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
         Resolution|FIXED                       |---
             Status|RESOLVED                    |REOPENED

--- Comment #2 from skierpage <skierp...@gmail.com> ---
(In reply to tagwerk19 from comment #1)
> Looks like it's fixed along the way...

It works for your nifty test files, but my steps to reproduce still fail.

I wrote
> There doesn't seem to be any way to run baloo_file_indexer yourself to
> find out what it gets from a file. Nor could I figure out what the 
> baloo_filemetadata_temp_extractor does, or how to get 
> useful logging of text extraction. This all makes debugging painful.

You can use `balooshow -x path/to/file` to see what terms baloo_file indexed.
For stadyn_largpagewithimages.html, it is very few words. Even words like
"Design" and "Principles" which are in the first 2500 bytes!

I ran strace on baloo_file and its children. One of them opens and reads
changed files, but it only read 16,384 bytes from this test file. However, that
incomplete read should have included those words.

I guess I'll have to look at the file extractors' source code or somehow step
through it in gdb. Do you know how to run the binaries by hand?

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to