https://bugs.kde.org/show_bug.cgi?id=380456

--- Comment #20 from Adam Fontenot <adam.m.fontenot+...@gmail.com> ---
(In reply to tagwerk19 from comment #19)
> I would still suspect memory use rather than CPU as the underlying reason.
It's quite possible that you're right about that. I do know the game is
sensitive to available memory, possibly because it runs on the internal Intel
graphics chip.

> > ... baloo_file_extractor is calling out to an external library
> > (poppler), and that library is consuming an endlessly growing amount of
> > memory (from 1-3 GB before I've killed it). It's probably safe to say that
> > this memory usage is in the form of anonymous mappings which can't be
> > reclaimed. Baloo *must* take that into account and kill the extractor
> > process if it begins affecting system resources.
> That's a *lot* of memory for a "pdf to text" conversion 8-]
Yes, especially for a random 20 MB PDF I didn't even remember existed.

> You see the baloo_file_extractor RAM usage go up during the extraction and
> not come down when it is finished?
I have never been able to leave it for long enough to finish extracting from
the file. It's possible I'd even get an out of RAM hang before then. The
Poppler devs estimate at least 7 GB of RAM would be needed to extract text from
this file. I even tested their pdftotext command on a system with plenty of
RAM, and even then the issue is that it simply takes too long. I've left it
running for over an hour on this one file before, and never seen it complete.

Moreover, they insist that it's not a bug on their end. The file, in their
view, is pathological and the only reasonable solution is not to try to extract
text from it. I think I understand that perspective: it's not every day that
you come across a PDF with millions of "words" on a single page. So it's on
Baloo to bail out if the process takes too long or consumes too much RAM.
Here's the bug report I filed with them if you want to follow that
conversation: https://gitlab.freedesktop.org/poppler/poppler/-/issues/1173

> Could you see the culprit file in "System Settings > Search" (recent
> releases of baloo show the progress of the indexing there) or when running
> "balooctl monitor"?
Unfortunately, I don't remember. I do remember using lsof and friends to check
that it was the only file Baloo had open. I may not have realized at the time
that that feature had been added to the Baloo KCM.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to