[chromium-dev] Re: Spellchecker and memory-mapped dicts

Brett Wilson Thu, 22 Oct 2009 14:23:20 -0700

On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade <est...@chromium.org> wrote:
>
> Hi all,
>
> At its last meeting the jank task force discussed improving
> responsiveness of the spellchecker but we didn't come to a solid
> conclusion so I thought I'd bring it up here to see if anyone else has
> opinions. The main concern is that we don't block the IO thread on
> file access. To this end, I recently moved initialization of the
> spellchecker from the IO thread to the file thread. However, instead
> of reading in the spellchecker dictionary in one solid chunk, we
> memory-map it. Then later we check individual words on the IO thread,
> which will be slow since the dictionary starts off effectively
> completely paged out. The proposal is that we read in the dictionary
> at spellchecker intialization instead of memory mapping it.
>
> Memory mapping pros:
> - possibly uses less overall memory, depending on the structure of the
> dictionary and the usage pattern of the user.
> - <strike>loading the dictionary doesn't block for a long
> time</strike> this one no longer occurs either way due to my recent
> refactoring
>
> Reading it all at once pros:
> - costly disk accesses are kept to the file thread (excepting future
> memory paging)
> - overall disk access time is probably lower (since we can read in the
> dict in one chunk)
>
> For reference, the English dictionary is about 500K, and most
> dictionaries are under 2 megs, some (such as Hungarian) are much
> higher, but no dictionary is over 10 megs.
>
> Opinions?


I've thought about this some (I wrote the memory map thing there now).

History of the spellchecker:
v1 : Per-process Hunspell storage (lots of memory duplicated in each
renderer, expensive to load).
v2 : Browser-process Hunspell storage (lots of memory, expensive to
load, only occurs once)
v3 : Browser-process memmap (less memory, cheap to load, only occurs once).

I would like to consider moving hunspell back to the renderer so we
can avoid sync IPCs and blocking the I/O thread on spellchecking.
Spellchecking isn't fast (especially suggestions) even when everything
is in memory, so it always sucks to have it block the I/O thread. Now
that it can be memmapped, each renderer can memmap its own image of
the data.

This doesn't help on Mac where we want to use the system spellchecker.
There would also be some amount of duplication since there are certain
tables that are initialized once at the beginning (I don't think its
that big, though).

I would suggest first making the current histograms in the
spellchecker.cc file UMA (currently they're debug-only local ones) so
we can see how much blocking we're getting from Hunspell in the field.

Brett

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: Spellchecker and memory-mapped dicts

Reply via email to