Hi Kevin,
Thanks for your input. There is a count of the number of entries on the
top line of the Hebrew dictionary, so that's not a problem.
On the machine I'm working on now, the OOo installation doesn't have
"check all langages" marked.
There's plenty of memory, as the following output of "free" shows:
total used free shared buffers cached
Mem: 8109956 981068 7128888 0 88780 710764
-/+ buffers/cache: 181524 7928432
Swap: 5815488 0 5815488
The installed dictionaries are: English US, Hebrew. If I type in
English, there is no noticeable delay, and misspelled words are marked
in red. If I then start typing in Hebrew, there is a 5 second delay in
which OOo seems "stuck" while building the hash table.
Thanks,
Alan
Kevin B. Hendricks wrote:
Hi Alan,
If you did place the count as the top line (to create a properly
sized hash table) then perhaps the only potential speedup is to
change hunspell to mmap a file that is the previously created
hashtable similar to what ispell uses.
The problem only real problem is that all binary formats like that
have endian issues across architectures that make things quite
difficult. That is why I decided with myspell to go with building
the hash table on-the-fly so to speak. There are no binary
compatibility issues that way.
Another source of delay when starting up the spell-checker is when
the user has checked "check word in all languages" option but doesn't
realize that that they have a large number of dictionaries that have
to be loaded when the first misspelt word is checked.
Obviously, for creating hash tables from large .dic files, available
memory is an issue. How much memory do you have available for your
machine?
Kevin
On May 1, 2007, at 1:08 PM, Alan Yaniger wrote:
Eleonora,
Yes, I used a different dictionary than yours. The hu_HU.dic I used
has 96,461 lines. Apparently the Hungarian dictionary available
through DicOO isn't the latest.
Perhaps your hardware is faster than mine. In my slower(?) hardware,
I see a significant difference between building the hash table for
large dictionaries and for smaller ones. Many users have complained
about OOo "getting stuck" while the dictionaries load. So I think
that it would be useful if Hunspell developers could improve
performance here.
Alan
ge wrote:
Alan,
The size of the 2-nd Hungarian dictionary is:
lines words characters
22068 124931 622546 hu_HU.aff
873355 873348 26481165 hu_HU.dic
895423 998279 27103711 total
dic contains 873378 words, it is 8 times larger than Hebrew.
aff is roughly twice as big as Hebrew.
I assume, you used the 1-st Hungarian one, with the small word
count for your test.
I use the 2-nd all the time, and it loads in
less than 1 second for me.
Therefore I do not understand the effect you
describe.
-eleonora
Hi Marcin, Janis, Eleanora,
I did some debugging in the hunspell code, and found that the size of
the Hebrew dictionaries was the cause of the delay, similar to
Janis's
problem in Latvian. The files are read line by line, and he_IL.dic
has
329,326 entries, which is far more than the other dictionies I tried.
The main bottleneck was not in reading the files from the disk,
but in
building the hash tables in hashmgr.cxx in add_word(). When I
shortened
he_IL.dic to the size of the Hungarian dictionary, it took the same
amount of time to load Hebrew and Hungarian. Same with Hebrew and
English US.
To Hunspell developers out there: is there any way to make the
building
of the hash tables more efficient?
Alan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: dev-
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: dev- [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]