Daniel Naber <list2...@danielnaber.de> wrote:
> On Montag, 4. Juni 2012, Dominique Pellé wrote:
>
> > My script to measure startup time is available here:
> >
> > http://dominique.pelle.free.fr/startup-time-lt.sh
>
> Can you try again? The hunspell rule now loads lazily. Would you like to
> commit that script? it seems useful. src/scripts could be a good place.
>
> Regards
> Daniel
Trying again. In my previous numbers, I realized later
that I was checking with an empty input text. That is
why, with or without -d HUNSPELL_RULE, the timing
was about the same (Hunspell dictionary was not probed).
But when passing an input sentence such as "foo bar", I
then measure a slow down with Hunspell (but that's expected
of course).
What was not expected was the slow down when using
-d HUNSPELL_RULE to disable Hunspell rule, compared to
when commenting out Hunspell rule in the code. That appears
to be fixed now.
Here are the numbers:
startup time in sec (3 samples) when checking
a 2-word sentence "foo bar".
+------------------------+--------------------------------------------+
| svn 7963 prior | latest svn r7237 |
| to hunspell | +------------------+-----------------+
| checkins | | svn r7247 | svn r7247 |
| | | hunspell disabled| hunspell enabled|
lang|#rule | |#rules | -d HUNSPELL_RULE | |
----+------+-----------------+-------+------------------+-----------------+
ast | 61 | 0.30 0.27 0.27 | 61 | 0.26 0.27 0.27 | 0.67 0.61 0.61 |
br | 437 | 0.51 0.49 0.49 | 460 | 0.51 0.51 0.50 | 1.65 1.64 1.65 |
ca | 434 | 0.64 0.64 0.64 | 397 | 0.88 0.70 0.70 | 1.04 1.06 1.08 |
cs | 1 | 0.11 0.10 0.10 | 1 | 0.11 0.11 0.10 | 0.11 0.11 0.11 |
zh | 328 | 2.33 2.27 2.27 | 328 | 2.40 2.27 2.27 | 2.29 2.36 2.29 |
da | 22 | 0.51 0.51 0.51 | 22 | 0.52 0.51 0.51 | 1.09 1.08 1.07 |
nl | 336 | 0.61 0.60 0.60 | 336 | 0.62 0.61 0.60 | 1.16 1.14 1.16 |
en | 787 | 0.80 0.70 0.69 | 797 | 0.73 0.71 0.69 | 0.69 0.70 0.69 |
eo | 269 | 0.57 0.58 0.59 | 274 | 0.57 0.57 0.57 | 0.57 0.57 0.57 |
fr | 2040 | 0.54 0.54 0.56 | 2052 | 0.53 0.52 0.52 | 0.98 0.98 0.98 |
gl | 157 | 0.68 0.68 0.67 | 157 | 0.67 0.69 0.74 | 0.94 0.94 0.95 |
be | 7 | 0.48 0.48 0.50 | 7 | 0.49 0.49 0.56 | 0.85 0.80 0.80 |
de | 1374 | 2.15 2.06 2.07 | 1390 | 2.33 2.04 2.04 | 2.48 2.50 2.46 |
is | 39 | 0.54 0.51 0.51 | 39 | 0.50 0.65 0.58 | 1.06 1.07 1.06 |
it | 116 | 0.39 0.29 0.28 | 116 | 0.33 0.29 0.29 | 0.66 0.66 0.67 |
km | 24 | 0.56 0.56 0.56 | 24 | 0.56 0.57 0.56 | 0.56 0.56 0.56 |
lt | 6 | 0.21 0.21 0.22 | 6 | 0.21 0.22 0.21 | 0.64 0.60 0.61 |
ml | 23 | 0.50 0.51 0.50 | 23 | 0.51 0.50 0.50 | 0.97 0.96 0.97 |
pl | 1029 | 0.82 0.85 0.82 | 1029 | 0.83 0.81 0.81 | 1.68 1.69 1.67 |
ro | 459 | 0.67 0.68 0.67 | 459 | 0.67 0.66 0.68 | 1.18 1.17 1.19 |
ru | 153 | 0.63 0.62 0.64 | 153 | 0.62 0.61 0.62 | 1.75 1.76 1.77 |
sk | 58 | 0.68 0.66 0.65 | 58 | 0.64 0.64 0.65 | 1.50 1.45 1.47 |
sl | 86 | 0.52 0.51 0.50 | 86 | 0.51 0.52 0.51 | 1.09 1.08 1.08 |
es | 70 | 0.55 0.56 0.55 | 70 | 0.55 0.54 0.54 | 0.90 0.91 0.90 |
sv | 26 | 0.11 0.10 0.11 | 26 | 0.11 0.10 0.11 | 0.11 0.11 0.11 |
tl | 44 | 0.28 0.26 0.25 | 44 | 0.26 0.25 0.25 | 0.42 0.42 0.45 |
uk | 25 | 1.27 1.21 1.27 | 25 | 1.24 1.24 1.22 | 1.71 1.75 1.73 |
Remarks looking at those numbers:
* using -d HUNSPELL_RULE is a now about as fast as it used
to be before the Hunspell checkins (good!). Your lazy initialization
paid off. Thanks!
* Using Hunspell has a noticeable overhead of course but that's
expected
* The only language that is slightly slower between svn r7963
and svn 7247 is Catalan. But I noticed that Jaume is very
active these days with the Catalan checker so that may be
expected. Strangely though, it is slower and yet there are less
rules now (434 -> 397).
* The number of rules is the number of XML rules.
Script does not count the Java rules.
* For languages that use the SRX tokenizer, the speed up
work made by Jarek a few weeks ago is clearly measurable if
I compared the startup time with those in an old email available here:
http://sourceforge.net/mailarchive/message.php?msg_id=28672649
For example, startup time for the Esperanto checker which use SRX
used to be 0.84 sec. It is now 0.57 sec (when Hunspell is disabled
of course).
I'll checkin the script to measure speed soon.
Regards
-- Dominique
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel