Re: [Languagetool] How to enable spellchecking?

Dominique Pellé Tue, 05 Jun 2012 13:32:36 -0700

Daniel Naber <list2...@danielnaber.de> wrote:

> On Montag, 4. Juni 2012, Dominique Pellé wrote:
>
> > My script to measure startup time is available here:
> >
> > http://dominique.pelle.free.fr/startup-time-lt.sh
>
> Can you try again? The hunspell rule now loads lazily. Would you like to
> commit that script? it seems useful. src/scripts could be a good place.
>
> Regards
>  Daniel



Trying again.  In my previous numbers, I realized later
that I was checking with an empty input text. That is
why, with or without -d HUNSPELL_RULE, the timing
was about the same (Hunspell dictionary was not probed).
But when passing an input sentence such as "foo bar", I
then measure a slow down with Hunspell (but that's expected
of course).

What was not expected was the slow down when using
-d HUNSPELL_RULE to disable Hunspell rule, compared to
when commenting out Hunspell rule in the code.  That appears
to be fixed now.

Here are the numbers:

              startup time in sec (3 samples) when checking
              a 2-word sentence "foo bar".
    +------------------------+--------------------------------------------+
    |        svn 7963 prior  |             latest svn r7237               |
    |        to hunspell     |       +------------------+-----------------+
    |        checkins        |       | svn r7247        | svn r7247       |
    |                        |       | hunspell disabled| hunspell enabled|
lang|#rule |                 |#rules | -d HUNSPELL_RULE |                 |
----+------+-----------------+-------+------------------+-----------------+
ast |   61 |  0.30 0.27 0.27 |    61 |  0.26 0.27 0.27  | 0.67 0.61 0.61  |
 br |  437 |  0.51 0.49 0.49 |   460 |  0.51 0.51 0.50  | 1.65 1.64 1.65  |
 ca |  434 |  0.64 0.64 0.64 |   397 |  0.88 0.70 0.70  | 1.04 1.06 1.08  |
 cs |    1 |  0.11 0.10 0.10 |     1 |  0.11 0.11 0.10  | 0.11 0.11 0.11  |
 zh |  328 |  2.33 2.27 2.27 |   328 |  2.40 2.27 2.27  | 2.29 2.36 2.29  |
 da |   22 |  0.51 0.51 0.51 |    22 |  0.52 0.51 0.51  | 1.09 1.08 1.07  |
 nl |  336 |  0.61 0.60 0.60 |   336 |  0.62 0.61 0.60  | 1.16 1.14 1.16  |
 en |  787 |  0.80 0.70 0.69 |   797 |  0.73 0.71 0.69  | 0.69 0.70 0.69  |
 eo |  269 |  0.57 0.58 0.59 |   274 |  0.57 0.57 0.57  | 0.57 0.57 0.57  |
 fr | 2040 |  0.54 0.54 0.56 |  2052 |  0.53 0.52 0.52  | 0.98 0.98 0.98  |
 gl |  157 |  0.68 0.68 0.67 |   157 |  0.67 0.69 0.74  | 0.94 0.94 0.95  |
 be |    7 |  0.48 0.48 0.50 |     7 |  0.49 0.49 0.56  | 0.85 0.80 0.80  |
 de | 1374 |  2.15 2.06 2.07 |  1390 |  2.33 2.04 2.04  | 2.48 2.50 2.46  |
 is |   39 |  0.54 0.51 0.51 |    39 |  0.50 0.65 0.58  | 1.06 1.07 1.06  |
 it |  116 |  0.39 0.29 0.28 |   116 |  0.33 0.29 0.29  | 0.66 0.66 0.67  |
 km |   24 |  0.56 0.56 0.56 |    24 |  0.56 0.57 0.56  | 0.56 0.56 0.56  |
 lt |    6 |  0.21 0.21 0.22 |     6 |  0.21 0.22 0.21  | 0.64 0.60 0.61  |
 ml |   23 |  0.50 0.51 0.50 |    23 |  0.51 0.50 0.50  | 0.97 0.96 0.97  |
 pl | 1029 |  0.82 0.85 0.82 |  1029 |  0.83 0.81 0.81  | 1.68 1.69 1.67  |
 ro |  459 |  0.67 0.68 0.67 |   459 |  0.67 0.66 0.68  | 1.18 1.17 1.19  |
 ru |  153 |  0.63 0.62 0.64 |   153 |  0.62 0.61 0.62  | 1.75 1.76 1.77  |
 sk |   58 |  0.68 0.66 0.65 |    58 |  0.64 0.64 0.65  | 1.50 1.45 1.47  |
 sl |   86 |  0.52 0.51 0.50 |    86 |  0.51 0.52 0.51  | 1.09 1.08 1.08  |
 es |   70 |  0.55 0.56 0.55 |    70 |  0.55 0.54 0.54  | 0.90 0.91 0.90  |
 sv |   26 |  0.11 0.10 0.11 |    26 |  0.11 0.10 0.11  | 0.11 0.11 0.11  |
 tl |   44 |  0.28 0.26 0.25 |    44 |  0.26 0.25 0.25  | 0.42 0.42 0.45  |
 uk |   25 |  1.27 1.21 1.27 |    25 |  1.24 1.24 1.22  | 1.71 1.75 1.73  |

Remarks looking at those numbers:

* using -d HUNSPELL_RULE is a now about as fast as it used
  to be before the Hunspell checkins (good!). Your lazy initialization
  paid off. Thanks!

* Using Hunspell has a noticeable overhead of course but that's
   expected

* The only language that is slightly slower between svn r7963
  and svn 7247 is Catalan.  But I noticed that Jaume is very
  active these days with the Catalan checker so that may be
  expected. Strangely though, it is slower and yet there are less
  rules now (434 -> 397).

* The number of rules is the number of XML rules.
   Script does not count the Java rules.

* For languages that use the SRX tokenizer, the speed up
  work made by Jarek a few weeks ago is clearly measurable if
  I compared the startup time with those in an old email available here:

  http://sourceforge.net/mailarchive/message.php?msg_id=28672649

  For example, startup time for the Esperanto checker which use SRX
  used to be 0.84 sec.  It is now  0.57 sec  (when Hunspell is disabled
  of course).

I'll checkin the script to measure speed soon.

Regards
-- Dominique

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: [Languagetool] How to enable spellchecking?

Reply via email to