To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=35725





------- Additional comments from [EMAIL PROTECTED] Sat Aug 27 13:35:20 -0700 
2005 -------
I have fixed the problem in Hunspell 1.0.9.
(http://sourceforge.net/projects/hunspell)

(Changelog)

* src/hunspell/{suggestmgr,affixmgr,hunspell}.cxx: improve ngram suggestion.
          Fix http://qa.openoffice.org/issues/show_bug.cgi?id=35725. See release
          notes for examples. This problem reported by beccablain at 
OpenOffice.org.
        - ngram suggestions now are case insensitive (see `Permenant' bug in
Issuezilla)
        - weight ngram suggestions (with the longest common subsequent 
algorithm,
          also considering lengths of bad word and suggestion, identical first
          letters and almost completely identical character positions)
        - set strict affix congruency in expand_rootword(). Now ngram 
suggestions
          are good for languages with rich morphology and also better for 
English.
          Rationale: affixed forms of the first ngram suggestion
          very often suppress the second and subsequent root word suggestions. 
But
          faults in affixes are more uncommon, and can be fix without 
suggestions.
          We must prefer the more informative second and subsequent root word
          suggestions instead of the suggestions for bad affixes.
        - a better suggestion may not be substring of a less good suggestion
          Rationale: Suggesting affixed forms of a root word is
          unnecessary, when root word has got better weighted ngram value.
          (Checking substrings is a good approximation for this refinement.)
        - lesser ngram suggestions (default 3 maximum instead of 10)
          Rationale: For users need a big extra effort to check a lot of bad 
ngram
          suggestions, nine times out of ten unnecessarily. It is very
          distracting, because ngram suggestions could be very different.
          Usually Myspell and Hunspell suggest one or two suggestions with
          the old suggestion algorithms (maximum is 15), with ngram algorithm
          often gives maximum number suggestions. With strict affix congruency
          and other refinements, the good suggestion there is usually among the
          first three elements.
        - new affix parameter: MAXNGRAMSUG

(Release notes)

------ examples for ngram improvement (O=old, N = new ngram suggestions) --

1. Permenant (instead of Permanent)

O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
        Ferment's, Ferments, Fermenting, Countermen, Weathermen

N: Permanent, Supermen, Preferment

Note: Ngram suggestions was case sensitive.

2. permenant (instead of permanent) 

O: supermen, newspapermen, empowerment, endangerment, preferments,
        preferment, permanent, preferment's, permanently, impermanent

N: permanent, supermen, preferment

Note: new suggestions are also weighted with longest common subsequence,
first letter and common character positions

3. pernemant (instead of permanent) 

O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
        supernatant, impermanent, semipermanent, impermanently

N: permanent, supernatant, pimpernel

Note: new method also prefers root word instead of not
relevant affixes ('s, s and ly)


4. pernament (instead of permanent)

O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
        ornament, ornamentals, ornamental, ornamentally

N: ornamental, ornament, tournament

Note: Both ngram methods misses here.


5. obvus (instad of obvious):

O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
        obviates, obviate, Travus

N: obvious, obtuse, obverse

Note: new method also prefers common first letters.


6. unambigus (instead of unambiguous) 

O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
        unambitious, ambiguities, ambiguousness

N: unambiguous, unambiguity, unambitious



7. consecvence (instead of consequence)

O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
        consecutiveness's, convenience's, consistences, consistence

N: consequence, consecutive, consecrates


An example in a language with rich morphology:

8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):

O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben,
        Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, 
Mississippiiben

N: Mississippiben, Mississippiiben, Misiiben

Note: Suggesting not relevant affixes was the biggest fault in ngram
   suggestion for languages with a lot of affixes.



---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to