Bug#487874: Problem is in hunspell and not myspell-en-gb

2008-07-07 Thread John Winters
I've now diagnosed this problem and (unless there are some stringent
rules on the contents of .aff files which I've been unable to find) the
problem lies in hunspell and not in myspell-en-gb.  It's just that the
extra complexity of myspell-en-gb tickles the bug in hunspell.

An example of a rule which hunspell fails to process correctly is:

SFX D 0 ed [aeio][aeiou][bcdfgkmnprstvz]

And a suitable word for demonstrating the problem is entertained.

The actual processing happens in the SfxEntry::test_condition method in
the affentry.cxx compilation unit.  It tries to work backwards through
the proposed stem (entertain) and at the same time backwards through
the above comparison rule.

When it finds the 'n' (the last letter of entertain) in the third
group it sets a flag (called ingroup) and decrements its pointer into
the target word so that pointer is now pointing at the 'i' of
entertain.  However it then carries on working its way through the
characters of the third group.  Fortunately there is no 'i' there to
find so the bug does no harm.

The code then moves on to the second group of characters, and works
backwards through it until it finds the 'i', which matches the letter
currently being pointed at in the target word.  It sets the flag again
and again decrements the pointer into the target word so now it points
at the 'a' in entertain.  This time the bug bites.  Because the code
goes on processing the remaining characters in the second group it then
finds the 'a' there, which causes the pointer into the target word to be
decremented again - now it points to the latter 't' in entertain and
so when the code comes to process the first group of characters it can't
get a match (it wants to find 'a', but that's been and gone).

As a quick demonstration that this explanation is correct, you can edit
the en_GB.aff file and reverse the middle group of the rule so that it
reads:

SFX D 0 ed [aeio][uoiea][bcdfgkmnprstvz]

Hunspell then successfully recognises entertained as being a word.

Can this bug be reassigned to the hunspell source package?



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#487874: Problem is in hunspell and not myspell-en-gb

2008-07-07 Thread Rene Engelhard
tag 487864 - help
tag 487864 - moreinfo
reassign 487864 libhunspell-1.2-0
forwarded 487864 
http://sourceforge.net/tracker/index.php?func=detailaid=2012753group_id=143754atid=756395
thanks

Hi,

John Winters wrote:
 I've now diagnosed this problem and (unless there are some stringent
 rules on the contents of .aff files which I've been unable to find) the
 problem lies in hunspell and not in myspell-en-gb.  It's just that the
 extra complexity of myspell-en-gb tickles the bug in hunspell.
...

Thanks for your analysis.

 Can this bug be reassigned to the hunspell source package?

No. (I'll reassign it to the buggy library in question -
libhunspell-1.2-0)

Thanks for filing the bug upstream, too. Marking as forwarded.

Regards,

Rene



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]