Re: [Aspell-user] small bug: two following non alpha characters

Gary Setter Wed, 26 Oct 2005 09:09:54 -0700

----- Original Message ----- 
From: "Kevin Atkinson" <[EMAIL PROTECTED]>
To: "Pablo Saratxaga" <[EMAIL PROTECTED]>
Cc: <[email protected]>
Sent: Tuesday, October 25, 2005 9:42 PM
Subject: Re: [Aspell-user] small bug: two following non alpha
characters



> On Tue, 25 Oct 2005, Pablo Saratxaga wrote:
>
> > I found a bug in last versions of aspell that wasn't there
previously.
> > It probably doesn't show in the majority of languages, but
> > it shows in Walloon; as you can have words lake
"hait-l'-ovraedje"
> > note the apostrophe followed by an hyphen.
>
> This is not a bug but a limitation of Aspell.  Aspell can not
handle the
> case of two "middle" characters in a row.  Aspell 0.50 accepted
the word
> when creating a dictionary but it would never be able to check
a word
> since the word will always be split into something like
"hait-l"
> "ovraedje".  Aspell 0.60 checks that words are valid before
accepting it.
>
Hi,
Back in August I was trying to make my program working with
Unicode and the koi8-r character set. One of the problems was
tokenizing the text into words. It seemed aspell was treating all
character sets as ASCII. The speller object does have a language
member and the language member does have a sense of the
characteristics of each character in the characterset. What are
the characteristics of the ampersand and dash in your
characterset? Might aspell make use those characterset specific
characteristics to tokenize "hait-l'-ovraedje" as one word?

In my port, I added functions to find the offset to the next
word and to find the length of the next word. Currently, it
tokenized based only on the alpha characteristic. I'm not sure
that is proper in all cases and I'm open to other ideas on how
this should be done. I can submit a patch to aspell, if that
would be helpful.

Best regards,
Gary Setter



_______________________________________________
Aspell-user mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/aspell-user

Re: [Aspell-user] small bug: two following non alpha characters

Reply via email to