I agree, that would be the ideal solution. The problem I'm facing with that however, is determining the correct parsing algorithm used by aspell. So far I've just been trying to "reverse engineer" it by trial and error. I thought I had it pretty much figured out until this popped up.
Is there any information out there that gives the exact algorithm (the name of the source file would do), or even better, a regular expression to use? I tried looking through the docs but didn't find anything except for some info on 8 bit chars. It said that it will convert UTF-8 chars to 8 bit chars. Maybe this is why it was counted as a word? Well I will keep plugging away at it. Any suggestions will be greatly appreciated. Thanks! ...aaron On 8/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hello, > > I see in your sample url following: > scope â€" geniral usage > > Therefore for me aspell works (almost) perfectly > considering > scope > â€" > usage > as correct > and geniral as an error > > In my opinion your algorithm should consider > â€" as a word, and that would fix the problem. > > -eleonora > > > > Hello, > > > > I am playing around with aspell as a server side spell checker for a > > flash application. It works beautifully (and fast as hell too!), but I > > did notice one little oddity that I haven't been able to find an > > explanation for in the docs. > > > > The problem happens when there is a special character in the text. I > > am not sure all of the special characters that cause my word counting > > algorithm to fail, but here is an example of the one that caused > > breakage (one of those long dashes that was in some text copied from a > > wiki). > > > > http://labs.splashlabs.com/spellcheck/1186978249 > > > > When I pipe the above file through aspell (en_US), i get back the result: > > > > aspell -a < 1186978249 > > @(#) International Ispell Version 3.1.20 (but really Aspell 0.50.5) > > * > > * > > & geniral 5 10: general, genital, genial, generally, generals > > * > > > > So it appears to count the lone character as a word. In my own > > program, I have to count words to find the start and end char indexes > > of the incorrect word. Since my algorithm does not count it as a word, > > my word count becomes off. > > > > Are there any options I can pass to prevent it from being counted? Or > > is there a way to figure out what all is counted as a word so I can > > match my own regex to it? > > > > Thanks for any advice! > > ...aaron > > -- > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. > Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail > > > _______________________________________________ > Aspell-user mailing list > [email protected] > http://lists.gnu.org/mailman/listinfo/aspell-user > -- Aaron Miller Chief Technology Officer Splash Labs, LLC. [EMAIL PROTECTED] | 206-328-5485 http://www.splashlabs.com _______________________________________________ Aspell-user mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-user
