I think my last email went to the wrong place. If not, I apologize. Here it is...
On 8/15/07, Aaron Miller <[EMAIL PROTECTED]> wrote: > I agree, that would be the ideal solution. The problem I'm facing with > that however, is determining the correct parsing algorithm used by > aspell. So far I've just been trying to "reverse engineer" it by trial > and error. I thought I had it pretty much figured out until this > popped up. > > Is there any information out there that gives the exact algorithm (the > name of the source file would do), or even better, a regular > expression to use? I tried looking through the docs but didn't find > anything except for some info on 8 bit chars. It said that it will > convert UTF-8 chars to 8 bit chars. Maybe this is why it was counted > as a word? > > Well I will keep plugging away at it. Any suggestions will be greatly > appreciated. > > Thanks! > ...aaron > > On 8/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Hello, > > > > I see in your sample url following: > > scope â€" geniral usage > > > > Therefore for me aspell works (almost) perfectly > > considering > > scope > > â€" > > usage > > as correct > > and geniral as an error > > > > In my opinion your algorithm should consider > > â€" as a word, and that would fix the problem. > > > > -eleonora > > > > > > > Hello, > > > > > > I am playing around with aspell as a server side spell checker for a > > > flash application. It works beautifully (and fast as hell too!), but I > > > did notice one little oddity that I haven't been able to find an > > > explanation for in the docs. > > > > > > The problem happens when there is a special character in the text. I > > > am not sure all of the special characters that cause my word counting > > > algorithm to fail, but here is an example of the one that caused > > > breakage (one of those long dashes that was in some text copied from a > > > wiki). > > > > > > http://labs.splashlabs.com/spellcheck/1186978249 > > > > > > When I pipe the above file through aspell (en_US), i get back the result: > > > > > > aspell -a < 1186978249 > > > @(#) International Ispell Version 3.1.20 (but really Aspell 0.50.5) > > > * > > > * > > > & geniral 5 10: general, genital, genial, generally, generals > > > * > > > > > > So it appears to count the lone character as a word. In my own > > > program, I have to count words to find the start and end char indexes > > > of the incorrect word. Since my algorithm does not count it as a word, > > > my word count becomes off. > > > > > > Are there any options I can pass to prevent it from being counted? Or > > > is there a way to figure out what all is counted as a word so I can > > > match my own regex to it? > > > > > > Thanks for any advice! > > > ...aaron > > > > -- > > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. > > Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail > > > > > > _______________________________________________ > > Aspell-user mailing list > > [email protected] > > http://lists.gnu.org/mailman/listinfo/aspell-user > > > > > -- > Aaron Miller > Chief Technology Officer > Splash Labs, LLC. > [EMAIL PROTECTED] | 206-328-5485 > http://www.splashlabs.com > -- Aaron Miller Chief Technology Officer Splash Labs, LLC. [EMAIL PROTECTED] | 206-328-5485 http://www.splashlabs.com _______________________________________________ Aspell-user mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-user
