---------- Forwarded message ----------
Date: Mon, 30 Sep 2002 12:35:20 +0530
From: Dr Abhijit Das <[EMAIL PROTECTED]>
To: Kaushik Ghose <[EMAIL PROTECTED]>
Subject: Re: [ilug-cal] dictionary project


> So Barda (avijit das, <[EMAIL PROTECTED]>) has a list of words with his
> bengali writer distribution, for his spell checker. There are 112,943
> words in that list encoded in ISCII, barda's bengali writer format and one
> other format which I can't figure out right now. I don't know how many of
> those words are "duplicate" ie noun/verb forms etc.

No words are duplicated. Different parts of speech are written as
0 (Noun), 1 (Adjective & Adverb), 2 (Pronoun), 3 (abyay), 4 (verb)
in the line of a word. But different verb forms (kori, korun, koruk,
korchhilo etc.) are listed separately. Without these verb forms the
dictionary size is about 50,000 words.

But the greatest danger in using this dictionary is that it has
not been spell-checked yet! Since there are (apparently) no other
databases, this has to be done manually! A pain. I sincerely
appreciate volunteers' efforts in this regard.

> I'm going to try to convert the IISC part to unicode (utf-8) (there are
> code snippets out there which do this) and that should be a good spring
> board. (My plan right now is to tweak barda's list and his spell check
> algorithm to run on Lekho)

The latest version (3.0) of bwedit has a doc directory. One file in
that directory describes the ISCII encoding of the Bengali alphabet
in explicit details. There is a second document on the spell-checking
algorithm. If you read these, there will be absolutely no problem in
porting the database/spell-checker to any other system (like Lekho).

But the essential problem remains: spell-checking the spell-checker!!!

   Dr Abhijit Das
   Monday September 30 2002
   12:26 PM (IST)

   +-----------------------------------------------------------+
   | Dr Abhijit Das                                            |
   | Visiting Faculty, Department of Mathematics               |
   | Indian Institute of Technology, Kanpur 208 016, UP, India |
   | Phone: +91-512-597753 (off), +91-512-598334 (res)         |
   | E-mail: [EMAIL PROTECTED], [EMAIL PROTECTED]              |
   | URL: http://home.iitk.ac.in/~abhijit/                     |
   +-----------------------------------------------------------+


____________________________________________________________
To unsubscribe from this list [Bangla Penguin] send a mail
to [EMAIL PROTECTED] with 'unsubscribe banglapenguin'
in the subjectline and body.
Archive of this mailing list is available at
http://www.mail-archive.com/[email protected]

Reply via email to