Re: [lingu-dev] Assistance on Enconding different

Sunday Bolaji Fri, 06 Feb 2009 04:47:20 -0800

Hi,
     Thanks for  your response.
I am having a problem understanding why Hunspell doe not give me suggestions 
such as ẹ̀kọ́ and ẹ̀kọ (which are present in my dictionary file) for  eko which 
is a likely wrong spelling  for the two words.
     In general, what changes can I make in the  TRY, REP table, MAP table, 
PHONE table  and KEY so that hunspell will suggest words in bracket as part of 
the suggested word as i shown below  (I am including my TRY, REP, MAP, PHONE 
and Key files.


My boss Dr.Adegbola said he has written you a letter inviting you for a spell 
cheker meeting for African languages to hold here in Nigeria.  I hope you will 
be able to make it so that we can meet you.



Hunspell 1.2.7
& eko 7 3: èkó, ko, oko, epo, ìko, eku, e ko    (  ẹ̀kọ́, ẹ̀kọ )

& èko 8 0: èkó, ko, èwo, oko, ìko, èso, èké, è ko  (ẹ̀kọ́, ẹ̀kọ )

& ekó 6 0: èkó, kó, okó, ìkó, eku, e kó       (ẹ̀kọ́, ẹ̀kọ )

& ẹ̀kọ̀ 7 0: ẹ̀kọ́, ẹ̀kọ, ẹ̀rọ̀, ẹ̀dọ̀, ẹ̀yọ̀, ẹ̀ kọ̀, ẹ̀-kọ̀

& ẹkọ 9 0: èkó, kọ, ẹ̀kọ, ẹbọ, akọ, ọkọ,
 ẹyọ, ẹkẹ, ẹ kọ   (ẹ̀kọ́ )

& ẹkọ̀ 5 0: kọ̀, ẹ̀kọ, àkọ̀, ọkọ̀, ẹ kọ̀               ( ẹ̀kọ́  )

& ẹkọ́ 7 0: ẹjọ́, kọ́, ẹ̀kọ́, ẹmọ́, ọkọ́, ìkọ́, ẹ kọ́     (  ẹ̀kọ  )

& ẹ́kọ́ 4 0: ẹ̀kọ́, ńkọ́, ẹ́ kọ́, ẹ́-kọ́        ( ẹ̀kọ́, ẹ̀kọ )

& ekọ 6 0: èkó, kọ, akọ, ọkọ, eku, e kọ  ( ẹ̀kọ́, ẹ̀kọ )

& ẹko 6 0: èkó, ko, oko, ìko, ẹkẹ, ẹ ko       ( ẹ̀kọ́, ẹ̀kọ )

& èkọ 6 0: èkó, kọ, akọ, ọkọ, èké, è kọ      ( ẹ̀kọ́, ẹ̀kọ )

& ẹkó 6 0: èkó, kó, okó, ìkó, ẹkẹ, ẹ
 kó      ( ẹ̀kọ́, ẹ̀kọ )

& ọrọ 12 0: òro, orò, oró, rọ, tọrọ, ọrọ̀, ọmọ, ọkọ, ọlọ, àrọ, arọ, ọrẹ   (  
ọ̀rọ̀ )

& oro 9 0: orò, òro, oró, ro, oko, orí, ore, orù, o ro       (ọ̀rọ̀, ọrọ̀ )

& ọro 5 0: òro, orò, oró, ro, ọrẹ           (ọ̀rọ̀, ọrọ̀ )  

& orọ 10 0: òro, orò, oró, rọ, àrọ, arọ, orí, ore, orù, o rọ  (ọ̀rọ̀, ọrọ̀ )

& ọ̀ro 2 0: òòró, ọ̀rá      ( ọ̀rọ̀, ọrọ̀ )

& ọrò 6 0: òro, orò, oró, rò, ọrẹ, èrò   ( ọ̀rọ̀, ọrọ̀ )

& ọ́rọ 3 0: òòró, ọ́ rọ, ọ́-rọ      ( ọ̀rọ̀, ọrọ̀ )

& ọ̀rọ 6 0:
 òòró, ọ̀rọ̀, ọrọ̀, ọ̀bọ, ọ̀rá, ẹ̀rọ

*

& ọrọ́ 5 0: ọrọ̀, rọ́, ọkọ́, ọwọ́, ọrẹ́    (  ọrọ̀  )

*

& ọ́rọ́ 7 0: òórọ̀, ọ̀rọ̀, tọ́rọ́, pọ́rọ́, rọ́rọ́, ọ́ rọ́, ọ́-rọ́

& ọ́rọ̀ 7 0: òórọ̀, ọ̀rọ̀, ọrọ̀, tọ́rọ̀, lọ́rọ̀, ọ́ rọ̀, ọ́-rọ̀

& ọ̀rọ́ 7 0: òórọ̀, ọ̀rọ̀, ọ̀wọ́, ọ̀dọ́, ọ̀yọ́, ọ̀ṣọ́, ọ̀rẹ́   ( ọrọ̀ ).
NOTE : The " ọ̀,   ọ́,   ẹ̀,  ẹ́,  "  are combination  of  two  characters  " ọ 
or ẹ " and tone mark  .̀        ́ The sample of our affix file is also shown 
below:



SET  UTF-8 

KEY  ọwertyuiop|asdfghjkl|ṣẹbnm  

TRY    tmnkwlbàaáóoòprọ̀ọ́ọíìfdyṣsgẹ̀ẹ́ẹéèeùúuTMNṢLBÀÁAÒÓOPRÒÓỌÌÍIGÈ
ÉẸÈÉE   

REP  94 

REP  a  à 

REP  à  á 

REP  a  á 

REP  á  à 

REP  a  àà 

REP  à  àà 

REP  a  àá 

REP  à  àá 

REP  á  àá 

REP  a  áà 

REP  à  áà 

REP  á  áà 

REP  a  aa 

REP  a  aá 

REP  ai  àì 

REP  ai  a 

REP  ài 
à   

REP  ái  á 

REP  e  è 

REP  è  é 

REP  e  é 

REP  é  è 

REP  e  ẹ̀ 

REP  e  ẹ́ 

REP  ẹ  ẹ̀ 

REP  ẹ̀  ẹ́  

REP  ẹ  ẹ́            

REP  ẹ́  ẹ̀ 

REP  e  ẹ 

REP  è  ẹ̀ 

REP  é  ẹ́ 

REP  e  èè 

REP  è  èè  

REP  e  éè 

REP  e  éé 

REP  é  éé 

REP  e  èé 

REP  e  eé 

REP  e  ee 

REP  ẹ́  ẹ́ẹ̀ 

REP  e  ẹ́ẹ̀ 

REP  ẹ  ẹ́ẹ̀ 

REP  e  ẹ̀ẹ̀ 

REP  ẹ  ẹ̀ẹ̀ 

REP  ẹ  ẹ̀ẹ́ 

REP  e  ẹ̀ẹ́ 

REP  e  ẹẹ 

REP  ẹ  ẹẹ 

REP  i  ì 

REP  ì  í 

REP  i  í 

REP  í  ì 

REP  i  íì 

REP  i  in 

REP  n  ǹ 

REP  n  ń 

REP  o  ọ̀ 

REP  o  ọ́  

REP  o  ò 

REP  ò  ó 

REP  o  ó 

REP  ó  ò 

REP  ọ  ọ̀ 

REP  ọ̀  ọ́  

REP  ọ  ọ́  

REP  ọ́ 
ọ̀   

REP  o  ọ  

REP  ò 
ọ̀   

REP  ó  ọ́ 

REP  o  òò 

REP  ò  òò 

REP  o  oo 

REP  o  oó 

REP  o  òó 

REP  o  ọ̀ọ̀ 

REP  ọ  ọ̀ọ̀ 

REP  ọ̀  ọ̀ọ̀ 

REP  ọ̀ 
ọ̀ọ́ 

REP  ọ  ọ̀ọ́ 

REP  o  ọ̀ọ́ 

REP  ọ́ 
ọ̀ọ́ 

REP  s  ṣ 

REP  ṣ  s 

REP  u  ù 

REP  u  ú 

REP  u  ùú 

REP  ù  ùú 

REP  ú  ùú 

REP  ù  ùù 

REP  u  ùù 

REP  h y 

REP  E Ẹ 

REP  S Ṣ 

REP  O Ọ  

   

MAP 12 

MAP àaá 

MAP ọ̀ọọ́óoò   

MAP ìíi 

MAP ṣs 

MAP ẹ̀ẹ́ẹèée 

MAP ǹńn 

MAP ùúu 

MAP SṢ 

MAP ÀÁA 

MAP Ọ̀Ọ́ỌÒÓO 

MAP ÌÍI 

MAP ẸÈÉE 

   

PHONE 37 

PHONE à a 

PHONE á a 

PHONE aa a 

PHONE ó o 

PHONE ò o 

PHONE ọ̀ o 

PHONE ọ o 

PHONE ọ́ o 

PHONE ọ̀ ọ 

PHONE oo o 

PHONE í i 

PHONE ì i 

PHONE ṣ s 

PHONE ẹ̀ e 

PHONE ẹ́ e 

PHONE ẹ e 

PHONE è e 

PHONE é e 

PHONE ee e 

PHONE ǹ n 

PHONE ń n 

PHONE ù u 

PHONE ú u 

PHONE uu u 

PHONE Ṣ S 

PHONE À A 

PHONE Á A 

PHONE Ò O 

PHONE Ọ̀ O 

PHONE Ó O 

PHONE Ọ́ O 

PHONE Ọ O 

PHONE Ì I 

PHONE Í I 

PHONE È E 

PHONE É E 

PHONE E Ẹ  

   

ICONV 7 

ICONV ọ  ọ 

ICONV ọ̀  ọ̀   

ICONV ọ́  ọ́ 

ICONV ṣ  ṣ 

ICONV ẹ̀  ẹ̀ 

ICONV ẹ́  ẹ́ 

ICONV ẹ  ẹ
Best regards,Jeje
 

   

   

   





--- On Wed, 2/4/09, Németh László <[email protected]> wrote:
From: Németh László <[email protected]>
Subject: Re: [lingu-dev] Assistance on Enconding different
To: [email protected]
Cc: [email protected]
Date: Wednesday, February 4, 2009, 1:22 AM

Hi,

The second method could be better for suggestions. Using multiple dictionaries 
to the same locale, spell checker component of OpenOffice.org 3.x will suggest 
in the following format:

suggestion_from_the_first_dictionary1


suggestion_from_the_first_dictionary2

suggestion_from_the_first_dictionary3

suggestion_from_the_second_dictionary1


suggestion_from_the_second_dictionary2


suggestion_from_the_second_dictionary3


etc.

So the suggestions with different encodings are in different blocks. This is 
the preferred method, if you want suggestions with multiple encodings.

Best regards,
László




2009/2/2 Sunday Bolaji <[email protected]>


Hi,
    For the redundant dictionary are we putting all the words with different 
encoding in one dictionary file or create a dictionary file each for  words 
with the same enconding .

Best regards
Jeje


--- On Mon, 2/2/09, Németh László <[email protected]> wrote:

From: Németh László <[email protected]>
Subject: Re: [lingu-dev] Assistance on Enconding different
To: [email protected], [email protected]

Date: Monday, February 2, 2009, 4:41 AM

Hi,

2009/2/2 Sunday Bolaji <[email protected]>


Hi,

     I have tried your suggestion on temporary solution to unicode normilisation

and it worked but one thing is not clear to me, are we going to have

separate dictionary for all the with different encoding or are we

putting in our dictionary file.

Another thing i observed with

hunspell is that if the number characters of correct word in the

dictionary file is more than the characters of word wrongly type,

hunspell will suggest diffreent word of the same length as wrong word.

Examples are given below :

(1)

"jókòó" is the correct word in the dictionary, but it will not suggest

it if i type "joke" despite specified in the REP table to replace " o"

with " òó ". it will only suggest " jókòó" if the wrong type word is " jokoo "



(2) " ọ̀rọ̀ " is the correct word in the dictionary, but it will not suggest it 
,if " ọrọ " is type despite specified in the REP table to replace " ọ " with " 
ọ̀ ". And this is due to that  " ọ " is a precomposed single character and " ọ̀ 
"



 and is combination of " ọ " and tone mark. The REP table is shown for similar 
characters. Please is there anything i can to solve this problem.
REP and MAP suggestions are not combined with similarity algorithms, unlike the 
PHONE and ph: phonetic suggestions.



Check the following suggestion parameters:

-- affix file ---
PHONE 4
 PHONE ó o
PHONE ò o
PHONE ọ̀ o
PHONE ọ o

Hunspell will convert "jókòó" to "jokoo" before comparing with the input word 
"joke".


You can use PHONE for normalization, too. Unfortunately, there was a potential 
problem with PHONE and diacritics under Windows, so it better to use ph: fields 
(separated by tabulators) for OpenOffice.org 3.0. Also ph: can work better for 
bigger word differences, too.



--- dic file ----
jókòó


        
         
        ph:joko 
ọ̀rọ̀


        ph:oro



Regards,
László



 







REP  94



REP  a  à



REP  à  á



REP  a  á



REP  á  à



REP  a  àà



REP  à  àà



REP  a  àá



REP  à  àá



REP  á  àá



REP  a  áà



REP  à  áà



REP  á  áà



REP  a  aa



REP  a  aá



REP  ai  àì



REP  ai  a



REP  ài

à



REP  ái  á



REP  e  è



REP  è  é



REP  e  é



REP  é  è



REP  e  ẹ̀



REP  e  ẹ́



REP  ẹ  ẹ̀



REP  ẹ̀  ẹ́



REP  ẹ  ẹ́



REP  ẹ́  ẹ̀



REP  e  ẹ



REP  è  ẹ̀



REP  é  ẹ́



REP  e  èè



REP  è  èè



REP  e  éè



REP  e  éé



REP  é  éé



REP  e  èé



REP  e  eé



REP  e  ee



REP  ẹ́  ẹ́ẹ̀



REP  e  ẹ́ẹ̀



REP  ẹ  ẹ́ẹ̀



REP  e  ẹ̀ẹ̀



REP  ẹ  ẹ̀ẹ̀



REP  ẹ  ẹ̀ẹ́



REP  e  ẹ̀ẹ́



REP  e  ẹẹ



REP  ẹ  ẹẹ



REP  i  ì



REP  ì  í



REP  i  í



REP  í  ì



REP  i  íì



REP  i  in



REP  n  ǹ



REP  n  ń



REP  o  ọ̀



REP  o  ọ́



REP  o  ò



REP  ò  ó



REP  o  ó



REP  ó  ò



REP  ọ  ọ̀



REP  ọ̀  ọ́



REP  ọ  ọ́



REP  ọ́

ọ̀



REP  o  ọ



REP  ò

ọ̀



REP  ó  ọ́



REP  o  òò



REP  ò  òò



REP  o  oo



REP  o  oó



REP  o  òó



REP  o  ọ̀ọ̀



REP  ọ  ọ̀ọ̀



REP  ọ̀  ọ̀ọ̀



REP  ọ̀

ọ̀ọ́



REP  ọ  ọ̀ọ́



REP  o  ọ̀ọ́



REP  ọ́

ọ̀ọ́



REP  s  ṣ



REP  ṣ  s



REP  u  ù



REP  u  ú



REP  u  ùú



REP  ù  ùú



REP  ú  ùú



REP  ù  ùù



REP  u  ùù



REP  h y



REP  E Ẹ



REP  S Ṣ



REP  O Ọ



Best regards,Jeje























---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected]

For additional commands, e-mail: [email protected]

Re: [lingu-dev] Assistance on Enconding different

Reply via email to