Re: [lingu-dev] Assistance on Enconding different

Sunday Bolaji Mon, 02 Feb 2009 07:31:11 -0800

Hi,
    For the redundant dictionary are we putting all the words with different 
encoding in one dictionary file or create a dictionary file each for  words 
with the same enconding .


Best regards
Jeje

--- On Mon, 2/2/09, Németh László <[email protected]> wrote:
From: Németh László <[email protected]>
Subject: Re: [lingu-dev] Assistance on Enconding different
To: [email protected], [email protected]
Date: Monday, February 2, 2009, 4:41 AM

Hi,

2009/2/2 Sunday Bolaji <[email protected]>

Hi,

     I have tried your suggestion on temporary solution to unicode normilisation

and it worked but one thing is not clear to me, are we going to have

separate dictionary for all the with different encoding or are we

putting in our dictionary file.

Another thing i observed with

hunspell is that if the number characters of correct word in the

dictionary file is more than the characters of word wrongly type,

hunspell will suggest diffreent word of the same length as wrong word.

Examples are given below :

(1)

"jókòó" is the correct word in the dictionary, but it will not suggest

it if i type "joke" despite specified in the REP table to replace " o"

with " òó ". it will only suggest " jókòó" if the wrong type word is " jokoo "


(2) " ọ̀rọ̀ " is the correct word in the dictionary, but it will not suggest it 
,if " ọrọ " is type despite specified in the REP table to replace " ọ " with " 
ọ̀ ". And this is due to that  " ọ " is a precomposed single character and " ọ̀ 
"


 and is combination of " ọ " and tone mark. The REP table is shown for similar 
characters. Please is there anything i can to solve this problem.
REP and MAP suggestions are not combined with similarity algorithms, unlike the 
PHONE and ph: phonetic suggestions.


Check the following suggestion parameters:

-- affix file ---
PHONE 4
 PHONE ó o
PHONE ò o
PHONE ọ̀ o
PHONE ọ o

Hunspell will convert "jókòó" to "jokoo" before comparing with the input word 
"joke".

You can use PHONE for normalization, too. Unfortunately, there was a potential 
problem with PHONE and diacritics under Windows, so it better to use ph: fields 
(separated by tabulators) for OpenOffice.org 3.0. Also ph: can work better for 
bigger word differences, too.


--- dic file ----
jókòó


        
         
        
#yiv1274146860  _filtered #yiv1274146860 {margin:2cm;}
#yiv1274146860 P {ph:joko       
ọ̀rọ̀


        ph:oro



Regards,
László



 







REP  94



REP  a  à



REP  à  á



REP  a  á



REP  á  à



REP  a  àà



REP  à  àà



REP  a  àá



REP  à  àá



REP  á  àá



REP  a  áà



REP  à  áà



REP  á  áà



REP  a  aa



REP  a  aá



REP  ai  àì



REP  ai  a



REP  ài

à



REP  ái  á



REP  e  è



REP  è  é



REP  e  é



REP  é  è



REP  e  ẹ̀



REP  e  ẹ́



REP  ẹ  ẹ̀



REP  ẹ̀  ẹ́



REP  ẹ  ẹ́



REP  ẹ́  ẹ̀



REP  e  ẹ



REP  è  ẹ̀



REP  é  ẹ́



REP  e  èè



REP  è  èè



REP  e  éè



REP  e  éé



REP  é  éé



REP  e  èé



REP  e  eé



REP  e  ee



REP  ẹ́  ẹ́ẹ̀



REP  e  ẹ́ẹ̀



REP  ẹ  ẹ́ẹ̀



REP  e  ẹ̀ẹ̀



REP  ẹ  ẹ̀ẹ̀



REP  ẹ  ẹ̀ẹ́



REP  e  ẹ̀ẹ́



REP  e  ẹẹ



REP  ẹ  ẹẹ



REP  i  ì



REP  ì  í



REP  i  í



REP  í  ì



REP  i  íì



REP  i  in



REP  n  ǹ



REP  n  ń



REP  o  ọ̀



REP  o  ọ́



REP  o  ò



REP  ò  ó



REP  o  ó



REP  ó  ò



REP  ọ  ọ̀



REP  ọ̀  ọ́



REP  ọ  ọ́



REP  ọ́

ọ̀



REP  o  ọ



REP  ò

ọ̀



REP  ó  ọ́



REP  o  òò



REP  ò  òò



REP  o  oo



REP  o  oó



REP  o  òó



REP  o  ọ̀ọ̀



REP  ọ  ọ̀ọ̀



REP  ọ̀  ọ̀ọ̀



REP  ọ̀

ọ̀ọ́



REP  ọ  ọ̀ọ́



REP  o  ọ̀ọ́



REP  ọ́

ọ̀ọ́



REP  s  ṣ



REP  ṣ  s



REP  u  ù



REP  u  ú



REP  u  ùú



REP  ù  ùú



REP  ú  ùú



REP  ù  ùù



REP  u  ùù



REP  h y



REP  E Ẹ



REP  S Ṣ



REP  O Ọ



Best regards,Jeje























---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected]

For additional commands, e-mail: [email protected]

Re: [lingu-dev] Assistance on Enconding different

Reply via email to