Hi, 2009/2/6 Sunday Bolaji <[email protected]>: > Hi, > Thanks for your response. > I am having a problem understanding why Hunspell doe not give me suggestions > such as ẹ̀kọ́ and ẹ̀kọ (which are present in my dictionary file) for eko > which is a likely wrong spelling for the two words.
The problem is the missing support of Unicode combining diacritical marks by Hunspell TRY and MAP (also by the affix condition and n-gram similarity algorithms). > In general, what changes can I make in the TRY, REP table, MAP table, > PHONE table and KEY so that hunspell will suggest words in bracket as part > of the suggested word as i shown below (I am including my TRY, REP, MAP, > PHONE and Key files. Thanks for the detailed bug report. > > My boss Dr.Adegbola said he has written you a letter inviting you for a > spell cheker meeting for African languages to hold here in Nigeria. I hope > you will be able to make it so that we can meet you. Thanks for the kind invitation. I'm afraid, I cannot participate in it, but I would like to implement the combining diacritical mark support in the near future. Your data will be a big help to check the result. Best regards, László > > > > Hunspell 1.2.7 > & eko 7 3: èkó, ko, oko, epo, ìko, eku, e ko ( ẹ̀kọ́, ẹ̀kọ ) > > & èko 8 0: èkó, ko, èwo, oko, ìko, èso, èké, è ko (ẹ̀kọ́, ẹ̀kọ ) > > & ekó 6 0: èkó, kó, okó, ìkó, eku, e kó (ẹ̀kọ́, ẹ̀kọ ) > > & ẹ̀kọ̀ 7 0: ẹ̀kọ́, ẹ̀kọ, ẹ̀rọ̀, ẹ̀dọ̀, ẹ̀yọ̀, ẹ̀ kọ̀, ẹ̀-kọ̀ > > & ẹkọ 9 0: èkó, kọ, ẹ̀kọ, ẹbọ, akọ, ọkọ, ẹyọ, ẹkẹ, ẹ kọ (ẹ̀kọ́ ) > > & ẹkọ̀ 5 0: kọ̀, ẹ̀kọ, àkọ̀, ọkọ̀, ẹ kọ̀ ( ẹ̀kọ́ ) > > & ẹkọ́ 7 0: ẹjọ́, kọ́, ẹ̀kọ́, ẹmọ́, ọkọ́, ìkọ́, ẹ kọ́ ( ẹ̀kọ ) > > & ẹ́kọ́ 4 0: ẹ̀kọ́, ńkọ́, ẹ́ kọ́, ẹ́-kọ́ ( ẹ̀kọ́, ẹ̀kọ ) > > & ekọ 6 0: èkó, kọ, akọ, ọkọ, eku, e kọ ( ẹ̀kọ́, ẹ̀kọ ) > > & ẹko 6 0: èkó, ko, oko, ìko, ẹkẹ, ẹ ko ( ẹ̀kọ́, ẹ̀kọ ) > > & èkọ 6 0: èkó, kọ, akọ, ọkọ, èké, è kọ ( ẹ̀kọ́, ẹ̀kọ ) > > & ẹkó 6 0: èkó, kó, okó, ìkó, ẹkẹ, ẹ kó ( ẹ̀kọ́, ẹ̀kọ ) > > & ọrọ 12 0: òro, orò, oró, rọ, tọrọ, ọrọ̀, ọmọ, ọkọ, ọlọ, àrọ, arọ, ọrẹ ( > ọ̀rọ̀ ) > > & oro 9 0: orò, òro, oró, ro, oko, orí, ore, orù, o ro (ọ̀rọ̀, ọrọ̀ ) > > & ọro 5 0: òro, orò, oró, ro, ọrẹ (ọ̀rọ̀, ọrọ̀ ) > > & orọ 10 0: òro, orò, oró, rọ, àrọ, arọ, orí, ore, orù, o rọ (ọ̀rọ̀, ọrọ̀ ) > > & ọ̀ro 2 0: òòró, ọ̀rá ( ọ̀rọ̀, ọrọ̀ ) > > & ọrò 6 0: òro, orò, oró, rò, ọrẹ, èrò ( ọ̀rọ̀, ọrọ̀ ) > > & ọ́rọ 3 0: òòró, ọ́ rọ, ọ́-rọ ( ọ̀rọ̀, ọrọ̀ ) > > & ọ̀rọ 6 0: òòró, ọ̀rọ̀, ọrọ̀, ọ̀bọ, ọ̀rá, ẹ̀rọ > > * > > & ọrọ́ 5 0: ọrọ̀, rọ́, ọkọ́, ọwọ́, ọrẹ́ ( ọrọ̀ ) > > * > > & ọ́rọ́ 7 0: òórọ̀, ọ̀rọ̀, tọ́rọ́, pọ́rọ́, rọ́rọ́, ọ́ rọ́, ọ́-rọ́ > > & ọ́rọ̀ 7 0: òórọ̀, ọ̀rọ̀, ọrọ̀, tọ́rọ̀, lọ́rọ̀, ọ́ rọ̀, ọ́-rọ̀ > > & ọ̀rọ́ 7 0: òórọ̀, ọ̀rọ̀, ọ̀wọ́, ọ̀dọ́, ọ̀yọ́, ọ̀ṣọ́, ọ̀rẹ́ ( ọrọ̀ ). > NOTE : The " ọ̀, ọ́, ẹ̀, ẹ́, " are combination of two characters > " ọ or ẹ " and tone mark . > > ̀ ́ > > The sample of our affix file is also shown below: > > SET UTF-8 > > KEY ọwertyuiop|asdfghjkl|ṣẹbnm > > TRY tmnkwlbàaáóoòprọ̀ọ́ọíìfdyṣsgẹ̀ẹ́ẹéèeùúuTMNṢLBÀÁAÒÓOPRÒÓỌÌÍIGÈ > ÉẸÈÉE > > > > REP 94 > > REP a à > > REP à á > > REP a á > > REP á à > > REP a àà > > REP à àà > > REP a àá > > REP à àá > > REP á àá > > REP a áà > > REP à áà > > REP á áà > > REP a aa > > REP a aá > > REP ai àì > > REP ai a > > REP ài à > > REP ái á > > REP e è > > REP è é > > REP e é > > REP é è > > REP e ẹ̀ > > REP e ẹ́ > > REP ẹ ẹ̀ > > REP ẹ̀ ẹ́ > > REP ẹ ẹ́ > > REP ẹ́ ẹ̀ > > REP e ẹ > > REP è ẹ̀ > > REP é ẹ́ > > REP e èè > > REP è èè > > REP e éè > > REP e éé > > REP é éé > > REP e èé > > REP e eé > > REP e ee > > REP ẹ́ ẹ́ẹ̀ > > REP e ẹ́ẹ̀ > > REP ẹ ẹ́ẹ̀ > > REP e ẹ̀ẹ̀ > > REP ẹ ẹ̀ẹ̀ > > REP ẹ ẹ̀ẹ́ > > REP e ẹ̀ẹ́ > > REP e ẹẹ > > REP ẹ ẹẹ > > REP i ì > > REP ì í > > REP i í > > REP í ì > > REP i íì > > REP i in > > REP n ǹ > > REP n ń > > REP o ọ̀ > > REP o ọ́ > > REP o ò > > REP ò ó > > REP o ó > > REP ó ò > > REP ọ ọ̀ > > REP ọ̀ ọ́ > > REP ọ ọ́ > > REP ọ́ ọ̀ > > REP o ọ > > REP ò ọ̀ > > REP ó ọ́ > > REP o òò > > REP ò òò > > REP o oo > > REP o oó > > REP o òó > > REP o ọ̀ọ̀ > > REP ọ ọ̀ọ̀ > > REP ọ̀ ọ̀ọ̀ > > REP ọ̀ ọ̀ọ́ > > REP ọ ọ̀ọ́ > > REP o ọ̀ọ́ > > REP ọ́ ọ̀ọ́ > > REP s ṣ > > REP ṣ s > > REP u ù > > REP u ú > > REP u ùú > > REP ù ùú > > REP ú ùú > > REP ù ùù > > REP u ùù > > REP h y > > REP E Ẹ > > REP S Ṣ > > REP O Ọ > > > > MAP 12 > > MAP àaá > > MAP ọ̀ọọ́óoò > > MAP ìíi > > MAP ṣs > > MAP ẹ̀ẹ́ẹèée > > MAP ǹńn > > MAP ùúu > > MAP SṢ > > MAP ÀÁA > > MAP Ọ̀Ọ́ỌÒÓO > > MAP ÌÍI > > MAP ẸÈÉE > > > > PHONE 37 > > PHONE à a > > PHONE á a > > PHONE aa a > > PHONE ó o > > PHONE ò o > > PHONE ọ̀ o > > PHONE ọ o > > PHONE ọ́ o > > PHONE ọ̀ ọ > > PHONE oo o > > PHONE í i > > PHONE ì i > > PHONE ṣ s > > PHONE ẹ̀ e > > PHONE ẹ́ e > > PHONE ẹ e > > PHONE è e > > PHONE é e > > PHONE ee e > > PHONE ǹ n > > PHONE ń n > > PHONE ù u > > PHONE ú u > > PHONE uu u > > PHONE Ṣ S > > PHONE À A > > PHONE Á A > > PHONE Ò O > > PHONE Ọ̀ O > > PHONE Ó O > > PHONE Ọ́ O > > PHONE Ọ O > > PHONE Ì I > > PHONE Í I > > PHONE È E > > PHONE É E > > PHONE E Ẹ > > > > ICONV 7 > > ICONV ọ ọ > > ICONV ọ̀ ọ̀ > > ICONV ọ́ ọ́ > > ICONV ṣ ṣ > > ICONV ẹ̀ ẹ̀ > > ICONV ẹ́ ẹ́ > > ICONV ẹ ẹ > > Best regards, > > Jeje > > > > > > > > > > --- On Wed, 2/4/09, Németh László <[email protected]> wrote: > > From: Németh László <[email protected]> > Subject: Re: [lingu-dev] Assistance on Enconding different > To: [email protected] > Cc: [email protected] > Date: Wednesday, February 4, 2009, 1:22 AM > > Hi, > > The second method could be better for suggestions. Using multiple > dictionaries to the same locale, spell checker component of OpenOffice.org > 3.x will suggest in the following format: > > suggestion_from_the_first_dictionary1 > suggestion_from_the_first_dictionary2 > suggestion_from_the_first_dictionary3 > suggestion_from_the_second_dictionary1 > suggestion_from_the_second_dictionary2 > suggestion_from_the_second_dictionary3 > etc. > > So the suggestions with different encodings are in different blocks. This is > the preferred method, if you want suggestions with multiple encodings. > > Best regards, > László > > > > 2009/2/2 Sunday Bolaji <[email protected]> >> >> Hi, >> For the redundant dictionary are we putting all the words with >> different encoding in one dictionary file or create a dictionary file each >> for words with the same enconding . >> >> Best regards >> Jeje >> >> --- On Mon, 2/2/09, Németh László <[email protected]> wrote: >> >> From: Németh László <[email protected]> >> Subject: Re: [lingu-dev] Assistance on Enconding different >> To: [email protected], [email protected] >> Date: Monday, February 2, 2009, 4:41 AM >> >> Hi, >> >> 2009/2/2 Sunday Bolaji <[email protected]> >>> >>> Hi, >>> I have tried your suggestion on temporary solution to unicode >>> normilisation >>> and it worked but one thing is not clear to me, are we going to have >>> separate dictionary for all the with different encoding or are we >>> putting in our dictionary file. >>> Another thing i observed with >>> hunspell is that if the number characters of correct word in the >>> dictionary file is more than the characters of word wrongly type, >>> hunspell will suggest diffreent word of the same length as wrong word. >>> Examples are given below : >>> (1) >>> "jókòó" is the correct word in the dictionary, but it will not suggest >>> it if i type "joke" despite specified in the REP table to replace " o" >>> with " òó ". it will only suggest " jókòó" if the wrong type word is " >>> jokoo " >>> >>> (2) " ọ̀rọ̀ " is the correct word in the dictionary, but it will not >>> suggest it ,if " ọrọ " is type despite specified in the REP table to replace >>> " ọ " with " ọ̀ ". And this is due to that " ọ " is a precomposed single >>> character and " ọ̀ " >>> and is combination of " ọ " and tone mark. The REP table is shown for >>> similar characters. Please is there anything i can to solve this problem. >> >> REP and MAP suggestions are not combined with similarity algorithms, >> unlike the PHONE and ph: phonetic suggestions. >> >> Check the following suggestion parameters: >> >> -- affix file --- >> PHONE 4 >> PHONE ó o >> PHONE ò o >> PHONE ọ̀ o >> PHONE ọ o >> >> Hunspell will convert "jókòó" to "jokoo" before comparing with the input >> word "joke". >> You can use PHONE for normalization, too. Unfortunately, there was a >> potential problem with PHONE and diacritics under Windows, so it better to >> use ph: fields (separated by tabulators) for OpenOffice.org 3.0. Also ph: >> can work better for bigger word differences, too. >> >> --- dic file ---- >> jókòó ph:joko >> >> ọ̀rọ̀ ph:oro >> >> >> Regards, >> László >> >> >> >> >>> >>> >>> >>> REP 94 >>> >>> REP a à >>> >>> REP à á >>> >>> REP a á >>> >>> REP á à >>> >>> REP a àà >>> >>> REP à àà >>> >>> REP a àá >>> >>> REP à àá >>> >>> REP á àá >>> >>> REP a áà >>> >>> REP à áà >>> >>> REP á áà >>> >>> REP a aa >>> >>> REP a aá >>> >>> REP ai àì >>> >>> REP ai a >>> >>> REP ài >>> à >>> >>> REP ái á >>> >>> REP e è >>> >>> REP è é >>> >>> REP e é >>> >>> REP é è >>> >>> REP e ẹ̀ >>> >>> REP e ẹ́ >>> >>> REP ẹ ẹ̀ >>> >>> REP ẹ̀ ẹ́ >>> >>> REP ẹ ẹ́ >>> >>> REP ẹ́ ẹ̀ >>> >>> REP e ẹ >>> >>> REP è ẹ̀ >>> >>> REP é ẹ́ >>> >>> REP e èè >>> >>> REP è èè >>> >>> REP e éè >>> >>> REP e éé >>> >>> REP é éé >>> >>> REP e èé >>> >>> REP e eé >>> >>> REP e ee >>> >>> REP ẹ́ ẹ́ẹ̀ >>> >>> REP e ẹ́ẹ̀ >>> >>> REP ẹ ẹ́ẹ̀ >>> >>> REP e ẹ̀ẹ̀ >>> >>> REP ẹ ẹ̀ẹ̀ >>> >>> REP ẹ ẹ̀ẹ́ >>> >>> REP e ẹ̀ẹ́ >>> >>> REP e ẹẹ >>> >>> REP ẹ ẹẹ >>> >>> REP i ì >>> >>> REP ì í >>> >>> REP i í >>> >>> REP í ì >>> >>> REP i íì >>> >>> REP i in >>> >>> REP n ǹ >>> >>> REP n ń >>> >>> REP o ọ̀ >>> >>> REP o ọ́ >>> >>> REP o ò >>> >>> REP ò ó >>> >>> REP o ó >>> >>> REP ó ò >>> >>> REP ọ ọ̀ >>> >>> REP ọ̀ ọ́ >>> >>> REP ọ ọ́ >>> >>> REP ọ́ >>> ọ̀ >>> >>> REP o ọ >>> >>> REP ò >>> ọ̀ >>> >>> REP ó ọ́ >>> >>> REP o òò >>> >>> REP ò òò >>> >>> REP o oo >>> >>> REP o oó >>> >>> REP o òó >>> >>> REP o ọ̀ọ̀ >>> >>> REP ọ ọ̀ọ̀ >>> >>> REP ọ̀ ọ̀ọ̀ >>> >>> REP ọ̀ >>> ọ̀ọ́ >>> >>> REP ọ ọ̀ọ́ >>> >>> REP o ọ̀ọ́ >>> >>> REP ọ́ >>> ọ̀ọ́ >>> >>> REP s ṣ >>> >>> REP ṣ s >>> >>> REP u ù >>> >>> REP u ú >>> >>> REP u ùú >>> >>> REP ù ùú >>> >>> REP ú ùú >>> >>> REP ù ùù >>> >>> REP u ùù >>> >>> REP h y >>> >>> REP E Ẹ >>> >>> REP S Ṣ >>> >>> REP O Ọ >>> >>> Best regards,Jeje >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> > > >
