Re: Illume dictionary for Dutch (Nederlands)
Is it possible to put comments in the .dic file? If so, in what format? E.g. only the first couple of lines which start with a #. Carsten Haitzler (The Rasterman) wrote: On Thu, 20 Nov 2008 10:55:02 +0100 (CET) Pander [EMAIL PROTECTED] babbled: any dictionary should not care about gsm encodings. it should be just a utf8 dictionary file. it is the job of the sms app to convert normal utf8 unicode to whatever encoding used by the network, and back. :) Small correction to my text: Note that more characters must be Note that certain special characters are in GSM 03.38 which are not in extended ASCII Nevertheless, one complete utf-8 dictionary could be used by most applications, also SMS. The conversion I do for GSM 03.38 could also be done later just before sending the SMS. On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: I have no idea... I might only make a new version with utf-8 encoded characters. :) On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) [1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- You are what you see. Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
On Fri, 28 Nov 2008 00:20:38 +0100 Pander [EMAIL PROTECTED] babbled: Is it possible to put comments in the .dic file? If so, in what format? E.g. only the first couple of lines which start with a #. no. it doesnt support comments. Carsten Haitzler (The Rasterman) wrote: On Thu, 20 Nov 2008 10:55:02 +0100 (CET) Pander [EMAIL PROTECTED] babbled: any dictionary should not care about gsm encodings. it should be just a utf8 dictionary file. it is the job of the sms app to convert normal utf8 unicode to whatever encoding used by the network, and back. :) Small correction to my text: Note that more characters must be Note that certain special characters are in GSM 03.38 which are not in extended ASCII Nevertheless, one complete utf-8 dictionary could be used by most applications, also SMS. The conversion I do for GSM 03.38 could also be done later just before sending the SMS. On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: I have no idea... I might only make a new version with utf-8 encoded characters. :) On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) [1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- You are what you see. Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- - Codito, ergo sum
Re: Illume dictionary for Dutch (Nederlands)
On Thu, 2008-11-20 at 10:14 +1100, Carsten Haitzler wrote: (japanese) sakana - さかな | 魚 | 肴 | 坂な | 茶菓な | 阪な | 差かな | 左かな | 差かな | 査かな | 鎖かな | サカナ | sakana Hi raster, I am curious how we can pass unicode character to applications like those via X. I though the keyboard could only send keycode. How does it work with illume keyboard ? -charlie signature.asc Description: This is a digitally signed message part ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) [1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
I have no idea... I might only make a new version with utf-8 encoded characters. :) On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) [1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- You are what you see. Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
Small correction to my text: Note that more characters must be Note that certain special characters are in GSM 03.38 which are not in extended ASCII Nevertheless, one complete utf-8 dictionary could be used by most applications, also SMS. The conversion I do for GSM 03.38 could also be done later just before sending the SMS. On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: I have no idea... I might only make a new version with utf-8 encoded characters. :) On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) [1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- You are what you see. Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
On Thu, 20 Nov 2008 10:55:02 +0100 (CET) Pander [EMAIL PROTECTED] babbled: any dictionary should not care about gsm encodings. it should be just a utf8 dictionary file. it is the job of the sms app to convert normal utf8 unicode to whatever encoding used by the network, and back. :) Small correction to my text: Note that more characters must be Note that certain special characters are in GSM 03.38 which are not in extended ASCII Nevertheless, one complete utf-8 dictionary could be used by most applications, also SMS. The conversion I do for GSM 03.38 could also be done later just before sending the SMS. On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: I have no idea... I might only make a new version with utf-8 encoded characters. :) On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) [1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- You are what you see. Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)[EMAIL PROTECTED] ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Illume dictionary for Dutch (Nederlands)
Hi all, Together with http://opentaal.org , I'm working on a special Illume dictionary for Dutch word completion. It will be available in the near future. Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? Thanks, Pander ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
On Wed, Nov 19, 2008 at 11:25:22PM +0100, Pander wrote: Hi all, Together with http://opentaal.org , I'm working on a special Illume dictionary for Dutch word completion. It will be available in the near future. Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? [EMAIL PROTECTED]:/usr/lib/enlightenment/modules/illume/dicts# wc -w *.dic 196684 English_(US).dic 10002 English_(US)_Small.dic 113218 Portuguese (ASCII).dic 319904 total So it's a little under the double of the words. Rui -- Frink! Today is Pungenday, the 31st day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
On Wed, 19 Nov 2008 23:25:22 +0100 Pander [EMAIL PROTECTED] babbled: Hi all, Together with http://opentaal.org , I'm working on a special Illume dictionary for Dutch word completion. It will be available in the near future. Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? english is about 98,000, but remember english has very few changes in words for conjugation. i need to change the dict format to account for this and compress better i think. i do need to make a different entered text - visible word mapping tho. this covers blind qwerty entry for accented words. i.e.: (german) fass - Faß brotchen - Brötchen (french) cafe - café etage - étage francais - Français (japanese) sakana - さかな | 魚 | 肴 | 坂な | 茶菓な | 阪な | 差かな | 左かな | 差かな | 査かな | 鎖かな | サカナ | sakana note that in some languages can have 1 romanised input match multiple (different) displays of that word (japanese is king at this. chinese likely if using pinyin could be similar). right now the dict format doesn't allow for this and sure- i can extend with a list of displayed words so currently non-freq format is: cafe etage with freq: cafe 126 etage 98 i can add a display list: cafe 126 cafe café etage 98 étage but the file will get bigger and bigger and get harder to auto-generate from input data. right now i am unsure of the exact strategy to take... but i'd like to cover as many languages as i can with 1 format and have minimal dict size overhead etc. -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)[EMAIL PROTECTED] ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... -- Treviño's World - Life and Linux http://www.3v1n0.net/ ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume dictionary for Dutch (Nederlands)
On Thu, Nov 20, 2008 at 03:02:41AM +0100, Marco Trevisan (Treviño) wrote: Pander wrote: Of course this particular word list is very long and contains about 250,000 words and has a typical lng tail. Many words or compositions or occur seldom in average day use. What would be a good cut off point in number of words, also in terms of performance? The Portuguese list contains 56,609 words. Is this workable? How many does the English contain? The Italian one can count also 500'000 words (to be short), but I can get a well working dictionary only using a smaller one (with about 150'000 words that I've taken counting its google popularity). Btw I've written more complete posts about this on the list... Well, since my basis was based on a million words taken from the most printed daily newspaper in Portugal (I didn't count but still I removed a lot of non words like numbers, etc...) already with frequency data, my job was so much easier... :) As for writing SMS/text messages... I haven't found yet a word that wasn't there (in fact my problem is that it so often is the first of several matches so I have to use the menu on the left) but I must confess to not be one of those whose primary use of the phone is SMS/text! Rui -- Frink! Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community