Hi all, I intent to generate the following: - a full list utf-8 (for 8 bit SMS and regular use, default) - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) - c truncated list utf-8 (for 8 bit SMS and regular use) - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default)
[1] These utf-8 characters in this list are within the 7-bit range of GSM 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note that more characters a and b will both have 250,000 words b will be conversion, remapping and normalisation of a c and d are truncations and normalisation of respectively a and b For utf-16, a simple conversion of the utf-8 files can be used, but I'll leave this for now. This could result in two extra files. Note that nor extended nor non-extended ASCII is available. Is this desirable? This can result in four extra files. So, I can come up with 10 different files. Which are according to you the most useful? Regards, Pander On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan (Treviño)" > wrote: >> Pander wrote: >> > Of course this particular word list is very long and contains about >> > 250,000 words and has a typical loooong tail. Many words or >> compositions >> > or occur seldom in average day use. >> > >> > What would be a good cut off point in number of words, also in terms >> of >> > performance? >> > >> > The Portuguese list contains 56,609 words. Is this workable? How many >> > does the English contain? >> >> The Italian one can count also 500'000 words (to be short), but I can >> get a well working dictionary only using a smaller one (with about >> 150'000 words that I've taken counting its google popularity). >> >> Btw I've written more complete posts about this on the list... > > Well, since my basis was based on a million words taken from the most > printed daily newspaper in Portugal (I didn't count but still I removed > a lot of non words like numbers, etc...) already with frequency data, my > job was so much easier... :) > > As for writing SMS/text messages... I haven't found yet a word that > wasn't there (in fact my problem is that it so often is the first of > several matches so I have to use the menu on the left) but I must > confess to not be one of those whose primary use of the phone is > SMS/text! > > Rui > > -- > Frink! > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 > + No matter how much you do, you never do enough -- unknown > + Whatever you do will be insignificant, > | but it is very important that you do it -- Gandhi > + So let's do it...? > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community