I have no idea... I might only make a new version with utf-8 encoded characters. :)
On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: > Hi all, > > I intent to generate the following: > - a full list utf-8 (for 8 bit SMS and regular use, default) > - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) > - c truncated list utf-8 (for 8 bit SMS and regular use) > - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) > > [1] These utf-8 characters in this list are within the 7-bit range of GSM > 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note > that more characters > > a and b will both have 250,000 words > b will be conversion, remapping and normalisation of a > c and d are truncations and normalisation of respectively a and b > > For utf-16, a simple conversion of the utf-8 files can be used, but I'll > leave this for now. This could result in two extra files. > > Note that nor extended nor non-extended ASCII is available. Is this > desirable? This can result in four extra files. > > So, I can come up with 10 different files. Which are according to you the > most useful? > > Regards, > > Pander > > On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: > > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan (Treviño)" > > wrote: > >> Pander wrote: > >> > Of course this particular word list is very long and contains about > >> > 250,000 words and has a typical loooong tail. Many words or > >> compositions > >> > or occur seldom in average day use. > >> > > >> > What would be a good cut off point in number of words, also in terms > >> of > >> > performance? > >> > > >> > The Portuguese list contains 56,609 words. Is this workable? How many > >> > does the English contain? > >> > >> The Italian one can count also 500'000 words (to be short), but I can > >> get a well working dictionary only using a smaller one (with about > >> 150'000 words that I've taken counting its google popularity). > >> > >> Btw I've written more complete posts about this on the list... > > > > Well, since my basis was based on a million words taken from the most > > printed daily newspaper in Portugal (I didn't count but still I removed > > a lot of non words like numbers, etc...) already with frequency data, my > > job was so much easier... :) > > > > As for writing SMS/text messages... I haven't found yet a word that > > wasn't there (in fact my problem is that it so often is the first of > > several matches so I have to use the menu on the left) but I must > > confess to not be one of those whose primary use of the phone is > > SMS/text! > > > > Rui > > > > -- > > Frink! > > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 > > + No matter how much you do, you never do enough -- unknown > > + Whatever you do will be insignificant, > > | but it is very important that you do it -- Gandhi > > + So let's do it...? > > > > _______________________________________________ > > Openmoko community mailing list > > community@lists.openmoko.org > > http://lists.openmoko.org/mailman/listinfo/community > > > > > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community -- You are what you see. Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community