Small correction to my text: "Note that more characters" must be "Note that certain special characters are in GSM 03.38 which are not in extended ASCII"
Nevertheless, one complete utf-8 dictionary could be used by most applications, also SMS. The conversion I do for GSM 03.38 could also be done later just before sending the SMS. On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: > I have no idea... I might only make a new version with utf-8 encoded > characters. :) > > > On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: >> Hi all, >> >> I intent to generate the following: >> - a full list utf-8 (for 8 bit SMS and regular use, default) >> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) >> - c truncated list utf-8 (for 8 bit SMS and regular use) >> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) >> >> [1] These utf-8 characters in this list are within the 7-bit range of >> GSM >> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note >> that more characters >> >> a and b will both have 250,000 words >> b will be conversion, remapping and normalisation of a >> c and d are truncations and normalisation of respectively a and b >> >> For utf-16, a simple conversion of the utf-8 files can be used, but I'll >> leave this for now. This could result in two extra files. >> >> Note that nor extended nor non-extended ASCII is available. Is this >> desirable? This can result in four extra files. >> >> So, I can come up with 10 different files. Which are according to you >> the >> most useful? >> >> Regards, >> >> Pander >> >> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: >> > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan >> (Treviño)" >> > wrote: >> >> Pander wrote: >> >> > Of course this particular word list is very long and contains about >> >> > 250,000 words and has a typical loooong tail. Many words or >> >> compositions >> >> > or occur seldom in average day use. >> >> > >> >> > What would be a good cut off point in number of words, also in >> terms >> >> of >> >> > performance? >> >> > >> >> > The Portuguese list contains 56,609 words. Is this workable? How >> many >> >> > does the English contain? >> >> >> >> The Italian one can count also 500'000 words (to be short), but I can >> >> get a well working dictionary only using a smaller one (with about >> >> 150'000 words that I've taken counting its google popularity). >> >> >> >> Btw I've written more complete posts about this on the list... >> > >> > Well, since my basis was based on a million words taken from the most >> > printed daily newspaper in Portugal (I didn't count but still I >> removed >> > a lot of non words like numbers, etc...) already with frequency data, >> my >> > job was so much easier... :) >> > >> > As for writing SMS/text messages... I haven't found yet a word that >> > wasn't there (in fact my problem is that it so often is the first of >> > several matches so I have to use the menu on the left) but I must >> > confess to not be one of those whose primary use of the phone is >> > SMS/text! >> > >> > Rui >> > >> > -- >> > Frink! >> > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD >> 3174 >> > + No matter how much you do, you never do enough -- unknown >> > + Whatever you do will be insignificant, >> > | but it is very important that you do it -- Gandhi >> > + So let's do it...? >> > >> > _______________________________________________ >> > Openmoko community mailing list >> > community@lists.openmoko.org >> > http://lists.openmoko.org/mailman/listinfo/community >> > >> >> >> >> _______________________________________________ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community > > -- > You are what you see. > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 > + No matter how much you do, you never do enough -- unknown > + Whatever you do will be insignificant, > | but it is very important that you do it -- Gandhi > + So let's do it...? > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community