Carsten Haitzler (The Rasterman) ha scritto: > On Thu, 02 Oct 2008 20:52:00 +0200 "Marco Trevisan (Treviño)" <[EMAIL > PROTECTED]> > babbled: >> So this is a little utility I wrote [1] to check the frequency of each >> word and writing back a new dictionary with frequency data. >> >> To run it you need php-cli (I guess v5 or above), set the given options, >> do "php words-popularity.php" and wait the work to be finished! :P >> >> It could be a long work, but it should give good results. > > yes. it would. who wants to run it? :)
I've done it for about 420000 words. Divinding the work in 5 shells went quite fine and took few hours, but now Google blocked it. I didn't know that I wasn't allowed to do it :/. I figure we should change our source :P. > nb. i checked illume's kbd code - it does have issues with utf8 keysequences > in > sorted dicts. if you have any it'll fail to keep looking for more words so you > need to remove anything utf8 from your dict :( yes - i know. bad. i need to > address this. and the change in dict format i am sure 1. makes this now > simple, > 2. compresses the dict, 3. speeds it up, 4. solves this problem. :) but i just > need to do it - no time right now :( Yes, I do agree with this. Using a better compressed format would increase the performances allowing to add more words. I think that the qtopia dawg format is a good example for this. I just hope you'll find some time for it soon :P. -- Treviño's World - Life and Linux http://www.3v1n0.net/ _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community