Hey raster, How's it going. I promised you some frequency data a while back. http://ucrel.lancs.ac.uk/bncfreq/flists.html http://ucrel.lancs.ac.uk/bncfreq/lists/1_2_all_freq.txt
there are others as well Carsten Haitzler (The Rasterman) wrote: > On Wed, 1 Oct 2008 21:05:53 -0600 "Ori Pessach" <[EMAIL PROTECTED]> babbled: > >> I understand what it's doing. It's not doing it well. I tried it for shell > > i disagree. it works like a charm for me - as per my previous mail - i can use > it while walking down the street. more than i can say for pretty much any > other > virtual keyboard i have available to me. > >> input, and it was an unusable mess. I tried it for text messaging, and it > > why someone would use a language dictionary-based corrective keyboard for > shell > input beats me! in this case i call "silly user - using a motorcycle to > deliver > elephants" line :) use the terminal keyboard. use a stylus. thats what it was > meant for. :) > >> was an unusable mess. It has no model of the likelihood of erroneous input > > it does. it absolutely does. maybe your fingers are incredibly off-center? > here > is the algorithm (and if u don't believe me - code is there to be read): > > it stores a press POINT (x,y). it looks for all keys whose center point is > WITHIN f distance of x,y (f being the fuzz value - the .kbd file for the > qwerty Default keyboard is 135 units wide, with fuzz radius of 20, so that's > about 1/3rd of the keyboard that it searches through for a likely match). > likelihood factors (distance) per key found is allocated based on distance (0 > == most likely, > 0 less likely the greater the value). each press is done > this > way EXCEPT if u hold for 0.25 sec then drag to select a key explicitly in zoom > mode - then the ONLY key available for that word slot is that letter selected > given a distance of 0. as you type all permutations of letters are searched > and > put into a list - with each permutation given a distance metric based on the > letters used (simply addition of the distances). now this is combined with the > dictionary's frequency metric (multiplied by an inverse) so the more likely > the > word is to be used the lower its distance becomes. words are sorted from most > to least likely based on this metric then listed with most likely in the > middle > of the list, leas likely to the left/right ends - which you may not see. the > vertical list lists all matches from most to least likely (top to bottom) > with 1 > exception - EXACTLY what u typed it as the top. it absolutely has a fairly > good > idea of likelihood of error and likelihood of usage of a word etc. etc. > > eg: > > Press | Guess+dist > e e+0 w+1 r+2 d+2 s+1 > r r+0 t+1 e+2 f+1 g+2 d+3 > k k+0 l+1 o+2 i+3 j+3 > d d+0 f+1 s+1 e+1 c+1 r+2 w+2 > > so "erkd" has distance 0 = but its not a word in the dictionary at all, so > thrown out. "rwkd" has distance 1, but not a word, "srkd" same, "etkd", > "efkd", > "erld", etc. etc. > > in the end it produces a list where most likely "world" ends up the word > with other options too - and this is a much simplified list. mostly the list > for candidate letters per input letter is about 10-12 letters. so u have > 12*12*12*12 permutations for a 4 letter word - of which a fraction of that > space is legitimate words. each permutation has a likelihood value based on > press distance and on frequency of usage of that word in language in general > in > the dictionary. > > mind you - i AM talking about illume's keyboard, its algorithms as is in the > image i built. if you use something else i cannot comment as it's something > else. > >> (relatively low) and instead appears to look for the word with the closest >> minimum edit distance to the user's input. This is nuts. I have never - > > it's not - as the edit distance is the likelihood of error. you likely press > the key you want - or near it. thus keys near where you pressed are more > likely > than those further away. to limit search distance only up to a certain > distance > is searched. chances are that you do this: > > fingerprint: > ___ > /~~~\ > |~~~| > |~~~| > \x/ > " > > where "x" is the pressure point reported on the touchscreen. the only info the > touchscreen reports is the pressure point - nothing else. you think u press > somewhere else, but don't. you know what u pressed bu what key "pops up" that > lets u know pretty well how good your pressing of the screen is. this is just > a > hardware limit of a resistive touchscreen. the point of greatest pressure is > used - not the middle point of the area in which skin contacts the screen. get > the gpe-sketchbook and try press with the flat of your finger and see just of > far off your press point is. it may surprise you. > > as i said - it does have all the model and code and even data to do proper > correction based on many factors. i do NOT have a dictionary with frequency > info for all of english - there is a "small" english dict (5000 words) with > some frequency info in it i managed to gather, but its very small. > > if you don't believe me - read the code, or do better. patches accepted, but i > think the problem is just that the dictionary has no frequency info by default > (a matter of simple lack of data) or how you press the screen. i suggest you > pay close attention to how you type and see. yes the "black word" (in the > black > box) may not be always the word u want - but its most often that word or a > word > right next to it - as you use it it will learn. if you are using it for > non-english stuff then you need a different dictionary. > >> literally - gotten the word I typed in. In the common use case, of a user >> who enters a correct word, it invariably get it wrong. >> >> Understanding what it's doing doesn't make it less of a nuisance. > > it does have concept of frequency orf words. i just dont have any DATA for > that. the dict format handles is: > word1 > word2 > word3 > > OR > word1 20 > word2 434 > word3 1 > > etc. > look at the personal dict file. ~/.e/e/dicts-dynamic/personal.dic > > it saves usage frequency. this affects lookup likelihood. btw - for me it gets > the word most of the time or the word is not the most likely but at least > listed as one of the most likely. use it for a bit and it learns and gets > better. if you wish to generate a dictionary with frequency info - please do > so! i made it really easy. > _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community