Re: [SHR] illume predictive keyboard is too slow
Carsten Haitzler (The Rasterman) wrote: On Thu, 05 Feb 2009 15:57:41 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: Surely, when there is a kayboard anyway, a couple of extra keys won't cost much. Not if they are on all phones, instead of only adapted ones. The americans can use the extras as application hotkeys. oh its not the extra keys - its the variations in production. I know. Which is why I suggest one single keyboard for all, with the maximum amount of keys instead of the minimum. That way, every language (at least every latin-based language) can have a normal keyboard. No problem for the english - it will work fine. Their extra keys can be blank, or used as hotkeys. Users with other languages can add whatever they need - and in the correct location too. that's not practical. have you SEEN all the accented characters available? its moe than going to double the # of chars in a kbd. otherwise you then need a compose mode where multiple keystrokes gives you æ or ø or ü or ñ etc. and its a combo you need to learn. you still need to offer all the accents then on such a kbd. like ~^'`,* (ãâáàäąå) which drastically will cramp the keyboard or make it yet another row bigger for everyone. (in addition to some form of compose key and specific compose logic). Have you seen the various european layouts? None of the lating-based keyboards have more than a handful of keys more than the english keyboard. (Those with bucketloads of accents use a dead-key approach, press then o to get ö and so on.) So no need for a seriously cramped keyboard. Of course different languages will mostly re-use the same keys, so you don't need a key for every possible letter. Only one key for each nonascii people expect to find on a keyboard adapted to their language. Look at the various keyboard layouts, pick the one with the most extras and you know how many keys are needed. Perhaps a few more keys than that, as some add extra keys in different places. But not many more. European pc keyboards tend to have 2 keys more than american, the rest is done by shift states and /or dead keys. (Things like []/? aren't directly accessible on a Norwegian keyboard, unlike american keyboards. One mechanical layout works for all of europe, you just have different keycaps. And of course the american layout works too - they get two do-nothing keys thats all. So, a keyboard with slightly more keys than what is needed for ascii will be enough for all languages that extend the latin alphabet. Some differently painted keytops will be needed, but that can be left to the various national importers (for a mass-produced device) or to the customers for a phone made in small series. Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Fri, 06 Feb 2009 13:20:53 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: On Thu, 05 Feb 2009 15:57:41 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: Surely, when there is a kayboard anyway, a couple of extra keys won't cost much. Not if they are on all phones, instead of only adapted ones. The americans can use the extras as application hotkeys. oh its not the extra keys - its the variations in production. I know. Which is why I suggest one single keyboard for all, with the maximum amount of keys instead of the minimum. That way, every language (at least every latin-based language) can have a normal keyboard. No problem for the english - it will work fine. Their extra keys can be blank, or used as hotkeys. Users with other languages can add whatever they need - and in the correct location too. that's not practical. have you SEEN all the accented characters available? its moe than going to double the # of chars in a kbd. otherwise you then need a compose mode where multiple keystrokes gives you æ or ø or ü or ñ etc. and its a combo you need to learn. you still need to offer all the accents then on such a kbd. like ~^'`,* (ãâáàäąå) which drastically will cramp the keyboard or make it yet another row bigger for everyone. (in addition to some form of compose key and specific compose logic). Have you seen the various european layouts? None of the lating-based keyboards have more than a handful of keys more than the english keyboard. (Those with bucketloads of accents use a dead-key approach, press then o to get ö and so on.) So no need for a seriously cramped keyboard. Of course different languages will mostly re-use the same keys, so you don't need a key for every possible letter. Only one key for each nonascii people expect to find on a keyboard adapted to their language. Look at the various keyboard layouts, pick the one with the most extras and you know how many keys are needed. Perhaps a few more keys than that, as some add extra keys in different places. But not many more. European pc keyboards tend to have 2 keys more than american, the rest is done by shift states and /or dead keys. (Things like []/? aren't directly accessible on a Norwegian keyboard, unlike american keyboards. One mechanical layout works for all of europe, you just have different keycaps. And of course the american layout works too - they get two do-nothing keys thats all. thats because they use composition. as i said above. and as i said if 1 keyboard were to cover ALL of them it'd be BIG (in key count). as such each european kbd covers just the language it intends to cover - thus limiting extras. So, a keyboard with slightly more keys than what is needed for ascii will be enough for all languages that extend the latin alphabet. Some differently painted keytops will be needed, but that can be left to the various national importers (for a mass-produced device) or to the customers for a phone made in small series. Helge Hafting -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Carsten Haitzler (The Rasterman) wrote: Surely, when there is a kayboard anyway, a couple of extra keys won't cost much. Not if they are on all phones, instead of only adapted ones. The americans can use the extras as application hotkeys. oh its not the extra keys - its the variations in production. I know. Which is why I suggest one single keyboard for all, with the maximum amount of keys instead of the minimum. That way, every language (at least every latin-based language) can have a normal keyboard. No problem for the english - it will work fine. Their extra keys can be blank, or used as hotkeys. Users with other languages can add whatever they need - and in the correct location too. just a change in printing whats on the keys is not free. software keyboards are A project like openmoko have the option of leaving that to the users. Supply a sheet with small letter stickers for all languages, and a printed sheet with where each letter normally go for the software-supported languages. [...] it sucks. but english is the lowest common denominator and thus most things tend to be built to support it - as it tends to keep more people happier than some other setup. if there was enough volume to make enough units for a particular language/locale/country - it'd be different. :) I understand that there is little interest in making a phone specifically for Norway, when the volumes are low. That doesn't mean the lowest common is the best way. A keyboard with several extra blank keys, and the english qwerty printed on the keytops will work fine for the large group of english-language users. Norwegians like me will simply put the ø and æ stickers on the 2 keys to the right of l, and the å sticker on the key to the right of p. Those who want a radically different layout, such as dvorak, take the lid off and carefully rearrange the keytops. Same for azerty-layouts. Programmable is better, but if someone wants real keys that depress with a click, then this is possible too. Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, Feb 5, 2009 at 16:48, Laszlo KREKACS laszlo.krekacs.l...@gmail.com wrote: I simply confirmed the same problem exists for other language too. In polish, we are often communicating on IMs, SMSes, IRC, chats etc. without polish accents (ą-a; ę-e; ó [which is pronounced as u]-o; ś-s; ł-l; ż-z; ź-z; ć-c; ń-n). In SMS to have more chars in one message; in IRC/IMs to type faster or to ask, how to set polish keyboard layout in Linux ;D And some words without accents means differently after dropping accents (for example laska vs. łaska), but we don't have problems with communicating in that way. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
2009/2/5 Johny Tenfinger seba.d...@gmail.com: but we don't have problems with communicating in that way. Unless you want to write (semi)official document. (like writing email to your boss, etc) Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, Feb 5, 2009 at 17:30, Laszlo KREKACS laszlo.krekacs.l...@gmail.com wrote: Unless you want to write (semi)official document. (like writing email to your boss, etc) Then simply switch to terminal-based keyboard without dictionary and with accents on right alt key (like in PC keyboard layout). ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, 5 Feb 2009 16:59:45 +0100 Johny Tenfinger seba.d...@gmail.com said: On Thu, Feb 5, 2009 at 16:48, Laszlo KREKACS laszlo.krekacs.l...@gmail.com wrote: I simply confirmed the same problem exists for other language too. In polish, we are often communicating on IMs, SMSes, IRC, chats etc. without polish accents (ą-a; ę-e; ó [which is pronounced as u]-o; ś-s; ł-l; ż-z; ź-z; ć-c; ń-n). In SMS to have more chars in one message; in IRC/IMs to type faster or to ask, how to set polish keyboard layout in Linux ;D And some words without accents means differently after dropping accents (for example laska vs. łaska), but we don't have problems with communicating in that way. cool. so you can survive. if an engine let you lazy-type without chosing accents and put them in for you (or you could type a whole word then just select the one accented correctly) then this may save you time. -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, 05 Feb 2009 15:57:41 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: Surely, when there is a kayboard anyway, a couple of extra keys won't cost much. Not if they are on all phones, instead of only adapted ones. The americans can use the extras as application hotkeys. oh its not the extra keys - its the variations in production. I know. Which is why I suggest one single keyboard for all, with the maximum amount of keys instead of the minimum. That way, every language (at least every latin-based language) can have a normal keyboard. No problem for the english - it will work fine. Their extra keys can be blank, or used as hotkeys. Users with other languages can add whatever they need - and in the correct location too. that's not practical. have you SEEN all the accented characters available? its moe than going to double the # of chars in a kbd. otherwise you then need a compose mode where multiple keystrokes gives you æ or ø or ü or ñ etc. and its a combo you need to learn. you still need to offer all the accents then on such a kbd. like ~^'`,* (ãâáàäąå) which drastically will cramp the keyboard or make it yet another row bigger for everyone. (in addition to some form of compose key and specific compose logic). i am not saying to do it - but to me that seems he job of specialised keyboards per language, not a universal one. just a change in printing whats on the keys is not free. software keyboards are A project like openmoko have the option of leaving that to the users. Supply a sheet with small letter stickers for all languages, and a printed sheet with where each letter normally go for the software-supported languages. [...] it sucks. but english is the lowest common denominator and thus most things tend to be built to support it - as it tends to keep more people happier than some other setup. if there was enough volume to make enough units for a particular language/locale/country - it'd be different. :) I understand that there is little interest in making a phone specifically for Norway, when the volumes are low. That doesn't mean the lowest common is the best way. A keyboard with several extra blank keys, and the english qwerty printed on the keytops will work fine for the large group of english-language users. Norwegians like me will simply put the ø and æ stickers on the 2 keys to the right of l, and the å sticker on the key to the right of p. Those who want a radically different layout, such as dvorak, take the lid off and carefully rearrange the keytops. Same for azerty-layouts. Programmable is better, but if someone wants real keys that depress with a click, then this is possible too. this would be the best solution - if its a hardware keyboard. a software keyboard is always the most flexible... but its currently also probably the least usable and not just because of software, but the fact a resistive screen only accepts touch at a time and typing will mean commonly 2 touches at a time (as you press the new key before you release your finger on the old one). :( -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, 5 Feb 2009 16:48:36 +0100 Laszlo KREKACS laszlo.krekacs.l...@gmail.com said: 2009/2/5 The Rasterman Carsten Haitzler ras...@rasterman.com: But there are other cases, where it is not that clear: ólt - pound (accusative) ölt - he killed ... olt - to graft sure.. maybe being an english speaker.. this doesn't bother me so much as english is full of such words... 1 word can have 2 or 3 or even more very different meanings. written the same way. only context lets you figure it out. so to me i go so.. what's the problem? :) Sure, many words can have different meanings. But you missed the point. When english has multiple meanings of a word, you pronounce the same way, it is the same word. But with accents, you pronounce very differently because it is not the same word! actually... no. there are cases where 1 word, written 1 way can have multiple meanings and pronounced multiple ways... some examples: row, wind, lead use: i had a row on the lake! - ambiguous meaning when written. could mean you rowed a boat on the lak, or had an argument on the lake. pronunciation is different in the 2 row's, but when written, it's the same. The correct analogy for english would be: Lets assume the character 'v' is just an accented version of character 'n'. Now when you want to write vice president, you always and up with nice president. See the difference? Better example: merge the character e with a. I think you get the idea... (( Battar axampla: marga tha charactar a with a. I think you gat tha idaa... Can you decrypt? Sure. By computer? Maybe. Was nice to read? I highly doubt it. )) i don't have the bandwidth to go solving every language on the planet's input problems. I didnt ask you to do so. I said, you cant just ignore the accents, because, most of the time, it is not a modifier of a char but a whole another character. It is the same case what Helge at the beginning said for norwegian language (for/fôr, tå/ta). I simply confirmed the same problem exists for other language too. well hungarian created a more complex case with compound words that go well beyond what german does. thats the problem :( -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Hi! ok - so if a young person typed: Öt szép szűz it'd be: Ot szep szuz ((btw, the meaning of Öt szép szűz lány őrült írót nyúz is Five virgins tire a crazy writer. It is the hungarian synonym of The quick brown fox jumps over the lazy dog)) Yes, and in that specific case works. (because none of the above words (Ot, szep, szuz) has a meaning in hungarian language, so you can understand that example without accent.) But there are other cases, where it is not that clear: ólt - pound (accusative) ölt - he killed ... olt - to graft So when you see olt in the text you cant be sure it is olt, ólt or ölt without analysing the whole sentence. The german example is two-way conversion: ü - ue, ß - ss. You can switch back and for without losing additional information. A simple word based dictionary is limited anyway for the hungarian language, where you can create a word as long as this: elkelkáposztástalaníthatatlanságoskodásaitokért. ugh. so its like german. compound words get created a lot by just stringing multiple words together without a space. that's ok- as long as there arent a massive set of them... :) But there are. Because this language is agglutinative. I explain a bit the difficulty. In german you can create the following word: wood [en] - Holz [de] - fa [hu] house [en] - Haus [de] - ház [hu] wood house [en] - Holzhaus [de] - faház [hu] So you glued together house and wood in one word. (this is your example: stringing together without space) In german you can even create words of one verb plus a modifier, like: to work [en], arbeiten [de], dolgoz [hu] to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu] It is the same process;) There are many example of this: to link together[en], anschliessen (an+schliessen) [de] - összekapcsol (össze+kapcsol) [hu], to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu] But in hungarian language, we glue together everything, some example: in house [en], im Haus [de], házban (ház+ban) [hu] car [en], Wagen [de], kocsi [hu] our car [en], unseren Wagen (unser+en Wagen) [de], kocsinkat (kocsi+(u/ü)nk+(a/á/e/é)t) [hu] So the possibilities are nearly infinite. Without analysing the sentence and the word, you cant find the root word with correct accent. And finding the root word requires a spell checker (the best available is hunspell for the hungarian language) Summary: - Losing the accents (in hungarian) most of the time results in contradiction. - Need a spell checker to suggesting the right accented word. (see: http://hunspell.sourceforge.net/) So creating an architecture for spell checker is not a bad idea (for future extensibility). It could be handy for english too. But for other language (ex: hungarian) maybe essential. Sorry for being so tiresome. Best regards, Khiraly ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: OT: [SHR] illume predictive keyboard is too slow
I zapped into this thread because it was only one new mail in the om-community folder and clicking it was the simplest way to mark it as read. Somehow I got curious what that strange (hungarian) sentence has to do with om and found a nice pack of information about your (?) language... Very interesting mail, that's what I love the free software world for. :) -- Marcel Am Wednesday 04 February 2009 16:37:56 schrieb Laszlo KREKACS: Hi! ok - so if a young person typed: Öt szép szűz it'd be: Ot szep szuz ((btw, the meaning of Öt szép szűz lány őrült írót nyúz is Five virgins tire a crazy writer. It is the hungarian synonym of The quick brown fox jumps over the lazy dog)) Yes, and in that specific case works. (because none of the above words (Ot, szep, szuz) has a meaning in hungarian language, so you can understand that example without accent.) But there are other cases, where it is not that clear: ólt - pound (accusative) ölt - he killed ... olt - to graft So when you see olt in the text you cant be sure it is olt, ólt or ölt without analysing the whole sentence. The german example is two-way conversion: ü - ue, ß - ss. You can switch back and for without losing additional information. A simple word based dictionary is limited anyway for the hungarian language, where you can create a word as long as this: elkelkáposztástalaníthatatlanságoskodásaitokért. ugh. so its like german. compound words get created a lot by just stringing multiple words together without a space. that's ok- as long as there arent a massive set of them... :) But there are. Because this language is agglutinative. I explain a bit the difficulty. In german you can create the following word: wood [en] - Holz [de] - fa [hu] house [en] - Haus [de] - ház [hu] wood house [en] - Holzhaus [de] - faház [hu] So you glued together house and wood in one word. (this is your example: stringing together without space) In german you can even create words of one verb plus a modifier, like: to work [en], arbeiten [de], dolgoz [hu] to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu] It is the same process;) There are many example of this: to link together[en], anschliessen (an+schliessen) [de] - összekapcsol (össze+kapcsol) [hu], to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu] But in hungarian language, we glue together everything, some example: in house [en], im Haus [de], házban (ház+ban) [hu] car [en], Wagen [de], kocsi [hu] our car [en], unseren Wagen (unser+en Wagen) [de], kocsinkat (kocsi+(u/ü)nk+(a/á/e/é)t) [hu] So the possibilities are nearly infinite. Without analysing the sentence and the word, you cant find the root word with correct accent. And finding the root word requires a spell checker (the best available is hunspell for the hungarian language) Summary: - Losing the accents (in hungarian) most of the time results in contradiction. - Need a spell checker to suggesting the right accented word. (see: http://hunspell.sourceforge.net/) So creating an architecture for spell checker is not a bad idea (for future extensibility). It could be handy for english too. But for other language (ex: hungarian) maybe essential. Sorry for being so tiresome. Best regards, Khiraly ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, 4 Feb 2009 16:37:56 +0100 Laszlo KREKACS laszlo.krekacs.l...@gmail.com said: Hi! ok - so if a young person typed: Öt szép szűz it'd be: Ot szep szuz ((btw, the meaning of Öt szép szűz lány őrült írót nyúz is Five virgins tire a crazy writer. It is the hungarian synonym of The quick brown fox jumps over the lazy dog)) Yes, and in that specific case works. (because none of the above words (Ot, szep, szuz) has a meaning in hungarian language, so you can understand that example without accent.) But there are other cases, where it is not that clear: ólt - pound (accusative) ölt - he killed ... olt - to graft sure.. maybe being an english speaker.. this doesn't bother me so much as english is full of such words... 1 word can have 2 or 3 or even more very different meanings. written the same way. only context lets you figure it out. so to me i go so.. what's the problem? :) So when you see olt in the text you cant be sure it is olt, ólt or ölt without analysing the whole sentence. The german example is two-way conversion: ü - ue, ß - ss. You can switch back and for without losing additional information. yup. as i speak german i have been using it as an example :) A simple word based dictionary is limited anyway for the hungarian language, where you can create a word as long as this: elkelkáposztástalaníthatatlanságoskodásaitokért. ugh. so its like german. compound words get created a lot by just stringing multiple words together without a space. that's ok- as long as there arent a massive set of them... :) But there are. Because this language is agglutinative. I explain a bit the difficulty. In german you can create the following word: wood [en] - Holz [de] - fa [hu] house [en] - Haus [de] - ház [hu] wood house [en] - Holzhaus [de] - faház [hu] So you glued together house and wood in one word. (this is your example: stringing together without space) In german you can even create words of one verb plus a modifier, like: to work [en], arbeiten [de], dolgoz [hu] to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu] It is the same process;) There are many example of this: to link together[en], anschliessen (an+schliessen) [de] - összekapcsol (össze+kapcsol) [hu], to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu] But in hungarian language, we glue together everything, some example: in house [en], im Haus [de], házban (ház+ban) [hu] car [en], Wagen [de], kocsi [hu] our car [en], unseren Wagen (unser+en Wagen) [de], (kocsi+(u/ü)nk+(a/á/e/é)t) [hu] So the possibilities are nearly infinite. Without analysing the sentence and the word, you cant find the root word with correct accent. oh dear. so you basically take the idea and run with it. nuts! like asian langs... they dont even know what space is! :) (by asian i mean korean, chinese, japanese). And finding the root word requires a spell checker (the best available is hunspell for the hungarian language) Summary: - Losing the accents (in hungarian) most of the time results in contradiction. - Need a spell checker to suggesting the right accented word. (see: http://hunspell.sourceforge.net/) So creating an architecture for spell checker is not a bad idea (for future extensibility). It could be handy for english too. But for other language (ex: hungarian) maybe essential. originally i wanted to actually use aspell to do this... for the vkbd... but its api just didnt cut it. i was wanting to re-use as much as possible, but submitting the totally misspelt word on the kbd just doesnt get you results in a spellchecker. (i hand created some and fed them to aspell to see what it did and it just was useless). they are used to 1 or 2 errors of certain kinds - maybe 3. but when every letter is totally wrong you need an exhaustive search through permutations. :( when kocsinkat is the word you wanted... but you actually typed opdsomlsr ... try get a speller to fix that! interestingly enough at least the english equivalenets: wanted foolhardy and i actually typed gioljsefu... illume can and will correct it to foolhardy... probably as the top or one of the top suggestions... whic is a far cry better than what aspell can dream of doing. it DOEs have a limit that exactly the same number of chars in the desired word need to exist as the matches - but for now, lets assume you hit the kbd the right number of times and its really just screen/finger accuracy fixing. i can't begin to imagine the permutation searches needed for hungarian as either you put all permutations in the dictionary of all words, (for german it's doable - seemingly not for hungarian), or you need to start trying all sorts of permutations of multiple words string together for matches... man thats going to be nastiness. to be honest. i really can't see it being possible to solve this without a lot of work. i don't have the bandwidth to go solving every language
Re: [SHR] illume predictive keyboard is too slow
1. norwegian does allow for conversion to roman-only text. there are rules much like german. 2. this conversion isn't used much and is a last resort thing. 3. only a few special letters are needed for common use cases in addition to latin Hi! I just giving you some perspective;) In Hungary the situation is much like the norvegian. We have two special accented character (ő,ű) which is not used in any other language all the other accent are present in the latin-1 char set (we use the latin-2 charset). In the early computer era ő was matched to õ and ű was matched to û so even the early microsoft word didnt care about those special characters (and used latin-1 charset instead). But it is a history now thanks to utf-8 (but is still a nightmare the accented filenames, especially when restoring broken harddrives;) There is no romanization here, but young people/computer addicts tend to type without accents, but you cant decrypt it by words, you need to understand the whole sentence. So simple word correction are not working. It is not like in Germany where you can write Tschüß, as Tschuess. So there was developed a standard (which are not used anymore, as there are no problem with accents nowaday), where all the accented characters are written using a char plus a punctuation. I give you an example: Öt szép szűz lány őrült írót nyúz O:t sze'p szuz la'ny oru:lt i'ro't nyu'z. Maybe you can use this idea. Or ignore utf-8 and use the corresponding is8859-1,2 etc charset, where one character is one byte. A simple word based dictionary is limited anyway for the hungarian language, where you can create a word as long as this: elkelkáposztástalaníthatatlanságoskodásaitokért. Hope it helps something. Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Carsten Haitzler (The Rasterman) wrote: [...] yeah. this is one reason i want toi understand how it works without ø, æ etc. - one day there will be a phone with a kbd.. and it wont have a version per language because the # of users in norway are too small to warrant a special production run for them - same for germany, france etc. etc. - until you have the sales numbers to justify that.. you need a way to either work around it by ignoring them - or have software correct it. so software that works eventually with a hw kbd and inserts the right ø, æ etc. based off normal a-z typing... would be useful. If we someday get an open phone with a keyboard, then I hope they are smart enough to make enough keys. (In my case, both the q row and the a row needs 11 keys) No problem if the keytops are painted with an english layout - I can paint. As long as they don't let the top row end in p... Surely, when there is a kayboard anyway, a couple of extra keys won't cost much. Not if they are on all phones, instead of only adapted ones. The americans can use the extras as application hotkeys. Another approach - let the keyboard be an extra touchscreen that is wide - in the shape of a keyboard. Then we can program the kayboard like we can today. Of course this keyboard-screen can be cheaper - monochrome, low resolution, maybe no backlight. i just want to understand the constraints of the languages i don't know - and how they are used. it gives me insight into how to solve the problem on a wider picture. thanks for the info. Glad to be of help. Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Tue, 03 Feb 2009 18:28:49 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: [...] yeah. this is one reason i want toi understand how it works without ø, æ etc. - one day there will be a phone with a kbd.. and it wont have a version per language because the # of users in norway are too small to warrant a special production run for them - same for germany, france etc. etc. - until you have the sales numbers to justify that.. you need a way to either work around it by ignoring them - or have software correct it. so software that works eventually with a hw kbd and inserts the right ø, æ etc. based off normal a-z typing... would be useful. If we someday get an open phone with a keyboard, then I hope they are smart enough to make enough keys. (In my case, both the q row and the a row needs 11 keys) No problem if the keytops are painted with an english layout - I can paint. As long as they don't let the top row end in p... Surely, when there is a kayboard anyway, a couple of extra keys won't cost much. Not if they are on all phones, instead of only adapted ones. The americans can use the extras as application hotkeys. oh its not the extra keys - its the variations in production. the moment you have a variation (with different # of keys, different layout of them) you have a change in plastic mould - thats costly (if doing things via molds a new mold costs upward of $US 60,000-100,000 or more). so if all you have is 500 customers in that country - that'd be an up-front cost of maybe 100k to just supply that market, and then for 500 people - IF you sell that many, it'd be $12-$20 extra per unit in costs. to cover the risk of not selling all your production you may have to raise retail prices by $50-$100 more over the mass produced item. now imagine its only 100 customers in that region, or 50. just a change in printing whats on the keys is not free. software keyboards are by far the cheaper option :) but if a hardware keyboard is there - changes of lots of variations per locale being around, unless you sell the kind of volume nokia do, is slim to none. :( Another approach - let the keyboard be an extra touchscreen that is wide - in the shape of a keyboard. Then we can program the kayboard like we can today. Of course this keyboard-screen can be cheaper - monochrome, low resolution, maybe no backlight. of course! i've actually mulled this idea with a clear plastic overlay that contains the mechanical contacts (done in a way that they dont obscure the middle of the key) and just have a normal lcd under it... have an extra lcd or just a bigger single lcd shared with the main one... :) thus a soft-hard-keyboard happens. as long as the # of buttons are ok (you can cover most use cases with the buttons there) then software can vary the painting and layout runtime. this might be the best middleground solution for a hardware keyboard for when low volume productions limit the ability to have custom molds/paint runs due to the small customer bases per locale. it sucks. but english is the lowest common denominator and thus most things tend to be built to support it - as it tends to keep more people happier than some other setup. if there was enough volume to make enough units for a particular language/locale/country - it'd be different. :) i just want to understand the constraints of the languages i don't know - and how they are used. it gives me insight into how to solve the problem on a wider picture. thanks for the info. Glad to be of help. Helge Hafting -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Tue, 3 Feb 2009 17:36:26 +0100 Laszlo KREKACS laszlo.krekacs.l...@gmail.com said: 1. norwegian does allow for conversion to roman-only text. there are rules much like german. 2. this conversion isn't used much and is a last resort thing. 3. only a few special letters are needed for common use cases in addition to latin Hi! I just giving you some perspective;) In Hungary the situation is much like the norvegian. We have two special accented character (ő,ű) which is not used in any other language all the other accent are present in the latin-1 char set (we use the latin-2 charset). In the early computer era ő was matched to õ and ű was matched to û so even the early microsoft word didnt care about those special characters (and used latin-1 charset instead). But it is a history now thanks to utf-8 (but is still a nightmare the accented filenames, especially when restoring broken harddrives;) There is no romanization here, but young people/computer addicts tend to type without accents, but you cant decrypt it by words, you need to understand the whole sentence. So simple word correction are not working. It is not like in Germany where you can write Tschüß, as Tschuess. ok - so if a young person typed: Öt szép szűz it'd be: Ot szep szuz right? So there was developed a standard (which are not used anymore, as there are no problem with accents nowaday), where all the accented characters are written using a char plus a punctuation. I give you an example: Öt szép szűz lány őrült írót nyúz O:t sze'p szuz la'ny oru:lt i'ro't nyu'z. Maybe you can use this idea. Or ignore utf-8 and use the corresponding is8859-1,2 etc charset, where one character is one byte. nah. this heavily precludes expansion. 1 byte is 256 chars. try cram in russian, greek, thai, hindi.. etc. into that space. not going to work. so you keep flipign charsets and have special code per charset... no thanks :) A simple word based dictionary is limited anyway for the hungarian language, where you can create a word as long as this: elkelkáposztástalaníthatatlanságoskodásaitokért. ugh. so its like german. compound words get created a lot by just stringing multiple words together without a space. that's ok- as long as there arent a massive set of them... :) Hope it helps something. Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow - Usability features
On Tue, 03 Feb 2009 20:15:53 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Carsten Haitzler (The Rasterman) wrote: On Mon, 02 Feb 2009 21:53:26 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: However in the past days I sent you privately also a mail about some issues of the keyboard in latest e17 svn [1], but I got no answer. Maybe the mail wasn't sent correctly?! got it - i just tend to ignore some of my mailboxes for a while and cycle around to them... got a lot of email here :) i'll get back to you on it. it just is that kbd isnt a focus at the moment so it tends to take a back-burner position. Ah, ok... It's understandable... They seem unrelated, but why not workarounding them by allowing these actions only after a small timeout (i.e. waiting few ms from the latest char pressure)? so lets say 0.4 sec after the last keyboard key press it will allow for swipes and match hits etc. that could be done. again - tuning a timing value. will people then complain that :i often try and swipe or hit a match and it doesnt respond. i need to do it again?. h. Maybe 0.4 seconds is too much. I think that we could use a lower value too. And maybe configurable directly from the keyboard (also if I don't think that this is needed at all). Generally you never confirm a word or switch keyboard as fast as you type over a char (since typing can be un-precise thanks to the keyboard correction, switching a keyboard or selecting a word must be precise)... correct. it's a fine line to walk tho - as above :) And... What about making the horizontal word list (the one over the keys) scrollable [right-left] as the configuration toolbar is? Would it require more computation? I figure that that could improve the usability. no - it'd be not much of a problem - i just didnt do it. :) Ok, so please put it in your/illume TODO :P :) nb - i can see why you often hit a match word. your kbd layout doesnt have padding ABOVE the qwerty line like the default does... :) Yes. That's true. But people could have also keyboards with more keys than the mine (see Norwegians :P), and make the words-list and the keys closer. The fact is that also using the default qwerty keyboard, that has more padding, it could happen to hit a word if you're writing while walking/driving[ehm... :P]/talking (or simply writing quickly)... Don't you agree? oh indeed it can happen.. but its much less likely - it never has happened to me, thus why i think its probably ok. but with that padding removed on your kbd, i can see how it becomes much more probable you hit these things at the top. you should just add more padding :) the problem is the kbd wont resize per layout atm so the default determines the size so if yours is the default then just make sure it has padding and the problem should go away. -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Sun 01 Feb 2009 00:31:09 Carsten Haitzler wrote: On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh olo...@gmail.com said: On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said: But I think a dictionary format in plain utf8 that includes the normalised words as well as any candidates to display would be the best way. Then the dictionary itself could choose which characters to normalise and which to leave as is. So for Swedish, you can leave å, ä and ö as they are but normalise é, à etc. Searching would be as simple as in your original implementation (no need to convert from multibyte format). the problem is - the dict in utf8 means searching is slow as you do it in utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need to be allocated in non-swappable ram (its a phone - it has no swap) and thus a few mb of your ram goes into the kbd dict at all times. by using mmap you leave it to the kernels paging system to figure it out. so as such a dict change will mean a non-ascii format in future for this reason. but there will then need to be a tool to generate such a file. Searching in utf8 doesn't mean it has to be slow. Simple strcmp works fine on multibyte utf8 strings as well, and should be as fast as the dictionary was before adding multibyte to widechars conversions. But if you have some other idea in mind, please don't let me disturb. =) the problem is - it INSt a simple keyvalue lookup. it's a possible-match tree build on-the-fly. that means you jump about examining 1 character at a time. the problem here is that 1 char may or may not be 1 byte or more and that makes it really nasty. if it were a simple key lookup for a given simple string - life would be easy. this is possible - but then u'd have to generate ALL permutations first then look ALL of them up. if you weed out permutations AS you look them up you can weed out something like 90% of the permutations as you KNOw there are no words starting with qz... so as you go through qa... qs qx... qz... you can easily stop all the combinations with qs, qz ans qx as no words begin with that (if you have an 8 letter word with 8 possible letters per character in the word thats 8^6 lookups you avoided (in the case above - ie all permutations of the other 6 letters). thats 262144 lookups avoided... just there. for... 1 of the above impossible permutation trees. now add it up over all of them. Do you consider this paper relevant? http://citeseer.ist.psu.edu/schulz02fast.html Fast String Correction with Levenshtein-Automata, (2002), Klaus Schulz, Stoyan Mihov It actually uses tries to avoid generating and comparing exhaustively all permutations of the input word (typed keys), but instead traverses *only* know words and accumulates permutations unless a max-errors limit gets exceeded, in which case this path dies. It describes a mathematical model for correcting typos, but since i have already implemented it (in java) i know think it can be retrofitted to perform what you describe in: http://wiki.openmoko.org/wiki/Illume_keyboard Keep up the good work. Kostis ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow - Usability features
Carsten Haitzler (The Rasterman) wrote: On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). Ok, thanks... I got it. However I hope we could have made something that is based on that idea (the trie) but that can be applied to non ascii-chars too. However in the past days I sent you privately also a mail about some issues of the keyboard in latest e17 svn [1], but I got no answer. Maybe the mail wasn't sent correctly?! However I've wrote there also some features that I'd suggest to implement in the Illume keyboard. I'll write them here too to make the community aware: I use the illume keyboard every day and I'm very happy with it as I've said many times in this ML, but sometimes it happens that it performs some unwanted actions like: - I involuntarily click on a suggested word while I'm still typing my word (cause I'm not too precise I tap over a word, instead of a top char). - It happens that I got my keyboard switched while typing (yes, I know that this mainly an hardware-related issue, due to the touchscreen jitters). They seem unrelated, but why not workarounding them by allowing these actions only after a small timeout (i.e. waiting few ms from the latest char pressure)? Generally you never confirm a word or switch keyboard as fast as you type over a char (since typing can be un-precise thanks to the keyboard correction, switching a keyboard or selecting a word must be precise)... And... What about making the horizontal word list (the one over the keys) scrollable [right-left] as the configuration toolbar is? Would it require more computation? I figure that that could improve the usability. Bye. [1] http://i43.tinypic.com/i4il2d.png -- Treviño's World - Life and Linux http://www.3v1n0.net/ ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Carsten Haitzler (The Rasterman) wrote: On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said: I hope things like this will be possible, if a new dictionary format is realized. It is ok if typing for suggests fôr as an alternative, but før should not come up unless the user types f ø r. In which case o must not be suggested... ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue, ß - ss, etc. - there is a set of romanisation rules that can convert any such char to 1 or more roman letters. i was hoping to be even more lenient with ö - o being valid too for the lazy :) japanese has romanisation rules - so does chinese... norwegian must (eg æ - ae for example). Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, ø-oe, å-aa. They are next to useless, because ae and oe occur naturally in many words where æ or ø does not belong, and these double vowels are pronounced differently as well. A Norwegian seeing oe in a word may be able to figure out if this means ø or if it really is supposed to be oe, but this may need a context of several words. And it looks funny/wrong - similar to how it looks silly transcribing x as ks and write ksylophone. oh thats not bad! then it's just like english! (you get used to the vague insanity of it all sooner or later!) :) but seriously - if your name is nønæn, and you move to japan, and have to fill out a form for your bank account name - they will see the ø and æ and go ummm. we can't do that - can you please use normal roman text? Sure, in that case, it is ø-oe, æ-ae and å-aa. (Or some will go ø-o and å-a because their name looks less mangled that way.) While this may be ok for opening a bank account in japan, it is not something ordinary people will want to consider for typing text messages on a phone. Simple phones have had æøå in the T9 system for ages. (with æ and å on the same key as a, and ø on the same key as o) [...] just like my example above - but i guess i was being stricter. the stodgey old banking system isn't going to go adapt like modern sports data systenms. its go roman - or go home. :) Sure. I just hope the freerunner doesn't evolve into a stodgey old thing as far as keyboards are considered. Looks like it doesn't, so I'll be fine. :-) [...] hmm. how interesting. i have always been baffled why there is a UK qwerty layout vs US - thre UK is the only place that uses it... all other english speaking countries i know use US qwerty (and if UK qwerty was nicely killed off.. it wouldn't need to be US qwerty - just qwerty) :) Surely this is because of the £-sign? (And € too, in later standards.) I don't think they are ready to give up the pound. ok - but there is a way to do this. when stuck on your friends pc when visiting them in california, and they dont have compose-modes enabled... how do you type æ and ø etc. that was basically the q - there must be some accepted mechanism for decimation/conversion. seemingly it's the obvious: æ - ae, ø - o etc. My preferred way is to open a webpage and paste the special characters I need. These days, any pc seems to support æøå even if the keyboard itself doesn't. In a situation where æøå cannot be entered (such as the sms app in SHR which erroneously filter out non-ascii), I write my sentences very carefully avoiding these letters. For I don't want to spell wrong deliberately, not even transcriptions. Those that care a lot less about spelling use more transcriptions - and might even use transcriptions on a phone that has æøå, because their phone is badly adapted to Norwegian and have æøå in weird places. (Because the manufacturers aren't really into adding a couple of extra _hardware_ keys.) Software keyboards are great! Excellent! So if I have a wordlist and make a keyboard, then a dictionary can be synthesized so there will be no unnecessary confusion between o and ø, because both letters exists as keys? correct. as long as the dict matching doesnt drop extra info - ie normalize o - ø. currently it does. but the rest o the code doesn't. it's just the dict matching engine - which as we have been discussing... needs work. :) The dictionary file problably need to have some metadata anyway - such as what language it is for. It could also have a list of what non-ascii letters to use as-is. And assume standard romanization rules for the rest. Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Mon, 2 Feb 2009 19:39:52 +0200 Kostis Anagnostopoulos ankos...@gmail.com said: On Sun 01 Feb 2009 00:31:09 Carsten Haitzler wrote: On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh olo...@gmail.com said: On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said: But I think a dictionary format in plain utf8 that includes the normalised words as well as any candidates to display would be the best way. Then the dictionary itself could choose which characters to normalise and which to leave as is. So for Swedish, you can leave å, ä and ö as they are but normalise é, à etc. Searching would be as simple as in your original implementation (no need to convert from multibyte format). the problem is - the dict in utf8 means searching is slow as you do it in utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need to be allocated in non-swappable ram (its a phone - it has no swap) and thus a few mb of your ram goes into the kbd dict at all times. by using mmap you leave it to the kernels paging system to figure it out. so as such a dict change will mean a non-ascii format in future for this reason. but there will then need to be a tool to generate such a file. Searching in utf8 doesn't mean it has to be slow. Simple strcmp works fine on multibyte utf8 strings as well, and should be as fast as the dictionary was before adding multibyte to widechars conversions. But if you have some other idea in mind, please don't let me disturb. =) the problem is - it INSt a simple keyvalue lookup. it's a possible-match tree build on-the-fly. that means you jump about examining 1 character at a time. the problem here is that 1 char may or may not be 1 byte or more and that makes it really nasty. if it were a simple key lookup for a given simple string - life would be easy. this is possible - but then u'd have to generate ALL permutations first then look ALL of them up. if you weed out permutations AS you look them up you can weed out something like 90% of the permutations as you KNOw there are no words starting with qz... so as you go through qa... qs qx... qz... you can easily stop all the combinations with qs, qz ans qx as no words begin with that (if you have an 8 letter word with 8 possible letters per character in the word thats 8^6 lookups you avoided (in the case above - ie all permutations of the other 6 letters). thats 262144 lookups avoided... just there. for... 1 of the above impossible permutation trees. now add it up over all of them. Do you consider this paper relevant? http://citeseer.ist.psu.edu/schulz02fast.html Fast String Correction with Levenshtein-Automata, (2002), Klaus Schulz, Stoyan Mihov It actually uses tries to avoid generating and comparing exhaustively all permutations of the input word (typed keys), but instead traverses *only* know words and accumulates permutations unless a max-errors limit gets exceeded, in which case this path dies. not sure thats that good.. that will drop possible matches - the current scheme walks the tree of known words using the permutation list to pick paths - it wotn follow paths that dont exist, so thats already done. i was just saying that you need the permutation list per letter + walking of the data to be inherently combined. as without that you need to generate every permutation and throw it at a 1 key - value lookup hash. it still uses a trie ( which is a binary tree with the letters inlined as part of the tree struct). :) just reading the abstract tho.. document is 67 pages i have to dig through... It describes a mathematical model for correcting typos, but since i have already implemented it (in java) i know think it can be retrofitted to perform what you describe in: http://wiki.openmoko.org/wiki/Illume_keyboard sure - can it be implemented so all data is mmaped from files? thats the biggest problem. the first dict for illume (before the current) used a 27-way per node tree - lookups were hyper-fast. but it ate ram. i went to the opposite end where i just mmaped the test file and built a very small 2-level char offset lookup table to avoid ram usage. this isnt that fast - but was ok. i know i could improve the parsing with having it all ucs2 to avoid slower utf8 decomposing and with line jump-tables built into the file it'd avoid scanning a whole line to jump to the next entry when a match fails. as such it's more a matter of just having a fast dict format that can be mmaped and walked easily while spooling off the permutations of chars per letter (and thus being able to spot a match and calculate its relative distance). Keep up the good work. Kostis ___ Openmoko community mailing list community@lists.openmoko.org
Re: [SHR] illume predictive keyboard is too slow
On Mon, 02 Feb 2009 15:26:50 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said: I hope things like this will be possible, if a new dictionary format is realized. It is ok if typing for suggests fôr as an alternative, but før should not come up unless the user types f ø r. In which case o must not be suggested... ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue, ß - ss, etc. - there is a set of romanisation rules that can convert any such char to 1 or more roman letters. i was hoping to be even more lenient with ö - o being valid too for the lazy :) japanese has romanisation rules - so does chinese... norwegian must (eg æ - ae for example). Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, ø-oe, å-aa. They are next to useless, because ae and oe occur naturally in many words where æ or ø does not belong, and these double vowels are pronounced differently as well. A Norwegian seeing oe in a word may be able to figure out if this means ø or if it really is supposed to be oe, but this may need a context of several words. And it looks funny/wrong - similar to how it looks silly transcribing x as ks and write ksylophone. oh thats not bad! then it's just like english! (you get used to the vague insanity of it all sooner or later!) :) but seriously - if your name is nønæn, and you move to japan, and have to fill out a form for your bank account name - they will see the ø and æ and go ummm. we can't do that - can you please use normal roman text? Sure, in that case, it is ø-oe, æ-ae and å-aa. (Or some will go ø-o and å-a because their name looks less mangled that way.) While this may be ok for opening a bank account in japan, it is not something ordinary people will want to consider for typing text messages on a phone. Simple phones have had æøå in the T9 system for ages. (with æ and å on the same key as a, and ø on the same key as o) [...] sure! yes. thats why i allowed for keys to be 'ø' and 'æ' etc. etc. - already done. i was hoping to have a way of also doing it just with plain qwerty. so there is a way of reducing it :) just like my example above - but i guess i was being stricter. the stodgey old banking system isn't going to go adapt like modern sports data systenms. its go roman - or go home. :) Sure. I just hope the freerunner doesn't evolve into a stodgey old thing as far as keyboards are considered. Looks like it doesn't, so I'll be fine. :-) [...] unlike the banking system. the users CAN have a say in fixing it... if they just do some code :) if they just sit and wait for people to do it for them for free - they may have to wait a while until it becomes a priority for those doing the code. :) hmm. how interesting. i have always been baffled why there is a UK qwerty layout vs US - thre UK is the only place that uses it... all other english speaking countries i know use US qwerty (and if UK qwerty was nicely killed off.. it wouldn't need to be US qwerty - just qwerty) :) Surely this is because of the £-sign? (And € too, in later standards.) I don't think they are ready to give up the pound. hmm no - they moved the a-z letters around. symbols i can understand. but what play with a-z layout... beats me... ok - but there is a way to do this. when stuck on your friends pc when visiting them in california, and they dont have compose-modes enabled... how do you type æ and ø etc. that was basically the q - there must be some accepted mechanism for decimation/conversion. seemingly it's the obvious: æ - ae, ø - o etc. My preferred way is to open a webpage and paste the special characters I need. These days, any pc seems to support æøå even if the keyboard itself doesn't. In a situation where æøå cannot be entered (such as the sms app in SHR which erroneously filter out non-ascii), I write my sentences very carefully avoiding these letters. For I don't want to spell wrong deliberately, not even transcriptions. Those that care a lot less about spelling use more transcriptions - and might even use transcriptions on a phone that has æøå, because their phone is badly adapted to Norwegian and have æøå in weird places. (Because the manufacturers aren't really into adding a couple of extra _hardware_ keys.) Software keyboards are great! yeah. this is one reason i want toi understand how it works without ø, æ etc. - one day there will be a phone with a kbd.. and it wont have a version per language because the # of users in norway are too small to warrant a special production run for them - same for germany, france etc. etc. - until you have the sales numbers to justify that.. you need a way to
Re: [SHR] illume predictive keyboard is too slow - Usability features
On Mon, 02 Feb 2009 21:53:26 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Carsten Haitzler (The Rasterman) wrote: On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). Ok, thanks... I got it. However I hope we could have made something that is based on that idea (the trie) but that can be applied to non ascii-chars too. However in the past days I sent you privately also a mail about some issues of the keyboard in latest e17 svn [1], but I got no answer. Maybe the mail wasn't sent correctly?! got it - i just tend to ignore some of my mailboxes for a while and cycle around to them... got a lot of email here :) i'll get back to you on it. it just is that kbd isnt a focus at the moment so it tends to take a back-burner position. However I've wrote there also some features that I'd suggest to implement in the Illume keyboard. I'll write them here too to make the community aware: I use the illume keyboard every day and I'm very happy with it as I've said many times in this ML, but sometimes it happens that it performs some unwanted actions like: - I involuntarily click on a suggested word while I'm still typing my word (cause I'm not too precise I tap over a word, instead of a top char). thats a problem. mostly of spacing. its actually hard to figure that out. i really dont know what to do there - if u reduce the hit area for matches - it gets harder to select them. if i add more spacing you lose more screen to the kbd. somewhere someone loses. it's a matter of fine adjustments i guess in the spacing to add more space. - It happens that I got my keyboard switched while typing (yes, I know that this mainly an hardware-related issue, due to the touchscreen jitters). hmm thats hard to do. either u make swipes less sensitive and thus make it harder to change layout and solve your problem, or you live with the occasional swipe... or we have another way to change layout thats easy. They seem unrelated, but why not workarounding them by allowing these actions only after a small timeout (i.e. waiting few ms from the latest char pressure)? so lets say 0.4 sec after the last keyboard key press it will allow for swipes and match hits etc. that could be done. again - tuning a timing value. will people then complain that :i often try and swipe or hit a match and it doesnt respond. i need to do it again?. h. Generally you never confirm a word or switch keyboard as fast as you type over a char (since typing can be un-precise thanks to the keyboard correction, switching a keyboard or selecting a word must be precise)... correct. it's a fine line to walk tho - as above :) And... What about making the horizontal word list (the one over the keys) scrollable [right-left] as the configuration toolbar is? Would it require more computation? I figure that that could improve the usability. no - it'd be not much of a problem - i just didnt do it. :) nb - i can see why you often hit a match word. your kbd layout doesnt have padding ABOVE the qwerty line like the default does... :) Bye. [1] http://i43.tinypic.com/i4il2d.png -- Treviño's World - Life and Linux http://www.3v1n0.net/ ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh olo...@gmail.com said: On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said: But I think a dictionary format in plain utf8 that includes the normalised words as well as any candidates to display would be the best way. Then the dictionary itself could choose which characters to normalise and which to leave as is. So for Swedish, you can leave å, ä and ö as they are but normalise é, à etc. Searching would be as simple as in your original implementation (no need to convert from multibyte format). the problem is - the dict in utf8 means searching is slow as you do it in utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need to be allocated in non-swappable ram (its a phone - it has no swap) and thus a few mb of your ram goes into the kbd dict at all times. by using mmap you leave it to the kernels paging system to figure it out. so as such a dict change will mean a non-ascii format in future for this reason. but there will then need to be a tool to generate such a file. Searching in utf8 doesn't mean it has to be slow. Simple strcmp works fine on multibyte utf8 strings as well, and should be as fast as the dictionary was before adding multibyte to widechars conversions. But if you have some other idea in mind, please don't let me disturb. =) the problem is - it INSt a simple keyvalue lookup. it's a possible-match tree build on-the-fly. that means you jump about examining 1 character at a time. the problem here is that 1 char may or may not be 1 byte or more and that makes it really nasty. if it were a simple key lookup for a given simple string - life would be easy. this is possible - but then u'd have to generate ALL permutations first then look ALL of them up. if you weed out permutations AS you look them up you can weed out something like 90% of the permutations as you KNOw there are no words starting with qz... so as you go through qa... qs qx... qz... you can easily stop all the combinations with qs, qz ans qx as no words begin with that (if you have an 8 letter word with 8 possible letters per character in the word thats 8^6 lookups you avoided (in the case above - ie all permutations of the other 6 letters). thats 262144 lookups avoided... just there. for... 1 of the above impossible permutation trees. now add it up over all of them. -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Carsten Haitzler (The Rasterman) wrote: On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said: I hope things like this will be possible, if a new dictionary format is realized. It is ok if typing for suggests fôr as an alternative, but før should not come up unless the user types f ø r. In which case o must not be suggested... ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue, ß - ss, etc. - there is a set of romanisation rules that can convert any such char to 1 or more roman letters. i was hoping to be even more lenient with ö - o being valid too for the lazy :) japanese has romanisation rules - so does chinese... norwegian must (eg æ - ae for example). Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, ø-oe, å-aa. They are next to useless, because ae and oe occur naturally in many words where æ or ø does not belong, and these double vowels are pronounced differently as well. A Norwegian seeing oe in a word may be able to figure out if this means ø or if it really is supposed to be oe, but this may need a context of several words. And it looks funny/wrong - similar to how it looks silly transcribing x as ks and write ksylophone. You might want to transcribe x that way in an emergency, if your x key breaks, until you get a new keyboard. You probably don't want to throw away the x to save space on a keyboard though. And norwegian transcriptions aren't used for the same reasons. I have only seen two cases of such traqnscription: 1. Names of norwegian athletes in international sports events. Which looks real silly. And completely unnecessary. Sport computer systems these days handle more than a-z, the names are spelled correctly in national events after all. And it is not as if foreigners get big problems with an ø. If they don't know what the slash is for, they can read it as o, and so on. Similiar to how I read french - I have no idea what the difference between à and á is. Both is a to me. 2. Expert computer users sometimes use the transcriptions, because they often use the latest equipment before keyboards gets fixed and before ascii-only limitations are sorted out. Some of them are tired of fighting and give up. And they have actually heard about the concept of transcription! But mainstream users get equipment with proper keyboards, anything less is an unfinished product. You won't find an ascii keyboard in a norwegian shop. if something can be romanised - it can have a romanised match in a dictionary and thus suggest the appropriate matches. of course now the dictionary determines these rules implicitly by content, not by code specifically enforcing such rules. :) but yes - selecting dictionary is needed so selecting a keyboard for that language as well as dictionary is useful. it still adds a few keys - thus squashing the keyboard some more :( i was hoping to avoid that. English can work with 10 keys in a row, norwegian needs 11. :-) The solution then is different keyboards, those who don't need more should not need to suffer the slightly smaller keys. note - the keyboard is by no means limited to ascii at all - it's perfectly able to have accented/other keys added to layouts - so i'm considering this problem solved as its simply a matter of everyone agreeing to make a .kbd for their language - should they need one other than the default qwerty (ascii) one. so from this point of view - that's solved. what isn't done yet is: Excellent! So if I have a wordlist and make a keyboard, then a dictionary can be synthesized so there will be no unnecessary confusion between o and ø, because both letters exists as keys? 1. a kbd being able to hint at wanting a specific dictionary language (or vice-versa). For packaging, put the wordlist and keyboard layout in the same package. And switch both when swithcing keyboards. I guess several languages will have the same layout. This can be solved elegantly with hard links. Or a machanism where keyboards either uses stdandard ascii, or a language specific layout. 2. dictionary itself being able to hint to have a specific kbd layout. 3. applications not being able to hint for a specific language for input (and thus dictionary and/or kbd). I believe we use the same apps, regardless of language? So an app should simply ask for numeric/alphabetic/terminal, and then the system provides the system default alpha kayboard. This could be english, norwegian, german, ... depending on a system setting. Multilingual persons can have one default keyboard and explicitly select another when needed. It'd be nice if one could have the option of setting a terminal keyboard as the default alphabetic keyboard too - some people don't like guesswork because the wordlist is never truly complete - or maybe there is no list for their language yet. Of course they then have to struggle with stylus and
Re: [SHR] illume predictive keyboard is too slow
On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said: On Fri, Jan 30, 2009 at 4:25 AM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh olo...@gmail.com said: On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Olof Sjobergh wrote: Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). I think that too. Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie the problem here comes with having multiple displays for a single match. let me take japanese as an example (i hope you have the fonts to see this at least - though there is no need to understand beyond knowing that there are a lot of matches that are visibly different): sakana - さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖 かな unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii input string matching one of MANY very different matches. the european case of vogel - Vogel Vögel is a simplified version of the above. the reason i wanted decimation to match a simple roman text (ascii) string is - that this is a pretty universal thing. thats how japanese, chinese and even some korean input methods work. it also works for european languages too. europeans are NOT used to the idea of a dictionary guessing/selecting system when they type - but the asians are. they are always typing and selecting. the smarts come with the dictionary system selecting the right one more often than not by default or the right selection you want being only 1 or 2 keystrokes away. i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. so if you add those keys and use them or flip to another key layout to select them - you get what you expect. but if i am to redo the dict - the api is very generic - just the internals and format need changing to be able to do the above. the cool bit is.. if i manage the above... it has almost solved asian languages too - and input methods... *IF* the vkbd is also able to talk to a complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type chinese characters... :) but in principle the dictionary and lookup scheme will work - its then just mechanics of sending the data to the app in a way it can use it. so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. To not make it too large for example the following syntax could also be accepted: eat; 512 // No candidates, just show the match as is har här hår; 1234// Also show the match itself as a candidate If you think this would be good enough, I could try to implement it. Another problem with languages like Swedish, and also Japanese, is the heavy use of conjugation. For example, in Japanese the verbs 食べる and 考える can both be conjugated in the same way like this: 食べる 食べました 食べた 食べている 食べていた 食べています 食べてい ました考える 考えました 考えた 考えている 考えていた 考えています 考 えていました Another example, the Swedish nouns: bil bilen bilar bilarna bilens bilarnas But including all these forms in a dictionary makes it very large, which is impractical. So some way to indicate possible conjugations would be
Re: [SHR] illume predictive keyboard is too slow
On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said: I hope things like this will be possible, if a new dictionary format is realized. It is ok if typing for suggests fôr as an alternative, but før should not come up unless the user types f ø r. In which case o must not be suggested... ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue, ß - ss, etc. - there is a set of romanisation rules that can convert any such char to 1 or more roman letters. i was hoping to be even more lenient with ö - o being valid too for the lazy :) japanese has romanisation rules - so does chinese... norwegian must (eg æ - ae for example). Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, ø-oe, å-aa. They are next to useless, because ae and oe occur naturally in many words where æ or ø does not belong, and these double vowels are pronounced differently as well. A Norwegian seeing oe in a word may be able to figure out if this means ø or if it really is supposed to be oe, but this may need a context of several words. And it looks funny/wrong - similar to how it looks silly transcribing x as ks and write ksylophone. oh thats not bad! then it's just like english! (you get used to the vague insanity of it all sooner or later!) :) but seriously - if your name is nønæn, and you move to japan, and have to fill out a form for your bank account name - they will see the ø and æ and go ummm. we can't do that - can you please use normal roman text? because they will either accept roman (a-z) OR japanese (hiragana/katakana/kanji). strange accented european chars aren't going to work. :) so i guess i'm asking because sooner or later when filling out an immigration form or something in another country - you will need to drop such chars into roman text somehow (that ugly nasty lowest common denominator thing - i know), and so i was curious... how you solve that - as that then presents a set of solutions/rules that can be applied. :) again - not saying to get rid of the ø's of this world. already supported.but just wondering, how we can work when they are not there/used. :) You might want to transcribe x that way in an emergency, if your x key breaks, until you get a new keyboard. You probably don't want to throw away the x to save space on a keyboard though. And norwegian transcriptions aren't used for the same reasons. I have only seen two cases of such traqnscription: 1. Names of norwegian athletes in international sports events. Which looks real silly. And completely unnecessary. Sport computer systems these days handle more than a-z, the names are spelled correctly in national events after all. And it is not as if foreigners get big problems with an ø. If they don't know what the slash is for, they can read it as o, and so on. Similiar to how I read french - I have no idea what the difference between à and á is. Both is a to me. just like my example above - but i guess i was being stricter. the stodgey old banking system isn't going to go adapt like modern sports data systenms. its go roman - or go home. :) 2. Expert computer users sometimes use the transcriptions, because they often use the latest equipment before keyboards gets fixed and before ascii-only limitations are sorted out. Some of them are tired of fighting and give up. And they have actually heard about the concept of transcription! But mainstream users get equipment with proper keyboards, anything less is an unfinished product. You won't find an ascii keyboard in a norwegian shop. hmm. how interesting. i have always been baffled why there is a UK qwerty layout vs US - thre UK is the only place that uses it... all other english speaking countries i know use US qwerty (and if UK qwerty was nicely killed off.. it wouldn't need to be US qwerty - just qwerty) :) ok - but there is a way to do this. when stuck on your friends pc when visiting them in california, and they dont have compose-modes enabled... how do you type æ and ø etc. that was basically the q - there must be some accepted mechanism for decimation/conversion. seemingly it's the obvious: æ - ae, ø - o etc. :) if something can be romanised - it can have a romanised match in a dictionary and thus suggest the appropriate matches. of course now the dictionary determines these rules implicitly by content, not by code specifically enforcing such rules. :) but yes - selecting dictionary is needed so selecting a keyboard for that language as well as dictionary is useful. it still adds a few keys - thus squashing the keyboard some more :( i was hoping to avoid that. English can work with 10 keys in a row, norwegian needs 11. :-) The solution then is different keyboards, those who don't need
Re: [SHR] illume predictive keyboard is too slow
On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said: But I think a dictionary format in plain utf8 that includes the normalised words as well as any candidates to display would be the best way. Then the dictionary itself could choose which characters to normalise and which to leave as is. So for Swedish, you can leave å, ä and ö as they are but normalise é, à etc. Searching would be as simple as in your original implementation (no need to convert from multibyte format). the problem is - the dict in utf8 means searching is slow as you do it in utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need to be allocated in non-swappable ram (its a phone - it has no swap) and thus a few mb of your ram goes into the kbd dict at all times. by using mmap you leave it to the kernels paging system to figure it out. so as such a dict change will mean a non-ascii format in future for this reason. but there will then need to be a tool to generate such a file. Searching in utf8 doesn't mean it has to be slow. Simple strcmp works fine on multibyte utf8 strings as well, and should be as fast as the dictionary was before adding multibyte to widechars conversions. But if you have some other idea in mind, please don't let me disturb. =) Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
2009/1/29 Olof Sjobergh olo...@gmail.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. This dictionary would have hundreds of millions of rows even if you take only reasonable user inputs. But what to do if the users inputs something that's not in the dictionary? Of course I'm assuming you want to correct typos, as it's doing now. vogel: Vogel, Vögel vigel: Vogel, Vögel vpgel: Vogel, Vögel wogel: Vogel, Vögel wigel: Vogel, Vögel vigem: Vogel, Vögel vigwl: Vogel, Vögel ... ... ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
This dictionary would have hundreds of millions of rows even if you take only reasonable user inputs. why would that be? colloquial language (nad that's what is to be considered) contains only several thousends words, still a lot but far away from millions. But what to do if the users inputs something that's not in the dictionary? but that's a problem with every dictionary -- you never can contain every possible word. i don't use the keyboard and i do not follow the discussion close, but what always struck me odd was the use of a text file. why not use a db? it would enable learning, too. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, Jan 29, 2009 at 10:18 AM, Michal Brzozowski ruso...@poczta.fm wrote: 2009/1/29 Olof Sjobergh olo...@gmail.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. This dictionary would have hundreds of millions of rows even if you take only reasonable user inputs. But what to do if the users inputs something that's not in the dictionary? Of course I'm assuming you want to correct typos, as it's doing now. vogel: Vogel, Vögel vigel: Vogel, Vögel vpgel: Vogel, Vögel wogel: Vogel, Vögel wigel: Vogel, Vögel vigem: Vogel, Vögel vigwl: Vogel, Vögel ... ... It did not mean all possible misspellings should be included, only the normalisation which removes accented chars etc. So for normal English, there would be almost no extra size compared to now. The current way of correcting typos by checking all combinations from neighbouring keys would work just like today. Best redards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
2009/1/29 Olof Sjobergh olo...@gmail.com It did not mean all possible misspellings should be included, only the normalisation which removes accented chars etc. So for normal English, there would be almost no extra size compared to now. The current way of correcting typos by checking all combinations from neighbouring keys would work just like today. Ok, now I understand. This is a very good idea then. Is there any explanation available on how the keyboard does typo correcting? I mean the algorithm it uses. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Carsten Haitzler (The Rasterman) wrote: i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. We will still need to select the correct dictionary for the language somewhere. It is no more work if this also selects a keyboard layout adapted to that language. I can see why you want a simple keyboard with fewer keys - the keys can be bigger and so there will be fewer finger-misses. I don't see any reason why it should be limited to ascii though - that division does not seem natural to me. An example from the Norwegian laguage: The letter ô is rarely used, and everybody thinks about it as an o with a hat on it. So this one fits your scheme - type o and ô will be suggested in the few cases where it is appropriate. But the three vowels æøå is different. They are letters of their own, they are not seen as modifications of a/o, even if that may be historically correct. These three have their own names and their own places in the alphabet (after z). An å is not merely an a with ring, no more than the E is an F with an extra line attached. The ø is not merely an o with a slash either. Many people don't know that æ originated as an ae ligature. æ and ae can both occur in words, but the pronunciation is different and they are not interchangeable. So when Norwegians type, they expect to see the 29 letters of their alphabet: abcdefghijklmnopqrstuvwxyzæøå. ô and é are sometimes useful too, but these are just o and e with modifications. æøå however, are parts of the base alphabet. Just like abc. A keyboard without æøå is assumed not to support Norwegian. I hope things like this will be possible, if a new dictionary format is realized. It is ok if typing for suggests fôr as an alternative, but før should not come up unless the user types f ø r. In which case o must not be suggested... Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thursday 29 January 2009, Michal Brzozowski wrote: 2009/1/29 Olof Sjobergh olo...@gmail.com It did not mean all possible misspellings should be included, only the normalisation which removes accented chars etc. So for normal English, there would be almost no extra size compared to now. The current way of correcting typos by checking all combinations from neighbouring keys would work just like today. Ok, now I understand. This is a very good idea then. Is there any explanation available on how the keyboard does typo correcting? I mean the algorithm it uses. The wiki page links to a thread where Raster explains the process in great detail. http://wiki.openmoko.org/wiki/Illume_keyboard http://lists.openmoko.org/nabble.html#nabble-td2115715 ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, 29 Jan 2009 12:19:38 +0100 arne anka openm...@ginguppin.de said: This dictionary would have hundreds of millions of rows even if you take only reasonable user inputs. why would that be? colloquial language (nad that's what is to be considered) contains only several thousends words, still a lot but far away from millions. But what to do if the users inputs something that's not in the dictionary? but that's a problem with every dictionary -- you never can contain every possible word. i don't use the keyboard and i do not follow the discussion close, but what always struck me odd was the use of a text file. why not use a db? it would enable learning, too. sheer simplicity and dependencies. a db would mean selecting one. gdbm is gpl. libdb is fine - but they love to break db format every few releases and that'd royally suck. also these lean to key/value pair - and that means u need to GENERATE all possible permutations (which is prohibitively expensive) so the dict also affects the lookup as you simply avoid generating permutations u know will never have any matches (ie nothing starts with qz... so never worry about all the qz* permutations). the best suggestion is a trie - but i need a format i can access really quickly - and a library that isnt license or otherwise restricted, easy to use, doesnt eat much ram at all, and is fast. invariably you never get that - it either eats ram or it slow, or something else. so what i did is just use a simple format easy to generate with a small 1 liner shell command and index it on the fly for quick lookups in a tiny 2 level index. it of course is not incredibly fast - but it uses a tiny amount of precious ram. making it a text file opens the gate to easy generation of new dicts - and i wanted to keep that as easy as possible. -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said: Carsten Haitzler (The Rasterman) wrote: i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. We will still need to select the correct dictionary for the language somewhere. It is no more work if this also selects a keyboard layout adapted to that language. I can see why you want a simple keyboard with fewer keys - the keys can be bigger and so there will be fewer finger-misses. I don't see any reason why it should be limited to ascii though - that division does not seem natural to me. An example from the Norwegian laguage: The letter ô is rarely used, and everybody thinks about it as an o with a hat on it. So this one fits your scheme - type o and ô will be suggested in the few cases where it is appropriate. But the three vowels æøå is different. They are letters of their own, they are not seen as modifications of a/o, even if that may be historically correct. These three have their own names and their own places in the alphabet (after z). An å is not merely an a with ring, no more than the E is an F with an extra line attached. The ø is not merely an o with a slash either. Many people don't know that æ originated as an ae ligature. æ and ae can both occur in words, but the pronunciation is different and they are not interchangeable. So when Norwegians type, they expect to see the 29 letters of their alphabet: abcdefghijklmnopqrstuvwxyzæøå. ô and é are sometimes useful too, but these are just o and e with modifications. æøå however, are parts of the base alphabet. Just like abc. A keyboard without æøå is assumed not to support Norwegian. I hope things like this will be possible, if a new dictionary format is realized. It is ok if typing for suggests fôr as an alternative, but før should not come up unless the user types f ø r. In which case o must not be suggested... ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue, ß - ss, etc. - there is a set of romanisation rules that can convert any such char to 1 or more roman letters. i was hoping to be even more lenient with ö - o being valid too for the lazy :) japanese has romanisation rules - so does chinese... norwegian must (eg æ - ae for example). if something can be romanised - it can have a romanised match in a dictionary and thus suggest the appropriate matches. of course now the dictionary determines these rules implicitly by content, not by code specifically enforcing such rules. :) but yes - selecting dictionary is needed so selecting a keyboard for that language as well as dictionary is useful. it still adds a few keys - thus squashing the keyboard some more :( i was hoping to avoid that. note - the keyboard is by no means limited to ascii at all - it's perfectly able to have accented/other keys added to layouts - so i'm considering this problem solved as its simply a matter of everyone agreeing to make a .kbd for their language - should they need one other than the default qwerty (ascii) one. so from this point of view - that's solved. what isn't done yet is: 1. a kbd being able to hint at wanting a specific dictionary language (or vice-versa). 2. dictionary itself being able to hint to have a specific kbd layout. 3. applications not being able to hint for a specific language for input (and thus dictionary and/or kbd). so there needs to be a tie-in between language, dict and kbd - which one drives what... is the question. it needs to not BREAK things like terminal kbd etc. - ie i can stay with norwegian ad my language but if i select the terminal kbd - it will stay there and not suddenly flip back to the simple kbd layout. number/symbol entry similarly. this bit of things is currently undefined and unimplemented. the other is improved dictionary format. the problem is - if we go make the dict smarter... how on earth do you GENERATE such a dictionary. i sure as hell am not hand-writing a whole dictionary... and i doubt anyone here will - it could be a large community effort to build a full one for each language - but that will take time. you need to enter all words, all matches, conjugations, and then frequency info too. the simple dict english can use is much easier - it can be auto-generated from input text. just throw a (text version) of a book - or newspaper or documentation - it can just index every word it finds and even count frequency usage. thats easy to automate the production of such a dict (and that is why the
Re: [SHR] illume predictive keyboard is too slow
On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh olo...@gmail.com said: On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Olof Sjobergh wrote: Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). I think that too. Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie the problem here comes with having multiple displays for a single match. let me take japanese as an example (i hope you have the fonts to see this at least - though there is no need to understand beyond knowing that there are a lot of matches that are visibly different): sakana - さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii input string matching one of MANY very different matches. the european case of vogel - Vogel Vögel is a simplified version of the above. the reason i wanted decimation to match a simple roman text (ascii) string is - that this is a pretty universal thing. thats how japanese, chinese and even some korean input methods work. it also works for european languages too. europeans are NOT used to the idea of a dictionary guessing/selecting system when they type - but the asians are. they are always typing and selecting. the smarts come with the dictionary system selecting the right one more often than not by default or the right selection you want being only 1 or 2 keystrokes away. i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. so if you add those keys and use them or flip to another key layout to select them - you get what you expect. but if i am to redo the dict - the api is very generic - just the internals and format need changing to be able to do the above. the cool bit is.. if i manage the above... it has almost solved asian languages too - and input methods... *IF* the vkbd is also able to talk to a complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type chinese characters... :) but in principle the dictionary and lookup scheme will work - its then just mechanics of sending the data to the app in a way it can use it. so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. To not make it too large for example the following syntax could also be accepted: eat; 512 // No candidates, just show the match as is har här hår; 1234// Also show the match itself as a candidate If you think this would be good enough, I could try to implement it. Another problem with languages like Swedish, and also Japanese, is the heavy use of conjugation. For example, in Japanese the verbs 食べる and 考える can both be conjugated in the same way like this: 食べる 食べました 食べた 食べている 食べていた 食べています 食べていまし た考える 考えました 考えた 考えている 考えていた 考えています 考えていま した Another example, the Swedish nouns: bil bilen bilar bilarna bilens bilarnas But including all these forms in a dictionary makes it very large, which is impractical. So some way to indicate possible conjugations would be good, but it would make the dictionary format a lot more complex. the real problem is... how on EARTH will such a dictionary get written? who will write all of that? the advantage to the simple just list lots of words and ALL their forms is easy - it can be generated by
Re: [SHR] illume predictive keyboard is too slow
On Fri, Jan 30, 2009 at 4:25 AM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh olo...@gmail.com said: On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Olof Sjobergh wrote: Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). I think that too. Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie the problem here comes with having multiple displays for a single match. let me take japanese as an example (i hope you have the fonts to see this at least - though there is no need to understand beyond knowing that there are a lot of matches that are visibly different): sakana - さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii input string matching one of MANY very different matches. the european case of vogel - Vogel Vögel is a simplified version of the above. the reason i wanted decimation to match a simple roman text (ascii) string is - that this is a pretty universal thing. thats how japanese, chinese and even some korean input methods work. it also works for european languages too. europeans are NOT used to the idea of a dictionary guessing/selecting system when they type - but the asians are. they are always typing and selecting. the smarts come with the dictionary system selecting the right one more often than not by default or the right selection you want being only 1 or 2 keystrokes away. i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. so if you add those keys and use them or flip to another key layout to select them - you get what you expect. but if i am to redo the dict - the api is very generic - just the internals and format need changing to be able to do the above. the cool bit is.. if i manage the above... it has almost solved asian languages too - and input methods... *IF* the vkbd is also able to talk to a complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type chinese characters... :) but in principle the dictionary and lookup scheme will work - its then just mechanics of sending the data to the app in a way it can use it. so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. To not make it too large for example the following syntax could also be accepted: eat; 512 // No candidates, just show the match as is har här hår; 1234// Also show the match itself as a candidate If you think this would be good enough, I could try to implement it. Another problem with languages like Swedish, and also Japanese, is the heavy use of conjugation. For example, in Japanese the verbs 食べる and 考える can both be conjugated in the same way like this: 食べる 食べました 食べた 食べている 食べていた 食べています 食べていまし た考える 考えました 考えた 考えている 考えていた 考えています 考えていま した Another example, the Swedish nouns: bil bilen bilar bilarna bilens bilarnas But including all these forms in a dictionary makes it very large, which is impractical. So some way to indicate possible conjugations would be good, but it would make the dictionary format a lot more complex. the real problem is... how on EARTH will such a dictionary get written? who will write all of that? the advantage to the
[SHR] illume predictive keyboard is too slow
I tried to write something with the illume keyboard within the SHR unstable and it is too slow to be usable! There is a way to fix it? withing the previous SHR testing it was working quite good! thanks -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wednesday 28 January 2009, Giorgio Marciano wrote: I tried to write something with the illume keyboard within the SHR unstable and it is too slow to be usable! There is a way to fix it? withing the previous SHR testing it was working quite good! That's my UTF8 fix [1] that's causing the slowness, I'm afraid. Unfortunately I'm very very busy ATM and therefore I'm unable to work on it. It could either be the latin - UTF16 code which is slow or another bug I introduced (causing excessive lookups for example). Cheers, Florian [1] http://trac.enlightenment.org/e/changeset/38274 -- DI Florian Hackenberger flor...@hackenberger.at www.hackenberger.at ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 11:53 AM, Florian Hackenberger f.hackenber...@chello.at wrote: That's my UTF8 fix [1] that's causing the slowness, I'm afraid. Unfortunately I'm very very busy ATM and therefore I'm unable to work on it. It could either be the latin - UTF16 code which is slow or another bug I introduced (causing excessive lookups for example). I looked into this issue when my Swedish keyboard didn't work correctly. I found some issues and some parts that could be improved and sent a patch with these fixes to the enlightenment devel list. However, even fixing everything I could find, it's still a bit slow. The problem seems to be the conversion to utf16 for each and every strcmp when doing the lookup. Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Olof Sjobergh wrote: On Wed, Jan 28, 2009 at 11:53 AM, Florian Hackenberger f.hackenber...@chello.at wrote: That's my UTF8 fix [1] that's causing the slowness, I'm afraid. Unfortunately I'm very very busy ATM and therefore I'm unable to work on it. It could either be the latin - UTF16 code which is slow or another bug I introduced (causing excessive lookups for example). I looked into this issue when my Swedish keyboard didn't work correctly. I found some issues and some parts that could be improved and sent a patch with these fixes to the enlightenment devel list. However, even fixing everything I could find, it's still a bit slow. The problem seems to be the conversion to utf16 for each and every strcmp when doing the lookup. Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). The obvious fix is to store the dictionary in such a format that conversions won't be necessary. Not sure why utf16 is being used, utf8 is more compact and works so well for everything else in linux. Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting helge.haft...@hist.no wrote: The obvious fix is to store the dictionary in such a format that conversions won't be necessary. Not sure why utf16 is being used, utf8 is more compact and works so well for everything else in linux. Yes, the obvious fix is to change the dictionary format. However, it's not as simple as you might think. The dictionary today is stored in utf8, not utf16. But the dictionary lookup tries to match words not exactly the same as the input word, for example e should also match é, è and ë. To do this, every character in the input string, and every character of each word, has to be normalised to ascii. Since in utf8 a single character can take up multiple bytes, to normalise a word it's first converted to utf16 where all characters are the same size, and then a simple lookup table can be used for each character. But converting from multibyte format each time a string is compared to another adds overhead. With a different dictionary format where all words are stored already normalised, there would be no need for all the conversions. But then you also have to store all possible conversions for each word, so the format would be more complicated. Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Olof Sjobergh wrote: On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting helge.haft...@hist.no wrote: The obvious fix is to store the dictionary in such a format that conversions won't be necessary. Not sure why utf16 is being used, utf8 is more compact and works so well for everything else in linux. Yes, the obvious fix is to change the dictionary format. However, it's not as simple as you might think. The dictionary today is stored in utf8, not utf16. But the dictionary lookup tries to match words not exactly the same as the input word, for example e should also match é, è and ë. To do this, every I see. This is done to avoid needing a few extra keys for accents and umlauts? Won't that create problems for languages where two words differ only in accents? In Norwegian, there are many such pairs. Examples: for/fôr, tå/ta, dør/dor,... Helge Hafting ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
Olof Sjobergh wrote: Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). I think that too. Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie -- Treviño's World - Life and Linux http://www.3v1n0.net/ ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 5:50 PM, Helge Hafting helge.haft...@hist.no wrote: I see. This is done to avoid needing a few extra keys for accents and umlauts? Won't that create problems for languages where two words differ only in accents? In Norwegian, there are many such pairs. Examples: for/fôr, tå/ta, dør/dor,... Yes, that's a problem I ran into with Swedish as well. We have for example har/här/hår etc. But with a good dictionary it actually works ok, if not optimally. For these words you have to select the one you want from the matches which is a little annoying but not a total show-stopper. To fix it, either you would need different normalisation tables for each language, or a new dictionary format. Raster said in an earlier mail on the list that he'd fix it someday but had a lot of other stuff to look at now. So I guess we have to be patient for now. Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Olof Sjobergh wrote: Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). I think that too. Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie the problem here comes with having multiple displays for a single match. let me take japanese as an example (i hope you have the fonts to see this at least - though there is no need to understand beyond knowing that there are a lot of matches that are visibly different): sakana - さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii input string matching one of MANY very different matches. the european case of vogel - Vogel Vögel is a simplified version of the above. the reason i wanted decimation to match a simple roman text (ascii) string is - that this is a pretty universal thing. thats how japanese, chinese and even some korean input methods work. it also works for european languages too. europeans are NOT used to the idea of a dictionary guessing/selecting system when they type - but the asians are. they are always typing and selecting. the smarts come with the dictionary system selecting the right one more often than not by default or the right selection you want being only 1 or 2 keystrokes away. i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. so if you add those keys and use them or flip to another key layout to select them - you get what you expect. but if i am to redo the dict - the api is very generic - just the internals and format need changing to be able to do the above. the cool bit is.. if i manage the above... it has almost solved asian languages too - and input methods... *IF* the vkbd is also able to talk to a complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type chinese characters... :) but in principle the dictionary and lookup scheme will work - its then just mechanics of sending the data to the app in a way it can use it. so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler ras...@rasterman.com wrote: On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net said: Olof Sjobergh wrote: Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). I think that too. Maybe using something like a trie [1] to archive the words could help (both for words matching and for compressing the dictionary). Too hard? [1] http://en.wikipedia.org/wiki/Trie the problem here comes with having multiple displays for a single match. let me take japanese as an example (i hope you have the fonts to see this at least - though there is no need to understand beyond knowing that there are a lot of matches that are visibly different): sakana - さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii input string matching one of MANY very different matches. the european case of vogel - Vogel Vögel is a simplified version of the above. the reason i wanted decimation to match a simple roman text (ascii) string is - that this is a pretty universal thing. thats how japanese, chinese and even some korean input methods work. it also works for european languages too. europeans are NOT used to the idea of a dictionary guessing/selecting system when they type - but the asians are. they are always typing and selecting. the smarts come with the dictionary system selecting the right one more often than not by default or the right selection you want being only 1 or 2 keystrokes away. i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as possible - so you can just type and it will work and offer the selections as it's trying to guess anyway - it can present the multiple accented versions too. this limits the need for special keyboards - doesn't obviate it, but allows more functionality out of the box. in the event users explicitly select an accented char - ie a non-ascii character, it should not decimate. it should try match exactly that char. so if you add those keys and use them or flip to another key layout to select them - you get what you expect. but if i am to redo the dict - the api is very generic - just the internals and format need changing to be able to do the above. the cool bit is.. if i manage the above... it has almost solved asian languages too - and input methods... *IF* the vkbd is also able to talk to a complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type chinese characters... :) but in principle the dictionary and lookup scheme will work - its then just mechanics of sending the data to the app in a way it can use it. so back to the trie... the trie would only be useful for the ascii matching - i need something more complex. it just combines the data with the match tree (letters are inline). i need a match tree + lookup table to other matches to display - and possibly several match entries (all the matches to display also need to be in the tree pointing to a smaller match list). -- - Codito, ergo sum - I code, therefore I am -- The Rasterman (Carsten Haitzler)ras...@rasterman.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. To not make it too large for example the following syntax could also be accepted: eat; 512 // No candidates, just show the match as is har här hår; 1234// Also show the match itself as a candidate If you think this would be good enough, I could try to implement it. Another problem with languages like Swedish, and also Japanese, is the heavy use of conjugation. For example, in Japanese the verbs 食べる and 考える can both be conjugated in the same way like this: 食べる 食べました 食べた 食べている 食べていた 食べています 食べていました 考える 考えました 考えた 考えている 考えていた 考えています 考えていました Another example, the Swedish nouns: bil bilen bilar bilarna bilens bilarnas But including all these forms in a dictionary makes it very large, which is impractical. So some way to indicate possible conjugations would be good, but it would make the dictionary format a lot more complex. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community