Re: [SHR] illume predictive keyboard is too slow

2009-02-06 Thread Helge Hafting
Carsten Haitzler (The Rasterman) wrote:
 On Thu, 05 Feb 2009 15:57:41 +0100 Helge Hafting helge.haft...@hist.no said:
 
 Carsten Haitzler (The Rasterman) wrote:

 Surely, when there is a kayboard anyway, a couple of extra keys won't
 cost much. Not if they are on all phones, instead of only adapted 
 ones. The americans can use the extras as application hotkeys.
 oh its not the extra keys - its the variations in production. 
 I know. Which is why I suggest one single keyboard for all, with
 the maximum amount of keys instead of the minimum. That way, every
 language (at least every latin-based language) can have a normal keyboard.

 No problem for the english - it will work fine. Their extra keys can be 
 blank, or used as hotkeys. Users with other languages can add whatever 
 they need - and in the correct location too.
 
 that's not practical. have you SEEN all the accented characters available? its
 moe than going to double the # of chars in a kbd. otherwise you then need a
 compose mode where multiple keystrokes gives you æ or ø or ü or ñ etc. and its
 a combo you need to learn. you still need to offer all the accents then on 
 such
 a kbd. like ~^'`,* (ãâáàäąå) which drastically will cramp the keyboard or 
 make
 it yet another row bigger for everyone. (in addition to some form of compose
 key and specific compose logic).

Have you seen the various european layouts? None of the lating-based 
keyboards have more than a handful of keys more than the english 
keyboard. (Those with bucketloads of accents use a dead-key approach,
press  then o to get ö and so on.)

So no need for a seriously cramped keyboard. Of course different
languages will mostly re-use the same keys, so you don't need a key for 
every possible letter. Only one key for each nonascii people expect to 
find on a keyboard adapted to their language. Look at the various 
keyboard layouts, pick the one with the most extras and you know how 
many keys are needed. Perhaps a few more keys than that, as some add 
extra keys in different places. But not many more. European pc keyboards 
tend to have 2 keys more than american, the rest is done by shift states 
and /or dead keys. (Things like []/? aren't directly accessible on a 
Norwegian keyboard, unlike american keyboards. One mechanical layout 
works for all of europe, you just have different keycaps. And of course 
the american layout works too - they get two do-nothing keys thats all.

So, a keyboard with slightly more keys than what is needed for ascii 
will be enough for all languages that extend the latin alphabet.

Some differently painted keytops will be needed, but that can be left to 
the various national importers (for a mass-produced device) or to the 
customers for a phone made in small series.

Helge Hafting




___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-06 Thread The Rasterman
On Fri, 06 Feb 2009 13:20:53 +0100 Helge Hafting helge.haft...@hist.no said:

 Carsten Haitzler (The Rasterman) wrote:
  On Thu, 05 Feb 2009 15:57:41 +0100 Helge Hafting helge.haft...@hist.no
  said:
  
  Carsten Haitzler (The Rasterman) wrote:
 
  Surely, when there is a kayboard anyway, a couple of extra keys won't
  cost much. Not if they are on all phones, instead of only adapted 
  ones. The americans can use the extras as application hotkeys.
  oh its not the extra keys - its the variations in production. 
  I know. Which is why I suggest one single keyboard for all, with
  the maximum amount of keys instead of the minimum. That way, every
  language (at least every latin-based language) can have a normal keyboard.
 
  No problem for the english - it will work fine. Their extra keys can be 
  blank, or used as hotkeys. Users with other languages can add whatever 
  they need - and in the correct location too.
  
  that's not practical. have you SEEN all the accented characters available?
  its moe than going to double the # of chars in a kbd. otherwise you then
  need a compose mode where multiple keystrokes gives you æ or ø or ü or ñ
  etc. and its a combo you need to learn. you still need to offer all the
  accents then on such a kbd. like ~^'`,* (ãâáàäąå) which drastically will
  cramp the keyboard or make it yet another row bigger for everyone. (in
  addition to some form of compose key and specific compose logic).
 
 Have you seen the various european layouts? None of the lating-based 
 keyboards have more than a handful of keys more than the english 
 keyboard. (Those with bucketloads of accents use a dead-key approach,
 press  then o to get ö and so on.)
 
 So no need for a seriously cramped keyboard. Of course different
 languages will mostly re-use the same keys, so you don't need a key for 
 every possible letter. Only one key for each nonascii people expect to 
 find on a keyboard adapted to their language. Look at the various 
 keyboard layouts, pick the one with the most extras and you know how 
 many keys are needed. Perhaps a few more keys than that, as some add 
 extra keys in different places. But not many more. European pc keyboards 
 tend to have 2 keys more than american, the rest is done by shift states 
 and /or dead keys. (Things like []/? aren't directly accessible on a 
 Norwegian keyboard, unlike american keyboards. One mechanical layout 
 works for all of europe, you just have different keycaps. And of course 
 the american layout works too - they get two do-nothing keys thats all.

thats because they use composition. as i said above. and as i said if 1
keyboard were to cover ALL of them it'd be BIG (in key count). as such each
european kbd covers just the language it intends to cover - thus limiting
extras.

 So, a keyboard with slightly more keys than what is needed for ascii 
 will be enough for all languages that extend the latin alphabet.
 
 Some differently painted keytops will be needed, but that can be left to 
 the various national importers (for a mass-produced device) or to the 
 customers for a phone made in small series.
 
 Helge Hafting
 
 
 


-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread Helge Hafting
Carsten Haitzler (The Rasterman) wrote:

 Surely, when there is a kayboard anyway, a couple of extra keys won't
 cost much. Not if they are on all phones, instead of only adapted 
 ones. The americans can use the extras as application hotkeys.
 
 oh its not the extra keys - its the variations in production. 
I know. Which is why I suggest one single keyboard for all, with
the maximum amount of keys instead of the minimum. That way, every
language (at least every latin-based language) can have a normal keyboard.

No problem for the english - it will work fine. Their extra keys can be 
blank, or used as hotkeys. Users with other languages can add whatever 
they need - and in the correct location too.


 just a change in printing whats on the keys is not free. software keyboards 
 are
A project like openmoko have the option of leaving that to the users.
Supply a sheet with small letter stickers for all languages, and a 
printed sheet with where each letter normally go for the 
software-supported languages.
[...]

 it sucks. but english is the lowest common denominator and thus most things
 tend to be built to support it - as   it tends to keep more people happier
 than  some other setup. if there was enough volume to make enough units for a
 particular language/locale/country - it'd be different. :)

I understand that there is little interest in making a phone 
specifically for Norway, when the volumes are low. That doesn't mean the 
lowest common is the best way. A keyboard with several extra blank keys,
and the english qwerty printed on the keytops will work fine for the 
large group of english-language users.  Norwegians like me will simply 
put the ø and æ stickers on the 2 keys to the right of l, and the 
å sticker on the key to the right of p.

Those who want a radically different layout, such as dvorak, take the 
lid off and carefully rearrange the keytops. Same for azerty-layouts.

Programmable is better, but if someone wants real keys that depress with 
a click, then this is possible too.

Helge Hafting

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread Johny Tenfinger
On Thu, Feb 5, 2009 at 16:48, Laszlo KREKACS
laszlo.krekacs.l...@gmail.com wrote:
 I simply confirmed the same problem exists for other language too.

In polish, we are often communicating on IMs, SMSes, IRC, chats etc.
without polish accents (ą-a; ę-e; ó [which is pronounced as u]-o;
ś-s; ł-l; ż-z; ź-z; ć-c; ń-n). In SMS to have more chars in one
message; in IRC/IMs to type faster or to ask, how to set polish
keyboard layout in Linux ;D And some words without accents means
differently after dropping accents (for example laska vs. łaska),
but we don't have problems with communicating in that way.

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread Laszlo KREKACS
2009/2/5 Johny Tenfinger seba.d...@gmail.com:
 but we don't have problems with communicating in that way.

Unless you want to write (semi)official document.
(like writing email to your boss, etc)

Best regards,
Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread Johny Tenfinger
On Thu, Feb 5, 2009 at 17:30, Laszlo KREKACS
laszlo.krekacs.l...@gmail.com wrote:
 Unless you want to write (semi)official document.
 (like writing email to your boss, etc)

Then simply switch to terminal-based keyboard without dictionary and
with accents on right alt key (like in PC keyboard layout).

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread The Rasterman
On Thu, 5 Feb 2009 16:59:45 +0100 Johny Tenfinger seba.d...@gmail.com said:

 On Thu, Feb 5, 2009 at 16:48, Laszlo KREKACS
 laszlo.krekacs.l...@gmail.com wrote:
  I simply confirmed the same problem exists for other language too.
 
 In polish, we are often communicating on IMs, SMSes, IRC, chats etc.
 without polish accents (ą-a; ę-e; ó [which is pronounced as u]-o;
 ś-s; ł-l; ż-z; ź-z; ć-c; ń-n). In SMS to have more chars in one
 message; in IRC/IMs to type faster or to ask, how to set polish
 keyboard layout in Linux ;D And some words without accents means
 differently after dropping accents (for example laska vs. łaska),
 but we don't have problems with communicating in that way.

cool. so you can survive. if an engine let you lazy-type without chosing
accents and put them in for you (or you could type a whole word then just
select the one accented correctly) then this may save you time.

-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread The Rasterman
On Thu, 05 Feb 2009 15:57:41 +0100 Helge Hafting helge.haft...@hist.no said:

 Carsten Haitzler (The Rasterman) wrote:
 
  Surely, when there is a kayboard anyway, a couple of extra keys won't
  cost much. Not if they are on all phones, instead of only adapted 
  ones. The americans can use the extras as application hotkeys.
  
  oh its not the extra keys - its the variations in production. 
 I know. Which is why I suggest one single keyboard for all, with
 the maximum amount of keys instead of the minimum. That way, every
 language (at least every latin-based language) can have a normal keyboard.
 
 No problem for the english - it will work fine. Their extra keys can be 
 blank, or used as hotkeys. Users with other languages can add whatever 
 they need - and in the correct location too.

that's not practical. have you SEEN all the accented characters available? its
moe than going to double the # of chars in a kbd. otherwise you then need a
compose mode where multiple keystrokes gives you æ or ø or ü or ñ etc. and its
a combo you need to learn. you still need to offer all the accents then on such
a kbd. like ~^'`,* (ãâáàäąå) which drastically will cramp the keyboard or make
it yet another row bigger for everyone. (in addition to some form of compose
key and specific compose logic).

i am not saying to do it - but to me that seems he job of specialised keyboards
per language, not a universal one.

  just a change in printing whats on the keys is not free. software keyboards
  are
 A project like openmoko have the option of leaving that to the users.
 Supply a sheet with small letter stickers for all languages, and a 
 printed sheet with where each letter normally go for the 
 software-supported languages.
 [...]
 
  it sucks. but english is the lowest common denominator and thus most
  things tend to be built to support it - as   it tends to keep more people
  happier than  some other setup. if there was enough volume to make enough
  units for a particular language/locale/country - it'd be different. :)
 
 I understand that there is little interest in making a phone 
 specifically for Norway, when the volumes are low. That doesn't mean the 
 lowest common is the best way. A keyboard with several extra blank keys,
 and the english qwerty printed on the keytops will work fine for the 
 large group of english-language users.  Norwegians like me will simply 
 put the ø and æ stickers on the 2 keys to the right of l, and the 
 å sticker on the key to the right of p.
 
 Those who want a radically different layout, such as dvorak, take the 
 lid off and carefully rearrange the keytops. Same for azerty-layouts.
 
 Programmable is better, but if someone wants real keys that depress with 
 a click, then this is possible too.

this would be the best solution - if its a hardware keyboard. a software
keyboard is always the most flexible... but its currently also probably the
least usable and not just because of software, but the fact a resistive screen
only accepts  touch at a time and typing will mean commonly 2 touches at a time
(as you press the new key before you release your finger on the old one). :(


-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-05 Thread The Rasterman
On Thu, 5 Feb 2009 16:48:36 +0100 Laszlo KREKACS
laszlo.krekacs.l...@gmail.com said:

 2009/2/5 The Rasterman Carsten Haitzler ras...@rasterman.com:
  But there are other cases, where it is not that clear:
  ólt - pound (accusative)
  ölt - he killed ...
  olt - to graft
 
  sure.. maybe being an english speaker.. this doesn't bother me so much as
  english is full of such words... 1 word can have 2 or 3 or even more very
  different meanings. written the same way. only context lets you figure it
  out. so to me i go so.. what's the problem? :)
 
 Sure, many words can have different meanings. But you missed the point.
 
 When english has multiple meanings of a word, you pronounce the same
 way, it is the same word.
 But with accents, you pronounce very differently because it is not the
 same word!

actually... no. there are cases where 1 word, written 1 way can have multiple
meanings and pronounced multiple ways... some examples:

row, wind, lead

use:

i had a row on the lake! - ambiguous meaning when written. could mean you
rowed a boat on the lak, or had an argument on the lake. pronunciation is
different in the 2 row's, but when written, it's the same.

 The correct analogy for english would be:
 Lets assume the character 'v' is just an accented version of character 'n'.
 Now when you want to write vice president, you always and up with
 nice president.
 See the difference?
 
 Better example: merge the character e with a. I think you get the idea...
 ((
 Battar axampla: marga tha charactar a with a. I think you gat tha idaa...
 Can you decrypt? Sure. By computer? Maybe. Was nice to read? I highly doubt
 it. ))
 
  i don't have the bandwidth to go solving every language on the planet's
  input problems.
 
 I didnt ask you to do so.
 I said, you cant just ignore the accents, because, most of the time,
 it is not a modifier of a char but a whole another character.
 
 It is the same case what Helge at the beginning said for norwegian
 language (for/fôr, tå/ta).
 
 I simply confirmed the same problem exists for other language too.

well hungarian created a more complex case with compound words that go well
beyond what german does. thats the problem :(

-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-04 Thread Laszlo KREKACS
Hi!

 ok - so if a young person typed:
 Öt szép szűz
 it'd be:
 Ot szep szuz

((btw, the meaning of Öt szép szűz lány őrült írót nyúz is
Five virgins tire a crazy writer.
It is the hungarian synonym of The quick brown fox jumps over the lazy dog))


Yes, and in that specific case works.
(because none of the above words (Ot, szep, szuz) has a meaning in
hungarian language, so you can understand that example without
accent.)

But there are other cases, where it is not that clear:
ólt - pound (accusative)
ölt - he killed ...
olt - to graft

So when you see olt in the text you cant be sure it is olt, ólt
or ölt without analysing the whole sentence.

The german example is two-way conversion: ü - ue, ß - ss. You can
switch back and for
without losing additional information.

 A simple word based dictionary is limited anyway for the hungarian
 language, where you can create a word as long as this:
 elkelkáposztástalaníthatatlanságoskodásaitokért.

 ugh. so its like german. compound words get created a lot by just stringing
 multiple words together without a space. that's ok- as long as there arent a
 massive set of them... :)


But there are. Because this language is agglutinative.
I explain a bit the difficulty.

In german you can create the following word:
wood [en] - Holz [de] - fa [hu]
house [en] - Haus [de] - ház [hu]

wood house [en] - Holzhaus [de] - faház [hu]

So you glued together house and wood in one word.
(this is your example: stringing together without space)

In german you can even create words of one verb plus a modifier, like:
to work [en], arbeiten [de], dolgoz [hu]
to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu]

It is the same process;) There are many example of this:
to link together[en], anschliessen (an+schliessen) [de] - összekapcsol
(össze+kapcsol) [hu],
to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu]

But in hungarian language, we glue together everything, some example:
in house [en], im Haus [de], házban (ház+ban) [hu]
car [en], Wagen [de], kocsi [hu]
our car [en], unseren Wagen (unser+en Wagen) [de], kocsinkat
(kocsi+(u/ü)nk+(a/á/e/é)t) [hu]

So the possibilities are nearly infinite.
Without analysing the sentence and the word, you cant find the root
word with correct accent.

And finding the root word requires a spell checker (the best available
is hunspell for the hungarian language)

Summary:
- Losing the accents (in hungarian) most of the time results in contradiction.
- Need a spell checker to suggesting the right accented word.
(see: http://hunspell.sourceforge.net/)

So creating an architecture for spell checker is not a bad idea (for
future extensibility).
It could be handy for english too. But for other language (ex:
hungarian) maybe essential.

Sorry for being so tiresome.

Best regards,
 Khiraly

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: OT: [SHR] illume predictive keyboard is too slow

2009-02-04 Thread Marcel
I zapped into this thread because it was only one new mail in the om-community 
folder and clicking it was the simplest way to mark it as read. Somehow I got 
curious what that strange (hungarian) sentence has to do with om and found a 
nice pack of information about your (?) language... Very interesting mail, 
that's what I love the free software world for. :)

--
Marcel

Am Wednesday 04 February 2009 16:37:56 schrieb Laszlo KREKACS:
 Hi!

  ok - so if a young person typed:
  Öt szép szűz
  it'd be:
  Ot szep szuz

 ((btw, the meaning of Öt szép szűz lány őrült írót nyúz is
 Five virgins tire a crazy writer.
 It is the hungarian synonym of The quick brown fox jumps over the lazy
 dog))


 Yes, and in that specific case works.
 (because none of the above words (Ot, szep, szuz) has a meaning in
 hungarian language, so you can understand that example without
 accent.)

 But there are other cases, where it is not that clear:
 ólt - pound (accusative)
 ölt - he killed ...
 olt - to graft

 So when you see olt in the text you cant be sure it is olt, ólt
 or ölt without analysing the whole sentence.

 The german example is two-way conversion: ü - ue, ß - ss. You can
 switch back and for
 without losing additional information.

  A simple word based dictionary is limited anyway for the hungarian
  language, where you can create a word as long as this:
  elkelkáposztástalaníthatatlanságoskodásaitokért.
 
  ugh. so its like german. compound words get created a lot by just
  stringing multiple words together without a space. that's ok- as long as
  there arent a massive set of them... :)

 But there are. Because this language is agglutinative.
 I explain a bit the difficulty.

 In german you can create the following word:
 wood [en] - Holz [de] - fa [hu]
 house [en] - Haus [de] - ház [hu]

 wood house [en] - Holzhaus [de] - faház [hu]

 So you glued together house and wood in one word.
 (this is your example: stringing together without space)

 In german you can even create words of one verb plus a modifier, like:
 to work [en], arbeiten [de], dolgoz [hu]
 to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu]

 It is the same process;) There are many example of this:
 to link together[en], anschliessen (an+schliessen) [de] - összekapcsol
 (össze+kapcsol) [hu],
 to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu]

 But in hungarian language, we glue together everything, some example:
 in house [en], im Haus [de], házban (ház+ban) [hu]
 car [en], Wagen [de], kocsi [hu]
 our car [en], unseren Wagen (unser+en Wagen) [de], kocsinkat
 (kocsi+(u/ü)nk+(a/á/e/é)t) [hu]

 So the possibilities are nearly infinite.
 Without analysing the sentence and the word, you cant find the root
 word with correct accent.

 And finding the root word requires a spell checker (the best available
 is hunspell for the hungarian language)

 Summary:
 - Losing the accents (in hungarian) most of the time results in
 contradiction. - Need a spell checker to suggesting the right accented
 word.
 (see: http://hunspell.sourceforge.net/)

 So creating an architecture for spell checker is not a bad idea (for
 future extensibility).
 It could be handy for english too. But for other language (ex:
 hungarian) maybe essential.

 Sorry for being so tiresome.

 Best regards,
  Khiraly

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-04 Thread The Rasterman
On Wed, 4 Feb 2009 16:37:56 +0100 Laszlo KREKACS
laszlo.krekacs.l...@gmail.com said:

 Hi!
 
  ok - so if a young person typed:
  Öt szép szűz
  it'd be:
  Ot szep szuz
 
 ((btw, the meaning of Öt szép szűz lány őrült írót nyúz is
 Five virgins tire a crazy writer.
 It is the hungarian synonym of The quick brown fox jumps over the lazy dog))
 
 
 Yes, and in that specific case works.
 (because none of the above words (Ot, szep, szuz) has a meaning in
 hungarian language, so you can understand that example without
 accent.)
 
 But there are other cases, where it is not that clear:
 ólt - pound (accusative)
 ölt - he killed ...
 olt - to graft

sure.. maybe being an english speaker.. this doesn't bother me so much as
english is full of such words... 1 word can have 2 or 3 or even more very
different meanings. written the same way. only context lets you figure it out.
so to me i go so.. what's the problem? :)

 So when you see olt in the text you cant be sure it is olt, ólt
 or ölt without analysing the whole sentence.
 
 The german example is two-way conversion: ü - ue, ß - ss. You can
 switch back and for
 without losing additional information.

yup. as i speak german i have been using it as an example :)

  A simple word based dictionary is limited anyway for the hungarian
  language, where you can create a word as long as this:
  elkelkáposztástalaníthatatlanságoskodásaitokért.
 
  ugh. so its like german. compound words get created a lot by just stringing
  multiple words together without a space. that's ok- as long as there arent a
  massive set of them... :)
 
 
 But there are. Because this language is agglutinative.
 I explain a bit the difficulty.
 
 In german you can create the following word:
 wood [en] - Holz [de] - fa [hu]
 house [en] - Haus [de] - ház [hu]
 
 wood house [en] - Holzhaus [de] - faház [hu]
 
 So you glued together house and wood in one word.
 (this is your example: stringing together without space)
 
 In german you can even create words of one verb plus a modifier, like:
 to work [en], arbeiten [de], dolgoz [hu]
 to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu]
 
 It is the same process;) There are many example of this:
 to link together[en], anschliessen (an+schliessen) [de] - összekapcsol
 (össze+kapcsol) [hu],
 to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu]
 
 But in hungarian language, we glue together everything, some example:
 in house [en], im Haus [de], házban (ház+ban) [hu]
 car [en], Wagen [de], kocsi [hu]
 our car [en], unseren Wagen (unser+en Wagen) [de], 
 (kocsi+(u/ü)nk+(a/á/e/é)t) [hu]
 
 So the possibilities are nearly infinite.
 Without analysing the sentence and the word, you cant find the root
 word with correct accent.

oh dear. so you basically take the idea and run with it. nuts! like asian
langs... they dont even know what space is! :) (by asian i mean korean,
chinese, japanese).

 And finding the root word requires a spell checker (the best available
 is hunspell for the hungarian language)
 
 Summary:
 - Losing the accents (in hungarian) most of the time results in contradiction.
 - Need a spell checker to suggesting the right accented word.
 (see: http://hunspell.sourceforge.net/)
 
 So creating an architecture for spell checker is not a bad idea (for
 future extensibility).
 It could be handy for english too. But for other language (ex:
 hungarian) maybe essential.

originally i wanted to actually use aspell to do this... for the vkbd... but its
api just didnt cut it. i was wanting to re-use as much as possible, but
submitting the totally misspelt word on the kbd just doesnt get you results in a
spellchecker. (i hand created some and fed them to aspell to see what it did
and it just was useless). they are used to 1 or 2 errors of certain kinds
- maybe 3. but when every letter is totally wrong you need an exhaustive search
through permutations. :(

when kocsinkat is the word you wanted... but you actually typed
opdsomlsr ... try get a speller to fix that! interestingly enough at least
the english equivalenets: wanted foolhardy and i actually typed
gioljsefu... illume can and will correct it to foolhardy... probably as the
top or one of the top suggestions... whic is a far cry better than what aspell
can dream of doing. it DOEs have a limit that exactly the same number of chars
in the desired word need to exist as the matches - but for now, lets assume you
hit the kbd the right number of times and its really just screen/finger accuracy
fixing.

i can't begin to imagine the permutation searches needed for hungarian as
either you put all permutations in the dictionary of all words, (for german
it's doable - seemingly not for hungarian), or you need to start trying all
sorts of permutations of multiple words string together for matches... man
thats going to be nastiness. to be honest. i really can't see it being possible
to solve this without a lot of work. i don't have the bandwidth to go solving
every language 

Re: [SHR] illume predictive keyboard is too slow

2009-02-03 Thread Laszlo KREKACS

 1. norwegian does allow for conversion to roman-only text. there are rules 
 much
 like german.
 2. this conversion isn't used much and is a last resort thing.
 3. only a few special letters are needed for common use cases in addition to
 latin


Hi!

I just giving you some perspective;)

In Hungary the situation is much like the norvegian.
We have two special accented character (ő,ű) which is not used in any
other language all the other accent are present in the latin-1 char
set (we use the latin-2 charset).

In the early computer era ő was matched to õ and ű was matched to û so
even the early microsoft word didnt care about those special
characters (and used latin-1 charset instead).
But it is a history now thanks to utf-8 (but is still a nightmare the
accented filenames, especially when restoring broken harddrives;)

There is no romanization here, but young people/computer addicts
tend to type without accents, but you cant decrypt it by words, you
need to understand the whole sentence. So simple word correction are
not working.

It is not like in Germany where you can write Tschüß, as Tschuess.

So there was developed a standard (which are not used anymore, as
there are no problem with accents nowaday), where all the accented
characters are written using a char plus a punctuation.
I give you an example:
Öt szép szűz lány őrült írót nyúz
O:t sze'p szuz la'ny oru:lt i'ro't nyu'z.

Maybe you can use this idea. Or ignore utf-8 and use the corresponding
is8859-1,2 etc charset, where  one character is one byte.

A simple word based dictionary is limited anyway for the hungarian
language, where you can create a word as long as this:
elkelkáposztástalaníthatatlanságoskodásaitokért.

Hope it helps something.

Best regards,
 Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-03 Thread Helge Hafting
Carsten Haitzler (The Rasterman) wrote:
[...]
 yeah. this is one reason i want toi understand how it works without ø, æ etc. 
 -
 one day there will be a phone with a kbd.. and it wont have a version per
 language because the # of users in norway are too small to warrant a special
 production run for them - same for germany, france etc. etc. - until you have
 the sales numbers to justify that.. you need a way to either work around it by
 ignoring them - or have software correct it. so software that works eventually
 with a hw kbd and inserts the right ø, æ etc. based off normal a-z typing...
 would be useful.

If we someday get an open phone with a keyboard, then I hope they are 
smart enough to make enough keys. (In my case, both the q row and the a 
row needs 11 keys) No problem if the keytops are painted with an english
layout - I can paint. As long as they don't let the top row end in p...

Surely, when there is a kayboard anyway, a couple of extra keys won't
cost much. Not if they are on all phones, instead of only adapted 
ones. The americans can use the extras as application hotkeys.

Another approach - let the keyboard be an extra touchscreen that is 
wide - in the shape of a keyboard. Then we can program the kayboard like 
we can today. Of course this keyboard-screen can be cheaper - 
monochrome, low resolution, maybe no backlight.

 i just want to understand the constraints of the languages i don't know - and
 how they are used. it gives me insight into how to solve the problem on a 
 wider
 picture. thanks for the info.
 
Glad to be of help.

Helge Hafting


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-03 Thread The Rasterman
On Tue, 03 Feb 2009 18:28:49 +0100 Helge Hafting helge.haft...@hist.no said:

 Carsten Haitzler (The Rasterman) wrote:
 [...]
  yeah. this is one reason i want toi understand how it works without ø, æ
  etc. - one day there will be a phone with a kbd.. and it wont have a
  version per language because the # of users in norway are too small to
  warrant a special production run for them - same for germany, france etc.
  etc. - until you have the sales numbers to justify that.. you need a way to
  either work around it by ignoring them - or have software correct it. so
  software that works eventually with a hw kbd and inserts the right ø, æ
  etc. based off normal a-z typing... would be useful.
 
 If we someday get an open phone with a keyboard, then I hope they are 
 smart enough to make enough keys. (In my case, both the q row and the a 
 row needs 11 keys) No problem if the keytops are painted with an english
 layout - I can paint. As long as they don't let the top row end in p...
 
 Surely, when there is a kayboard anyway, a couple of extra keys won't
 cost much. Not if they are on all phones, instead of only adapted 
 ones. The americans can use the extras as application hotkeys.

oh its not the extra keys - its the variations in production. the moment you
have a variation (with different # of keys, different layout of them) you have
a change in plastic mould - thats costly (if doing things via molds a new mold
costs upward of $US 60,000-100,000 or more). so if all you have is 500
customers in that country - that'd be an up-front cost of maybe 100k to just
supply that market, and then for 500 people - IF you sell that many, it'd be
$12-$20 extra per unit in costs. to cover the risk of not selling all your
production you may have to raise retail prices by $50-$100 more over the mass
produced item. now imagine its only 100 customers in that region, or 50.

just a change in printing whats on the keys is not free. software keyboards are
by far the cheaper option :) but if a hardware keyboard is there - changes of
lots of variations per locale being around, unless you sell the kind of volume
nokia do, is slim to none. :(

 Another approach - let the keyboard be an extra touchscreen that is 
 wide - in the shape of a keyboard. Then we can program the kayboard like 
 we can today. Of course this keyboard-screen can be cheaper - 
 monochrome, low resolution, maybe no backlight.

of course! i've actually mulled this idea with a clear plastic overlay that
contains the mechanical contacts (done in a way that they dont obscure the
middle of the key) and just have a normal lcd under it... have an extra lcd or
just a bigger single lcd shared with the main one... :) thus a
soft-hard-keyboard happens. as long as the # of buttons are ok (you can cover
most use cases with the buttons there) then software can vary the painting
and layout runtime. this might be the best middleground solution for a
hardware keyboard for when low volume productions limit the ability to have
custom molds/paint runs due to the small customer bases per locale.

it sucks. but english is the lowest common denominator and thus most things
tend to be built to support it - as   it tends to keep more people happier
than  some other setup. if there was enough volume to make enough units for a
particular language/locale/country - it'd be different. :)

  i just want to understand the constraints of the languages i don't know -
  and how they are used. it gives me insight into how to solve the problem on
  a wider picture. thanks for the info.
  
 Glad to be of help.
 
 Helge Hafting
 


-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-03 Thread The Rasterman
On Tue, 3 Feb 2009 17:36:26 +0100 Laszlo KREKACS
laszlo.krekacs.l...@gmail.com said:

 
  1. norwegian does allow for conversion to roman-only text. there are rules
  much like german.
  2. this conversion isn't used much and is a last resort thing.
  3. only a few special letters are needed for common use cases in addition
  to latin
 
 
 Hi!
 
 I just giving you some perspective;)
 
 In Hungary the situation is much like the norvegian.
 We have two special accented character (ő,ű) which is not used in any
 other language all the other accent are present in the latin-1 char
 set (we use the latin-2 charset).
 
 In the early computer era ő was matched to õ and ű was matched to û so
 even the early microsoft word didnt care about those special
 characters (and used latin-1 charset instead).
 But it is a history now thanks to utf-8 (but is still a nightmare the
 accented filenames, especially when restoring broken harddrives;)
 
 There is no romanization here, but young people/computer addicts
 tend to type without accents, but you cant decrypt it by words, you
 need to understand the whole sentence. So simple word correction are
 not working.
 
 It is not like in Germany where you can write Tschüß, as Tschuess.

ok - so if a young person typed:
Öt szép szűz
it'd be:
Ot szep szuz

right?

 So there was developed a standard (which are not used anymore, as
 there are no problem with accents nowaday), where all the accented
 characters are written using a char plus a punctuation.
 I give you an example:
 Öt szép szűz lány őrült írót nyúz
 O:t sze'p szuz la'ny oru:lt i'ro't nyu'z.
 
 Maybe you can use this idea. Or ignore utf-8 and use the corresponding
 is8859-1,2 etc charset, where  one character is one byte.

nah. this heavily precludes expansion. 1 byte is 256 chars. try cram in
russian, greek, thai, hindi.. etc. into that space. not going to work. so you
keep flipign charsets and have special code per charset... no thanks :)

 A simple word based dictionary is limited anyway for the hungarian
 language, where you can create a word as long as this:
 elkelkáposztástalaníthatatlanságoskodásaitokért.

ugh. so its like german. compound words get created a lot by just stringing
multiple words together without a space. that's ok- as long as there arent a
massive set of them... :)

 Hope it helps something.
 
 Best regards,
  Laszlo
 
 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow - Usability features

2009-02-03 Thread The Rasterman
On Tue, 03 Feb 2009 20:15:53 +0100 Marco Trevisan (Treviño) m...@3v1n0.net
said:

 Carsten Haitzler (The Rasterman) wrote:
  On Mon, 02 Feb 2009 21:53:26 +0100 Marco Trevisan (Treviño)
  m...@3v1n0.net said:
  However in the past days I sent you privately also a mail about some
  issues of the keyboard in latest e17 svn [1], but I got no answer.
  Maybe the mail wasn't sent correctly?!
  
  got it - i just tend to ignore some of my mailboxes for a while and cycle
  around to them... got a lot of email here :) i'll get back to you on it. it
  just is that kbd isnt a focus at the moment so it tends to take a
  back-burner position.
 
 Ah, ok... It's understandable...
 
  They seem unrelated, but why not workarounding them by allowing these
  actions only after a small timeout (i.e. waiting few ms from the latest
  char pressure)?
  
  so  lets say 0.4 sec after the last keyboard key press it will allow for
  swipes and match hits etc. that could be done. again - tuning a timing
  value. will people then complain that :i often try and swipe or hit a
  match and it doesnt respond. i need to do it again?. h.
 
 Maybe 0.4 seconds is too much. I think that we could use a lower value
 too. And maybe configurable directly from the keyboard (also if I don't
 think that this is needed at all).
 
  Generally you never confirm a word or switch keyboard as fast as you
  type over a char (since typing can be un-precise thanks to the keyboard
  correction, switching a keyboard or selecting a word must be precise)...
  
  correct. it's a fine line to walk tho - as above :)
  
  And... What about making the horizontal word list (the one over the
  keys) scrollable [right-left] as the configuration toolbar is? Would it
  require more computation? I figure that that could improve the usability.
  
  no - it'd be not much of a problem - i just didnt do it. :)
 
 Ok, so please put it in your/illume TODO :P

:)

  nb - i can see why you often hit a match word. your kbd layout doesnt have
  padding ABOVE the qwerty line like the default does... :)
 
 Yes. That's true. But people could have also keyboards with more keys
 than the mine (see Norwegians :P), and make the words-list and the keys
 closer.
 The fact is that also using the default qwerty keyboard, that has more
 padding, it could happen to hit a word if you're writing while
 walking/driving[ehm... :P]/talking (or simply writing quickly)...
 Don't you agree?

oh indeed it can happen.. but its much less likely - it never has happened to
me, thus why i think its probably ok. but with that padding removed on your
kbd, i can see how it becomes much more probable you hit these things at the
top. you should just add more padding :) the problem is the kbd wont resize per
layout atm so the default determines the size so if yours is the default then
just make sure it has padding and the problem should go away.

-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-02 Thread Kostis Anagnostopoulos
On Sun 01 Feb 2009 00:31:09 Carsten Haitzler wrote:
 On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh olo...@gmail.com said:
  On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler
 
  ras...@rasterman.com wrote:
   On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com 
said:
   But I think a dictionary format in plain utf8 that includes the
   normalised words as well as any candidates to display would be the
   best way. Then the dictionary itself could choose which characters to
   normalise and which to leave as is. So for Swedish, you can leave å, ä
   and ö as they are but normalise é, à etc. Searching would be as simple
   as in your original implementation (no need to convert from multibyte
   format).
  
   the problem is - the dict in utf8 means searching is slow as you do it
   in utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need
   to be allocated in non-swappable ram (its a phone - it has no swap) and
   thus a few mb of your ram goes into the kbd dict at all times. by using
   mmap you leave it to the kernels paging system to figure it out.
  
   so as such a dict change will mean a non-ascii format in future for
   this reason. but there will then need to be a tool to generate such a
   file.
 
  Searching in utf8 doesn't mean it has to be slow. Simple strcmp works
  fine on multibyte utf8 strings as well, and should be as fast as the
  dictionary was before adding multibyte to widechars conversions. But
  if you have some other idea in mind, please don't let me disturb. =)

 the problem is - it INSt a simple keyvalue lookup. it's a possible-match
 tree build on-the-fly. that means you jump about examining 1 character at a
 time. the problem here is that 1 char may or may not be 1 byte or more and
 that makes it really nasty. if it were a simple key lookup for a given
 simple string - life would be easy. this is possible - but then u'd have to
 generate ALL permutations first then look ALL of them up. if you weed out
 permutations AS you look them up you can weed out something like 90% of the
 permutations as you KNOw there are no words starting with qz... so as you
 go through qa... qs qx... qz... you can easily stop all the
 combinations with qs, qz ans qx as no words begin with that (if you have an
 8 letter word with 8 possible letters per character in the word thats 8^6
 lookups you avoided (in the case above - ie all permutations of the other 6
 letters). thats 262144 lookups avoided... just there. for... 1 of the above
 impossible permutation trees. now add it up over all of them.

Do you consider this paper relevant?
http://citeseer.ist.psu.edu/schulz02fast.html
Fast String Correction with Levenshtein-Automata, (2002),  Klaus Schulz, 
Stoyan Mihov

It actually uses tries to avoid generating and comparing exhaustively all 
permutations of the input word (typed keys),
but instead traverses *only* know words and accumulates permutations unless 
a max-errors limit gets exceeded, in which case this path dies.

It describes a mathematical model for correcting typos,
but since i have already implemented it (in java) 
i know think it can be retrofitted to perform what you describe in:
http://wiki.openmoko.org/wiki/Illume_keyboard


Keep up the good work.
  Kostis

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow - Usability features

2009-02-02 Thread Marco Trevisan (Treviño)
Carsten Haitzler (The Rasterman) wrote:
 On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net
 said:
 Maybe using something like a trie [1] to archive the
 words could help (both for words matching and for compressing the
 dictionary).
 Too hard?

 [1] http://en.wikipedia.org/wiki/Trie
 
 so back to the trie... the trie would only be useful for the ascii matching - 
 i
 need something more complex. it just combines the data with the match tree
 (letters are inline). i need a match tree + lookup table to other matches to
 display - and possibly several match entries (all the matches to display also
 need to be in the tree pointing to a smaller match list).

Ok, thanks... I got it. However I hope we could have made something that
 is based on that idea (the trie) but that can be applied to non
ascii-chars too.

However in the past days I sent you privately also a mail about some
issues of the keyboard in latest e17 svn [1], but I got no answer.
Maybe the mail wasn't sent correctly?!

However I've wrote there also some features that I'd suggest to
implement in the Illume keyboard. I'll write them here too to make the
community aware:

I use the illume keyboard every day and I'm very happy with it as I've
said many times in this ML, but sometimes it happens that it performs
some unwanted actions like:
 - I involuntarily click on a suggested word while I'm still typing my
   word (cause I'm not too precise I tap over a word, instead of a top
   char).
 - It happens that I got my keyboard switched while typing (yes, I know
   that this mainly an hardware-related issue, due to the touchscreen
   jitters).
They seem unrelated, but why not workarounding them by allowing these
actions only after a small timeout (i.e. waiting few ms from the latest
char pressure)?
Generally you never confirm a word or switch keyboard as fast as you
type over a char (since typing can be un-precise thanks to the keyboard
correction, switching a keyboard or selecting a word must be precise)...

And... What about making the horizontal word list (the one over the
keys) scrollable [right-left] as the configuration toolbar is? Would it
require more computation? I figure that that could improve the usability.

Bye.


[1] http://i43.tinypic.com/i4il2d.png

-- 
Treviño's World - Life and Linux
http://www.3v1n0.net/


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-02 Thread Helge Hafting
Carsten Haitzler (The Rasterman) wrote:
 On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting helge.haft...@hist.no said:
 
 Carsten Haitzler (The Rasterman) wrote:
 On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no
 said:
 I hope things like this will be possible, if a new dictionary format is 
 realized. It is ok if typing for suggests fôr as an alternative, but 
 før should not come up unless the user types f ø r. In which 
 case o must not be suggested...
 ok - how do you romanise norwegian then? example. in german ö - oe, ü -
 ue, ß - ss, etc. - there is a set of romanisation rules that can convert
 any such char to 1 or more roman letters. i was hoping to be even more
 lenient with ö - o being valid too for the lazy :) japanese has
 romanisation rules - so does chinese... norwegian must (eg æ - ae for
 example).

 Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, 
 ø-oe, å-aa.  They are next to useless, because ae and oe occur 
 naturally in many words where æ or ø does not belong, and these double 
 vowels are pronounced differently as well. A Norwegian seeing oe in a 
 word may be able to figure out if this means ø or if it really is 
 supposed to be oe, but this may need a context of several words. And 
 it looks funny/wrong - similar to how it looks silly transcribing x as 
 ks and write ksylophone.
 
 oh thats not bad! then it's just like english! (you get used to the vague
 insanity of it all sooner or later!) :)
 but seriously - if your name is nønæn, and you move to japan, and have to fill
 out a form for your bank account name - they will see the ø and æ and go 
 ummm.
 we can't do that - can you please use normal roman text?

Sure, in that case, it is ø-oe, æ-ae and å-aa. (Or some will go ø-o 
and å-a because their name looks less mangled that way.) While this may 
be ok for opening a bank account in japan, it is not something ordinary 
people will want to consider for typing text messages on a phone. 
Simple phones have had æøå in the T9 system for ages. (with æ and å 
on the same key as a, and ø on the same key as o)
[...]

 just like my example above - but i guess i was being stricter. the stodgey old
 banking system isn't going to go adapt like modern sports data systenms. its
 go roman - or go home. :)

Sure. I just hope the freerunner doesn't evolve into a stodgey old 
thing as far as keyboards are considered. Looks like it doesn't, so 
I'll be fine. :-)
[...]
 hmm. how interesting. i have always been baffled why there is a UK qwerty
 layout vs US - thre UK is the only place that uses it... all other english
 speaking countries i know use US qwerty (and if UK qwerty was nicely killed
 off.. it wouldn't need to be US qwerty - just qwerty) :)

Surely this is because of the £-sign? (And € too, in later standards.)
I don't think they are ready to give up the pound.

 ok - but there is a way to do this. when stuck on your friends pc when 
 visiting
 them in california, and they dont have compose-modes enabled... how do you 
 type
 æ and ø etc. that was basically the q - there must be some accepted mechanism
 for decimation/conversion. seemingly it's the obvious: æ - ae, ø - o etc.

My preferred way is to open a webpage and paste the special characters I 
need. These days, any pc seems to support æøå even if the keyboard 
itself doesn't. In a situation where æøå cannot be entered (such as the 
sms app in SHR which erroneously filter out non-ascii), I write my 
sentences very carefully avoiding these letters. For I don't want to 
spell wrong deliberately, not even transcriptions. Those that care a lot 
less about spelling use more transcriptions - and might even use 
transcriptions on a phone that has æøå, because their phone is badly 
adapted to Norwegian and have æøå in weird places. (Because the 
manufacturers aren't really into adding a couple of extra _hardware_ 
keys.) Software keyboards are great!

 Excellent!
 So if I have a wordlist and make a keyboard, then a dictionary can be 
 synthesized so there will be no unnecessary confusion between o and ø, 
 because both letters exists as keys?
 
 correct. as long as the dict matching doesnt drop extra info - ie normalize o
 - ø. currently it does.  but the rest o the code doesn't. it's just the dict
 matching engine - which as we have been discussing... needs work. :)
 
The dictionary file problably need to have some metadata anyway - such 
as what language it is for. It could also have a list of what non-ascii 
letters to use as-is. And assume standard romanization rules for the rest.

Helge Hafting

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-02-02 Thread The Rasterman
On Mon, 2 Feb 2009 19:39:52 +0200 Kostis Anagnostopoulos ankos...@gmail.com
said:

 On Sun 01 Feb 2009 00:31:09 Carsten Haitzler wrote:
  On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh olo...@gmail.com said:
   On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler
  
   ras...@rasterman.com wrote:
On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com 
 said:
But I think a dictionary format in plain utf8 that includes the
normalised words as well as any candidates to display would be the
best way. Then the dictionary itself could choose which characters to
normalise and which to leave as is. So for Swedish, you can leave å, ä
and ö as they are but normalise é, à etc. Searching would be as simple
as in your original implementation (no need to convert from multibyte
format).
   
the problem is - the dict in utf8 means searching is slow as you do it
in utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need
to be allocated in non-swappable ram (its a phone - it has no swap) and
thus a few mb of your ram goes into the kbd dict at all times. by using
mmap you leave it to the kernels paging system to figure it out.
   
so as such a dict change will mean a non-ascii format in future for
this reason. but there will then need to be a tool to generate such a
file.
  
   Searching in utf8 doesn't mean it has to be slow. Simple strcmp works
   fine on multibyte utf8 strings as well, and should be as fast as the
   dictionary was before adding multibyte to widechars conversions. But
   if you have some other idea in mind, please don't let me disturb. =)
 
  the problem is - it INSt a simple keyvalue lookup. it's a possible-match
  tree build on-the-fly. that means you jump about examining 1 character at a
  time. the problem here is that 1 char may or may not be 1 byte or more and
  that makes it really nasty. if it were a simple key lookup for a given
  simple string - life would be easy. this is possible - but then u'd have to
  generate ALL permutations first then look ALL of them up. if you weed out
  permutations AS you look them up you can weed out something like 90% of the
  permutations as you KNOw there are no words starting with qz... so as you
  go through qa... qs qx... qz... you can easily stop all the
  combinations with qs, qz ans qx as no words begin with that (if you have an
  8 letter word with 8 possible letters per character in the word thats 8^6
  lookups you avoided (in the case above - ie all permutations of the other 6
  letters). thats 262144 lookups avoided... just there. for... 1 of the above
  impossible permutation trees. now add it up over all of them.
 
 Do you consider this paper relevant?
 http://citeseer.ist.psu.edu/schulz02fast.html
 Fast String Correction with Levenshtein-Automata, (2002),  Klaus Schulz, 
 Stoyan Mihov
 
 It actually uses tries to avoid generating and comparing exhaustively all 
 permutations of the input word (typed keys),
 but instead traverses *only* know words and accumulates permutations unless 
 a max-errors limit gets exceeded, in which case this path dies.

not sure thats that good.. that will drop possible matches - the current scheme
walks the tree of known words using the permutation list to pick paths - it
wotn follow paths that dont exist, so thats already done. i was just saying
that you need the permutation list per letter + walking of the data to be
inherently combined. as without that you need to generate every permutation and
throw it at a 1 key - value lookup hash. it still uses a trie ( which is a
binary tree with the letters inlined as part of the tree struct). :)

just reading the abstract tho.. document is 67 pages i have to dig through...

 It describes a mathematical model for correcting typos,
 but since i have already implemented it (in java) 
 i know think it can be retrofitted to perform what you describe in:
 http://wiki.openmoko.org/wiki/Illume_keyboard

sure - can it be implemented so all data is mmaped from files? thats the
biggest problem. the first dict for illume (before the current) used a 27-way
per node tree - lookups were hyper-fast. but it ate ram. i went to the opposite
end where i just mmaped the test file and built a very small 2-level char
offset lookup table to avoid ram usage. this isnt that fast - but was ok. i
know i could improve the parsing with having it all ucs2 to avoid slower utf8
decomposing and with line jump-tables built into the file it'd avoid scanning a
whole line to jump to the next entry when a match fails. as such it's more a
matter of just having a fast dict format that can be mmaped and walked easily
while spooling off the permutations of chars per letter (and thus being able to
spot a match and calculate its relative distance).

 Keep up the good work.
   Kostis
 
 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 

Re: [SHR] illume predictive keyboard is too slow

2009-02-02 Thread The Rasterman
On Mon, 02 Feb 2009 15:26:50 +0100 Helge Hafting helge.haft...@hist.no said:

 Carsten Haitzler (The Rasterman) wrote:
  On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting helge.haft...@hist.no
  said:
  
  Carsten Haitzler (The Rasterman) wrote:
  On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no
  said:
  I hope things like this will be possible, if a new dictionary format is 
  realized. It is ok if typing for suggests fôr as an alternative, but 
  før should not come up unless the user types f ø r. In which 
  case o must not be suggested...
  ok - how do you romanise norwegian then? example. in german ö - oe, ü -
  ue, ß - ss, etc. - there is a set of romanisation rules that can convert
  any such char to 1 or more roman letters. i was hoping to be even more
  lenient with ö - o being valid too for the lazy :) japanese has
  romanisation rules - so does chinese... norwegian must (eg æ - ae for
  example).
 
  Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, 
  ø-oe, å-aa.  They are next to useless, because ae and oe occur 
  naturally in many words where æ or ø does not belong, and these double 
  vowels are pronounced differently as well. A Norwegian seeing oe in a 
  word may be able to figure out if this means ø or if it really is 
  supposed to be oe, but this may need a context of several words. And 
  it looks funny/wrong - similar to how it looks silly transcribing x as 
  ks and write ksylophone.
  
  oh thats not bad! then it's just like english! (you get used to the vague
  insanity of it all sooner or later!) :)
  but seriously - if your name is nønæn, and you move to japan, and have to
  fill out a form for your bank account name - they will see the ø and æ and
  go ummm. we can't do that - can you please use normal roman text?
 
 Sure, in that case, it is ø-oe, æ-ae and å-aa. (Or some will go ø-o 
 and å-a because their name looks less mangled that way.) While this may 
 be ok for opening a bank account in japan, it is not something ordinary 
 people will want to consider for typing text messages on a phone. 
 Simple phones have had æøå in the T9 system for ages. (with æ and å 
 on the same key as a, and ø on the same key as o)
 [...]

sure! yes. thats why i allowed for keys to be 'ø' and 'æ' etc. etc. - already
done. i was hoping to have a way of also doing it just with plain qwerty. so
there is a way of reducing it :)

  just like my example above - but i guess i was being stricter. the stodgey
  old banking system isn't going to go adapt like modern sports data
  systenms. its go roman - or go home. :)
 
 Sure. I just hope the freerunner doesn't evolve into a stodgey old 
 thing as far as keyboards are considered. Looks like it doesn't, so 
 I'll be fine. :-)
 [...]

unlike the banking system. the users CAN have a say in fixing it... if they
just do some code :) if they just sit and wait for people to do it for them for
free - they may have to wait a while until it becomes a priority for those
doing the code. :)

  hmm. how interesting. i have always been baffled why there is a UK qwerty
  layout vs US - thre UK is the only place that uses it... all other english
  speaking countries i know use US qwerty (and if UK qwerty was nicely killed
  off.. it wouldn't need to be US qwerty - just qwerty) :)
 
 Surely this is because of the £-sign? (And € too, in later standards.)
 I don't think they are ready to give up the pound.

hmm no - they moved the a-z letters around. symbols i can understand. but what
play with a-z layout... beats me...

  ok - but there is a way to do this. when stuck on your friends pc when
  visiting them in california, and they dont have compose-modes enabled...
  how do you type æ and ø etc. that was basically the q - there must be some
  accepted mechanism for decimation/conversion. seemingly it's the obvious: æ
  - ae, ø - o etc.
 
 My preferred way is to open a webpage and paste the special characters I 
 need. These days, any pc seems to support æøå even if the keyboard 
 itself doesn't. In a situation where æøå cannot be entered (such as the 
 sms app in SHR which erroneously filter out non-ascii), I write my 
 sentences very carefully avoiding these letters. For I don't want to 
 spell wrong deliberately, not even transcriptions. Those that care a lot 
 less about spelling use more transcriptions - and might even use 
 transcriptions on a phone that has æøå, because their phone is badly 
 adapted to Norwegian and have æøå in weird places. (Because the 
 manufacturers aren't really into adding a couple of extra _hardware_ 
 keys.) Software keyboards are great!

yeah. this is one reason i want toi understand how it works without ø, æ etc. -
one day there will be a phone with a kbd.. and it wont have a version per
language because the # of users in norway are too small to warrant a special
production run for them - same for germany, france etc. etc. - until you have
the sales numbers to justify that.. you need a way to 

Re: [SHR] illume predictive keyboard is too slow - Usability features

2009-02-02 Thread The Rasterman
On Mon, 02 Feb 2009 21:53:26 +0100 Marco Trevisan (Treviño) m...@3v1n0.net
said:

 Carsten Haitzler (The Rasterman) wrote:
  On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño)
  m...@3v1n0.net said:
  Maybe using something like a trie [1] to archive the
  words could help (both for words matching and for compressing the
  dictionary).
  Too hard?
 
  [1] http://en.wikipedia.org/wiki/Trie
  
  so back to the trie... the trie would only be useful for the ascii matching
  - i need something more complex. it just combines the data with the match
  tree (letters are inline). i need a match tree + lookup table to other
  matches to display - and possibly several match entries (all the matches to
  display also need to be in the tree pointing to a smaller match list).
 
 Ok, thanks... I got it. However I hope we could have made something that
  is based on that idea (the trie) but that can be applied to non
 ascii-chars too.
 
 However in the past days I sent you privately also a mail about some
 issues of the keyboard in latest e17 svn [1], but I got no answer.
 Maybe the mail wasn't sent correctly?!

got it - i just tend to ignore some of my mailboxes for a while and cycle
around to them... got a lot of email here :) i'll get back to you on it. it
just is that kbd isnt a focus at the moment so it tends to take a back-burner
position.

 However I've wrote there also some features that I'd suggest to
 implement in the Illume keyboard. I'll write them here too to make the
 community aware:
 
 I use the illume keyboard every day and I'm very happy with it as I've
 said many times in this ML, but sometimes it happens that it performs
 some unwanted actions like:
  - I involuntarily click on a suggested word while I'm still typing my
word (cause I'm not too precise I tap over a word, instead of a top
char).

thats a problem. mostly of spacing. its actually hard to figure that out. i
really dont know what to do there - if u reduce the hit area for matches - it
gets harder to select them. if i add more spacing you lose more screen to the
kbd. somewhere someone loses. it's a matter of fine adjustments i guess in the
spacing to add more space.

  - It happens that I got my keyboard switched while typing (yes, I know
that this mainly an hardware-related issue, due to the touchscreen
jitters).

hmm thats hard to do. either u make swipes less sensitive and thus make it
harder to change layout and solve your problem, or you live with the occasional
swipe... or we have another way to change layout thats easy.

 They seem unrelated, but why not workarounding them by allowing these
 actions only after a small timeout (i.e. waiting few ms from the latest
 char pressure)?

so  lets say 0.4 sec after the last keyboard key press it will allow for
swipes and match hits etc. that could be done. again - tuning a timing value.
will people then complain that :i often try and swipe or hit a match and it
doesnt respond. i need to do it again?. h.

 Generally you never confirm a word or switch keyboard as fast as you
 type over a char (since typing can be un-precise thanks to the keyboard
 correction, switching a keyboard or selecting a word must be precise)...

correct. it's a fine line to walk tho - as above :)

 And... What about making the horizontal word list (the one over the
 keys) scrollable [right-left] as the configuration toolbar is? Would it
 require more computation? I figure that that could improve the usability.

no - it'd be not much of a problem - i just didnt do it. :)

nb - i can see why you often hit a match word. your kbd layout doesnt have
padding ABOVE the qwerty line like the default does... :)

 Bye.
 
 
 [1] http://i43.tinypic.com/i4il2d.png
 
 -- 
 Treviño's World - Life and Linux
 http://www.3v1n0.net/
 
 
 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-31 Thread The Rasterman
On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh olo...@gmail.com said:

 On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler
 ras...@rasterman.com wrote:
  On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said:
  But I think a dictionary format in plain utf8 that includes the
  normalised words as well as any candidates to display would be the
  best way. Then the dictionary itself could choose which characters to
  normalise and which to leave as is. So for Swedish, you can leave å, ä
  and ö as they are but normalise é, à etc. Searching would be as simple
  as in your original implementation (no need to convert from multibyte
  format).
 
  the problem is - the dict in utf8 means searching is slow as you do it in
  utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need to be
  allocated in non-swappable ram (its a phone - it has no swap) and thus a
  few mb of your ram goes into the kbd dict at all times. by using mmap you
  leave it to the kernels paging system to figure it out.
 
  so as such a dict change will mean a non-ascii format in future for this
  reason. but there will then need to be a tool to generate such a file.
 
 Searching in utf8 doesn't mean it has to be slow. Simple strcmp works
 fine on multibyte utf8 strings as well, and should be as fast as the
 dictionary was before adding multibyte to widechars conversions. But
 if you have some other idea in mind, please don't let me disturb. =)

the problem is - it INSt a simple keyvalue lookup. it's a possible-match tree
build on-the-fly. that means you jump about examining 1 character at a time.
the problem here is that 1 char may or may not be 1 byte or more and that makes
it really nasty. if it were a simple key lookup for a given simple string -
life would be easy. this is possible - but then u'd have to generate ALL
permutations first then look ALL of them up. if you weed out permutations AS
you look them up you can weed out something like 90% of the permutations as you
KNOw there are no words starting with qz... so as you go through qa... qs
qx... qz... you can easily stop all the combinations with qs, qz ans qx as no
words begin with that (if you have an 8 letter word with 8 possible letters per
character in the word thats 8^6 lookups you avoided (in the case above - ie all
permutations of the other 6 letters). thats 262144 lookups avoided... just
there. for... 1 of the above impossible permutation trees. now add it up over
all of them.


-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-30 Thread Helge Hafting
Carsten Haitzler (The Rasterman) wrote:
 On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said:

 I hope things like this will be possible, if a new dictionary format is 
 realized. It is ok if typing for suggests fôr as an alternative, but 
 før should not come up unless the user types f ø r. In which 
 case o must not be suggested...
 
 ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue,
 ß - ss, etc. - there is a set of romanisation rules that can convert any such
 char to 1 or more roman letters. i was hoping to be even more lenient with ö 
 -
 o being valid too for the lazy :) japanese has romanisation rules - so does
 chinese... norwegian must (eg æ - ae for example).
 
Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, 
ø-oe, å-aa.  They are next to useless, because ae and oe occur 
naturally in many words where æ or ø does not belong, and these double 
vowels are pronounced differently as well. A Norwegian seeing oe in a 
word may be able to figure out if this means ø or if it really is 
supposed to be oe, but this may need a context of several words. And 
it looks funny/wrong - similar to how it looks silly transcribing x as 
ks and write ksylophone.

You might want to transcribe x that way in an emergency, if your x 
key breaks, until you get a new keyboard. You probably don't want to 
throw away the x to save space on a keyboard though. And norwegian 
transcriptions aren't used for the same reasons. I have only seen two 
cases of such traqnscription:
1. Names of norwegian athletes in international sports events.
Which looks real silly. And completely unnecessary. Sport computer
systems these days handle more than a-z, the names are spelled
correctly in national events after all.
And it is not as if foreigners get big
problems with an ø. If they don't know what the slash is for,
they can read it as o, and so on. Similiar to how I read
french - I have no idea what the difference between à and á is.
Both is a to me.
2. Expert computer users sometimes use the transcriptions, because
they often use the latest equipment before keyboards gets fixed
and before ascii-only limitations are sorted out. Some of them
are tired of fighting and give up. And they have actually heard
about the concept of transcription! But mainstream users get
equipment with proper keyboards, anything less is an unfinished
product. You won't find an ascii keyboard in a norwegian shop.

 if something can be romanised - it can have a romanised match in a dictionary
 and thus suggest the appropriate matches. of course now the dictionary
 determines these rules implicitly by content, not by code specifically
 enforcing such rules. :)
 
 but yes - selecting dictionary is needed so selecting a keyboard for that
 language as well as dictionary is useful. it still adds a few keys - thus
 squashing the keyboard some more :( i was hoping to avoid that.

English can work with 10 keys in a row, norwegian needs 11. :-)
The solution then is different keyboards, those who don't need more 
should not need to suffer the slightly smaller keys.

 note - the keyboard is by no means limited to ascii at all - it's perfectly
 able to have accented/other keys added to layouts - so i'm considering this
 problem solved as its simply a matter of everyone agreeing to make a .kbd 
 for
 their language - should they need one other than the default qwerty (ascii)
 one. so from this point of view - that's solved. what isn't done yet is:

Excellent!
So if I have a wordlist and make a keyboard, then a dictionary can be 
synthesized so there will be no unnecessary confusion between o and ø, 
because both letters exists as keys?

 1. a kbd being able to hint at wanting a specific dictionary language (or
 vice-versa).
For packaging, put the wordlist and keyboard layout in the same package. 
And switch both when swithcing keyboards. I guess several languages will 
have the same layout. This can be solved elegantly with hard links. Or a 
machanism where keyboards either uses stdandard ascii, or a language 
specific layout.

 2. dictionary itself being able to hint to have a specific kbd layout.
 3. applications not being able to hint for a specific language for input (and
 thus dictionary and/or kbd).

I believe we use the same apps, regardless of language? So an app should 
simply ask for numeric/alphabetic/terminal, and then the system provides 
the system default alpha kayboard. This could be english, norwegian, 
german, ... depending on a system setting.  Multilingual persons can 
have one default keyboard and explicitly select another when needed.

It'd be nice if one could have the option of setting a terminal keyboard 
as the default alphabetic keyboard too - some people don't like 
guesswork because the wordlist is never truly complete - or maybe there 
is no list for their language yet. Of course they then have to struggle 
with stylus and 

Re: [SHR] illume predictive keyboard is too slow

2009-01-30 Thread The Rasterman
On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said:

 On Fri, Jan 30, 2009 at 4:25 AM, The Rasterman Carsten Haitzler
 ras...@rasterman.com wrote:
  On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh olo...@gmail.com said:
 
  On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
  ras...@rasterman.com wrote:
   On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño)
   m...@3v1n0.net said:
  
   Olof Sjobergh wrote:
Unless I missed something big (which I hope I didn't, but I wouldn't
be surprised if I did), this is not fixable with the current
dictionary lookup design. Raster talked about redesigning the
dictionary format, so I guess we have to wait until he gets around to
it (or someone else does it).
  
   I think that too. Maybe using something like a trie [1] to archive the
   words could help (both for words matching and for compressing the
   dictionary).
   Too hard?
  
   [1] http://en.wikipedia.org/wiki/Trie
  
   the problem here comes with having multiple displays for a single match.
   let me take japanese as an example (i hope you have the fonts to see this
   at least - though there is no need to understand beyond knowing that
   there are a lot of matches that are visibly different):
  
   sakana -
さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖
   かな
  
   unlike simple decimation of é - e and ë - e and è - e etc. you need 1
   ascii input string matching one of MANY very different matches. the
   european case of
  
   vogel - Vogel Vögel
  
   is a simplified version of the above. the reason i wanted decimation to
   match a simple roman text (ascii) string is - that this is a pretty
   universal thing. thats how japanese, chinese and even some korean input
   methods work. it also works for european languages too. europeans are NOT
   used to the idea of a dictionary guessing/selecting system when they
   type - but the asians are. they are always typing and selecting. the
   smarts come with the dictionary system selecting the right one more
   often than not by default or the right selection you want being only 1
   or 2 keystrokes away.
  
   i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as
   much as possible - so you can just type and it will work and offer the
   selections as it's trying to guess anyway - it can present the multiple
   accented versions too. this limits the need for special keyboards -
   doesn't obviate it, but allows more functionality out of the box. in the
   event users explicitly select an accented char - ie a non-ascii
   character, it should not decimate. it should try match exactly that
   char.
  
   so if you add those keys and use them or flip to another key layout to
   select them - you get what you expect. but if i am to redo the dict - the
   api is very generic - just the internals and format need changing to be
   able to do the above. the cool bit is.. if i manage the above... it has
   almost solved asian languages too - and input methods... *IF* the vkbd is
   also able to talk to a complex input method (XIM/SCIM/UIM etc.) as
   keystroke faking wont let you type chinese characters... :) but in
   principle the dictionary and lookup scheme will work - its then just
   mechanics of sending the data to the app in a way it can use it.
  
   so back to the trie... the trie would only be useful for the ascii
   matching
   - i need something more complex. it just combines the data with the match
   tree (letters are inline). i need a match tree + lookup table to other
   matches to display - and possibly several match entries (all the matches
   to display also need to be in the tree pointing to a smaller match list).
  
   --
   - Codito, ergo sum - I code, therefore I am --
   The Rasterman (Carsten Haitzler)ras...@rasterman.com
 
  I think most problems could be solved by using a dictionary format
  similar to what you describe above, i.e. something like:
 
  match : candidate1 candidate2; frequency
  for example:
  vogel : Vogel Vögel; 123
 
  That would mean you can search on the normalised word where simple
  strcmp works fine and will be fast enough. To not make it too large
  for example the following syntax could also be accepted:
  eat; 512 // No candidates, just show the match as is
  har här hår; 1234// Also show the match itself as a candidate
 
  If you think this would be good enough, I could try to implement it.
 
  Another problem with languages like Swedish, and also Japanese, is the
  heavy use of conjugation. For example, in Japanese the verbs 食べる and
  考える can both be conjugated in the same way like this:
  食べる 食べました 食べた 食べている 食べていた 食べています 食べてい
  ました考える 考えました 考えた 考えている 考えていた 考えています 考
  えていました
 
  Another example, the Swedish nouns:
  bil bilen bilar bilarna bilens bilarnas
 
  But including all these forms in a dictionary makes it very large,
  which is impractical. So some way to indicate possible conjugations
  would be 

Re: [SHR] illume predictive keyboard is too slow

2009-01-30 Thread The Rasterman
On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting helge.haft...@hist.no said:

 Carsten Haitzler (The Rasterman) wrote:
  On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no
  said:
 
  I hope things like this will be possible, if a new dictionary format is 
  realized. It is ok if typing for suggests fôr as an alternative, but 
  før should not come up unless the user types f ø r. In which 
  case o must not be suggested...
  
  ok - how do you romanise norwegian then? example. in german ö - oe, ü -
  ue, ß - ss, etc. - there is a set of romanisation rules that can convert
  any such char to 1 or more roman letters. i was hoping to be even more
  lenient with ö - o being valid too for the lazy :) japanese has
  romanisation rules - so does chinese... norwegian must (eg æ - ae for
  example).
  
 Usually, one doesn't romanize Norwegian. There are some rules: æ-ae, 
 ø-oe, å-aa.  They are next to useless, because ae and oe occur 
 naturally in many words where æ or ø does not belong, and these double 
 vowels are pronounced differently as well. A Norwegian seeing oe in a 
 word may be able to figure out if this means ø or if it really is 
 supposed to be oe, but this may need a context of several words. And 
 it looks funny/wrong - similar to how it looks silly transcribing x as 
 ks and write ksylophone.

oh thats not bad! then it's just like english! (you get used to the vague
insanity of it all sooner or later!) :)
but seriously - if your name is nønæn, and you move to japan, and have to fill
out a form for your bank account name - they will see the ø and æ and go ummm.
we can't do that - can you please use normal roman text? because they will
either accept roman (a-z) OR japanese (hiragana/katakana/kanji). strange
accented european chars aren't going to work. :) so i guess i'm asking because
sooner or later when filling out an immigration form or something in another
country - you will need to drop such chars into roman text somehow (that ugly
nasty lowest common denominator thing - i know), and so i was curious... how
you solve that - as that then presents a set of solutions/rules that can be
applied. :) again - not saying to get rid of the ø's of this world. already
supported.but just wondering, how we can work when they are not there/used. :)

 You might want to transcribe x that way in an emergency, if your x 
 key breaks, until you get a new keyboard. You probably don't want to 
 throw away the x to save space on a keyboard though. And norwegian 
 transcriptions aren't used for the same reasons. I have only seen two 
 cases of such traqnscription:
 1. Names of norwegian athletes in international sports events.
 Which looks real silly. And completely unnecessary. Sport computer
 systems these days handle more than a-z, the names are spelled
 correctly in national events after all.
 And it is not as if foreigners get big
 problems with an ø. If they don't know what the slash is for,
 they can read it as o, and so on. Similiar to how I read
 french - I have no idea what the difference between à and á is.
 Both is a to me.

just like my example above - but i guess i was being stricter. the stodgey old
banking system isn't going to go adapt like modern sports data systenms. its
go roman - or go home. :)

 2. Expert computer users sometimes use the transcriptions, because
 they often use the latest equipment before keyboards gets fixed
 and before ascii-only limitations are sorted out. Some of them
 are tired of fighting and give up. And they have actually heard
 about the concept of transcription! But mainstream users get
 equipment with proper keyboards, anything less is an unfinished
 product. You won't find an ascii keyboard in a norwegian shop.

hmm. how interesting. i have always been baffled why there is a UK qwerty
layout vs US - thre UK is the only place that uses it... all other english
speaking countries i know use US qwerty (and if UK qwerty was nicely killed
off.. it wouldn't need to be US qwerty - just qwerty) :)

ok - but there is a way to do this. when stuck on your friends pc when visiting
them in california, and they dont have compose-modes enabled... how do you type
æ and ø etc. that was basically the q - there must be some accepted mechanism
for decimation/conversion. seemingly it's the obvious: æ - ae, ø - o etc.
:)

  if something can be romanised - it can have a romanised match in a
  dictionary and thus suggest the appropriate matches. of course now the
  dictionary determines these rules implicitly by content, not by code
  specifically enforcing such rules. :)
  
  but yes - selecting dictionary is needed so selecting a keyboard for that
  language as well as dictionary is useful. it still adds a few keys - thus
  squashing the keyboard some more :( i was hoping to avoid that.
 
 English can work with 10 keys in a row, norwegian needs 11. :-)
 The solution then is different keyboards, those who don't need 

Re: [SHR] illume predictive keyboard is too slow

2009-01-30 Thread Olof Sjobergh
On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler
ras...@rasterman.com wrote:
 On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh olo...@gmail.com said:
 But I think a dictionary format in plain utf8 that includes the
 normalised words as well as any candidates to display would be the
 best way. Then the dictionary itself could choose which characters to
 normalise and which to leave as is. So for Swedish, you can leave å, ä
 and ö as they are but normalise é, à etc. Searching would be as simple
 as in your original implementation (no need to convert from multibyte
 format).

 the problem is - the dict in utf8 means searching is slow as you do it in utf8
 space. the dict is mmaped() to save ram - if it wasnt it'd need to be 
 allocated
 in non-swappable ram (its a phone - it has no swap) and thus a few mb of your
 ram goes into the kbd dict at all times. by using mmap you leave it to the
 kernels paging system to figure it out.

 so as such a dict change will mean a non-ascii format in future for this
 reason. but there will then need to be a tool to generate such a file.

Searching in utf8 doesn't mean it has to be slow. Simple strcmp works
fine on multibyte utf8 strings as well, and should be as fast as the
dictionary was before adding multibyte to widechars conversions. But
if you have some other idea in mind, please don't let me disturb. =)

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Michal Brzozowski
2009/1/29 Olof Sjobergh olo...@gmail.com


 I think most problems could be solved by using a dictionary format
 similar to what you describe above, i.e. something like:

 match : candidate1 candidate2; frequency
 for example:
 vogel : Vogel Vögel; 123

 That would mean you can search on the normalised word where simple
 strcmp works fine and will be fast enough.


This dictionary would have hundreds of millions of rows even if you take
only reasonable user inputs. But what to do if the users inputs something
that's not in the dictionary? Of course I'm assuming you want to correct
typos, as it's doing now.

vogel: Vogel, Vögel
vigel: Vogel, Vögel
vpgel: Vogel, Vögel
wogel: Vogel, Vögel
wigel: Vogel, Vögel
vigem: Vogel, Vögel
vigwl: Vogel, Vögel
...
...
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread arne anka
 This dictionary would have hundreds of millions of rows even if you take
 only reasonable user inputs.

why would that be? colloquial language (nad that's what is to be  
considered) contains only several thousends words, still a lot but far  
away from millions.

 But what to do if the users inputs something
 that's not in the dictionary?

but that's a problem with every dictionary -- you never can contain every  
possible word.

i don't use the keyboard and i do not follow the discussion close, but  
what always struck me odd was the use of a text file.
why not use a db? it would enable learning, too.

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Olof Sjobergh
On Thu, Jan 29, 2009 at 10:18 AM, Michal Brzozowski ruso...@poczta.fm wrote:
 2009/1/29 Olof Sjobergh olo...@gmail.com

 I think most problems could be solved by using a dictionary format
 similar to what you describe above, i.e. something like:

 match : candidate1 candidate2; frequency
 for example:
 vogel : Vogel Vögel; 123

 That would mean you can search on the normalised word where simple
 strcmp works fine and will be fast enough.

 This dictionary would have hundreds of millions of rows even if you take
 only reasonable user inputs. But what to do if the users inputs something
 that's not in the dictionary? Of course I'm assuming you want to correct
 typos, as it's doing now.

 vogel: Vogel, Vögel
 vigel: Vogel, Vögel
 vpgel: Vogel, Vögel
 wogel: Vogel, Vögel
 wigel: Vogel, Vögel
 vigem: Vogel, Vögel
 vigwl: Vogel, Vögel
 ...
 ...

It did not mean all possible misspellings should be included, only the
normalisation which removes accented chars etc. So for normal English,
there would be almost no extra size compared to now. The current way
of correcting typos by checking all combinations from neighbouring
keys would work just like today.

Best redards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Michal Brzozowski
2009/1/29 Olof Sjobergh olo...@gmail.com

 It did not mean all possible misspellings should be included, only the
 normalisation which removes accented chars etc. So for normal English,
 there would be almost no extra size compared to now. The current way
 of correcting typos by checking all combinations from neighbouring
 keys would work just like today.


Ok, now I understand. This is a very good idea then. Is there any
explanation available on how the keyboard does typo correcting? I mean the
algorithm it uses.
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Helge Hafting
Carsten Haitzler (The Rasterman) wrote:

 i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as 
 much as
 possible - so you can just type and it will work and offer the selections as
 it's trying to guess anyway - it can present the multiple accented versions
 too. this limits the need for special keyboards - doesn't obviate it, but
 allows more functionality out of the box. in the event users explicitly 
 select
 an accented char - ie a non-ascii character, it should not decimate. it
 should try match exactly that char.
 
We will still need to select the correct dictionary for the language 
somewhere. It is no more work if this also selects a keyboard layout 
adapted to that language.

I can see why you want a simple keyboard with fewer keys - the keys can 
be bigger and so there will be fewer finger-misses. I don't see any 
reason why it should be limited to ascii though - that division does not 
seem natural to me.

An example from the Norwegian laguage: The letter ô is rarely used, and 
  everybody thinks about it as an o with a hat on it. So this one 
fits your scheme - type o and ô will be suggested in the few cases 
where it is appropriate.  But the three vowels æøå is different. They 
are letters of their own, they are not seen as modifications of a/o, 
even if that may be historically correct. These three have their own 
names and their own places in the alphabet (after z). An å is not 
merely an a with ring, no more than the E is an F with an extra 
line attached. The ø is not merely an o with a slash either. Many 
people don't know that æ originated as an ae ligature. æ and ae 
can both occur in words, but the pronunciation is different and they are 
not interchangeable.

So when Norwegians type, they expect to see the 29 letters of their 
alphabet: abcdefghijklmnopqrstuvwxyzæøå. ô and é are sometimes 
useful too, but these are just o and e with modifications. æøå 
however, are parts of the base alphabet. Just like abc. A keyboard 
without æøå is assumed not to support Norwegian.

I hope things like this will be possible, if a new dictionary format is 
realized. It is ok if typing for suggests fôr as an alternative, but 
før should not come up unless the user types f ø r. In which 
case o must not be suggested...

Helge Hafting




___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Al Johnson
On Thursday 29 January 2009, Michal Brzozowski wrote:
 2009/1/29 Olof Sjobergh olo...@gmail.com

  It did not mean all possible misspellings should be included, only the
  normalisation which removes accented chars etc. So for normal English,
  there would be almost no extra size compared to now. The current way
  of correcting typos by checking all combinations from neighbouring
  keys would work just like today.

 Ok, now I understand. This is a very good idea then. Is there any
 explanation available on how the keyboard does typo correcting? I mean the
 algorithm it uses.

The wiki page links to a thread where Raster explains the process in great 
detail.

http://wiki.openmoko.org/wiki/Illume_keyboard
http://lists.openmoko.org/nabble.html#nabble-td2115715


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread The Rasterman
On Thu, 29 Jan 2009 12:19:38 +0100 arne anka openm...@ginguppin.de said:

  This dictionary would have hundreds of millions of rows even if you take
  only reasonable user inputs.
 
 why would that be? colloquial language (nad that's what is to be  
 considered) contains only several thousends words, still a lot but far  
 away from millions.
 
  But what to do if the users inputs something
  that's not in the dictionary?
 
 but that's a problem with every dictionary -- you never can contain every  
 possible word.
 
 i don't use the keyboard and i do not follow the discussion close, but  
 what always struck me odd was the use of a text file.
 why not use a db? it would enable learning, too.

sheer simplicity and dependencies. a db would mean selecting one. gdbm is gpl.
libdb is fine - but they love to break db format every few releases and that'd
royally suck. also these lean to key/value pair - and that means u need to
GENERATE all possible permutations (which is prohibitively expensive) so the
dict also affects the lookup as you simply avoid generating permutations u know
will never have any matches (ie nothing starts with qz... so never worry about
all the qz* permutations). the best suggestion is a trie - but i need a format
i can access really quickly - and a library that isnt license or otherwise
restricted, easy to use, doesnt eat much ram at all, and is fast.

invariably you never get that - it either eats ram or it slow, or something
else. so what i did is just use a simple format easy to generate with a small 1
liner shell command and index it on the fly for quick lookups in a tiny 2 level
index. it of course is not incredibly fast - but it uses a tiny amount of
precious ram.

making it a text file opens the gate to easy generation of new dicts - and i
wanted to keep that as easy as possible.

-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread The Rasterman
On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting helge.haft...@hist.no said:

 Carsten Haitzler (The Rasterman) wrote:
 
  i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as 
  much as
  possible - so you can just type and it will work and offer the selections as
  it's trying to guess anyway - it can present the multiple accented versions
  too. this limits the need for special keyboards - doesn't obviate it, but
  allows more functionality out of the box. in the event users explicitly 
  select
  an accented char - ie a non-ascii character, it should not decimate. it
  should try match exactly that char.
  
 We will still need to select the correct dictionary for the language 
 somewhere. It is no more work if this also selects a keyboard layout 
 adapted to that language.
 
 I can see why you want a simple keyboard with fewer keys - the keys can 
 be bigger and so there will be fewer finger-misses. I don't see any 
 reason why it should be limited to ascii though - that division does not 
 seem natural to me.
 
 An example from the Norwegian laguage: The letter ô is rarely used, and 
   everybody thinks about it as an o with a hat on it. So this one 
 fits your scheme - type o and ô will be suggested in the few cases 
 where it is appropriate.  But the three vowels æøå is different. They 
 are letters of their own, they are not seen as modifications of a/o, 
 even if that may be historically correct. These three have their own 
 names and their own places in the alphabet (after z). An å is not 
 merely an a with ring, no more than the E is an F with an extra 
 line attached. The ø is not merely an o with a slash either. Many 
 people don't know that æ originated as an ae ligature. æ and ae 
 can both occur in words, but the pronunciation is different and they are 
 not interchangeable.
 
 So when Norwegians type, they expect to see the 29 letters of their 
 alphabet: abcdefghijklmnopqrstuvwxyzæøå. ô and é are sometimes 
 useful too, but these are just o and e with modifications. æøå 
 however, are parts of the base alphabet. Just like abc. A keyboard 
 without æøå is assumed not to support Norwegian.
 
 I hope things like this will be possible, if a new dictionary format is 
 realized. It is ok if typing for suggests fôr as an alternative, but 
 før should not come up unless the user types f ø r. In which 
 case o must not be suggested...

ok - how do you romanise norwegian then? example. in german ö - oe, ü - ue,
ß - ss, etc. - there is a set of romanisation rules that can convert any such
char to 1 or more roman letters. i was hoping to be even more lenient with ö -
o being valid too for the lazy :) japanese has romanisation rules - so does
chinese... norwegian must (eg æ - ae for example).

if something can be romanised - it can have a romanised match in a dictionary
and thus suggest the appropriate matches. of course now the dictionary
determines these rules implicitly by content, not by code specifically
enforcing such rules. :)

but yes - selecting dictionary is needed so selecting a keyboard for that
language as well as dictionary is useful. it still adds a few keys - thus
squashing the keyboard some more :( i was hoping to avoid that.

note - the keyboard is by no means limited to ascii at all - it's perfectly
able to have accented/other keys added to layouts - so i'm considering this
problem solved as its simply a matter of everyone agreeing to make a .kbd for
their language - should they need one other than the default qwerty (ascii)
one. so from this point of view - that's solved. what isn't done yet is:

1. a kbd being able to hint at wanting a specific dictionary language (or
vice-versa).
2. dictionary itself being able to hint to have a specific kbd layout.
3. applications not being able to hint for a specific language for input (and
thus dictionary and/or kbd).

so there needs to be a tie-in between language, dict and kbd - which one drives
what... is the question. it needs to not BREAK things like terminal kbd etc. -
ie i can stay with norwegian ad my language but if i select the terminal kbd -
it will stay there and not suddenly flip back to the simple kbd layout.
number/symbol entry similarly. this bit of things is currently undefined and
unimplemented.

the other is improved dictionary format. the problem is - if we go make the
dict smarter... how on earth do you GENERATE such a dictionary. i sure as hell
am not hand-writing a whole dictionary... and i doubt anyone here will - it
could be a large community effort to build a full one for each language - but
that will take time. you need to enter all words, all matches, conjugations,
and then frequency info too. the simple dict english can use is much easier -
it can be auto-generated from input text. just throw a (text version) of a book
- or newspaper or documentation - it can just index every word it finds and
even count frequency usage. thats easy to automate the production of such a
dict (and that is why the 

Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread The Rasterman
On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh olo...@gmail.com said:

 On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
 ras...@rasterman.com wrote:
  On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño)
  m...@3v1n0.net said:
 
  Olof Sjobergh wrote:
   Unless I missed something big (which I hope I didn't, but I wouldn't
   be surprised if I did), this is not fixable with the current
   dictionary lookup design. Raster talked about redesigning the
   dictionary format, so I guess we have to wait until he gets around to
   it (or someone else does it).
 
  I think that too. Maybe using something like a trie [1] to archive the
  words could help (both for words matching and for compressing the
  dictionary).
  Too hard?
 
  [1] http://en.wikipedia.org/wiki/Trie
 
  the problem here comes with having multiple displays for a single match.
  let me take japanese as an example (i hope you have the fonts to see this
  at least - though there is no need to understand beyond knowing that there
  are a lot of matches that are visibly different):
 
  sakana -
   さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな
 
  unlike simple decimation of é - e and ë - e and è - e etc. you need 1
  ascii input string matching one of MANY very different matches. the
  european case of
 
  vogel - Vogel Vögel
 
  is a simplified version of the above. the reason i wanted decimation to
  match a simple roman text (ascii) string is - that this is a pretty
  universal thing. thats how japanese, chinese and even some korean input
  methods work. it also works for european languages too. europeans are NOT
  used to the idea of a dictionary guessing/selecting system when they type -
  but the asians are. they are always typing and selecting. the smarts come
  with the dictionary system selecting the right one more often than not by
  default or the right selection you want being only 1 or 2 keystrokes away.
 
  i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much
  as possible - so you can just type and it will work and offer the
  selections as it's trying to guess anyway - it can present the multiple
  accented versions too. this limits the need for special keyboards - doesn't
  obviate it, but allows more functionality out of the box. in the event
  users explicitly select an accented char - ie a non-ascii character, it
  should not decimate. it should try match exactly that char.
 
  so if you add those keys and use them or flip to another key layout to
  select them - you get what you expect. but if i am to redo the dict - the
  api is very generic - just the internals and format need changing to be
  able to do the above. the cool bit is.. if i manage the above... it has
  almost solved asian languages too - and input methods... *IF* the vkbd is
  also able to talk to a complex input method (XIM/SCIM/UIM etc.) as
  keystroke faking wont let you type chinese characters... :) but in
  principle the dictionary and lookup scheme will work - its then just
  mechanics of sending the data to the app in a way it can use it.
 
  so back to the trie... the trie would only be useful for the ascii matching
  - i need something more complex. it just combines the data with the match
  tree (letters are inline). i need a match tree + lookup table to other
  matches to display - and possibly several match entries (all the matches to
  display also need to be in the tree pointing to a smaller match list).
 
  --
  - Codito, ergo sum - I code, therefore I am --
  The Rasterman (Carsten Haitzler)ras...@rasterman.com
 
 I think most problems could be solved by using a dictionary format
 similar to what you describe above, i.e. something like:
 
 match : candidate1 candidate2; frequency
 for example:
 vogel : Vogel Vögel; 123
 
 That would mean you can search on the normalised word where simple
 strcmp works fine and will be fast enough. To not make it too large
 for example the following syntax could also be accepted:
 eat; 512 // No candidates, just show the match as is
 har här hår; 1234// Also show the match itself as a candidate
 
 If you think this would be good enough, I could try to implement it.
 
 Another problem with languages like Swedish, and also Japanese, is the
 heavy use of conjugation. For example, in Japanese the verbs 食べる and
 考える can both be conjugated in the same way like this:
 食べる 食べました 食べた 食べている 食べていた 食べています 食べていまし
 た考える 考えました 考えた 考えている 考えていた 考えています 考えていま
 した
 
 Another example, the Swedish nouns:
 bil bilen bilar bilarna bilens bilarnas
 
 But including all these forms in a dictionary makes it very large,
 which is impractical. So some way to indicate possible conjugations
 would be good, but it would make the dictionary format a lot more
 complex.

the real problem is... how on EARTH will such a dictionary get written? who
will write all of that? the advantage to the simple just list lots of words
and ALL their forms is easy - it can be generated by 

Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Olof Sjobergh
On Fri, Jan 30, 2009 at 4:25 AM, The Rasterman Carsten Haitzler
ras...@rasterman.com wrote:
 On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh olo...@gmail.com said:

 On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
 ras...@rasterman.com wrote:
  On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño)
  m...@3v1n0.net said:
 
  Olof Sjobergh wrote:
   Unless I missed something big (which I hope I didn't, but I wouldn't
   be surprised if I did), this is not fixable with the current
   dictionary lookup design. Raster talked about redesigning the
   dictionary format, so I guess we have to wait until he gets around to
   it (or someone else does it).
 
  I think that too. Maybe using something like a trie [1] to archive the
  words could help (both for words matching and for compressing the
  dictionary).
  Too hard?
 
  [1] http://en.wikipedia.org/wiki/Trie
 
  the problem here comes with having multiple displays for a single match.
  let me take japanese as an example (i hope you have the fonts to see this
  at least - though there is no need to understand beyond knowing that there
  are a lot of matches that are visibly different):
 
  sakana -
   さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな
 
  unlike simple decimation of é - e and ë - e and è - e etc. you need 1
  ascii input string matching one of MANY very different matches. the
  european case of
 
  vogel - Vogel Vögel
 
  is a simplified version of the above. the reason i wanted decimation to
  match a simple roman text (ascii) string is - that this is a pretty
  universal thing. thats how japanese, chinese and even some korean input
  methods work. it also works for european languages too. europeans are NOT
  used to the idea of a dictionary guessing/selecting system when they type -
  but the asians are. they are always typing and selecting. the smarts come
  with the dictionary system selecting the right one more often than not by
  default or the right selection you want being only 1 or 2 keystrokes away.
 
  i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much
  as possible - so you can just type and it will work and offer the
  selections as it's trying to guess anyway - it can present the multiple
  accented versions too. this limits the need for special keyboards - doesn't
  obviate it, but allows more functionality out of the box. in the event
  users explicitly select an accented char - ie a non-ascii character, it
  should not decimate. it should try match exactly that char.
 
  so if you add those keys and use them or flip to another key layout to
  select them - you get what you expect. but if i am to redo the dict - the
  api is very generic - just the internals and format need changing to be
  able to do the above. the cool bit is.. if i manage the above... it has
  almost solved asian languages too - and input methods... *IF* the vkbd is
  also able to talk to a complex input method (XIM/SCIM/UIM etc.) as
  keystroke faking wont let you type chinese characters... :) but in
  principle the dictionary and lookup scheme will work - its then just
  mechanics of sending the data to the app in a way it can use it.
 
  so back to the trie... the trie would only be useful for the ascii matching
  - i need something more complex. it just combines the data with the match
  tree (letters are inline). i need a match tree + lookup table to other
  matches to display - and possibly several match entries (all the matches to
  display also need to be in the tree pointing to a smaller match list).
 
  --
  - Codito, ergo sum - I code, therefore I am --
  The Rasterman (Carsten Haitzler)ras...@rasterman.com

 I think most problems could be solved by using a dictionary format
 similar to what you describe above, i.e. something like:

 match : candidate1 candidate2; frequency
 for example:
 vogel : Vogel Vögel; 123

 That would mean you can search on the normalised word where simple
 strcmp works fine and will be fast enough. To not make it too large
 for example the following syntax could also be accepted:
 eat; 512 // No candidates, just show the match as is
 har här hår; 1234// Also show the match itself as a candidate

 If you think this would be good enough, I could try to implement it.

 Another problem with languages like Swedish, and also Japanese, is the
 heavy use of conjugation. For example, in Japanese the verbs 食べる and
 考える can both be conjugated in the same way like this:
 食べる 食べました 食べた 食べている 食べていた 食べています 食べていまし
 た考える 考えました 考えた 考えている 考えていた 考えています 考えていま
 した

 Another example, the Swedish nouns:
 bil bilen bilar bilarna bilens bilarnas

 But including all these forms in a dictionary makes it very large,
 which is impractical. So some way to indicate possible conjugations
 would be good, but it would make the dictionary format a lot more
 complex.

 the real problem is... how on EARTH will such a dictionary get written? who
 will write all of that? the advantage to the 

[SHR] illume predictive keyboard is too slow

2009-01-28 Thread Giorgio Marciano
I tried to write something with the illume keyboard within the SHR
unstable and it is too slow to be usable!

There is a way to fix it? withing the previous SHR testing it was working
quite good!

thanks

-- 
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Florian Hackenberger
On Wednesday 28 January 2009, Giorgio Marciano wrote:
 I tried to write something with the illume keyboard within the SHR
 unstable and it is too slow to be usable!
 There is a way to fix it? withing the previous SHR testing it was
 working quite good!
That's my UTF8 fix [1] that's causing the slowness, I'm afraid. 
Unfortunately I'm very very busy ATM and therefore I'm unable to work 
on it. It could either be the latin - UTF16 code which is slow or 
another bug I introduced (causing excessive lookups for example).


Cheers,
Florian

[1] http://trac.enlightenment.org/e/changeset/38274

-- 
DI Florian Hackenberger
flor...@hackenberger.at
www.hackenberger.at

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 11:53 AM, Florian Hackenberger
f.hackenber...@chello.at wrote:
 That's my UTF8 fix [1] that's causing the slowness, I'm afraid.
 Unfortunately I'm very very busy ATM and therefore I'm unable to work
 on it. It could either be the latin - UTF16 code which is slow or
 another bug I introduced (causing excessive lookups for example).

I looked into this issue when my Swedish keyboard didn't work
correctly. I found some issues and some parts that could be improved
and sent a patch with these fixes to the enlightenment devel list.
However, even fixing everything I could find, it's still a bit slow.
The problem seems to be the conversion to utf16 for each and every
strcmp when doing the lookup.

Unless I missed something big (which I hope I didn't, but I wouldn't
be surprised if I did), this is not fixable with the current
dictionary lookup design. Raster talked about redesigning the
dictionary format, so I guess we have to wait until he gets around to
it (or someone else does it).

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Helge Hafting
Olof Sjobergh wrote:
 On Wed, Jan 28, 2009 at 11:53 AM, Florian Hackenberger
 f.hackenber...@chello.at wrote:
 That's my UTF8 fix [1] that's causing the slowness, I'm afraid.
 Unfortunately I'm very very busy ATM and therefore I'm unable to work
 on it. It could either be the latin - UTF16 code which is slow or
 another bug I introduced (causing excessive lookups for example).
 
 I looked into this issue when my Swedish keyboard didn't work
 correctly. I found some issues and some parts that could be improved
 and sent a patch with these fixes to the enlightenment devel list.
 However, even fixing everything I could find, it's still a bit slow.
 The problem seems to be the conversion to utf16 for each and every
 strcmp when doing the lookup.
 
 Unless I missed something big (which I hope I didn't, but I wouldn't
 be surprised if I did), this is not fixable with the current
 dictionary lookup design. Raster talked about redesigning the
 dictionary format, so I guess we have to wait until he gets around to
 it (or someone else does it).
 
The obvious fix is to store the dictionary in such a format that
conversions won't be necessary. Not sure why utf16 is being used,
utf8 is more compact and  works so well for everything else in linux.

Helge Hafting

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting helge.haft...@hist.no wrote:
 The obvious fix is to store the dictionary in such a format that
 conversions won't be necessary. Not sure why utf16 is being used,
 utf8 is more compact and  works so well for everything else in linux.

Yes, the obvious fix is to change the dictionary format. However, it's
not as simple as you might think.

The dictionary today is stored in utf8, not utf16. But the dictionary
lookup tries to match words not exactly the same as the input word,
for example e should also match é, è and ë. To do this, every
character in the input string, and every character of each word, has
to be normalised to ascii. Since in utf8 a single character can take
up multiple bytes, to normalise a word it's first converted to utf16
where all characters are the same size, and then a simple lookup table
can be used for each character. But converting from multibyte format
each time a string is compared to another adds overhead.

With a different dictionary format where all words are stored already
normalised, there would be no need for all the conversions. But then
you also have to store all possible conversions for each word, so the
format would be more complicated.

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Helge Hafting
Olof Sjobergh wrote:
 On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting helge.haft...@hist.no wrote:
 The obvious fix is to store the dictionary in such a format that
 conversions won't be necessary. Not sure why utf16 is being used,
 utf8 is more compact and  works so well for everything else in linux.
 
 Yes, the obvious fix is to change the dictionary format. However, it's
 not as simple as you might think.
 
 The dictionary today is stored in utf8, not utf16. But the dictionary
 lookup tries to match words not exactly the same as the input word,
 for example e should also match é, è and ë. To do this, every

I see. This is done to avoid needing a few extra keys for accents and 
umlauts? Won't that create problems for languages where two words differ 
only in accents?  In Norwegian, there are many such pairs. Examples:
for/fôr, tå/ta, dør/dor,...

Helge Hafting

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Marco Trevisan (Treviño)
Olof Sjobergh wrote:
 Unless I missed something big (which I hope I didn't, but I wouldn't
 be surprised if I did), this is not fixable with the current
 dictionary lookup design. Raster talked about redesigning the
 dictionary format, so I guess we have to wait until he gets around to
 it (or someone else does it).

I think that too. Maybe using something like a trie [1] to archive the
words could help (both for words matching and for compressing the
dictionary).
Too hard?

[1] http://en.wikipedia.org/wiki/Trie

-- 
Treviño's World - Life and Linux
http://www.3v1n0.net/


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 5:50 PM, Helge Hafting helge.haft...@hist.no wrote:
 I see. This is done to avoid needing a few extra keys for accents and
 umlauts? Won't that create problems for languages where two words differ
 only in accents?  In Norwegian, there are many such pairs. Examples:
 for/fôr, tå/ta, dør/dor,...

Yes, that's a problem I ran into with Swedish as well. We have for
example har/här/hår etc. But with a good dictionary it actually works
ok, if not optimally. For these words you have to select the one you
want from the matches which is a little annoying but not a total
show-stopper.

To fix it, either you would need different normalisation tables for
each language, or a new dictionary format. Raster said in an earlier
mail on the list that he'd fix it someday but had a lot of other stuff
to look at now. So I guess we have to be patient for now.

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread The Rasterman
On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net
said:

 Olof Sjobergh wrote:
  Unless I missed something big (which I hope I didn't, but I wouldn't
  be surprised if I did), this is not fixable with the current
  dictionary lookup design. Raster talked about redesigning the
  dictionary format, so I guess we have to wait until he gets around to
  it (or someone else does it).
 
 I think that too. Maybe using something like a trie [1] to archive the
 words could help (both for words matching and for compressing the
 dictionary).
 Too hard?
 
 [1] http://en.wikipedia.org/wiki/Trie

the problem here comes with having multiple displays for a single match. let me
take japanese as an example (i hope you have the fonts to see this at least -
though there is no need to understand beyond knowing that there are a lot of
matches that are visibly different):

sakana -
 さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな

unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii
input string matching one of MANY very different matches. the european case of

vogel - Vogel Vögel

is a simplified version of the above. the reason i wanted decimation to match
a simple roman text (ascii) string is - that this is a pretty universal thing.
thats how japanese, chinese and even some korean input methods work. it also
works for european languages too. europeans are NOT used to the idea of a
dictionary guessing/selecting system when they type - but the asians are. they
are always typing and selecting. the smarts come with the dictionary system
selecting the right one more often than not by default or the right selection
you want being only 1 or 2 keystrokes away.

i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as
possible - so you can just type and it will work and offer the selections as
it's trying to guess anyway - it can present the multiple accented versions
too. this limits the need for special keyboards - doesn't obviate it, but
allows more functionality out of the box. in the event users explicitly select
an accented char - ie a non-ascii character, it should not decimate. it
should try match exactly that char.

so if you add those keys and use them or flip to another key layout to select
them - you get what you expect. but if i am to redo the dict - the api is very
generic - just the internals and format need changing to be able to do the
above. the cool bit is.. if i manage the above... it has almost solved asian
languages too - and input methods... *IF* the vkbd is also able to talk to a
complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type
chinese characters... :) but in principle the dictionary and lookup scheme will
work - its then just mechanics of sending the data to the app in a way it can
use it.

so back to the trie... the trie would only be useful for the ascii matching - i
need something more complex. it just combines the data with the match tree
(letters are inline). i need a match tree + lookup table to other matches to
display - and possibly several match entries (all the matches to display also
need to be in the tree pointing to a smaller match list).

-- 
- Codito, ergo sum - I code, therefore I am --
The Rasterman (Carsten Haitzler)ras...@rasterman.com


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
ras...@rasterman.com wrote:
 On Wed, 28 Jan 2009 18:59:32 +0100 Marco Trevisan (Treviño) m...@3v1n0.net
 said:

 Olof Sjobergh wrote:
  Unless I missed something big (which I hope I didn't, but I wouldn't
  be surprised if I did), this is not fixable with the current
  dictionary lookup design. Raster talked about redesigning the
  dictionary format, so I guess we have to wait until he gets around to
  it (or someone else does it).

 I think that too. Maybe using something like a trie [1] to archive the
 words could help (both for words matching and for compressing the
 dictionary).
 Too hard?

 [1] http://en.wikipedia.org/wiki/Trie

 the problem here comes with having multiple displays for a single match. let 
 me
 take japanese as an example (i hope you have the fonts to see this at least -
 though there is no need to understand beyond knowing that there are a lot of
 matches that are visibly different):

 sakana -
  さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな

 unlike simple decimation of é - e and ë - e and è - e etc. you need 1 ascii
 input string matching one of MANY very different matches. the european case of

 vogel - Vogel Vögel

 is a simplified version of the above. the reason i wanted decimation to match
 a simple roman text (ascii) string is - that this is a pretty universal thing.
 thats how japanese, chinese and even some korean input methods work. it also
 works for european languages too. europeans are NOT used to the idea of a
 dictionary guessing/selecting system when they type - but the asians are. they
 are always typing and selecting. the smarts come with the dictionary system
 selecting the right one more often than not by default or the right selection
 you want being only 1 or 2 keystrokes away.

 i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as
 possible - so you can just type and it will work and offer the selections as
 it's trying to guess anyway - it can present the multiple accented versions
 too. this limits the need for special keyboards - doesn't obviate it, but
 allows more functionality out of the box. in the event users explicitly select
 an accented char - ie a non-ascii character, it should not decimate. it
 should try match exactly that char.

 so if you add those keys and use them or flip to another key layout to select
 them - you get what you expect. but if i am to redo the dict - the api is very
 generic - just the internals and format need changing to be able to do the
 above. the cool bit is.. if i manage the above... it has almost solved asian
 languages too - and input methods... *IF* the vkbd is also able to talk to a
 complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type
 chinese characters... :) but in principle the dictionary and lookup scheme 
 will
 work - its then just mechanics of sending the data to the app in a way it can
 use it.

 so back to the trie... the trie would only be useful for the ascii matching - 
 i
 need something more complex. it just combines the data with the match tree
 (letters are inline). i need a match tree + lookup table to other matches to
 display - and possibly several match entries (all the matches to display also
 need to be in the tree pointing to a smaller match list).

 --
 - Codito, ergo sum - I code, therefore I am --
 The Rasterman (Carsten Haitzler)ras...@rasterman.com

I think most problems could be solved by using a dictionary format
similar to what you describe above, i.e. something like:

match : candidate1 candidate2; frequency
for example:
vogel : Vogel Vögel; 123

That would mean you can search on the normalised word where simple
strcmp works fine and will be fast enough. To not make it too large
for example the following syntax could also be accepted:
eat; 512 // No candidates, just show the match as is
har här hår; 1234// Also show the match itself as a candidate

If you think this would be good enough, I could try to implement it.

Another problem with languages like Swedish, and also Japanese, is the
heavy use of conjugation. For example, in Japanese the verbs 食べる and
考える can both be conjugated in the same way like this:
食べる 食べました 食べた 食べている 食べていた 食べています 食べていました
考える 考えました 考えた 考えている 考えていた 考えています 考えていました

Another example, the Swedish nouns:
bil bilen bilar bilarna bilens bilarnas

But including all these forms in a dictionary makes it very large,
which is impractical. So some way to indicate possible conjugations
would be good, but it would make the dictionary format a lot more
complex.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community