Re: New developments on Caribou

2010-05-19 Thread Matteo Vescovi

Hi,

marmuta wrote:

I think there is scope to join forces between presage and onboard.

presage is architected to merge predictions generated by a set of 
predictors. Each predictor uses a different language model/predictive 
algorithm to generate predictions.


Currently presage provides the following predictors:
ARPA predictor: statistical language modelling data in the ARPA
N-gram format
generalized smoothed n-gram statistical predictor: generalized
smoothed n-gram statistical predictor can work with n-gram of
arbitrary cardinality recency predictor: based on recency promotion
principle dictionary predictor: generates a prediction by returning
tokens that are a completion of the current prefix in alphabetical
order abbreviation expansion predictor: maps the current prefix to a
token and returns the token in a prediction with a 1.0 probability
dejavu predictor: learns and then later reproduces previously seen
text sequences.

A bit more information on how these predictors work is available
here: http://presage.sourceforge.net/?q=node/15


It sounds like the language model and predictive algorithm used in
the onboard word-prediction branch is an ideal candidate to be
integrated into presage and become a new presage predictor class.


Pretty interesting stuff, but from looking over its feature list I'm
wondering what presage would gain. There doesn't seem to be much
onboards prediction could add that isn't implemented already. 


Roughly compared, gpredict (name is subject to change) covers
these presage components:

- generalized smoothed n-gram statistical predictor 
- recency predictor (with exponential falloff)

- dictionary predictor (word completion)
- dejavu predictor? (if it does continuous on-line learning)

The main difference, apart from the general architecture, may be that
gpredict uses dynamically updatable language models, handy for on-line
learning. I'm not completely sure, but it seems presage's three n-gram
predictors are based on immutable models and the dejavu predictor keeps
a separate adaptable model of unigrams.
  


The generalized smoothed n-gram predictor does continuous on-line 
learning (learning can be turned on or off at runtime or via 
configuration). When learning is turned on, the language model is 
updated on the fly with new n-gram counts.


The dejavu predictor is just a toy predictor, really. I wrote it to try 
things out when I started implemented continuous online learning 
functionality and it now serves as simple example of how to implement a 
learning predictor class.


Similarly, the smoothed count predictor and the 3-gram smoothed 
predictor are remnants from a time when I was experimenting with 
language models and really are building steps towards the generalized 
smoothed n-gram predictor, which is currently the main statistical 
predictor (along with the ARPA predictor).


presage could then be the engine used to power the d-bus prediction 
service, offering the predictive capabilities of the onboard language 
model/predictor, plus all the predictors currently provided by

presage (all of which can be turned on/off and configured to suit
individual needs).


The modularity could be helpful, even though I'm not sure if I could
really make use of it.

We were very concerned about memory usage and had initially thought
about using static ARPA compatible structures for large immutable
language models and dynamically updatable models only for on-line
learning. However later the dynamic models turned out to be almost as
efficient as the ARPA implementation and so now there are (flavors of)
dynamic models for everything.

Similar consolidation happened with recency caching. It was originally
planned as a separate modular component. However that would have meant
redundant storage of n-grams and a forced limit to some arbitrarily
small number of recent n-grams. So I had it integrate more closely with
the generic dynamic models, gaining recency tracking across all known
n-grams but sacrificing some modularity (there is still variability
through inheritance though).
  


If onboard's current predictive functionality was merged into presage 
and encapsulated into a (say, for lack of a better name) 
OnboardPredictor class, then presage's modularity would be useful 
because it would allow us to:
- replicate exactly the same predictive functionality of current 
gpredict service, by switching on OnboardPredictor and turning off other 
predictors
- augment OnboardPredictor predictive functionality with other 
predictors currently provided by presage, as desired by onboard or the 
user, simply by modifying a config variable.


Presage would definitely benefit from having a new and high-quality 
predictor in its core.



The presage core library itself has minimal dependencies: it pretty
much only needs a C++ runtime and sqlite, which is used as the
backing store for n-gram based language models (this ensure fast
access, minimum memory footprint and no delays while lo

Re: New developments on Caribou

2010-05-13 Thread David Pellicer
Thanks to all for your responses!!

Our intention is to make Caribou as flexible as we can, and the
prediction options shall be configurable. 

We have a little video-demo with the work done until now [0], I would
like you too see the video, feedback is welcome.

We didn't publish the code yet because is still buggy. 

Cheers!

David Pellicer

[0] http://is.gd/c72P1


El vie, 07-05-2010 a las 12:10 +0100, David Colven escribió: 
> It's important that the user can set the weighting of prediction
> results, for example frequency and rececy (if that is really a word!),
> and also the order and way words are displayed (frequency, recency,
> alphabetic and word length satisfy different needs), control word length
> minima, n word phrase prediction, abbreviation-expansion, TTS support
> and predict after...n letters.  A phonetic correction (kan=can) is
> useful for some users.
> 
> Are these points being considered?  Am I trying to teach someone to suck
> eggs here?
> 
> Most research seems to indicate that prediction does not speed a
> keyboard user up unless they type at less than 1 character per second.
> There are other reasons for using intelligent prediction of course such
> as language difficulties and effort in text entry.  For switch users
> however the gains are much higher.
> 
> I'll go back to lurking for a bit
> 
> All the best
> 
> David Colven
> AEGIS Project www.aegis-project.eu 
> 
> The ACE Centre Advisory Trust
> 92 Windmill Road
> Headington
> Oxford
> OX3 7DR
> 
> Direct - 01865 759813
> Office - 01865 759800
> Email - col...@ace-centre.org.uk
> 
> The ACE Centre is a registered charity (no 1040868)
> 
> > -Original Message-
> > From: gnome-accessibility-list-boun...@gnome.org [mailto:gnome-
> > accessibility-list-boun...@gnome.org] On Behalf Of Matteo Vescovi
> > Sent: Friday, May 07, 2010 11:44 AM
> > To: Francesco Fumanti
> > Cc: marmuta; gnome-accessibility-list@gnome.org
> > Subject: Re: New developments on Caribou
> > 
> > Francesco Fumanti wrote:
> > > There is working ongoing to create a word prediction service over
> dbus
> > > for the onscreen keyboard onboard. (onboard is the default onscreen
> > > keyboard shipping with Ubuntu.)
> > > At some point, there was also talk to share it with Caribou. It uses
> > > n-grams language modeling. If you want to have a look at it, you can
> > > find it in the word completion branch of onboard:
> > > https://code.launchpad.net/onboard
> > 
> > I had a look at the onboard word-completion branch, great stuff!
> > 
> > I think there is scope to join forces between presage and onboard.
> > 
> > presage is architected to merge predictions generated by a set of
> > predictors. Each predictor uses a different language model/predictive
> > algorithm to generate predictions.
> > 
> > Currently presage provides the following predictors:
> > ARPA predictor: statistical language modelling data in the ARPA N-gram
> > format
> > generalized smoothed n-gram statistical predictor: generalized
> smoothed
> > n-gram statistical predictor can work with n-gram of arbitrary
> cardinality
> > recency predictor: based on recency promotion principle
> > dictionary predictor: generates a prediction by returning tokens that
> > are a completion of the current prefix in alphabetical order
> > abbreviation expansion predictor: maps the current prefix to a token
> and
> > returns the token in a prediction with a 1.0 probability
> > dejavu predictor: learns and then later reproduces previously seen
> text
> > sequences.
> > 
> > A bit more information on how these predictors work is available here:
> > http://presage.sourceforge.net/?q=node/15
> > 
> > 
> > It sounds like the language model and predictive algorithm used in the
> > onboard word-prediction branch is an ideal candidate to be integrated
> > into presage and become a new presage predictor class.
> > 
> > presage could then be the engine used to power the d-bus prediction
> > service, offering the predictive capabilities of the onboard language
> > model/predictor, plus all the predictors currently provided by presage
> > (all of which can be turned on/off and configured to suit individual
> > needs).
> > 
> > 
> > The presage core library itself has minimal dependencies: it pretty
> much
> > only needs a C++ runtime and sqlite, which is used as the backing
> store
> > for n-gram based language models (this ensure fast access, minimum
> > memory footprint and no delays while loading the languag

RE: New developments on Caribou

2010-05-07 Thread David Colven

It's important that the user can set the weighting of prediction
results, for example frequency and rececy (if that is really a word!),
and also the order and way words are displayed (frequency, recency,
alphabetic and word length satisfy different needs), control word length
minima, n word phrase prediction, abbreviation-expansion, TTS support
and predict after...n letters.  A phonetic correction (kan=can) is
useful for some users.

Are these points being considered?  Am I trying to teach someone to suck
eggs here?

Most research seems to indicate that prediction does not speed a
keyboard user up unless they type at less than 1 character per second.
There are other reasons for using intelligent prediction of course such
as language difficulties and effort in text entry.  For switch users
however the gains are much higher.

I'll go back to lurking for a bit

All the best

David Colven
AEGIS Project www.aegis-project.eu 

The ACE Centre Advisory Trust
92 Windmill Road
Headington
Oxford
OX3 7DR

Direct - 01865 759813
Office - 01865 759800
Email - col...@ace-centre.org.uk

The ACE Centre is a registered charity (no 1040868)

> -Original Message-
> From: gnome-accessibility-list-boun...@gnome.org [mailto:gnome-
> accessibility-list-boun...@gnome.org] On Behalf Of Matteo Vescovi
> Sent: Friday, May 07, 2010 11:44 AM
> To: Francesco Fumanti
> Cc: marmuta; gnome-accessibility-list@gnome.org
> Subject: Re: New developments on Caribou
> 
> Francesco Fumanti wrote:
> > There is working ongoing to create a word prediction service over
dbus
> > for the onscreen keyboard onboard. (onboard is the default onscreen
> > keyboard shipping with Ubuntu.)
> > At some point, there was also talk to share it with Caribou. It uses
> > n-grams language modeling. If you want to have a look at it, you can
> > find it in the word completion branch of onboard:
> > https://code.launchpad.net/onboard
> 
> I had a look at the onboard word-completion branch, great stuff!
> 
> I think there is scope to join forces between presage and onboard.
> 
> presage is architected to merge predictions generated by a set of
> predictors. Each predictor uses a different language model/predictive
> algorithm to generate predictions.
> 
> Currently presage provides the following predictors:
> ARPA predictor: statistical language modelling data in the ARPA N-gram
> format
> generalized smoothed n-gram statistical predictor: generalized
smoothed
> n-gram statistical predictor can work with n-gram of arbitrary
cardinality
> recency predictor: based on recency promotion principle
> dictionary predictor: generates a prediction by returning tokens that
> are a completion of the current prefix in alphabetical order
> abbreviation expansion predictor: maps the current prefix to a token
and
> returns the token in a prediction with a 1.0 probability
> dejavu predictor: learns and then later reproduces previously seen
text
> sequences.
> 
> A bit more information on how these predictors work is available here:
> http://presage.sourceforge.net/?q=node/15
> 
> 
> It sounds like the language model and predictive algorithm used in the
> onboard word-prediction branch is an ideal candidate to be integrated
> into presage and become a new presage predictor class.
> 
> presage could then be the engine used to power the d-bus prediction
> service, offering the predictive capabilities of the onboard language
> model/predictor, plus all the predictors currently provided by presage
> (all of which can be turned on/off and configured to suit individual
> needs).
> 
> 
> The presage core library itself has minimal dependencies: it pretty
much
> only needs a C++ runtime and sqlite, which is used as the backing
store
> for n-gram based language models (this ensure fast access, minimum
> memory footprint and no delays while loading the language model in
> memory).
> 
> 
> > For details about the word prediction service, please contact
marmuta
> > that did nearly all the work about the word prediction service.
> 
> I'll follow up with marmuta to discuss the feasibility of making this
> happen and work out the technical details, in case there is consensus
to
> go ahead with this.
> 
> 
> Cheers,
> - Matteo
> 
> ___
> gnome-accessibility-list mailing list
> gnome-accessibility-list@gnome.org
> http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
___
gnome-accessibility-list mailing list
gnome-accessibility-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list


Re: New developments on Caribou

2010-05-07 Thread Matteo Vescovi

Francesco Fumanti wrote:
There is working ongoing to create a word prediction service over dbus 
for the onscreen keyboard onboard. (onboard is the default onscreen 
keyboard shipping with Ubuntu.)
At some point, there was also talk to share it with Caribou. It uses 
n-grams language modeling. If you want to have a look at it, you can 
find it in the word completion branch of onboard:

https://code.launchpad.net/onboard


I had a look at the onboard word-completion branch, great stuff!

I think there is scope to join forces between presage and onboard.

presage is architected to merge predictions generated by a set of 
predictors. Each predictor uses a different language model/predictive 
algorithm to generate predictions.


Currently presage provides the following predictors:
ARPA predictor: statistical language modelling data in the ARPA N-gram 
format
generalized smoothed n-gram statistical predictor: generalized smoothed 
n-gram statistical predictor can work with n-gram of arbitrary cardinality

recency predictor: based on recency promotion principle
dictionary predictor: generates a prediction by returning tokens that 
are a completion of the current prefix in alphabetical order
abbreviation expansion predictor: maps the current prefix to a token and 
returns the token in a prediction with a 1.0 probability
dejavu predictor: learns and then later reproduces previously seen text 
sequences.


A bit more information on how these predictors work is available here: 
http://presage.sourceforge.net/?q=node/15



It sounds like the language model and predictive algorithm used in the 
onboard word-prediction branch is an ideal candidate to be integrated 
into presage and become a new presage predictor class.


presage could then be the engine used to power the d-bus prediction 
service, offering the predictive capabilities of the onboard language 
model/predictor, plus all the predictors currently provided by presage 
(all of which can be turned on/off and configured to suit individual needs).



The presage core library itself has minimal dependencies: it pretty much 
only needs a C++ runtime and sqlite, which is used as the backing store 
for n-gram based language models (this ensure fast access, minimum 
memory footprint and no delays while loading the language model in memory).



For details about the word prediction service, please contact marmuta 
that did nearly all the work about the word prediction service.


I'll follow up with marmuta to discuss the feasibility of making this 
happen and work out the technical details, in case there is consensus to 
go ahead with this.



Cheers,
- Matteo

___
gnome-accessibility-list mailing list
gnome-accessibility-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list


Re: New developments on Caribou

2010-05-06 Thread Francesco Fumanti

Hi,


On 05/06/2010 09:24 AM, David Pellicer wrote:

  I am the person who is going to take over the project and I would like
to
  explain our objectives.

  I have been asked to develop the following features of the Caribou
  roadmap:
   * change the qwerty layout to have 4 rows instead of 3 (add more
  control characters to the first layer)(already included in Caribou)
   * improve text detection using atspi (already included in Caribou)
   * add sticky keys functionality
   * add support for word prediction using prediction dbus service
   * add 'alpha numeric sorted alphabetically' and 'alpha numeric sorted
  by frequency' keyboards
   * keyboard fully usable with 1 switch Autoscan
   * keyboard fully usable with 1 switch Userscan
   * UI for setting up switch
   * a mode that is always visible

  First of all I´m going to integrate the Joaquim Rocha's patch touse
  JSON keyboard [0] layouts instead of the actual specification. As this
  patch is based on a previous version of Caribou, then I will integrate
  the next set of  changes uploaded in the new way.
  For the prediction feature, I will create a D-Bus service using Presage
  [1], which is an interesting  library to make text prediction based on
the
  context instead of the characters of the word.
  If anyone is working on any of these features, please tell me on this
  list to avoid duplicated work, and discuss  the best option to
  implement it.


There is working ongoing to create a word prediction service over dbus for the 
onscreen keyboard onboard. (onboard is the default onscreen keyboard shipping 
with Ubuntu.)
At some point, there was also talk to share it with Caribou. It uses n-grams 
language modeling. If you want to have a look at it, you can find it in the 
word completion branch of onboard:
https://code.launchpad.net/onboard

The word prediction service does not have a separate host yet, but it is planed.

For details about the word prediction service, please contact marmuta that did 
nearly all the work about the word prediction service.

Perhaps a few last words about onboard versus caribou. When Ben started working 
on caribou, I contacted him to tell him about onboard, but he did not find 
onboard as a suitable starting point for what he had in mind for caribou. 
Moreover, you should perhaps know that marmuta and I have the intention to keep 
onboard independant from at-spi to avoid the issues due to a running at-spi; if 
at some point a dependancy on at-spi is necessary (for example if at some point 
switch access with gui grabbing is going to be implemented in onboard), the 
current intention is to only add it as a soft dependancy, if possible.


Cheers,

Francesco




___
gnome-accessibility-list mailing list
gnome-accessibility-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list


Re: New developments on Caribou

2010-05-06 Thread Matteo Vescovi

David Pellicer wrote:

 For the prediction feature, I will create a D-Bus service using Presage
 [1], which is an interesting  library to make text prediction based on
the
 context instead of the characters of the word.
  


Hi David,

I'm thrilled you are planning to use presage to add the predictive text 
functionality to caribou.


I'm willing to help out with any fixes or enhancements to presage, if 
any are needed.



Cheers,
- Matteo

___
gnome-accessibility-list mailing list
gnome-accessibility-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list


Re: New developments on Caribou

2010-05-06 Thread Piñeiro
From: David Pellicer 

>  For the prediction feature, I will create a D-Bus service using Presage
>  [1], which is an interesting  library to make text prediction based on
> the
>  context instead of the characters of the word.
>  If anyone is working on any of these features, please tell me on this
>  list to avoid duplicated work, and discuss  the best option to
>  implement it.

As I said in other thread, you could take a look to Joaquim Rocha work
using presage [2]. He basically created a new gtk input method that
uses presage internally, and then uses that with Caribou.

The conclusion talking with Joaquim Rocha is that it is in a rough
state, but this can be seen as a first step. Anyway, not sure if it is
exactly what you want.

BR

> [0] https://bugzilla.gnome.org/show_bug.cgi?id=613229
> [1] http://presage.sourceforge.net
[2] 
http://www.joaquimrocha.com/2010/04/05/caribou-and-text-predictor-input-mode/

===
API (apinhe...@igalia.com)
___
gnome-accessibility-list mailing list
gnome-accessibility-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list


Re: New developments on Caribou

2010-05-06 Thread Eitan Isaacson
Hi David,

We talked earlier, Great to see this moving forward. We often chat on
IRC, #a11y at gimpnet.

On Thu, 2010-05-06 at 09:24 +0200, David Pellicer wrote:
>  First of all I´m going to integrate the Joaquim Rocha's patch to use
>  JSON keyboard [0] layouts instead of the actual specification.

Joaquim and I recently corresponded on the bug, please see our comments
there.

Cheers,
  Eitan.

___
gnome-accessibility-list mailing list
gnome-accessibility-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list