Re: New developments on Caribou
Hi, marmuta wrote: I think there is scope to join forces between presage and onboard. presage is architected to merge predictions generated by a set of predictors. Each predictor uses a different language model/predictive algorithm to generate predictions. Currently presage provides the following predictors: ARPA predictor: statistical language modelling data in the ARPA N-gram format generalized smoothed n-gram statistical predictor: generalized smoothed n-gram statistical predictor can work with n-gram of arbitrary cardinality recency predictor: based on recency promotion principle dictionary predictor: generates a prediction by returning tokens that are a completion of the current prefix in alphabetical order abbreviation expansion predictor: maps the current prefix to a token and returns the token in a prediction with a 1.0 probability dejavu predictor: learns and then later reproduces previously seen text sequences. A bit more information on how these predictors work is available here: http://presage.sourceforge.net/?q=node/15 It sounds like the language model and predictive algorithm used in the onboard word-prediction branch is an ideal candidate to be integrated into presage and become a new presage predictor class. Pretty interesting stuff, but from looking over its feature list I'm wondering what presage would gain. There doesn't seem to be much onboards prediction could add that isn't implemented already. Roughly compared, gpredict (name is subject to change) covers these presage components: - generalized smoothed n-gram statistical predictor - recency predictor (with exponential falloff) - dictionary predictor (word completion) - dejavu predictor? (if it does continuous on-line learning) The main difference, apart from the general architecture, may be that gpredict uses dynamically updatable language models, handy for on-line learning. I'm not completely sure, but it seems presage's three n-gram predictors are based on immutable models and the dejavu predictor keeps a separate adaptable model of unigrams. The generalized smoothed n-gram predictor does continuous on-line learning (learning can be turned on or off at runtime or via configuration). When learning is turned on, the language model is updated on the fly with new n-gram counts. The dejavu predictor is just a toy predictor, really. I wrote it to try things out when I started implemented continuous online learning functionality and it now serves as simple example of how to implement a learning predictor class. Similarly, the smoothed count predictor and the 3-gram smoothed predictor are remnants from a time when I was experimenting with language models and really are building steps towards the generalized smoothed n-gram predictor, which is currently the main statistical predictor (along with the ARPA predictor). presage could then be the engine used to power the d-bus prediction service, offering the predictive capabilities of the onboard language model/predictor, plus all the predictors currently provided by presage (all of which can be turned on/off and configured to suit individual needs). The modularity could be helpful, even though I'm not sure if I could really make use of it. We were very concerned about memory usage and had initially thought about using static ARPA compatible structures for large immutable language models and dynamically updatable models only for on-line learning. However later the dynamic models turned out to be almost as efficient as the ARPA implementation and so now there are (flavors of) dynamic models for everything. Similar consolidation happened with recency caching. It was originally planned as a separate modular component. However that would have meant redundant storage of n-grams and a forced limit to some arbitrarily small number of recent n-grams. So I had it integrate more closely with the generic dynamic models, gaining recency tracking across all known n-grams but sacrificing some modularity (there is still variability through inheritance though). If onboard's current predictive functionality was merged into presage and encapsulated into a (say, for lack of a better name) OnboardPredictor class, then presage's modularity would be useful because it would allow us to: - replicate exactly the same predictive functionality of current gpredict service, by switching on OnboardPredictor and turning off other predictors - augment OnboardPredictor predictive functionality with other predictors currently provided by presage, as desired by onboard or the user, simply by modifying a config variable. Presage would definitely benefit from having a new and high-quality predictor in its core. The presage core library itself has minimal dependencies: it pretty much only needs a C++ runtime and sqlite, which is used as the backing store for n-gram based language models (this ensure fast access, minimum memory footprint and no delays while lo
Re: New developments on Caribou
Thanks to all for your responses!! Our intention is to make Caribou as flexible as we can, and the prediction options shall be configurable. We have a little video-demo with the work done until now [0], I would like you too see the video, feedback is welcome. We didn't publish the code yet because is still buggy. Cheers! David Pellicer [0] http://is.gd/c72P1 El vie, 07-05-2010 a las 12:10 +0100, David Colven escribió: > It's important that the user can set the weighting of prediction > results, for example frequency and rececy (if that is really a word!), > and also the order and way words are displayed (frequency, recency, > alphabetic and word length satisfy different needs), control word length > minima, n word phrase prediction, abbreviation-expansion, TTS support > and predict after...n letters. A phonetic correction (kan=can) is > useful for some users. > > Are these points being considered? Am I trying to teach someone to suck > eggs here? > > Most research seems to indicate that prediction does not speed a > keyboard user up unless they type at less than 1 character per second. > There are other reasons for using intelligent prediction of course such > as language difficulties and effort in text entry. For switch users > however the gains are much higher. > > I'll go back to lurking for a bit > > All the best > > David Colven > AEGIS Project www.aegis-project.eu > > The ACE Centre Advisory Trust > 92 Windmill Road > Headington > Oxford > OX3 7DR > > Direct - 01865 759813 > Office - 01865 759800 > Email - col...@ace-centre.org.uk > > The ACE Centre is a registered charity (no 1040868) > > > -Original Message- > > From: gnome-accessibility-list-boun...@gnome.org [mailto:gnome- > > accessibility-list-boun...@gnome.org] On Behalf Of Matteo Vescovi > > Sent: Friday, May 07, 2010 11:44 AM > > To: Francesco Fumanti > > Cc: marmuta; gnome-accessibility-list@gnome.org > > Subject: Re: New developments on Caribou > > > > Francesco Fumanti wrote: > > > There is working ongoing to create a word prediction service over > dbus > > > for the onscreen keyboard onboard. (onboard is the default onscreen > > > keyboard shipping with Ubuntu.) > > > At some point, there was also talk to share it with Caribou. It uses > > > n-grams language modeling. If you want to have a look at it, you can > > > find it in the word completion branch of onboard: > > > https://code.launchpad.net/onboard > > > > I had a look at the onboard word-completion branch, great stuff! > > > > I think there is scope to join forces between presage and onboard. > > > > presage is architected to merge predictions generated by a set of > > predictors. Each predictor uses a different language model/predictive > > algorithm to generate predictions. > > > > Currently presage provides the following predictors: > > ARPA predictor: statistical language modelling data in the ARPA N-gram > > format > > generalized smoothed n-gram statistical predictor: generalized > smoothed > > n-gram statistical predictor can work with n-gram of arbitrary > cardinality > > recency predictor: based on recency promotion principle > > dictionary predictor: generates a prediction by returning tokens that > > are a completion of the current prefix in alphabetical order > > abbreviation expansion predictor: maps the current prefix to a token > and > > returns the token in a prediction with a 1.0 probability > > dejavu predictor: learns and then later reproduces previously seen > text > > sequences. > > > > A bit more information on how these predictors work is available here: > > http://presage.sourceforge.net/?q=node/15 > > > > > > It sounds like the language model and predictive algorithm used in the > > onboard word-prediction branch is an ideal candidate to be integrated > > into presage and become a new presage predictor class. > > > > presage could then be the engine used to power the d-bus prediction > > service, offering the predictive capabilities of the onboard language > > model/predictor, plus all the predictors currently provided by presage > > (all of which can be turned on/off and configured to suit individual > > needs). > > > > > > The presage core library itself has minimal dependencies: it pretty > much > > only needs a C++ runtime and sqlite, which is used as the backing > store > > for n-gram based language models (this ensure fast access, minimum > > memory footprint and no delays while loading the languag
RE: New developments on Caribou
It's important that the user can set the weighting of prediction results, for example frequency and rececy (if that is really a word!), and also the order and way words are displayed (frequency, recency, alphabetic and word length satisfy different needs), control word length minima, n word phrase prediction, abbreviation-expansion, TTS support and predict after...n letters. A phonetic correction (kan=can) is useful for some users. Are these points being considered? Am I trying to teach someone to suck eggs here? Most research seems to indicate that prediction does not speed a keyboard user up unless they type at less than 1 character per second. There are other reasons for using intelligent prediction of course such as language difficulties and effort in text entry. For switch users however the gains are much higher. I'll go back to lurking for a bit All the best David Colven AEGIS Project www.aegis-project.eu The ACE Centre Advisory Trust 92 Windmill Road Headington Oxford OX3 7DR Direct - 01865 759813 Office - 01865 759800 Email - col...@ace-centre.org.uk The ACE Centre is a registered charity (no 1040868) > -Original Message- > From: gnome-accessibility-list-boun...@gnome.org [mailto:gnome- > accessibility-list-boun...@gnome.org] On Behalf Of Matteo Vescovi > Sent: Friday, May 07, 2010 11:44 AM > To: Francesco Fumanti > Cc: marmuta; gnome-accessibility-list@gnome.org > Subject: Re: New developments on Caribou > > Francesco Fumanti wrote: > > There is working ongoing to create a word prediction service over dbus > > for the onscreen keyboard onboard. (onboard is the default onscreen > > keyboard shipping with Ubuntu.) > > At some point, there was also talk to share it with Caribou. It uses > > n-grams language modeling. If you want to have a look at it, you can > > find it in the word completion branch of onboard: > > https://code.launchpad.net/onboard > > I had a look at the onboard word-completion branch, great stuff! > > I think there is scope to join forces between presage and onboard. > > presage is architected to merge predictions generated by a set of > predictors. Each predictor uses a different language model/predictive > algorithm to generate predictions. > > Currently presage provides the following predictors: > ARPA predictor: statistical language modelling data in the ARPA N-gram > format > generalized smoothed n-gram statistical predictor: generalized smoothed > n-gram statistical predictor can work with n-gram of arbitrary cardinality > recency predictor: based on recency promotion principle > dictionary predictor: generates a prediction by returning tokens that > are a completion of the current prefix in alphabetical order > abbreviation expansion predictor: maps the current prefix to a token and > returns the token in a prediction with a 1.0 probability > dejavu predictor: learns and then later reproduces previously seen text > sequences. > > A bit more information on how these predictors work is available here: > http://presage.sourceforge.net/?q=node/15 > > > It sounds like the language model and predictive algorithm used in the > onboard word-prediction branch is an ideal candidate to be integrated > into presage and become a new presage predictor class. > > presage could then be the engine used to power the d-bus prediction > service, offering the predictive capabilities of the onboard language > model/predictor, plus all the predictors currently provided by presage > (all of which can be turned on/off and configured to suit individual > needs). > > > The presage core library itself has minimal dependencies: it pretty much > only needs a C++ runtime and sqlite, which is used as the backing store > for n-gram based language models (this ensure fast access, minimum > memory footprint and no delays while loading the language model in > memory). > > > > For details about the word prediction service, please contact marmuta > > that did nearly all the work about the word prediction service. > > I'll follow up with marmuta to discuss the feasibility of making this > happen and work out the technical details, in case there is consensus to > go ahead with this. > > > Cheers, > - Matteo > > ___ > gnome-accessibility-list mailing list > gnome-accessibility-list@gnome.org > http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list ___ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
Re: New developments on Caribou
Francesco Fumanti wrote: There is working ongoing to create a word prediction service over dbus for the onscreen keyboard onboard. (onboard is the default onscreen keyboard shipping with Ubuntu.) At some point, there was also talk to share it with Caribou. It uses n-grams language modeling. If you want to have a look at it, you can find it in the word completion branch of onboard: https://code.launchpad.net/onboard I had a look at the onboard word-completion branch, great stuff! I think there is scope to join forces between presage and onboard. presage is architected to merge predictions generated by a set of predictors. Each predictor uses a different language model/predictive algorithm to generate predictions. Currently presage provides the following predictors: ARPA predictor: statistical language modelling data in the ARPA N-gram format generalized smoothed n-gram statistical predictor: generalized smoothed n-gram statistical predictor can work with n-gram of arbitrary cardinality recency predictor: based on recency promotion principle dictionary predictor: generates a prediction by returning tokens that are a completion of the current prefix in alphabetical order abbreviation expansion predictor: maps the current prefix to a token and returns the token in a prediction with a 1.0 probability dejavu predictor: learns and then later reproduces previously seen text sequences. A bit more information on how these predictors work is available here: http://presage.sourceforge.net/?q=node/15 It sounds like the language model and predictive algorithm used in the onboard word-prediction branch is an ideal candidate to be integrated into presage and become a new presage predictor class. presage could then be the engine used to power the d-bus prediction service, offering the predictive capabilities of the onboard language model/predictor, plus all the predictors currently provided by presage (all of which can be turned on/off and configured to suit individual needs). The presage core library itself has minimal dependencies: it pretty much only needs a C++ runtime and sqlite, which is used as the backing store for n-gram based language models (this ensure fast access, minimum memory footprint and no delays while loading the language model in memory). For details about the word prediction service, please contact marmuta that did nearly all the work about the word prediction service. I'll follow up with marmuta to discuss the feasibility of making this happen and work out the technical details, in case there is consensus to go ahead with this. Cheers, - Matteo ___ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
Re: New developments on Caribou
Hi, On 05/06/2010 09:24 AM, David Pellicer wrote: I am the person who is going to take over the project and I would like to explain our objectives. I have been asked to develop the following features of the Caribou roadmap: * change the qwerty layout to have 4 rows instead of 3 (add more control characters to the first layer)(already included in Caribou) * improve text detection using atspi (already included in Caribou) * add sticky keys functionality * add support for word prediction using prediction dbus service * add 'alpha numeric sorted alphabetically' and 'alpha numeric sorted by frequency' keyboards * keyboard fully usable with 1 switch Autoscan * keyboard fully usable with 1 switch Userscan * UI for setting up switch * a mode that is always visible First of all I´m going to integrate the Joaquim Rocha's patch touse JSON keyboard [0] layouts instead of the actual specification. As this patch is based on a previous version of Caribou, then I will integrate the next set of changes uploaded in the new way. For the prediction feature, I will create a D-Bus service using Presage [1], which is an interesting library to make text prediction based on the context instead of the characters of the word. If anyone is working on any of these features, please tell me on this list to avoid duplicated work, and discuss the best option to implement it. There is working ongoing to create a word prediction service over dbus for the onscreen keyboard onboard. (onboard is the default onscreen keyboard shipping with Ubuntu.) At some point, there was also talk to share it with Caribou. It uses n-grams language modeling. If you want to have a look at it, you can find it in the word completion branch of onboard: https://code.launchpad.net/onboard The word prediction service does not have a separate host yet, but it is planed. For details about the word prediction service, please contact marmuta that did nearly all the work about the word prediction service. Perhaps a few last words about onboard versus caribou. When Ben started working on caribou, I contacted him to tell him about onboard, but he did not find onboard as a suitable starting point for what he had in mind for caribou. Moreover, you should perhaps know that marmuta and I have the intention to keep onboard independant from at-spi to avoid the issues due to a running at-spi; if at some point a dependancy on at-spi is necessary (for example if at some point switch access with gui grabbing is going to be implemented in onboard), the current intention is to only add it as a soft dependancy, if possible. Cheers, Francesco ___ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
Re: New developments on Caribou
David Pellicer wrote: For the prediction feature, I will create a D-Bus service using Presage [1], which is an interesting library to make text prediction based on the context instead of the characters of the word. Hi David, I'm thrilled you are planning to use presage to add the predictive text functionality to caribou. I'm willing to help out with any fixes or enhancements to presage, if any are needed. Cheers, - Matteo ___ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
Re: New developments on Caribou
From: David Pellicer > For the prediction feature, I will create a D-Bus service using Presage > [1], which is an interesting library to make text prediction based on > the > context instead of the characters of the word. > If anyone is working on any of these features, please tell me on this > list to avoid duplicated work, and discuss the best option to > implement it. As I said in other thread, you could take a look to Joaquim Rocha work using presage [2]. He basically created a new gtk input method that uses presage internally, and then uses that with Caribou. The conclusion talking with Joaquim Rocha is that it is in a rough state, but this can be seen as a first step. Anyway, not sure if it is exactly what you want. BR > [0] https://bugzilla.gnome.org/show_bug.cgi?id=613229 > [1] http://presage.sourceforge.net [2] http://www.joaquimrocha.com/2010/04/05/caribou-and-text-predictor-input-mode/ === API (apinhe...@igalia.com) ___ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
Re: New developments on Caribou
Hi David, We talked earlier, Great to see this moving forward. We often chat on IRC, #a11y at gimpnet. On Thu, 2010-05-06 at 09:24 +0200, David Pellicer wrote: > First of all I´m going to integrate the Joaquim Rocha's patch to use > JSON keyboard [0] layouts instead of the actual specification. Joaquim and I recently corresponded on the bug, please see our comments there. Cheers, Eitan. ___ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list