Henrik Nilsen Omma wrote:
Eric S. Johansson wrote:
In short: Create a copy-left (GPL) tool to transfer text from Naturally
Speaking on Windows to Linux.
this is one half of the solution needed. Not only do you need to
propagate text to Linux but you need to provide enough context back to
windows so that NaturallySpeaking can select different grammars. It
would be nice to also modify the text injected in to Linux because
nuanced really screwed the pooch on natural text.
A few starts have been made on this, but it needs to be organised as a
proper community project and driven forward by several people.
this is a difficult task. There is a very nice package called voice
coder spearheaded by Alain Desilets up at nrc-it in conjunction with
David Fox. They haven't gotten a whole lot of additional contributions.
People with upper extremity disorders tend not to volunteer a whole
lot because quite frankly life is bouncing physical pain against what
needs to be done. It's exhausting.
The user
interface should aim to be better than what the native Windows NS
version has. It should be speech engine and OS agnostic. That way you'll
get people using it to transfer speech between all sorts of different
systems, and it will get more use and development. You should be able to
easily plug in a free engine like Sphinx (so these will be encouraged to
improve) or even Vista's native system, which will be very widespread.
damn, you are the optimist. Yes, user interface does need to be better
but it may not be possible because the recognition engine or systems
around it may not expose the interface is necessary to make it better.
For example, where do you get the information from to give the user
clear feedback that the system is hearing something and it's at the
right level? Also, the whole process of adding or deleting words from
your dictionary, training, or testing your audio input to make sure it
works right? I'm not saying it's impossible. I'm just saying be
prepared to work very very hard. I think we'd be better off finding
some way of overlaying the user interface from NaturallySpeaking on top
of a Linux virtual machine screen. Sucks but you might get done faster
than you are very desirable but overly optimistic wish.
My biggest gripe with NS is the editing interface. The actual
recognition is quite good IMO, but when you do make a mistake it is very
awkward to fix it without using the keyboard. If you give an edit
command and that is not understood correctly either then you get a
meaningless sentence and you are no longer able to easily correct the
one you originally wanted to fix. The end result is that to totally lose
the flow of what you were trying to express.
It's not quite that bad. Select-and-Say when it works is quite useful
for small phrases. What we need to do is propagate the Emacs mark and
point interface into a GUI environment. It's far more effective, at
least when you're noodling about within error-prone navigation process.
In any event, take a look at the voice coder you live for making
corrections. I really like it. It's the best correction interface is
seen so far. David Fox is responsible for that wonderful creation.
I presume the macro functionality in NS is configured so that the
pattern recognition is quite good on the macros you define yourself. So
when you say 'Paste in my address' it generally works. We can (ab)use
this macro facility for our own editing needs. We would define a set of
macros that would be processed by the NS engine and would give us a know
and parseable string.
natpython is the way to go. It lets a user create a sapi4 grammer and
associate a method with the grammar wants it resolves. Or is the term
hits a termial node? Anyway, it works, it's reliable, it's written in
Python and from user level looks to be relatively portable between
recognition requirements.
So saying 'Macro: delete sentence' would actually insert the text
**MACRO-DELETE-SENTENCE** into the text stream. If you were watching the
text on the Windows system the real text would be interspersed with such
commands, but on the Linux system receiving the stream it would just Do
the Right Thing. The big advantage is that it's very configurable this
way so we can make it do what we want.
you mean something like this...
operation = left | right | delete | kill | switch | copy;
datatype = character | word | sentence | paragraph | line | region;
doit exported = operation datatype ;
---
def gotResults_operation(self, words, fullResults):
translationtable={
'leftcharacter': {ctrl+b},
'rightcharacter': {ctrl+f},
'deletecharacter': {Backspace},
'killcharacter': {ctrl+d},
'switchcharacter': {ctrl+t},
'leftword':{esc}b,
'rightword':