Some of you will recognize me as someone who pops up occasionally asking questions as I grope my way to a usable speech driven programming environment. My last set of experiments with a technique called togglename and speech driven template notation hit a pretty nasty wall of usability because of a fundamental incompatibility between GUIs and speech recognition and the lack of support Nuance gives to disabled users in general.

Before anybody suggests it, yes I know about that guy who gave a talk at a python convention and uses what we call the burp, belch, and fart school of speech recognition engine abuse. yes that is actually an affectionate description. :-) what he did is impressive but it's not where I'm going

I think the techniques I was experimenting with are good ones because they do make it easier to speak code. the problem comes about because of the irreversibility of the transformation making editing code as difficult as it was before.

A little background. Today, Python is an amazingly speech recognition friendly programming language (especially if you ignore pep-8). Using simple macros, you can pretty much noodle along and write code relatively easily. A few more specialized pieces and it's almost easy to rip, shred, and tear code into new shapes as you realize you went down the wrong path but still have lots of good idioms.

However, as easy as it is to noodle along, creating code I find myself somewhere around 0.8 as effective as I was with my hands and in editing code, I'm around 0.5 or less. My goal is to make speech driven programming at least on a parity with someone who has useful hands and hopefully 3 to 5 times faster.

a few years ago, a disabled friend of mine pointed out that the hard problem was not the creation of code but the editing of code. I took his observations to heart and have been working on trying to create a speech friendly environment that that can transform from the speech notation to the code notation and back again and still remain functionally identical. I have some ideas but I need some outside perspective from people who know Python better than I do.

The core of the idea is an editor which can present code in two forms. The first form is what you guys all know in love but is horrible to speak. The second form is something that is easy to speak, and as I said above, functionally identical to the code form. An ideal solution would give me the ability to toggle back and forth between these two representations. An experiment would be to play with is displaying both representations at the same time so you can see what you speak in near real-time.

The speech environment lends itself to speaking the broad intent and then answering questions to fill in the detail to create something concrete. For example, in one of my prototypes (shown below), I state that I want a class. Then I fill a detail like an initialization function, inheriting from a parent, copying in all the arguments etc. and I end up with a full class definition much more quickly than I could even type it with good hands. This is what I meant above by 3 to 5 times faster than hand generated code.

But with every experimental success, there is usually more than one problem. In this case is that I lose all the meta-information when I create the instance of the intent plus detail. I can't go back to that abstract form.

The obvious answer is saving that meta-information in conjunction with the code but when working in a team environment, that information is going to drive you handies up the wall because it's going to visually overwhelm the actual code. Serving the meta-information separately will mean it's even harder to recover a speech friendly version of the code after it's been touched.

Another thought experiment has been with always generating syntactically correct code and basing various code generation and navigation constructs around that.

So the questions I have right now are, or

what's a good open editor ( preferably multiplatform) that actually decomposes Python code into fundamental components such as class, expression, etc. and, lets you operate on those components? this is in contrast to editors such as Emacs which give you some fundamental pieces you can operate on but it's really character oriented and all of the syntax smartness not really available for coupling to speech recognition environment. it would be great if it was in Python so I don't have to learn yet another fricking language.

What would be the best way to store meta-information necessary to re-create the speech friendly presentation of code? I don't know if this is possible but I would like to be able to let handy programmers make changes that will be propagated automatically into the speech friendly code presentation without forcing them to learn the new notation.

An example of this is the definition of the class. In my world, a class definition looks like this:

uses name:sta
uses init:yes
uses parent:dict
uses arg_list:magic dictonary, long sting, nuance sucks
uses super_arg_list:$arg_list
template class

Note: yes, I speak every single character or type it but with a smart editor, there's a bunch of optimizations one can use in data entry given the context. also, since I wrote this example, I realize that the uses statement is superfluous and I could just use template: <name> As the trigger for creating the instance of the template.

going from speech notation to code notation, I generate this:

class simple_class (super_nasty_class):
    def __init__(self, magic dictonary, long sting, nuance sucks):
super(simple_class,self).__init__(magic dictonary, long sting, nuance sucks)

Note: there is a mix of, what I call, codenames and string names in these examples. The togglename process would transform all string names into codenames at some later point in the user experience.

To elaborate on an earlier question, if someone put a doc string into the class definition I would need to be able to recognize it and put it back into the speech friendly form. Something like this:

class simple_class (super_nasty_class):
    """this is a real simple class to identify problems in the
       speech user interface
   """
    def __init__(self, magic dictonary, long sting, nuance sucks):
super(simple_class,self).__init__(magic dictonary, long sting, nuance sucks)

when transformed back into speech friendly form, it should look like:
uses name:sta
uses init:yes
uses parent:dict
uses arg_list:magic dictonary, long sting, nuance sucks
uses doc_string:
this is a real simple class to identify problems in the speech user interface
uses super_arg_list:$arg_list
template class

Speech driven programming is a hard problem. So thoughts, ideas would be welcome. Don't worry about giving me old ideas that have been looked at and rejected because you may have a take on it that I haven't seen considered and it's worth trying.

Thank you for reading this far. I know it's a long message and on an unfamiliar topic so I appreciate your attention.

--- eric



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to