On 2015-03-21 at 20:13, Joakim wrote:
"Find me the best deal on a S6"
[...]
Just tried it on google's voice search, it thought I said "Find me the best deal on 
a last sex" the first time I tried.

Obviously Google tries to converge the query with what is usually searched for. 
OTOH that's one of the areas you probably wouldn't want to browse through using 
a voice interface. :)


"There was this famous quote QUOTE to be or not to be END QUOTE on page six END 
PARAGRAPH..."
Just read that out normally and it'll be smart enough to know that the 
upper-case terms you highlighted are punctuation marks and not part of the 
sentence, by using various grammar and word frequency heuristics.  In the rare 
occurrence of real ambiguity, you'll be able to step down to a lower-level 
editing mode and correct it.

Yeah, I've exaggerated the problem. The deciding factor will be the required amount of stepping down to do 
low-level editing, even with a system supported by good machine learning, considering all the homonyms and 
words having many meanings. But let's assume that this was solved. I think the remaining "END 
PARAGRAPH", "OPEN BRACE" or "COMMA" problem will go away with a compromise: people 
will just tell stories like they normally do, and let the punctuation be added automatically using AST, 
interval and intonation analyses. And the dying breed of writers who care about punctuation very much will 
continue using keyboards.


Therefore I just cannot imagine voice being used anywhere where exact 
representation is required, especially in programming:

"Define M1 as a function that takes in two arguments. The state of the machine 
labelled ES and an integer number in range between two and six inclusive labelled X. The 
result of M1 is a boolean. M1 shall return true if and only if the ES member labelled 
squat THATS SQUAT WITH A T AT THE END is equal to zero modulo B. OH SHIT IT WAS NOT B BUT 
X. SCRATCH EVERYTHING."

As Paulo alludes to, the current textual representation of programming 
languages is optimized for keyboard entry. Programming languages themselves 
will change to allow fluid speech input.

That's true, programming languages will have to change. For example the distinction 
between lower and upper case is artificial and it was the biggest stumbling block in that 
video as well. That will have to go away along with other stuff. But if you look at my 
function definition, it doesn't have that, nor does it use parentheses, semicolons, etc., 
so it's "voice-ready". My question is: at which point would that be considered 
an efficient method to define a program's component that we would choose to use instead 
of the current succinct symbolic notation?


We still have some work to do to get these speech recognition engines there, 
but once we do, the entire visual interface to your computer will have to be 
redone to best suit voice input and *nobody* will use touch, mice, _or_ 
keyboards after that.

Yeah, right, people will create drawings with voice commands. :)  Every 
interface has its rightful domain and voice ain't best for everything. Or do 
you mean that touch will go away but instead people will be waving their hands 
around?

Reply via email to