On Saturday, 21 March 2015 at 14:07:28 UTC, FG wrote:
On 2015-03-21 at 06:30, H. S. Teoh via Digitalmars-d wrote:
On Sat, Mar 21, 2015 at 04:17:00AM +0000, Joakim via Digitalmars-d wrote:
[...]
What I was going to say too, neither CLI or GUI will win, speech recognition will replace them both, by providing the best of both. Rather than writing a script to scrape several shopping websites for the price of a Galaxy S6, I'll simply tell the intelligent agent on my computer "Find me the best deal on a S6" and it will go find it.

I dunno, I find that I can express myself far more precisely and concisely on the keyboard than I can verbally. Maybe for everyday tasks like shopping for the best deals voice recognition is Good Enough(tm), but for more complex tasks, I have yet to find something more expressive
than the keyboard.

"Find me the best deal on a S6" is only a little more complex than "make me a cup of coffee." Fine for doing predefined tasks but questionable as an ubiquitous input method. It's hard enough for mathematicians to dictate a theorem without using any symbolic notation. There is too much ambiguity and room for interpretation in speech to make it a reliable and easy input method for all tasks. Even in your example:

You say: "Find me the best deal on a S6."
I hear: "Fine me the best teal on A.S. six."
Computer: "Are you looking for steel?"

Just tried it on google's voice search, it thought I said "Find me the best deal on a last sex" the first time I tried. After 3-4 more tries- "a sex," "nsx," etc- it finally got it right. But it never messed up anything before "on," only the intentionally difficult S6, which requires context to understand. Ask that question to the wrong person and they'd have no idea what you meant by S6 either.

My point is that the currently deployed, state-of-the-art systems are already much better than what you'd hear or what you think the computer would guess, and soon they will get that last bit right too.

Now imagine the extra trouble if you mix languages. Also, how do you include meta-text control sequences in a message? By raising your voice or tilting your head when you say the magic words? Cf.:

"There was this famous quote QUOTE to be or not to be END QUOTE on page six END PARAGRAPH..."

Just read that out normally and it'll be smart enough to know that the upper-case terms you highlighted are punctuation marks and not part of the sentence, by using various grammar and word frequency heuristics. In the rare occurrence of real ambiguity, you'll be able to step down to a lower-level editing mode and correct it.

Mixing languages is already hellish with keyboards and will be a lot easier with speech recognition.

Very awkward, if talking to oneself wasn't awkward already.

Put a headset on and speak a bit lower and nobody watching will know what you're saying or who you're saying it to.

Therefore I just cannot imagine voice being used anywhere where exact representation is required, especially in programming:

"Define M1 as a function that takes in two arguments. The state of the machine labelled ES and an integer number in range between two and six inclusive labelled X. The result of M1 is a boolean. M1 shall return true if and only if the ES member labelled squat THATS SQUAT WITH A T AT THE END is equal to zero modulo B. OH SHIT IT WAS NOT B BUT X. SCRATCH EVERYTHING."

As Paulo alludes to, the current textual representation of programming languages is optimized for keyboard entry. Programming languages themselves will change to allow fluid speech input.

On Saturday, 21 March 2015 at 15:13:13 UTC, Piotrek wrote:
Just for fun. A visualization of the problem from 2007 (I doubt there was breakthrough meanwhile)

https://www.youtube.com/watch?v=MzJ0CytAsec

Got a couple minutes into that before I knew current speech recognition is much better, as it has progressed by leaps and bounds over the intervening eight years. Doesn't mean it's good enough to throw away your keyboard yet, but it's nowhere near that bad anymore.

On Saturday, 21 March 2015 at 15:47:14 UTC, H. S. Teoh wrote:
It's about the ability to abstract, that's
currently missing from today's ubiquitous GUIs. I would willingly leave my text-based interfaces behind if you could show me a GUI that gives me the same (or better) abstraction power as the expressiveness of a CLI script, for example. Contemporary GUIs fail me on the following counts:

1) Expressiveness: there is no simple way of conveying complex
--snip--
5) Precision: Even when working with graphical data, I prefer text-based interfaces where practical, not because text is the best way to work with them -- it's quite inefficient, in fact -- but because I can specify the exact coordinates of object X and the exact displacement(s) I desire, rather than fight with the inherently imprecise mouse movement
and getting myself a wrist aneurysm trying to position object X
precisely in a GUI. I have yet to see a GUI that allows you to specify
things in a precise way without essentially dropping back to a
text-based interface (e.g., an input field that requires you to type in numbers... which is actually not a bad solution; many GUIs don't even provide that, but instead give you the dreaded slider control which is inherently imprecise and extremely cumbersome to use. Or worse, the text box with the inconveniently-small 5-pixel up/down arrows that changes the value by 0.1 per mouse click, thereby requiring an impractical number of clicks to get you to the right value -- if you're really unlucky, you can't even type in an explicit number but can only use
those microscopic arrows to change it).

A lot of this is simply that you are a different kind of computer user than the vast majority of computer users. You want to drive a Mustang with a manual transmission and a beast of an engine, whereas most computer users are perfectly happy with their Taurus with automatic transmission. A touch screen or WIMP GUI suits their mundane tasks best, while you need more expressiveness and control so you use the CLI.

The great promise of voice interfaces is that they will _both_ be simple enough for casual users and expressive enough for power users, while being very efficient and powerful for both. We still have some work to do to get these speech recognition engines there, but once we do, the entire visual interface to your computer will have to be redone to best suit voice input and nobody will use touch, mice, _or_ keyboards after that.

Reply via email to