On Saturday, 21 March 2015 at 14:07:28 UTC, FG wrote:
On 2015-03-21 at 06:30, H. S. Teoh via Digitalmars-d wrote:
On Sat, Mar 21, 2015 at 04:17:00AM +0000, Joakim via
Digitalmars-d wrote:
[...]
What I was going to say too, neither CLI or GUI will win,
speech
recognition will replace them both, by providing the best of
both.
Rather than writing a script to scrape several shopping
websites for
the price of a Galaxy S6, I'll simply tell the intelligent
agent on my
computer "Find me the best deal on a S6" and it will go find
it.
I dunno, I find that I can express myself far more precisely
and
concisely on the keyboard than I can verbally. Maybe for
everyday tasks
like shopping for the best deals voice recognition is Good
Enough(tm),
but for more complex tasks, I have yet to find something more
expressive
than the keyboard.
"Find me the best deal on a S6" is only a little more complex
than "make me a cup of coffee." Fine for doing predefined tasks
but questionable as an ubiquitous input method. It's hard
enough for mathematicians to dictate a theorem without using
any symbolic notation. There is too much ambiguity and room for
interpretation in speech to make it a reliable and easy input
method for all tasks. Even in your example:
You say: "Find me the best deal on a S6."
I hear: "Fine me the best teal on A.S. six."
Computer: "Are you looking for steel?"
Just tried it on google's voice search, it thought I said "Find
me the best deal on a last sex" the first time I tried. After
3-4 more tries- "a sex," "nsx," etc- it finally got it right.
But it never messed up anything before "on," only the
intentionally difficult S6, which requires context to understand.
Ask that question to the wrong person and they'd have no idea
what you meant by S6 either.
My point is that the currently deployed, state-of-the-art systems
are already much better than what you'd hear or what you think
the computer would guess, and soon they will get that last bit
right too.
Now imagine the extra trouble if you mix languages. Also, how
do you include meta-text control sequences in a message? By
raising your voice or tilting your head when you say the magic
words? Cf.:
"There was this famous quote QUOTE to be or not to be END QUOTE
on page six END PARAGRAPH..."
Just read that out normally and it'll be smart enough to know
that the upper-case terms you highlighted are punctuation marks
and not part of the sentence, by using various grammar and word
frequency heuristics. In the rare occurrence of real ambiguity,
you'll be able to step down to a lower-level editing mode and
correct it.
Mixing languages is already hellish with keyboards and will be a
lot easier with speech recognition.
Very awkward, if talking to oneself wasn't awkward already.
Put a headset on and speak a bit lower and nobody watching will
know what you're saying or who you're saying it to.
Therefore I just cannot imagine voice being used anywhere where
exact representation is required, especially in programming:
"Define M1 as a function that takes in two arguments. The state
of the machine labelled ES and an integer number in range
between two and six inclusive labelled X. The result of M1 is a
boolean. M1 shall return true if and only if the ES member
labelled squat THATS SQUAT WITH A T AT THE END is equal to zero
modulo B. OH SHIT IT WAS NOT B BUT X. SCRATCH EVERYTHING."
As Paulo alludes to, the current textual representation of
programming languages is optimized for keyboard entry.
Programming languages themselves will change to allow fluid
speech input.
On Saturday, 21 March 2015 at 15:13:13 UTC, Piotrek wrote:
Just for fun. A visualization of the problem from 2007 (I doubt
there was breakthrough meanwhile)
https://www.youtube.com/watch?v=MzJ0CytAsec
Got a couple minutes into that before I knew current speech
recognition is much better, as it has progressed by leaps and
bounds over the intervening eight years. Doesn't mean it's good
enough to throw away your keyboard yet, but it's nowhere near
that bad anymore.
On Saturday, 21 March 2015 at 15:47:14 UTC, H. S. Teoh wrote:
It's about the ability to abstract, that's
currently missing from today's ubiquitous GUIs. I would
willingly leave
my text-based interfaces behind if you could show me a GUI that
gives me
the same (or better) abstraction power as the expressiveness of
a CLI
script, for example. Contemporary GUIs fail me on the following
counts:
1) Expressiveness: there is no simple way of conveying complex
--snip--
5) Precision: Even when working with graphical data, I prefer
text-based
interfaces where practical, not because text is the best way to
work
with them -- it's quite inefficient, in fact -- but because I
can
specify the exact coordinates of object X and the exact
displacement(s)
I desire, rather than fight with the inherently imprecise mouse
movement
and getting myself a wrist aneurysm trying to position object X
precisely in a GUI. I have yet to see a GUI that allows you to
specify
things in a precise way without essentially dropping back to a
text-based interface (e.g., an input field that requires you to
type in
numbers... which is actually not a bad solution; many GUIs
don't even
provide that, but instead give you the dreaded slider control
which is
inherently imprecise and extremely cumbersome to use. Or worse,
the text
box with the inconveniently-small 5-pixel up/down arrows that
changes
the value by 0.1 per mouse click, thereby requiring an
impractical
number of clicks to get you to the right value -- if you're
really
unlucky, you can't even type in an explicit number but can only
use
those microscopic arrows to change it).
A lot of this is simply that you are a different kind of computer
user than the vast majority of computer users. You want to drive
a Mustang with a manual transmission and a beast of an engine,
whereas most computer users are perfectly happy with their Taurus
with automatic transmission. A touch screen or WIMP GUI suits
their mundane tasks best, while you need more expressiveness and
control so you use the CLI.
The great promise of voice interfaces is that they will _both_ be
simple enough for casual users and expressive enough for power
users, while being very efficient and powerful for both. We
still have some work to do to get these speech recognition
engines there, but once we do, the entire visual interface to
your computer will have to be redone to best suit voice input and
nobody will use touch, mice, _or_ keyboards after that.