On Sat, Mar 21, 2015 at 07:13:10PM +0000, Joakim via Digitalmars-d wrote:
[...]
> On Saturday, 21 March 2015 at 15:47:14 UTC, H. S. Teoh wrote:
> >It's about the ability to abstract, that's currently missing from
> >today's ubiquitous GUIs. I would willingly leave my text-based
> >interfaces behind if you could show me a GUI that gives me the same
> >(or better) abstraction power as the expressiveness of a CLI script,
> >for example. Contemporary GUIs fail me on the following counts:
> >
> >1) Expressiveness: there is no simple way of conveying complex
> --snip--
> >5) Precision: Even when working with graphical data, I prefer
> >text-based interfaces where practical, not because text is the best
> >way to work with them -- it's quite inefficient, in fact -- but
> >because I can specify the exact coordinates of object X and the exact
> >displacement(s) I desire, rather than fight with the inherently
> >imprecise mouse movement and getting myself a wrist aneurysm trying
> >to position object X precisely in a GUI. I have yet to see a GUI that
> >allows you to specify things in a precise way without essentially
> >dropping back to a text-based interface (e.g., an input field that
> >requires you to type in numbers... which is actually not a bad
> >solution; many GUIs don't even provide that, but instead give you the
> >dreaded slider control which is inherently imprecise and extremely
> >cumbersome to use. Or worse, the text box with the
> >inconveniently-small 5-pixel up/down arrows that changes the value by
> >0.1 per mouse click, thereby requiring an impractical number of
> >clicks to get you to the right value -- if you're really unlucky, you
> >can't even type in an explicit number but can only use those
> >microscopic arrows to change it).
> 
> A lot of this is simply that you are a different kind of computer user
> than the vast majority of computer users.  You want to drive a Mustang
> with a manual transmission and a beast of an engine, whereas most
> computer users are perfectly happy with their Taurus with automatic
> transmission.  A touch screen or WIMP GUI suits their mundane tasks
> best, while you need more expressiveness and control so you use the
> CLI.

Of course. But we're talking here about interfaces for *programmers*,
not for your average Joe, for whom a pretty GUI with a button or two
would suffice.


> The great promise of voice interfaces is that they will _both_ be
> simple enough for casual users and expressive enough for power users,
> while being very efficient and powerful for both.

Call me a skeptic, but I'll believe this promise when I see it.


> We still have some work to do to get these speech recognition engines
> there, but once we do, the entire visual interface to your computer
> will have to be redone to best suit voice input and nobody will use
> touch, mice, _or_ keyboards after that.

This is the unpopular opinion, but I'm skeptical if this day will ever
come. The problem with voice recognition is that it's based on natural
language, and natural language is inherently ambiguous. You say that
heuristics can solve this, I call BS on that. Heuristics are bug-prone
and unreliable (because otherwise they'd be algorithms!), precisely
because they fail to capture the essence of the problem, but are merely
crutches to get us mostly there in lieu of an actual solution.

The inherent ambiguity in natural language comes not from some kind of
inherent flaw as most people tend to believe, but it's actually a
side-effect of the brain's ability at context-sensitive comprehension.
The exact same utterance, spoken in different contexts, can mean totally
different things, and the brain has no problem with that (provided it is
given sufficient context, of course). The brain is also constantly
optimizing itself -- if it can convey its intended meaning in fewer,
simpler words, it will prefer to do that instead of going through the
effort of uttering the full phrase. This is one of the main factors
behind language change, which happens over time and is mostly
unconscious.  Long, convoluted phrases, if spoken often enough, tend to
contract into shorter, sometimes ambiguous, utterances, as long as there
is sufficient context to disambiguate. This is why we have a tendency
toward acronyms -- the brain is optimizing away the long utterance in
preference to a short acronym, which, based on the context of a group of
speakers who mutually share similar contexts (e.g., computer lingo), is
unambiguous, but may very well be ambiguous in a wider context. If I
talk to you about UFCS, you'd immediately understand what I was talking
about, but if I said that to my wife, she would have no idea what I just
said -- she may not even realize it's an acronym, because it sounds like
a malformed sentence "you ...". The only way to disambiguate this kind
of context-specific utterance is to *share* in that context in the first
place. Talk to a Java programmer about UFCS, and he probably wouldn't
know what you just said either, unless he has been reading up on D.

The only way speech recognition can acquire this level of context in
order to disambiguate is to customize itself to that specific user -- in
essence learn his personal lingo, pick up his (sub)culture, learn the
contexts associated with his areas of interest, even adapt to his
peculiarities of pronunciation. If software can get to this level, it
might as well pass the Turing test, 'cos then it'd have enough context
to carry out an essentially human conversation.  I'd say we're far, far
from that point today, and it's not clear we'd ever get there. We
haven't even mastered context-sensitive languages, except via the crutch
of parsing a context-free grammar and then apply a patchwork of semantic
analysis after the fact, let alone natural language, which is not only
context-sensitive but may depend on context outside of the input
(cultural background, implicit common shared knowledge, etc.). Before we
can get there, we'd have to grapple with knowledge representation,
context-sensitive semantics, and all those hard problems that today seem
to have no tractable solution in sight.

P.S. Haha, it looks like my Perl script has serendipitously selected a
quote that captures the inherent ambiguity of natural language -- you
can't even tell, at a glance, where the verbs are! I'd like to see an
algorithm parse *that* (and then see it fall flat on its face when I
actually meant one of the "non-standard" interpretations of it, such as
if this were in the context of a sci-fi movie where there are insects
called "time flies"...).


T

-- 
Time flies like an arrow. Fruit flies like a banana.

Reply via email to