Hi Jean Louis, et al,

I guess I didn't make myself clear.  Configuration/Setup is not the issue.  The Numen project (in combination with 'dotool'/'xdotool' and 'keynav') provides a complete voice computer control system.  Configuration is via a set of text files and is surprisingly flexible.  I now have custom setup which, in principle, allows me to easily do all the tasks that you mention in your reply and many more.  Basically, I should now be able to do with my voice anything that I could do with the mouse and keyboard.  If the underlying AI/LLM speech recognition system was up to the task this setup would be awesome!  Unfortunately, it's not.

Numen uses Vosk (https://alphacephei.com/vosk/ ) as it's speech recognition engine.  (I believe Vosk is in turn based on the Kaldi speech recognition toolkit: https://kaldi-asr.org/doc/about.html .)  According to the documentation for Numen and Vosk, the default model used by Numen is 'vosk-model-small-en-us-0.15' which has a word error rate (WER) of around 10 (see https://alphacephei.com/vosk/models ).  This means that, on average, around one word ten will be wrong.

At first glance, a WER of 10 doesn't sound too bad for dictation ... but consider: Imagine that you've just dictated an e-mail with 200 words in it.  About 20 of those will be wrong.  That means not only that one has to go back and proof read the e-mail to look for the errors, but, presumably, voice control will _also_ be used to correct the errors also.  But some of those commands will be misunderstood creating yet more problems.  Which brings me to ...

For computer control functions (editing text, selecting between application windows, selecting menu items, clicking on buttons, changing work spaces, etc., etc.) is sometimes a complete mess. In this case the AI/LLM is frequently being used to enter various control sequences.  When _this_ goes wrong it can be a complete disaster!  It can (unintentionally) delete big chunks of text, delete large numbers of emails, close application windows, put the keyboard and/or mouse in an unresponsive state, and so on. Recovering from such errors has sometimes taken me an hour or more.  That certainly doesn't do much for my productivity. :^/

Clearly there are cases where even this level of functionality/reliability would be a huge win.  Somebody who is unable to use a keyboard and mouse at all for example.  For me, as I've said, it's basically a draw.  For somebody with no problem using a mouse and keyboard it would just be a huge PITA.

As always though, this is just my $0.02.

On 7/20/25 02:16, Jean Louis wrote:
* Leland C. Best<[email protected]> [2025-07-18 18:54]:
I want to respond to one particular point:

On 7/16/25 04:30, Jean Louis wrote:

[...]
     There will be a day when AI is actually productively helpful, but
     that's not today for most things.
Well maybe not for you, I respect the opinion, though many of people I
know using Large Language Models (LLM) have got tremendous assistance,
that they couldn't complete themselves otherwise. It would need too
large number of people, and for individual on university, it wouldn't
be even possible making those projects.
[...]

I have some serious difficulties using a computer due to some neurological
issues.  I can type at about 0.5 - 1.0 key-stroke/second.
Sorry for that.

Can you speak? If yes, then I can help you talking to get much faster
transcription on computer. That is what I do daily.

Using a mouse is even more problematic. Clicking a button can take
anywhere from 5sec to 1min or more.  I can, however, talk/speak
okay.
You can contact me privately, and I can help you implement transcription.

                ⭐ Hyperscope Tags ⭐

database sql Postgresql Hyperscope report rcd-notes
statistics 24-hours

┌────────────┬─────────────┬──────────────────────┐
│    Date    │ Total Words │ Average Spoken Pages │
├────────────┼─────────────┼──────────────────────┤
│ 2025-07-20 │        1506 │                 3.01 │
│ 2025-07-18 │        2896 │                 5.79 │
│ 2025-07-17 │        2728 │                 5.46 │
│ 2025-07-16 │        3460 │                 6.92 │
│ 2025-07-15 │        3329 │                 6.66 │
│ 2025-07-14 │        1209 │                 2.42 │
│ 2025-07-13 │         537 │                 1.07 │
│ 2025-07-12 │        6161 │                12.32 │

I had thought, therefore, that using today's AI/LLM technology would
allow me to use a computer much more efficiently by simply telling
it what I wanted to do. No such luck.
It can.

Just that I am on GNU/Linux OS, so if you use something else, I can't
test it easy on distance, especially not with your speed. But someone
could be helping you to set things up.

I am daily using this script:
https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-speech-single-input.sh

And so many times, it saves my time, it saves time of my workers,
imagine having 5-10 employees and each waiting for you to write
something, just speak and transribe.

Contact me if you need help implementing it.

First, the only reasonably-Free voice control software I could find was the
Numen project (seehttps://sr.ht/~geb/numen/ ). After building it from
source (there is no pre-built package for the GNU/Linux distributions I use)
and modifying the configuration files to suit my taste, I ran some tests to
compare it against my "normal" computer use.  The results were seriously
underwhelming. It was no faster than me just using the computer unaided.
The problem seems to be that Numen makes so many mistakes that correcting
them negates any gains from the times when it gets everything right.
I can't now dwell to Numen. I understand your disappointment, you did
not find your solution.

Your use case is exactly where new Large Language Model (LLM) can
help. If you could tell me your workflows and you work on GNU/Linux,
then I could help you implement the easy control of the workflows.

That is what I do, I minimize my workflows. It helps daily.

I am using Emacs Lisp and Emacs for that. Calling people from within
Emacs, sending SMS, sharing locations, tasks, notes, documents,
dispatching emails without opening email program. And other
things. Who can wait those seconds?

I find person's name who needs to receive document, press h s and
select the document by it's name or number, it is sent.

No tedious opening of e-mail client, entering e-mail address, finding
document, attaching it...

So your workflow by talking can be improved, imagine front screen:

E-mails
-------

- send e-mail to someone
- see new e-mails
- send document to someone

Phone
-----

- send SMS
- call someone
- find a contact

Text
----

- Write an article
- Publish article

and so on. Computer asks you, or you just press a button. Spend some seconds 
doing it.

Computer asks you, you answer what you want, like "I want to send
E-mail to Jean Louis", and computer could eventually, provided we work
on that workflow, start composing it.

It would ask you for subject maybe, or generate the subject.

You would talk, and computer would transcribe it.

When you finish, you would say it is finished, not even type it.

Computer would send e-mail for you once you approve it.

Workflows like that are fine. It will work with low cost Nvidia GPU cards.

Granted, my evidence is purely anecdotal.  Also, there are clearly people
for which even this level of functionality would be a big plus.  Still.  If
AI/LLM can't provide _me_ any productivity gains in performing such
elementary tasks, why should I believe it would do any better at much more
complicated tasks, and/or for people without my disabilities?
I can understand your disbelief and disappointments, but good that you
came to practical part. I am here, ready to help you implement it.

You must be ready to buy hardware and install software that is needed.

Jean Louis
Cheers
Leland

--
-------------------------------------------------------------------------------
Leland C. Best      | When stupidity is considered patriotism, it is unsafe
[email protected] | to be intelligent.
                    | -- Isaac Asimov
-------------------------------------------------------------------------------


_______________________________________________
libreplanet-discuss mailing list
[email protected]
https://lists.libreplanet.org/mailman/listinfo/libreplanet-discuss

Reply via email to