> Losing the high frequency formants above 4k would decrease
intelligibility a lot.

Nah, band limiting speech to 4kHz has only a small impact on
intelligibility, at least in clean conditions. That's why telephony has
used (at best) 4kHz bandwidth since forever - most of us have never made an
actual live phone call with more than maybe 3.75kHz bandwidth (if that -
closer to 3kHz is common), and we all seem to be able to converse without
much issue. You can get higher bandwidths on over-the-top VOIP systems
(Skype, etc.) but in that case the higher latency often cancels out any
advantage of higher bandwidth in conversing, IMHO.

But one thing that increasing bandwidth definitely helps is fatigue. Our
brains have to do much less work to make sense of speech signals with
higher bandwidth (especially in noise), so we fatigue faster when we're
listening to speech with significantly reduced bandwidth. I guess this
happens because higher-level parts of the brain have to get involved in
order to resolve ambiguities (i.e., "did he just say 'cap' or 'cat?'" It's
probably clear from context, but not directly from the acoustic cues coming
out of the cochlea - meaning some higher level cognitive power has to be
expended to resolve the issue).

Anyway for something like an audiobook, where the idea is to listen for
long stretches of time and with minimal fatigue, I definitely agree that
bandwidth beyond 4kHz is indicated. It won't make it particularly more
intelligible, but it will make it more pleasant and relaxing to listen to.
And it's not like there's a big premium on reducing bandwidth like there is
in telephony, so why not?

That said you probably do still want to isolate the 4kHz band and process
it separately, since the structure of speech tends to be different in the
upper bandwidths (voicing tails off, time resolution becomes much more
important, etc.).

E


On Thu, Mar 6, 2014 at 4:10 PM, Peter S <peter.schoffhau...@gmail.com>wrote:

> On 06/03/2014, Charles Z Henry <czhe...@gmail.com> wrote:
>
> >> 1) Steep filter to isolate speech (100-4k?).
> >>
> >
> > Not a great idea over all.  You might just mangle the audio a bit more.
>  It
> > loses some temporal qualities when filtered too much.  Listening to
> speech
> > relies a lot on temporal cues that you want to keep intact.
>
> I think probably he meant a multiband processor, using crossover
> filters to isolate and process the important region (100-4k), and then
> combine that with the rest of the (unprocessed) spectrum. At least
> that's how I imagined it. Losing the high frequency formants above 4k
> would decrease intelligibility a lot.
>
> - Peter
> --
> dupswapdrop -- the music-dsp mailing list and website:
> subscription info, FAQ, source code archive, list archive, book reviews,
> dsp links
> http://music.columbia.edu/cmc/music-dsp
> http://music.columbia.edu/mailman/listinfo/music-dsp
>
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to