> Losing the high frequency formants above 4k would decrease intelligibility a lot.
Nah, band limiting speech to 4kHz has only a small impact on intelligibility, at least in clean conditions. That's why telephony has used (at best) 4kHz bandwidth since forever - most of us have never made an actual live phone call with more than maybe 3.75kHz bandwidth (if that - closer to 3kHz is common), and we all seem to be able to converse without much issue. You can get higher bandwidths on over-the-top VOIP systems (Skype, etc.) but in that case the higher latency often cancels out any advantage of higher bandwidth in conversing, IMHO. But one thing that increasing bandwidth definitely helps is fatigue. Our brains have to do much less work to make sense of speech signals with higher bandwidth (especially in noise), so we fatigue faster when we're listening to speech with significantly reduced bandwidth. I guess this happens because higher-level parts of the brain have to get involved in order to resolve ambiguities (i.e., "did he just say 'cap' or 'cat?'" It's probably clear from context, but not directly from the acoustic cues coming out of the cochlea - meaning some higher level cognitive power has to be expended to resolve the issue). Anyway for something like an audiobook, where the idea is to listen for long stretches of time and with minimal fatigue, I definitely agree that bandwidth beyond 4kHz is indicated. It won't make it particularly more intelligible, but it will make it more pleasant and relaxing to listen to. And it's not like there's a big premium on reducing bandwidth like there is in telephony, so why not? That said you probably do still want to isolate the 4kHz band and process it separately, since the structure of speech tends to be different in the upper bandwidths (voicing tails off, time resolution becomes much more important, etc.). E On Thu, Mar 6, 2014 at 4:10 PM, Peter S <peter.schoffhau...@gmail.com>wrote: > On 06/03/2014, Charles Z Henry <czhe...@gmail.com> wrote: > > >> 1) Steep filter to isolate speech (100-4k?). > >> > > > > Not a great idea over all. You might just mangle the audio a bit more. > It > > loses some temporal qualities when filtered too much. Listening to > speech > > relies a lot on temporal cues that you want to keep intact. > > I think probably he meant a multiband processor, using crossover > filters to isolate and process the important region (100-4k), and then > combine that with the rest of the (unprocessed) spectrum. At least > that's how I imagined it. Losing the high frequency formants above 4k > would decrease intelligibility a lot. > > - Peter > -- > dupswapdrop -- the music-dsp mailing list and website: > subscription info, FAQ, source code archive, list archive, book reviews, > dsp links > http://music.columbia.edu/cmc/music-dsp > http://music.columbia.edu/mailman/listinfo/music-dsp > -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp