I should say that I'm approaching this from first principles and not from any practical knowledge. So this is probably not of interest to Rachel. But I do find the problem interesting.
| From: Jim Van Meggelen <[email protected]> | If memory serves correctly, the conference mixer doesn't have to mix all | incoming audio, but rather only has to mix relevant audio (i.e. figure out | who's talking, and take that single audio stream and send it out to all the | participating channels). One challenge I would expect would be figuring out | the noise threshold (i.e. what is talking and what is just background noise), | and knowing to quickly enable a channel when somebody is speaking. A good | mixer should be able to handle more than one person speaking, but since for | the most part people can only handle one person talking at a time, if the | mixer is good, it doesn't have to work so hard at that. You also asked whether the problem was to handle M conferences of M people (where perhaps M * N = 1000) or one conference of N people (where N = 1000). A very good question. In a face to face conference, people behave differently as the number of participants increases. In particular, speaker selection gets to be more and more formal because the problem gets harder to solve. Things don't get easier with telephone conferencing: - some out of band signals are lost - eye contact, gaze - standing, sticking hand in the air - designation by chairperson - leaning over and whispering to a neighbour - some signals are degraded - only some frequencies are carried and the accuracy is reduced - dynamic range is reduced ("speaking up" works in real conferences but not nearly as well over a phone) - even modest time delays confuse informal conversational protocols - (with current systems) localization clues/cues are lost. The human ear can tell (with some ambiguity) where a sound comes from. This turns out to help quite a bit in understanding what is going on with several auditory things going on at once. I don't immediately see how a largish conference can be run as anything other than broadcasting by a single speaker or a small number of speakers designated manually. As a thought experiment, consider how one can hear a speaker in a lecture even over coughing. I don't see that working in a telephone conference with all mikes open. | I suspect the math involved is pretty complex, though. Math I can handle (perhaps). What I don't know are the practical considerations. The psycho-acoustics are not obvious. | This also gets me wondering if multiple, discreet conferences eat up more | horsepower than a single conference would, even with a large number of | participants. I imagine that small conferences would be more amenable to automatic solutions and hence could take more processing (per participant) than large conferences when simple designation must be used. I have no idea what the thresholds would be. I don't even know how many different strategies there would be (i.e. how many thresholds). | I suspect there's a lot more to it than that, though. Agreed. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
