I would be plased to have a complete list of the phonemes and corresponding audio files from different speakers. I fear 44 phonemes will not be enough to do a context-free analisis.
The data rate will be closer to 200pbs i think, since you will have to transfer a magnitude component along with the phoneme index, and maybe also a pitch component. Think of the pitch raise in a question, this feature is important for understanding. The main problem will be the fft to phoneme table correlation i think ... but to work on this there must be a phoneme table first.