[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-20 Thread Vojtech Bubnik
> Books that have been done in the past did not have narrow bandwidth
> as their main objective.

Look at the LPC-10 codec. You could try it by downloading internet
telephony software from speakfreely.org. The codec was developed with
the low bandwidth in mind and its intelligibility is on the threshold
what I would accept. If IVOX is able to squeeze LPC-10 to 1200bps with
the same intelligibility, than it is great.

The proposal that you gave will certainly not work. You describe a
system, that computes spectrum and compares it to a library of
spectra, one by one. It may work for some vowels, but it will
certainly not work for the other phonemes. The task is much more
difficult than you think.

The fricatives - d, t, f, s etc are hard to detect. The sound is quite
complex and spectrum of a single phoneme changes in time. The sound
starts with quiet pause, then explosion, then some transient of the
explosion. The hidden Markov classifier is a kind of probabilistic
network, is often used on spectra and is a good system for detection
of phonemes. As I already wrote, if you get the consonants wrong, your
system will mumble.

Even for vowels, you would have to keep more spectra for a single
phoneme in the lookup table, sampled for different pitch. You don't
want to learn to speak with a constant pitch.

Phoneme detection is described in every voice recognition text book.
Writing papers about phoneme detection for HAM radio is reinventing
the wheel.

73, Vojtech OK1IAK




Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-20 Thread Mike Lebo
Vojtech,

Thank you for reading my papers. I have no intention of re-inventing the
wheel. The project is like echolink and does not understand speech or change
to text. Books that have been done in the past did not have narrow bandwidth
as their main objective. I do not need hi-fidelity to understand what is
being said. I am used to slightly de-tuned SSB voice. I just need something
that is good enough. My big problem is that none of this will ever happen
unless someone steps up and wants to help me learn how to modify free,
public domain, C++ software. Could you or someone you know be that person?

73's

Miken6ief

On Nov 17, 2007 7:11 AM, Vojtěch Bubník <[EMAIL PROTECTED]> wrote:

>   Hi Mike.
>
> I studied some aspects of voice recognition about 10 years ago when I
> thought of joining a research group at Czech Technical University in Prague.
> I have a 260 pages text book on my book shelf on voice recognition.
>
> Voice signal has high redundancy if compared to a text transcription. But
> there is additional information stored in the voice signal like pitch,
> intonation, speed. One could estimate for example mood of the speaker from
> the utterance.
>
> Voice tract could be described by a generator (tone for vowels, hiss for
> consonants) and filter. Translating voice into generator and filter
> coefficients greatly decreases voice data redundancy. This is roughly the
> technique that the common voice codecs do. GSM voice compression is a kind
> of Algebraic Code Excited Linear Prediction. Another interesting codec is
> AMBE (Advanced Multi-Band Excitation) used by DSTAR system. GSM half-rate
> codec squeezes voice to 5.6kbit/sec, AMBE to 3.6 kbps. Both systems use
> excitation tables, but AMBE is more efficient and closed source. I think the
> clue to the efficiency is in size and quality of the excitation tables. To
> create such an algorithm requires considerable amount of research and data
> analysis. The intelligibility of GSM or AMBE codecs is very good. You could
> buy the intelectual property of the AMBE codec by buying the chip. There are
> couple of projects running trying to built DSTAR into legacy transceivers.
>
> About 10 years ago we at OK1KPI club experimented with an echolink like
> system. We modified speakfreely software to control FM transceiver and we
> added web interface to control tuning and subtone of the transceiver. It was
> a lot of fun and a very unique system at that time.
> http://www.speakfreely.org/ The best compression factor offers LPC-10
> codec (3460kbps), but the sound is very robot-like and quite hard to
> understand. At the end we reverted to GSM. I think IVOX is a variant of the
> LPC system that we tried.
>
> Your proposal is to increase compression rate by transmitting phonemes. I
> once had the same idea, but I quickly rejected it. Although it may be a nice
> exercise, I find it not very useless until good continuous speech
> multi-speaker multi-language recognition systems are available. I will try
> to explain my reasoning behind that statement.
>
> Let's classify voice recognition systems by the implementation complexity:
> 1) Single-speaker, limited set of utterances recognized (control your
> desktop by voice)
> 2) Multiple-speaker, limited set of utterances recognized (automated phone
> system)
> 3) dictating system
> 4) continuous speech transcription
> 5) speech recognition and understanding
>
> Your proposal will need implement most of the code from 4) or 5) to be
> really usable and it has to be reliable.
>
> State of the art voice recognition systems use hidden Markov models to
> detect phonemes. Phoneme is searched by traversing state diagram by
> evaluating multiple recorded spectra. The phoneme is soft-decoded. Output of
> the classifier is a list of phonemes with their probabilities of detection
> assigned. To cope with phoneme smearing on their boundaries, either
> sub-phonemes or phoneme pairs need to be detected.
>
> After the phonemes are classified, they are chained into words. Depending
> on the dictionary, most probable words are picked. You suppose that your
> system will not need it. But the trouble are consonants. They carry much
> less energy than vowels and are much easier to be confused. Dictionary is
> used to pick some second highest probability detected consonants in the
> word. Not only the dictionary, but also the phoneme classifier is language
> dependent.
>
> I think human brain works in the same way. Imagine learning foreign
> language. Even if you are able to recognize slowly pronounced words, you
> will be unable to pick them in a fast pronounced sentence. The word will
> sound different. Human needs considerable training to understand a language.
> You could decrease complexity of the decoder by constraining the detection
> to slowly dictated separate words.
>
> If you simply pick the high probability phoneme, you will experience
> comprehension problems of people with hearing loss. Oh yes, I am currently
> working 

Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-20 Thread Mike Lebo
Bob,

I was thinking about an SSB signal that is off frequency. Most of the time I
could get the information I need to get the contact. I never intended this
to be hi-fidelity. I just want it to be good enough.

Miken6ief

On Nov 18, 2007 12:11 PM, Robert Thompson <[EMAIL PROTECTED]>
wrote:

>   That is not entirely true. Besides, I wasn't focusing so much on their
> "real" research as the voice characterization research that they had
> to do before they could usefully work on recognition. It turns out
> that the very areas that are most necessary for digital voice
> recognition are the ones most necessary for human brains to recognize
> and interpret. Voice is a mixed-information-density signal, and if you
> "simplify" the signal by filtering out and discarding the less
> necessary elements, you have significantly reduced the effort the next
> stage has to do, whether it's digital encoding or speech recognition.
>
>
> On Nov 18, 2007 1:31 PM, Mike Lebo <[EMAIL PROTECTED]>
> wrote:
> >
> > Robert,
> >
> > I agree. The thing that is different is that speech recognition is not
> real
> > time. Voice over the radio is real time.
> >
> > Mike n6ief
> >
> >
> >
> > On Nov 18, 2007 10:46 AM, Robert Thompson < [EMAIL 
> > PROTECTED]
> >
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > There are several (military/gov) standard intelligibility tests that
> > > do a pretty good job of scoring what most humans can and can not
> > > reliably understand. You might try taking a look at them to get some
> > > ideas of which voice characteristics make the most difference to
> > > intelligibility. There is actually a surprising amount of data out
> > > there, especially if you include the data peripheral to the various
> > > computerized speech translator research projects. It's not *exactly*
> > > signal processing... but understanding what parts of the signal matter
> > > the most can be surprisingly helpful. This may be unusually
> > > productive, because as of yet there hasn't been a huge amount of
> > > cross-discipline work between the codec researchers and the
> > > speech-to-meaning researchers. While there's a lot of duplicate
> > > research in there, it tends to be from slightly different
> > > perspectives, and the "stereo view" can sometimes help.
> > >
> > >
> > >
> > >
> > > On Nov 18, 2007 9:12 AM, Mike Lebo <[EMAIL 
> > > PROTECTED]>
> wrote:
> > > >
> > > > Hi Vojtech,
> > > >
> > > > Thank you for your reply to my papers. I will do more work on the
> > phonemes.
> > > > The project I want to do uses new computers that were no available
> 10
> > years
> > > > ago. Every 10 mS a decision is made to send a one or a zero. To make
> > that
> > > > decision I have 68 parallel FFT's running in the background. I
> believe
> > the
> > > > brain could handle mispronounce words better than you think.
> > > >
> > > > Mike
> > > >
> > > >
> > > > On Nov 17, 2007 3:55 PM, r_lwesterfield <
> [EMAIL PROTECTED] >
> > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that
> > operate
> > > > in MELP for a vocoder – Mixed Excitation Linear Prediction. We have
> > found
> > > > MELP to be superior (more human-like voice qualities – less Charlie
> > Brown's
> > > > teacher) to LPC-10 but we use far larger bandwidths than 100 khz. I
> do
> > not
> > > > know how well any of this will play out at such a narrow bandwidth.
> > > > Listening to Charlie Brown's teacher will send you running away
> quickly
> > and
> > > > you should think of your listeners . . . they will tire very
> quickly.
> > Just
> > > > because voice can be sent at such narrower bandwidths does not
> > necessarily
> > > > mean that people will like to listen to it.
> > > > >
> > > > >
> > > > >
> > > > > Rick – KH2DF
> > > > >
> > > > >
> > > >

Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-18 Thread Robert Thompson
Oops, sent too quickly. What I meant was: "That ( speech recognition
not being real time) is not entirely true." There are many commercial
packages that do minimal-lag "realtime" speech recognition. One
example would be the voice command features built into Apple's OSX.
Another would be any one of a number of speech-to-text transcription
packages.

I apologize if my unsupported and abrupt original phrasing appeared to
be inflammatory. Such was not intended.


On Nov 18, 2007 2:11 PM, Robert Thompson <[EMAIL PROTECTED]> wrote:
> That is not entirely true. Besides, I wasn't focusing so much on their
> "real" research as the voice characterization research that they had
> to do before they could usefully work on recognition. It turns out
> that the very areas that are most necessary for digital voice
> recognition are the ones most necessary for human brains to recognize
> and interpret. Voice is a mixed-information-density signal, and if you
> "simplify" the signal by filtering out and discarding the less
> necessary elements, you have significantly reduced the effort the next
> stage has to do, whether it's digital encoding or speech recognition.
>
>
>
> On Nov 18, 2007 1:31 PM, Mike Lebo <[EMAIL PROTECTED]> wrote:
> >
> >  Robert,
> >
> > I agree. The thing that is different is that speech recognition is not real
> > time. Voice over the radio is real time.
> >
> > Mike n6ief
> >
> >
> >
> > On Nov 18, 2007 10:46 AM, Robert Thompson < [EMAIL PROTECTED]>
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > There are several (military/gov) standard intelligibility tests that
> > > do a pretty good job of scoring what most humans can and can not
> > > reliably understand. You might try taking a look at them to get some
> > > ideas of which voice characteristics make the most difference to
> > > intelligibility. There is actually a surprising amount of data out
> > > there, especially if you include the data peripheral to the various
> > > computerized speech translator research projects. It's not *exactly*
> > > signal processing... but understanding what parts of the signal matter
> > > the most can be surprisingly helpful. This may be unusually
> > > productive, because as of yet there hasn't been a huge amount of
> > > cross-discipline work between the codec researchers and the
> > > speech-to-meaning researchers. While there's a lot of duplicate
> > > research in there, it tends to be from slightly different
> > > perspectives, and the "stereo view" can sometimes help.
> > >
> > >
> > >
> > >
> > > On Nov 18, 2007 9:12 AM, Mike Lebo <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Hi Vojtech,
> > > >
> > > > Thank you for your reply to my papers. I will do more work on the
> > phonemes.
> > > > The project I want to do uses new computers that were no available 10
> > years
> > > > ago. Every 10 mS a decision is made to send a one or a zero. To make
> > that
> > > > decision I have 68 parallel FFT's running in the background. I believe
> > the
> > > > brain could handle mispronounce words better than you think.
> > > >
> > > > Mike
> > > >
> > > >
> > > > On Nov 17, 2007 3:55 PM, r_lwesterfield <[EMAIL PROTECTED]>
> > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that
> > operate
> > > > in MELP for a vocoder – Mixed Excitation Linear Prediction. We have
> > found
> > > > MELP to be superior (more human-like voice qualities – less Charlie
> > Brown's
> > > > teacher) to LPC-10 but we use far larger bandwidths than 100 khz. I do
> > not
> > > > know how well any of this will play out at such a narrow bandwidth.
> > > > Listening to Charlie Brown's teacher will send you running away quickly
> > and
> > > > you should think of your listeners . . . they will tire very quickly.
> > Just
> > > > because voice can be sent at such narrower bandwidths does not
> > necessarily
> > > > mean that people will like to listen to 

Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-18 Thread Robert Thompson
That is not entirely true. Besides, I wasn't focusing so much on their
"real" research as the voice characterization research that they had
to do before they could usefully work on recognition. It turns out
that the very areas that are most necessary for digital voice
recognition are the ones most necessary for human brains to recognize
and interpret. Voice is a mixed-information-density signal, and if you
"simplify" the signal by filtering out and discarding the less
necessary elements, you have significantly reduced the effort the next
stage has to do, whether it's digital encoding or speech recognition.


On Nov 18, 2007 1:31 PM, Mike Lebo <[EMAIL PROTECTED]> wrote:
>
>  Robert,
>
> I agree. The thing that is different is that speech recognition is not real
> time. Voice over the radio is real time.
>
> Mike n6ief
>
>
>
> On Nov 18, 2007 10:46 AM, Robert Thompson < [EMAIL PROTECTED]>
> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > There are several (military/gov) standard intelligibility tests that
> > do a pretty good job of scoring what most humans can and can not
> > reliably understand. You might try taking a look at them to get some
> > ideas of which voice characteristics make the most difference to
> > intelligibility. There is actually a surprising amount of data out
> > there, especially if you include the data peripheral to the various
> > computerized speech translator research projects. It's not *exactly*
> > signal processing... but understanding what parts of the signal matter
> > the most can be surprisingly helpful. This may be unusually
> > productive, because as of yet there hasn't been a huge amount of
> > cross-discipline work between the codec researchers and the
> > speech-to-meaning researchers. While there's a lot of duplicate
> > research in there, it tends to be from slightly different
> > perspectives, and the "stereo view" can sometimes help.
> >
> >
> >
> >
> > On Nov 18, 2007 9:12 AM, Mike Lebo <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi Vojtech,
> > >
> > > Thank you for your reply to my papers. I will do more work on the
> phonemes.
> > > The project I want to do uses new computers that were no available 10
> years
> > > ago. Every 10 mS a decision is made to send a one or a zero. To make
> that
> > > decision I have 68 parallel FFT's running in the background. I believe
> the
> > > brain could handle mispronounce words better than you think.
> > >
> > > Mike
> > >
> > >
> > > On Nov 17, 2007 3:55 PM, r_lwesterfield <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that
> operate
> > > in MELP for a vocoder – Mixed Excitation Linear Prediction. We have
> found
> > > MELP to be superior (more human-like voice qualities – less Charlie
> Brown's
> > > teacher) to LPC-10 but we use far larger bandwidths than 100 khz. I do
> not
> > > know how well any of this will play out at such a narrow bandwidth.
> > > Listening to Charlie Brown's teacher will send you running away quickly
> and
> > > you should think of your listeners . . . they will tire very quickly.
> Just
> > > because voice can be sent at such narrower bandwidths does not
> necessarily
> > > mean that people will like to listen to it.
> > > >
> > > >
> > > >
> > > > Rick – KH2DF
> > > >
> > > >
> > > >
> > > > 
> > >
> > > >
> > > > From: digitalradio@yahoogroups.com
> [mailto:[EMAIL PROTECTED]
> > > On Behalf Of Vojtech Bubník
> > > > Sent: Saturday, November 17, 2007 9:11 AM
> > > > To: [EMAIL PROTECTED]; digitalradio@yahoogroups.com
> > > > Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi Mike.
> > > >
> > > > I studied some aspects of voice recognition about 10 years ago when I
> > > thought of joining a research group at Czech Technical University in
> Prague.
> > > I have a 260 pages text book on my book shelf on

Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-18 Thread Mike Lebo
Robert,

I agree. The thing that is different is that speech recognition is not real
time. Voice over the radio is real time.

Mike n6ief

On Nov 18, 2007 10:46 AM, Robert Thompson <[EMAIL PROTECTED]>
wrote:

>   There are several (military/gov) standard intelligibility tests that
> do a pretty good job of scoring what most humans can and can not
> reliably understand. You might try taking a look at them to get some
> ideas of which voice characteristics make the most difference to
> intelligibility. There is actually a surprising amount of data out
> there, especially if you include the data peripheral to the various
> computerized speech translator research projects. It's not *exactly*
> signal processing... but understanding what parts of the signal matter
> the most can be surprisingly helpful. This may be unusually
> productive, because as of yet there hasn't been a huge amount of
> cross-discipline work between the codec researchers and the
> speech-to-meaning researchers. While there's a lot of duplicate
> research in there, it tends to be from slightly different
> perspectives, and the "stereo view" can sometimes help.
>
>
> On Nov 18, 2007 9:12 AM, Mike Lebo <[EMAIL PROTECTED]>
> wrote:
> >
> > Hi Vojtech,
> >
> > Thank you for your reply to my papers. I will do more work on the
> phonemes.
> > The project I want to do uses new computers that were no available 10
> years
> > ago. Every 10 mS a decision is made to send a one or a zero. To make
> that
> > decision I have 68 parallel FFT's running in the background. I believe
> the
> > brain could handle mispronounce words better than you think.
> >
> > Mike
> >
> >
> > On Nov 17, 2007 3:55 PM, r_lwesterfield <[EMAIL 
> > PROTECTED]
> >
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that
> operate
> > in MELP for a vocoder – Mixed Excitation Linear Prediction. We have
> found
> > MELP to be superior (more human-like voice qualities – less Charlie
> Brown's
> > teacher) to LPC-10 but we use far larger bandwidths than 100 khz. I do
> not
> > know how well any of this will play out at such a narrow bandwidth.
> > Listening to Charlie Brown's teacher will send you running away quickly
> and
> > you should think of your listeners . . . they will tire very quickly.
> Just
> > because voice can be sent at such narrower bandwidths does not
> necessarily
> > mean that people will like to listen to it.
> > >
> > >
> > >
> > > Rick – KH2DF
> > >
> > >
> > >
> > > 
> >
> > >
> > > From: digitalradio@yahoogroups.com 
> > > [mailto:
> digitalradio@yahoogroups.com ]
> > On Behalf Of Vojtech Bubník
> > > Sent: Saturday, November 17, 2007 9:11 AM
> > > To: [EMAIL PROTECTED] ;
> digitalradio@yahoogroups.com 
> > > Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi Mike.
> > >
> > > I studied some aspects of voice recognition about 10 years ago when I
> > thought of joining a research group at Czech Technical University in
> Prague.
> > I have a 260 pages text book on my book shelf on voice recognition.
> > >
> > > Voice signal has high redundancy if compared to a text transcription.
> But
> > there is additional information stored in the voice signal like pitch,
> > intonation, speed. One could estimate for example mood of the speaker
> from
> > the utterance.
> > >
> > > Voice tract could be described by a generator (tone for vowels, hiss
> for
> > consonants) and filter. Translating voice into generator and filter
> > coefficients greatly decreases voice data redundancy. This is roughly
> the
> > technique that the common voice codecs do. GSM voice compression is a
> kind
> > of Algebraic Code Excited Linear Prediction. Another interesting codec
> is
> > AMBE (Advanced Multi-Band Excitation) used by DSTAR system. GSM
> half-rate
> > codec squeezes voice to 5.6kbit/sec, AMBE to 3.6 kbps. Both systems use
> > excitation tables, but AMBE is more efficient and closed source. I think
> the
> > clue to the efficiency is in size and quality of the excitation tables.
> To
> > create such an algorithm requires considerable amount of research a

Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-18 Thread Robert Thompson
There are several (military/gov) standard intelligibility tests that
do a pretty good job of scoring what most humans can and can not
reliably understand. You might try taking a look at them to get some
ideas of which voice characteristics make the most difference to
intelligibility. There is actually a surprising amount of data out
there, especially if you include the data peripheral to the various
computerized speech translator research projects. It's not *exactly*
signal processing... but understanding what parts of the signal matter
the most can be surprisingly helpful. This may be unusually
productive, because as of yet there hasn't been a huge amount of
cross-discipline work between the codec researchers and the
speech-to-meaning researchers. While there's a lot of duplicate
research in there, it tends to be from slightly different
perspectives, and the "stereo view" can sometimes help.



On Nov 18, 2007 9:12 AM, Mike Lebo <[EMAIL PROTECTED]> wrote:
>
>  Hi Vojtech,
>
> Thank you for your reply to my papers. I will do more work on the phonemes.
> The project I want to do uses new computers that were no available 10 years
> ago. Every 10 mS a decision is made to send a one or a zero. To make that
> decision I have 68 parallel FFT's running in the background. I believe the
> brain could handle mispronounce words better than you think.
>
> Mike
>
>
> On Nov 17, 2007 3:55 PM, r_lwesterfield <[EMAIL PROTECTED]>
> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that operate
> in MELP for a vocoder – Mixed Excitation Linear Prediction.  We have found
> MELP to be superior (more human-like voice qualities – less Charlie Brown's
> teacher) to LPC-10 but we use far larger bandwidths than 100 khz.  I do not
> know how well any of this will play out at such a narrow bandwidth.
> Listening to Charlie Brown's teacher will send you running away quickly and
> you should think of your listeners . . . they will tire very quickly.  Just
> because voice can be sent at such narrower bandwidths does not necessarily
> mean that people will like to listen to it.
> >
> >
> >
> > Rick – KH2DF
> >
> >
> >
> > 
>
> >
> > From: digitalradio@yahoogroups.com [mailto:[EMAIL PROTECTED]
> On Behalf Of Vojtech Bubník
> > Sent: Saturday, November 17, 2007 9:11 AM
> > To: [EMAIL PROTECTED]; digitalradio@yahoogroups.com
> > Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth
> >
> >
> >
> >
> >
> >
> >
> > Hi Mike.
> >
> > I studied some aspects of voice recognition about 10 years ago when I
> thought of joining a research group at Czech Technical University in Prague.
> I have a 260 pages text book on my book shelf on voice recognition.
> >
> > Voice signal has high redundancy if compared to a text transcription. But
> there is additional information stored in the voice signal like pitch,
> intonation, speed. One could estimate for example mood of the speaker from
> the utterance.
> >
> > Voice tract could be described by a generator (tone for vowels, hiss for
> consonants) and filter. Translating voice into generator and filter
> coefficients greatly decreases voice data redundancy. This is roughly the
> technique that the common voice codecs do. GSM voice compression is a kind
> of Algebraic Code Excited Linear Prediction. Another interesting codec is
> AMBE (Advanced Multi-Band Excitation) used by DSTAR system. GSM half-rate
> codec squeezes voice to 5.6kbit/sec, AMBE to 3.6 kbps. Both systems use
> excitation tables, but AMBE is more efficient and closed source. I think the
> clue to the efficiency is in size and quality of the excitation tables. To
> create such an algorithm requires considerable amount of research and data
> analysis. The intelligibility of GSM or AMBE codecs is very good. You could
> buy the intelectual property of the AMBE codec by buying the chip. There are
> couple of projects running trying to built DSTAR into legacy transceivers.
> >
> > About 10 years ago we at OK1KPI club experimented with an echolink like
> system. We modified speakfreely software to control FM transceiver and we
> added web interface to control tuning and subtone of the transceiver. It was
> a lot of fun and a very unique system at that time.
> http://www.speakfreely.org/ The best compression factor offers LPC-10 codec
> (3460kbps), but the sound is very robot-like and quite hard to understand.
> At the end we reverted to GSM. I think IVOX is a variant of the LPC system
> that we tried.
> >
> > Y

Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-18 Thread Mike Lebo
Hi Vojtech,

Thank you for your reply to my papers. I will do more work on the phonemes.
The project I want to do uses new computers that were no available 10 years
ago. Every 10 mS a decision is made to send a one or a zero. To make that
decision I have 68 parallel FFT's running in the background. I believe the
brain could handle mispronounce words better than you think.

Mike

On Nov 17, 2007 3:55 PM, r_lwesterfield <[EMAIL PROTECTED]>
wrote:

>I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that
> operate in MELP for a vocoder – Mixed Excitation Linear Prediction.  We have
> found MELP to be superior (more human-like voice qualities – less Charlie
> Brown's teacher) to LPC-10 but we use far larger bandwidths than 100 khz.  I
> do not know how well any of this will play out at such a narrow bandwidth.
> Listening to Charlie Brown's teacher will send you running away quickly and
> you should think of your listeners . . . they will tire very quickly.  Just
> because voice can be sent at such narrower bandwidths does not necessarily
> mean that people will like to listen to it.
>
>
>
> Rick – KH2DF
>
>
>  --
>
> *From:* digitalradio@yahoogroups.com [mailto:[EMAIL PROTECTED]
> *On Behalf Of *Vojtech Bubník
> *Sent:* Saturday, November 17, 2007 9:11 AM
> *To:* [EMAIL PROTECTED]; digitalradio@yahoogroups.com
> *Subject:* [digitalradio] Re: digital voice within 100 Hz bandwidth
>
>
>
> Hi Mike.
>
> I studied some aspects of voice recognition about 10 years ago when I
> thought of joining a research group at Czech Technical University in Prague.
> I have a 260 pages text book on my book shelf on voice recognition.
>
> Voice signal has high redundancy if compared to a text transcription. But
> there is additional information stored in the voice signal like pitch,
> intonation, speed. One could estimate for example mood of the speaker from
> the utterance.
>
> Voice tract could be described by a generator (tone for vowels, hiss for
> consonants) and filter. Translating voice into generator and filter
> coefficients greatly decreases voice data redundancy. This is roughly the
> technique that the common voice codecs do. GSM voice compression is a kind
> of Algebraic Code Excited Linear Prediction. Another interesting codec is
> AMBE (Advanced Multi-Band Excitation) used by DSTAR system. GSM half-rate
> codec squeezes voice to 5.6kbit/sec, AMBE to 3.6 kbps. Both systems use
> excitation tables, but AMBE is more efficient and closed source. I think the
> clue to the efficiency is in size and quality of the excitation tables. To
> create such an algorithm requires considerable amount of research and data
> analysis. The intelligibility of GSM or AMBE codecs is very good. You could
> buy the intelectual property of the AMBE codec by buying the chip. There are
> couple of projects running trying to built DSTAR into legacy transceivers.
>
> About 10 years ago we at OK1KPI club experimented with an echolink like
> system. We modified speakfreely software to control FM transceiver and we
> added web interface to control tuning and subtone of the transceiver. It was
> a lot of fun and a very unique system at that time.
> http://www.speakfreely.org/ The best compression factor offers LPC-10
> codec (3460kbps), but the sound is very robot-like and quite hard to
> understand. At the end we reverted to GSM. I think IVOX is a variant of the
> LPC system that we tried.
>
> Your proposal is to increase compression rate by transmitting phonemes. I
> once had the same idea, but I quickly rejected it. Although it may be a nice
> exercise, I find it not very useless until good continuous speech
> multi-speaker multi-language recognition systems are available. I will try
> to explain my reasoning behind that statement.
>
> Let's classify voice recognition systems by the implementation complexity:
> 1) Single-speaker, limited set of utterances recognized (control your
> desktop by voice)
> 2) Multiple-speaker, limited set of utterances recognized (automated phone
> system)
> 3) dictating system
> 4) continuous speech transcription
> 5) speech recognition and understanding
>
> Your proposal will need implement most of the code from 4) or 5) to be
> really usable and it has to be reliable.
>
> State of the art voice recognition systems use hidden Markov models to
> detect phonemes. Phoneme is searched by traversing state diagram by
> evaluating multiple recorded spectra. The phoneme is soft-decoded. Output of
> the classifier is a list of phonemes with their probabilities of detection
> assigned. To cope with phoneme smearing on their boundaries, either
> sub-phonemes or phoneme pairs need to be detected.
>
> Aft

RE: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-17 Thread r_lwesterfield
I have a few radios (ARC-210-1851, PSC-5D, PRC-117F) at work that operate in
MELP for a vocoder – Mixed Excitation Linear Prediction.  We have found MELP
to be superior (more human-like voice qualities – less Charlie Brown’s
teacher) to LPC-10 but we use far larger bandwidths than 100 khz.  I do not
know how well any of this will play out at such a narrow bandwidth.
Listening to Charlie Brown’s teacher will send you running away quickly and
you should think of your listeners . . . they will tire very quickly.  Just
because voice can be sent at such narrower bandwidths does not necessarily
mean that people will like to listen to it.

 

Rick – KH2DF

 

  _  

From: digitalradio@yahoogroups.com [mailto:[EMAIL PROTECTED] On
Behalf Of Vojtech Bubník
Sent: Saturday, November 17, 2007 9:11 AM
To: [EMAIL PROTECTED]; digitalradio@yahoogroups.com
Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth

 

Hi Mike.

I studied some aspects of voice recognition about 10 years ago when I
thought of joining a research group at Czech Technical University in Prague.
I have a 260 pages text book on my book shelf on voice recognition.

Voice signal has high redundancy if compared to a text transcription. But
there is additional information stored in the voice signal like pitch,
intonation, speed. One could estimate for example mood of the speaker from
the utterance.

Voice tract could be described by a generator (tone for vowels, hiss for
consonants) and filter. Translating voice into generator and filter
coefficients greatly decreases voice data redundancy. This is roughly the
technique that the common voice codecs do. GSM voice compression is a kind
of Algebraic Code Excited Linear Prediction. Another interesting codec is
AMBE (Advanced Multi-Band Excitation) used by DSTAR system. GSM half-rate
codec squeezes voice to 5.6kbit/sec, AMBE to 3.6 kbps. Both systems use
excitation tables, but AMBE is more efficient and closed source. I think the
clue to the efficiency is in size and quality of the excitation tables. To
create such an algorithm requires considerable amount of research and data
analysis. The intelligibility of GSM or AMBE codecs is very good. You could
buy the intelectual property of the AMBE codec by buying the chip. There are
couple of projects running trying to built DSTAR into legacy transceivers.

About 10 years ago we at OK1KPI club experimented with an echolink like
system. We modified speakfreely software to control FM transceiver and we
added web interface to control tuning and subtone of the transceiver. It was
a lot of fun and a very unique system at that time. http://www.speakfre
<http://www.speakfreely.org/> ely.org/ The best compression factor offers
LPC-10 codec (3460kbps), but the sound is very robot-like and quite hard to
understand. At the end we reverted to GSM. I think IVOX is a variant of the
LPC system that we tried.

Your proposal is to increase compression rate by transmitting phonemes. I
once had the same idea, but I quickly rejected it. Although it may be a nice
exercise, I find it not very useless until good continuous speech
multi-speaker multi-language recognition systems are available. I will try
to explain my reasoning behind that statement.

Let's classify voice recognition systems by the implementation complexity:
1) Single-speaker, limited set of utterances recognized (control your
desktop by voice)
2) Multiple-speaker, limited set of utterances recognized (automated phone
system)
3) dictating system
4) continuous speech transcription
5) speech recognition and understanding

Your proposal will need implement most of the code from 4) or 5) to be
really usable and it has to be reliable.

State of the art voice recognition systems use hidden Markov models to
detect phonemes. Phoneme is searched by traversing state diagram by
evaluating multiple recorded spectra. The phoneme is soft-decoded. Output of
the classifier is a list of phonemes with their probabilities of detection
assigned. To cope with phoneme smearing on their boundaries, either
sub-phonemes or phoneme pairs need to be detected.

After the phonemes are classified, they are chained into words. Depending on
the dictionary, most probable words are picked. You suppose that your system
will not need it. But the trouble are consonants. They carry much less
energy than vowels and are much easier to be confused. Dictionary is used to
pick some second highest probability detected consonants in the word. Not
only the dictionary, but also the phoneme classifier is language dependent. 

I think human brain works in the same way. Imagine learning foreign
language. Even if you are able to recognize slowly pronounced words, you
will be unable to pick them in a fast pronounced sentence. The word will
sound different. Human needs considerable training to understand a language.
You could decrease complexity of the decoder by constraining the detection
to slowly dictated separate words.

If you simply pi

[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-17 Thread n6ief
Leigh,

I hope to have the project used throughout the world in less then a
year, once I get started. My problem is that I need someone to help me
start. If I don't find someone, the project will die with my two papers.

73's

Mike   n6ief

--- In digitalradio@yahoogroups.com, "pa0r" <[EMAIL PROTECTED]> wrote:
>
> In my opinion time delay is not even necessary, it is a matter of 
> choosing the right compression at the right level.
> 
> If I had time I would start the following project:
> * choose an existing speech-to-text converter (open source)
> * use cbh compression on the text, as normally no  more than 250
> different words are used anyway, so you can code 1 word in 1
character...
> * use pskmail to transmit the compressed text with PSK125 arq.
> * choose an existing open source text-to-speech converter (festival?)
> 
> You would save some 5 man-years of development with that approach.
> 
> 73,
> 
> Rein PA0R
> 
> 
> --- In digitalradio@yahoogroups.com, "Leigh L Klotz, Jr." 
> wrote:
> >
> > One thing to try might be an encoding that takes more time to send
than 
> > the audio it encodes.  If blank space compression is used, the effect 
> > can be reduced.  But there is nothing that says the encoding must be 
> > able to transmit voice in 100% of real time to be interesting or 
> > useful.
> > 73,
> > Leigh/WA5ZNU
> >
>




[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-17 Thread Vojtěch Bubník
Hi Mike.

I studied some aspects of voice recognition about 10 years ago when I thought 
of joining a research group at Czech Technical University in Prague. I have a 
260 pages text book on my book shelf on voice recognition.

Voice signal has high redundancy if compared to a text transcription. But there 
is additional information stored in the voice signal like pitch, intonation, 
speed. One could estimate for example mood of the speaker from the utterance.

Voice tract could be described by a generator (tone for vowels, hiss for 
consonants) and filter. Translating voice into generator and filter 
coefficients greatly decreases voice data redundancy. This is roughly the 
technique that the common voice codecs do. GSM voice compression is a kind of 
Algebraic Code Excited Linear Prediction. Another interesting codec is AMBE 
(Advanced Multi-Band Excitation) used by DSTAR system. GSM half-rate codec 
squeezes voice to 5.6kbit/sec, AMBE to 3.6 kbps. Both systems use excitation 
tables, but AMBE is more efficient and closed source. I think the clue to the 
efficiency is in size and quality of the excitation tables. To create such an 
algorithm requires considerable amount of research and data analysis. The 
intelligibility of GSM or AMBE codecs is very good. You could buy the 
intelectual property of the AMBE codec by buying the chip. There are couple of 
projects running trying to built DSTAR into legacy transceivers.

About 10 years ago we at OK1KPI club experimented with an echolink like system. 
We modified speakfreely software to control FM transceiver and we added web 
interface to control tuning and subtone of the transceiver. It was a lot of fun 
and a very unique system at that time. http://www.speakfreely.org/ The best 
compression factor offers LPC-10 codec (3460kbps), but the sound is very 
robot-like and quite hard to understand. At the end we reverted to GSM. I think 
IVOX is a variant of the LPC system that we tried.

Your proposal is to increase compression rate by transmitting phonemes. I once 
had the same idea, but I quickly rejected it. Although it may be a nice 
exercise, I find it not very useless until good continuous speech multi-speaker 
multi-language recognition systems are available. I will try to explain my 
reasoning behind that statement.

Let's classify voice recognition systems by the implementation complexity:
1) Single-speaker, limited set of utterances recognized (control your desktop 
by voice)
2) Multiple-speaker, limited set of utterances recognized (automated phone 
system)
3) dictating system
4) continuous speech transcription
5) speech recognition and understanding

Your proposal will need implement most of the code from 4) or 5) to be really 
usable and it has to be reliable.

State of the art voice recognition systems use hidden Markov models to detect 
phonemes. Phoneme is searched by traversing state diagram by evaluating 
multiple recorded spectra. The phoneme is soft-decoded. Output of the 
classifier is a list of phonemes with their probabilities of detection 
assigned. To cope with phoneme smearing on their boundaries, either 
sub-phonemes or phoneme pairs need to be detected.

After the phonemes are classified, they are chained into words. Depending on 
the dictionary, most probable words are picked.  You suppose that your system 
will not need it. But the trouble are consonants. They carry much less energy 
than vowels and are much easier to be confused. Dictionary is used to pick some 
second highest probability detected consonants in the word. Not only the 
dictionary, but also the phoneme classifier is language dependent. 

I think human brain works in the same way. Imagine learning foreign language. 
Even if you are able to recognize slowly pronounced words, you will be unable 
to pick them in a fast pronounced sentence. The word will sound different. 
Human needs considerable training to understand a language. You could decrease 
complexity of the decoder by constraining the detection to slowly dictated 
separate words.

If you simply pick the high probability phoneme, you will experience 
comprehension problems of people with hearing loss. Oh yes, I am currently 
working for hearing instrument manufacturer (I have nothing to do with 
merck.com).

from http://www.merck.com/mmhe/sec19/ch218/ch218a.html
> Loss of the ability to hear high-pitched sounds often makes it more difficult 
> to understand speech. Although the loudness of speech appears normal to the 
> person, certain consonant sounds—such as the sound of letters C, D, K, P, S, 
> and T—become hard to distinguish, so that many people with hearing loss think 
> the speaker is mumbling. Words can be misinterpreted. For example, a person 
> may hear “bone” when the speaker said “stone.”

For me, it would be very irritating to dictate slowly to a system knowing it 
will add some mumbling and not even having feedback about the errors the 
recognizer does. From my perspective, before good voice rec

[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-17 Thread pa0r
In my opinion time delay is not even necessary, it is a matter of 
choosing the right compression at the right level.

If I had time I would start the following project:
* choose an existing speech-to-text converter (open source)
* use cbh compression on the text, as normally no  more than 250
different words are used anyway, so you can code 1 word in 1 character...
* use pskmail to transmit the compressed text with PSK125 arq.
* choose an existing open source text-to-speech converter (festival?)

You would save some 5 man-years of development with that approach.

73,

Rein PA0R


--- In digitalradio@yahoogroups.com, "Leigh L Klotz, Jr." <[EMAIL PROTECTED]>
wrote:
>
> One thing to try might be an encoding that takes more time to send than 
> the audio it encodes.  If blank space compression is used, the effect 
> can be reduced.  But there is nothing that says the encoding must be 
> able to transmit voice in 100% of real time to be interesting or 
> useful.
> 73,
> Leigh/WA5ZNU
>




Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread Leigh L Klotz, Jr.
One thing to try might be an encoding that takes more time to send than 
the audio it encodes.  If blank space compression is used, the effect 
can be reduced.  But there is nothing that says the encoding must be 
able to transmit voice in 100% of real time to be interesting or 
useful.
73,
Leigh/WA5ZNU


[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread cesco12342000
I did send you a PM.




Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread Patrick Lindecker
Cesco,

RR for all

> At each 1/T it was necessary to send NxL elements of 
>information, which gives the final rate.
This corresponds to your calculation:
>23 * 3 bit = 69 bit per 40ms. 69*25=1725 bps. More than enough for the 
>1400bps codec. I can help you with this codec if needed.
Yes 1725 bps is much compared to 1400 or the IVOX system and with 1725 bps I 
have no good results. 

When you speak of a codec, do you mean a method or a DLL?
Any Internet link or information is welcome.

For about the transmission, I expected to use a sort of Throbx transmission but 
with 20 carriers separated by 25 Hz, without any coding.

73
Patrick 



 


  - Original Message - 
  From: cesco12342000 
  To: digitalradio@yahoogroups.com 
  Sent: Saturday, November 17, 2007 1:11 AM
  Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth


  Hi Patrick,

  > At each 1/T it was necessary to send NxL elements of 
  >information, which gives the final rate.

  Im not shure i understand your method 100%.

  My own tests found that you can transfer comprehensible, but unvoiced 
  speech with 10 carriers. But i did not restrict number of levels. For 
  voiced (natural sounding) speech the addition of a pitch-carrier is 
  necessary. I did those tests with ideas and help from G3PLX.

  But this might not be the easiest way. The EZ way would be to use an 
  existing 1400bps codec, and squeeze the 1400bps into a 500hz wide multi-
  carrier qam-16 or psk-16 (4 bits per symbol) modulation.

  The codec produces 54 bits per 40ms. 54 / 4 = 14 carriers. 40ms = 25 
  baud, proposed carrier spacing 37.5 hz, BW = 37.5 * 14 = 525 hz.
  500hz BW should be possible with a little tweaking.

  >With 23 carriers, 8 levels and T=40 ms (which can 
  >be send through a 500 Hz channel), it is very difficult 
  >to understand a (French) speech.

  23 * 8 bit = 69 bit per 40ms. 69*25=1725 bps. More than enough for the 
  1400bps codec. I can help you with this codec if needed.

  73, Cesco



   

[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread n6ief
My problem is that if I can't find someone to help me get started, the
project will die with my tow papers.

Miken6ief



[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread cesco12342000
> Very low bitrate algorithms exist now. There are a few that operate from 
> 200 bps to 600 bps. The Navy has software called IVOX that gets in this 
> range. 

Can you somehow lay hands on such a 200 to 600 bps codec?
Im VERY intrested.

The IVOX thing is based on 2400 bps lpc. With silence detection they bring 
it down to 1200bps average. Thats not the 200 to 600 bps codec.




Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread W2XJ


Yes it is

Steinar Aanesland wrote:
> Is this the IVOX system:?
> 
> http://downloads.pf.itd.nrl.navy.mil/ivox/
> 
> LA5VNA Steinar
> 
> 
> 
> 
> W2XJ skrev:
> 
>>
>>Very low bitrate algorithms exist now. There are a few that operate from
>>200 bps to 600 bps. The Navy has software called IVOX that gets in this
>>range. So you could transmit 16 QAM and hit the 100 HZ goal. The bigger
>>problem would be getting it to survive propagation and survive receiver
>>filtering. One would probably need to use a very narrow band OFDM
>>scheme. It would be an interesting but do-able experiment. If it worked
>>well, it would be a very worthwhile mode.
>>
>>Patrick Lindecker wrote:
>>
>>>Hello Cesco,
>>>
>>>For information, I have tried to see if it was possible to transmit 
>>
>>a speech through a 500 Hz channel using a digital transmission. I have 
>>decomposed the audio spectrum (but not through a FFT, but by 
>>intercorrelation to choose the carriers I wanted) in several carriers 
>>and associate to each carrier a level.
>>
>>>Then I have tried to decrease the number of carriers N, the number 
>>
>>of levels L and increase to the maximum the duration of 
>>intercorrelation T (the duration of an element of a speech), up to to 
>>find a "just comprehensible speech". It's a compression of the 
>>information, up to the maximum possible. Above this limit, the speech 
>>can't be understood.
>>
>>>After that, I do the reverse operation (equivalent to a FFT-1) and 
>>
>>have not much that listening to the result.
>>
>>>At each 1/T it was necessary to send NxL elements of information, 
>>
>>which gives the final rate.
>>
>>>This way is disappointed because you need much more information that 
>>
>>you can transmit through a 500 Hz channel (for example: 23 carriers, 
>>128 levels and T=40 ms). With 23 carriers, 8 levels and T=40 ms (which 
>>can be send through a 500 Hz channel), it is very difficult to 
>>understand a (French) speech.
>>
>>>73
>>>Patrick
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>- Original Message -
>>>From: cesco12342000
>>>To: digitalradio@yahoogroups.com 
>>
>><mailto:digitalradio%40yahoogroups.com>
>>
>>>Sent: Friday, November 16, 2007 4:26 PM
>>>Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth
>>>
>>>
>>>I would be plased to have a complete list of the phonemes and 
>>
>>corresponding
>>
>>>audio files from different speakers. I fear 44 phonemes will not be 
>>
>>enough
>>
>>>to do a context-free analisis.
>>>
>>>The data rate will be closer to 200pbs i think, since you will have to
>>>transfer a magnitude component along with the phoneme index, and maybe
>>>also a pitch component. Think of the pitch raise in a question, this
>>>feature is important for understanding.
>>>
>>>The main problem will be the fft to phoneme table correlation i 
>>
>>think ...
>>
>>>but to work on this there must be a phoneme table first.
>>>
>>>
>>>
>>>
>>
>> 
> 
> 
> 
> 



[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread cesco12342000
Hi Patrick,

> At each 1/T it was necessary to send NxL elements of 
>information, which gives the final rate.

Im not shure i understand your method 100%.

My own tests found that you can transfer comprehensible, but unvoiced 
speech with 10 carriers. But i did not restrict number of levels. For 
voiced (natural sounding) speech the addition of a pitch-carrier is 
necessary. I did those tests with ideas and help from G3PLX.

But this might not be the easiest way. The EZ way would be to use an 
existing 1400bps codec, and squeeze the 1400bps into a 500hz wide multi-
carrier qam-16 or psk-16 (4 bits per symbol) modulation.

The codec produces 54 bits per 40ms. 54 / 4 = 14 carriers. 40ms = 25 
baud, proposed carrier spacing 37.5 hz, BW = 37.5 * 14 = 525 hz.
500hz BW should be possible with a little tweaking.

>With 23 carriers, 8 levels and T=40 ms (which can 
>be send through a 500 Hz channel), it is very difficult 
>to understand a (French) speech.

23 * 8 bit = 69 bit per 40ms. 69*25=1725 bps. More than enough for the 
1400bps codec. I can help you with this codec if needed.

73, Cesco




Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread Steinar Aanesland
Is this the IVOX system:?

http://downloads.pf.itd.nrl.navy.mil/ivox/

LA5VNA Steinar




W2XJ skrev:
>
>
> Very low bitrate algorithms exist now. There are a few that operate from
> 200 bps to 600 bps. The Navy has software called IVOX that gets in this
> range. So you could transmit 16 QAM and hit the 100 HZ goal. The bigger
> problem would be getting it to survive propagation and survive receiver
> filtering. One would probably need to use a very narrow band OFDM
> scheme. It would be an interesting but do-able experiment. If it worked
> well, it would be a very worthwhile mode.
>
> Patrick Lindecker wrote:
> > Hello Cesco,
> >
> > For information, I have tried to see if it was possible to transmit 
> a speech through a 500 Hz channel using a digital transmission. I have 
> decomposed the audio spectrum (but not through a FFT, but by 
> intercorrelation to choose the carriers I wanted) in several carriers 
> and associate to each carrier a level.
> > Then I have tried to decrease the number of carriers N, the number 
> of levels L and increase to the maximum the duration of 
> intercorrelation T (the duration of an element of a speech), up to to 
> find a "just comprehensible speech". It's a compression of the 
> information, up to the maximum possible. Above this limit, the speech 
> can't be understood.
> > After that, I do the reverse operation (equivalent to a FFT-1) and 
> have not much that listening to the result.
> >
> > At each 1/T it was necessary to send NxL elements of information, 
> which gives the final rate.
> > This way is disappointed because you need much more information that 
> you can transmit through a 500 Hz channel (for example: 23 carriers, 
> 128 levels and T=40 ms). With 23 carriers, 8 levels and T=40 ms (which 
> can be send through a 500 Hz channel), it is very difficult to 
> understand a (French) speech.
> >
> > 73
> > Patrick
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > - Original Message -
> > From: cesco12342000
> > To: digitalradio@yahoogroups.com 
> <mailto:digitalradio%40yahoogroups.com>
> > Sent: Friday, November 16, 2007 4:26 PM
> > Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth
> >
> >
> > I would be plased to have a complete list of the phonemes and 
> corresponding
> > audio files from different speakers. I fear 44 phonemes will not be 
> enough
> > to do a context-free analisis.
> >
> > The data rate will be closer to 200pbs i think, since you will have to
> > transfer a magnitude component along with the phoneme index, and maybe
> > also a pitch component. Think of the pitch raise in a question, this
> > feature is important for understanding.
> >
> > The main problem will be the fft to phoneme table correlation i 
> think ...
> > but to work on this there must be a phoneme table first.
> >
> >
> >
> >
>
>  




Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread W2XJ

Very low bitrate algorithms exist now. There are a few that operate from 
200 bps to 600 bps. The Navy has software called IVOX that gets in this 
range. So you could transmit 16 QAM and hit the 100 HZ goal. The bigger 
problem would be getting it to survive propagation and survive receiver 
filtering. One would probably need to use a very narrow band OFDM 
scheme. It would be an interesting but do-able experiment. If it worked 
well, it would be a very worthwhile mode.



Patrick Lindecker wrote:
> Hello Cesco,
> 
> For information, I have tried to see if it was possible to transmit a speech 
> through a 500 Hz channel using a digital transmission. I have decomposed the 
> audio spectrum (but not through a FFT, but by intercorrelation to choose the 
> carriers I wanted) in several carriers and associate to each carrier a level. 
> Then I have tried to decrease the number of carriers N, the number of levels 
> L and increase to the maximum the duration of intercorrelation T (the 
> duration of an element of a speech), up to to find a "just comprehensible 
> speech". It's a compression of the information, up to the maximum possible. 
> Above this limit, the speech can't be understood.
> After that, I do the reverse operation (equivalent to a FFT-1) and have not 
> much that listening to the result.
> 
> At each 1/T it was necessary to send NxL elements of information, which gives 
> the final rate.
> This way is disappointed because you need much more information that you can 
> transmit through a 500 Hz channel (for example: 23 carriers, 128 levels and 
> T=40 ms). With 23 carriers, 8 levels and T=40 ms (which can be send through a 
> 500 Hz channel), it is very difficult to understand a (French) speech.
> 
> 73
> Patrick
> 
> 
>  
> 
> 
> 
> 
>   
> 
> 
> 
>   - Original Message ----- 
>   From: cesco12342000 
>   To: digitalradio@yahoogroups.com 
>   Sent: Friday, November 16, 2007 4:26 PM
>   Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth
> 
> 
>   I would be plased to have a complete list of the phonemes and corresponding 
>   audio files from different speakers. I fear 44 phonemes will not be enough 
>   to do a context-free analisis.
> 
>   The data rate will be closer to 200pbs i think, since you will have to 
>   transfer a magnitude component along with the phoneme index, and maybe 
>   also a pitch component. Think of the pitch raise in a question, this 
>   feature is important for understanding.
> 
>   The main problem will be the fft to phoneme table correlation i think ... 
>   but to work on this there must be a phoneme table first. 
> 
> 
> 
>



Re: [digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread Patrick Lindecker
Hello Cesco,

For information, I have tried to see if it was possible to transmit a speech 
through a 500 Hz channel using a digital transmission. I have decomposed the 
audio spectrum (but not through a FFT, but by intercorrelation to choose the 
carriers I wanted) in several carriers and associate to each carrier a level. 
Then I have tried to decrease the number of carriers N, the number of levels L 
and increase to the maximum the duration of intercorrelation T (the duration of 
an element of a speech), up to to find a "just comprehensible speech". It's a 
compression of the information, up to the maximum possible. Above this limit, 
the speech can't be understood.
After that, I do the reverse operation (equivalent to a FFT-1) and have not 
much that listening to the result.

At each 1/T it was necessary to send NxL elements of information, which gives 
the final rate.
This way is disappointed because you need much more information that you can 
transmit through a 500 Hz channel (for example: 23 carriers, 128 levels and 
T=40 ms). With 23 carriers, 8 levels and T=40 ms (which can be send through a 
500 Hz channel), it is very difficult to understand a (French) speech.

73
Patrick


 




  



  - Original Message - 
  From: cesco12342000 
  To: digitalradio@yahoogroups.com 
  Sent: Friday, November 16, 2007 4:26 PM
  Subject: [digitalradio] Re: digital voice within 100 Hz bandwidth


  I would be plased to have a complete list of the phonemes and corresponding 
  audio files from different speakers. I fear 44 phonemes will not be enough 
  to do a context-free analisis.

  The data rate will be closer to 200pbs i think, since you will have to 
  transfer a magnitude component along with the phoneme index, and maybe 
  also a pitch component. Think of the pitch raise in a question, this 
  feature is important for understanding.

  The main problem will be the fft to phoneme table correlation i think ... 
  but to work on this there must be a phoneme table first. 



   

[digitalradio] Re: digital voice within 100 Hz bandwidth

2007-11-16 Thread cesco12342000
I would be plased to have a complete list of the phonemes and corresponding 
audio files from different speakers. I fear 44 phonemes will not be enough 
to do a context-free analisis.

The data rate will be closer to 200pbs i think, since you will have to 
transfer a magnitude component along with the phoneme index, and maybe 
also a pitch component. Think of the pitch raise in a question, this 
feature is important for understanding.

The main problem will be the fft to phoneme table correlation i think ... 
but to work on this there must be a phoneme table first.