>    What the last sentence of the paragraph was trying to say is that if
>    there is a large change in the timestamp from one packet to the next,
>    but the sequence number only increments by one, then the receiver
>    knows that no packets were lost and that the gap in time was due to
>    intentional discontinuous transmission.

OK, it's good to know that transmitters will be keeping sequence number
consistency. :-) The problem in the above described situation is that the
*receiver* won't know this until it receives the packet after the gap,
which could be a long time, well longer than the depth of the receiver's
jitter buffer. So, when the receiver's jitter buffer underflows, it has
no way of distinguishing between:
1. the transmitter detected silence and just didn't bother to send any
packets, and the receiver should play out silence; and
2. the network is congested, packets are getting lost, and the receiver
should interpolate audio in an attempt to preserve audio quality.

I hope you can all agree with me that action 2., above, is common practice
whether explicit VAD and CN is being used and not. Beyond that, many would
say that action 2. is extremely desirable, that the technique used to
accomplish it is a key differentiator of their product(s), and that for
the general good of VoIP maybe should be considered a recommended practice.

But the general tone of your comment above, and elsewhere in the same
lead me to believe that you (and possibly others) do not support this, that
you support *always* simply playing out silence if a packet is not
for playout at the required time (when the jitter buffer underflows).

That's fine and dandy as your personal view. But the suggested language of
the section of the RFC that you wrote would "standardize" this behavior in
the face of extensive use of exactly the opposite behavior.

The purpose of the comfort noise coding is *exactly* to allow the receiver
to distinguish between cases 1. and 2., above. True, if packets are lost
they could just as well have been CN packets as not (But if the last packet
not lost was a CN packet, the receiver would interplotate comfort noise.).
True, CN packets consume more bandwidth that sending nothing (But less than
sending CODEC encoded near-silence.). Ya want to eliminate that bandwidth
a potential loss of audio quality when packets are lost, fine, don't
implement or advertise support of CN.

I think before this RFC can go forward, we need to clear this up. I think
the best we can and must say is that if packets aren't received in time,
the result is receiver implementation independent (Interpolate if ya want;
play silence if ya want; play "Yankee Doodle" if ya want. Let the
marketplace decide if they like interpolation, silence, or "Yankee Doodle"
better.). I don't think we can say, or imply, or leave open to
interpretation sans a statement to the contrary, that the intended action
when packets are not received in time is to *always* play silence.


