I recently came across a real-world use of steganography which hides extra
data in the LSB of CD audio tracks to allow (according to the vendor) the
equivalent of 20-bit samples instead of 16-bit and assorted other features.
According to the vendors, "HDCD has been used in the recording of more than
5,000 CD titles, which include more than 250 Billboard Top 200 recordings and
more than 175 GRAMMY nominations", so it's already fairly widely deployed.
>From http://www.hdcd.com/partners/proaudio/overview.html:

[...]

Hidden Code Addition/Output Dither/Quantization

The final step in the reduction to 16 bits is to add high-frequency weighted
dither and round the signal to 16-bit precision. The dither increases in
amplitude in the frequency range of 16 to 22.05 kHz, leaving the noise floor
flat below 16 kHz where the critical bands of hearing associated with tonality
occur. As part of the final quantization, a pseudo-random noise hidden code is
inserted as needed into the least significant bit (LSB) of the audio data. The
hidden code carries the decimation filter selection and Peak Extend and Low
Level Range Extend parameters. Inserted only 2?5 percent of the time, the
hidden code is completely inaudible-effectively producing full 16-bit
undecoded playback resolution. The result is an industry-standard 44.1-kHz,
16-bit recording compatible with all CD replication equipment and consumer CD
players.

[...]

The paper describing the process is available under the somewhat misleading
name http://www.hdcd.com/partners/proaudio/AES_Paper.pdf.  The description of
the stego en/decoding process is on p.15 (it's a rather long excerpt, but it's
interesting stuff):

As part of the final quantization, a hidden code side channel is inserted into
the LSB when it is necessary for the encoder to inform the decoder of any
change in the encoding algorithm. It takes the form of a pseudo-random noise
encoded bit stream which occupies the least significant bit temporarily,
leaving the full 16 bits for the program material most of the time. Normally,
the LSB is used for the command function less than five percent of the time,
typically only one to two percent for most music. Because the hidden code is
present for a small fraction of the time and because it is used as dither for
the remaining 15 bits when it is inserted, it is inaudible. This was confirmed
experimentally with insertion at several times the normal fraction of time.

[...]

The mechanism which allows insertion of commands only when needed consists of
encapsulating the command word and parameter data in a "packet". A
synchronizing pattern is prepended to the data and a checksum is appended. The
resulting packet is then scrambled using a feedback shift register with a
maximal length sequence and inserted serially, one bit per sample, into the
LSB of the audio data. The decoder sends the LSB's of the audio data to a
complementary shift register to unscramble the command data. A pattern
matching circuit looks for the synchronizing pattern in the output of the
descrambler, and when it finds it, it attempts to recover a command. If the
command has a legal format and the checksum matches, it is registered as a
valid packet for that channel. The arrival of a valid packet for a channel
resets a code detect timer for that channel. If both channels have active
timers, then code is deemed to be present and the filter select data is
considered valid immediately. However, any command data which would effect the
level of the signal must match between the two channels in order to take
effect. The primary reason for this is to handle the case where an error on
one channel destroys the code. In such a case, the decoder will mistrack for a
short time until the next command comes along, which is much less audible than
a change in gain on only one channel, causing a shift in balance and lateral
image movement. If either of the code detect timers times out, then code is
deemed not to be present, and all commands are canceled, returning the decode
system to its default state. If the conditions on the encoder side are not
changing, then command packets are inserted on a regular basis to keep the
code detect timers in the decoder active and to update the decoder if one
starts playing a selection in the middle of a continuous recording.

Since the decoder is constantly scanning the output of the de-scrambler shift
register for valid command packets even when none are present, the possibility
exists that there may be a false trigger. For audio generated by the encoder,
this possibility is eliminated in the absence of storage and transmission
errors by having the encoder scan the LSB of the audio data looking for a
match. If a match to the synchronizing pattern is found, the encoder inverts
one LSB to destroy it.

Modern digital storage and transmission media incorporate fairly sophisticated
error detection and correction systems. Therefore, we felt that only moderate
precautions were necessary in this system. The most likely result of an error
in the signal is a missed command, which can result in a temporary mis-
tracking of the decoding, as mentioned above. Given the low density of command
data, and the small changes to the signal which the process uses, these errors
are seldom more audible than the error would be in the absence of the process.
The chances of a storage error being falsely interpreted as a command are
extremely small.

For material not recorded using the encoder, a small probability for a false
trigger does exist. Given a moderate length for the scrambling shift register
so that its mapping behaves in a noise-like fashion and a choice of
synchronizing pattern which avoids patterns likely to appear in audio data
with a higher than average probability, susceptibility to false triggers can
be made arbitrarily small by increasing the length of the part of the packet
requiring a match. In the case of the current system, the combination of the
synchronizing pattern with the bit equivalence for all valid commands plus
check sum results in a required match equivalent to 39 sequential bits. For a
stereo signal, in which a match must occur in both channels within a one
second interval and the commands in both channels must specify the same gains,
this amounts to an expectation of one event in approximately 150 million years
of audio.

The scrambling operation uses a feedback shift register designed for a maximal
length sequence in which data taken from taps in the register are added using
modulo two arithmetic, equivalent to an 'exclusive or' operation, and fed back
to the input of the register. For a given register length there are certain
configurations of taps which will produce a sequence of one and zero values at
the output that does not repeat until 2N-1 values have emerged, where N is the
length of the shift register. This corresponds to the number of possible
states of the shift register minus one illegal state, and is called a maximal
length sequence. Such an output sequence has very noise-like properties and,
in fact, is the basis of some noise generators. We use the noise-like behavior
of the generator to scramble the command signals by adding them modulo two to
the input of the shift register, as for example in Figure 7a. This has the
advantage that a second similar shift register with taps in the same places
but with only feed forward addition modulo two (Figure 7b) will reproduce the
original input sequence when fed with the output of the first one. The fact
that the decode side has no feedback means that the initialization
requirements are limited to having N input samples prior to the beginning of
decoding, which means that the decoder will "lock up" very quickly. In this
scheme, the presence of a bit error anywhere in the length of a packet plus
initialization sequence will completely scramble the data, preventing
recovery. However, in practice, this has not been a problem for reasons
described above.

Peter.

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]

Reply via email to