> The basic notion of stego is that one replaces 'noise' in a document with
> the stego'ed information. Thus, a 'good' stego system must use a crypto
> strategy whose statistical properties mimic the noise properties of the
> carrying document. Our favorite off the shelf crypto algorithms do *not*
> have this property -- they are designed to generate output that looks
> statistically random. So, can't we detect the presence of stego'ed data by
> looking for 'noise' in the document that's *too* random?

Yes, and no.

There is no particular difficulty in altering the statistics of encrypted
data to match whatever distribution is necessary for the noise.  So there
is no reason a priori to expect that stego'd data would be "too random".

The real problem arises in constructing an accurate noise model.
Whatever model is built can be matched, but there is always the worry
that the model is not quite right.  In particular, if the adversary can
spend more to construct an accurate noise model than the steganographer,
then he can detect the stego'd data because its statistics will differ
in subtle ways from natural data.

In these circumstances, it is prudent to assume that an adversary does
have more money to spend than the person hiding the data.  He may well
be a large government agency or a private bureaucracy which is looking
for illicit data.  The attacker's budget will often be far bigger than
that of the people who need to hide from him.

All is not necessarily lost; it becomes a matter of sufficient accuracy
for the purpose.  In order to distinguish the stego data from natural
data it may be necessary to acquire a considerable volume of messages.
The stego noise model only needs to be accurate enough to make the data
indistinguishable from noise for the specific volume of data being
embedded.  If this threshold is reached, then even if the attacker's
model is better the stego can still succeed.

The greater danger is a subtle but catastrophic failure of the noise
model, as for example when a new statistical analysis is used which
the steganographer did not consider, perhaps some kind of higher order
correlation.  The well funded attacker can afford to spend time searching
for such statistics, and if he is lucky, the game may be over before it
has begun.

These considerations are the real art and science of steganography.
Plunking data into LSBs is grade school stuff, analogous to ROT13 as a
cipher.  True steganography goes far beyond such elementary substitutions.

Reply via email to