STEGO(1) USER COMMANDS STEGO(1) NAME stego - encode binary file as text
SYNOPSIS stego -d|-e [ -w outdict ] [ -f dictfile ] [ -l linelen ] [ -t textfile ] [ -u ] [ -v ] [ infile [ outfile ] ] DESCRIPTION Steganography is the art and science of secret communica- tion, encompassing everything from invisible ink to the latest techniques of spread spectrum broadcasting. While cryptography focuses on preventing any but the intended recipients' begin able to read the content of a message, steganography attempts to hide the very existence of the message, usually by disguising it in another, more innocu- ous, form. Steganography has its place even if cryptographic tools are generally accessible and deemed to provide adequate secu- rity. Often, one wishes not only to protect messages from interception, but also to obscure the fact that one is exchanging encrypted messages at all. Simply using encryp- tion creates, in the minds of nosy people, a presumption that you're doing something wrong. Why raise suspicion? One of the very best ways to hide an encrypted message is to embed it as low-level noise in an image or sound file: tools exist to accomplish this. This technique has the disadvan- tage, however, of requiring large files in order to hide substantial data without making the message-carrying ``noise'' too obvious. Lossy compression cannot be used to reduce the file size, as it would destroy the embedded information. stego, short for Steganosaurus, provides a compromise which transforms any binary file into nonsense text based on a dictionary either given explicitly or built on the fly from a source document. Its name, which recalls the large, heavily-armoured dinosaur Stegosaurus, was chosen because this is a large, slow moving form of steganography which, nonetheless, is armoured and robust. The output of stego is nonsense, but statistically resembles text in the language of the dictionary supplied. Although a human reader will instantly recognise it as gibberish, statistical sampling employed by eavesdroppers to detect encrypted messages may consider it to be unremarkable, especially if a relatively small amount of such text appears within a larger document. A human snoop would have to read the entire document just to discover that it contained some curious passages. stego makes no attempt, on its own, to prevent your message from being read. It is the equivalent of a book code as large as the number of unique words in the dictionary; tech- niques exist to break book codes even without obtaining a copy of the code. Cryptographic security should be delegated to a package intended for that purpose such as pgp. stego can then be applied to the encrypted output, transforming it into seemingly innocuous text for transmis- sion. Text created by stego uses only characters in the source dictionary or document (plus standard ASCII punctua- tion symbols), so it can be sent through media, such as electronic mail, which cannot transmit binary information. Unlike files encoded with uuencode or pgp's ``ASCII armour'' facilities, the result doesn't scream ``suspicious'' at the (very) first glance. OPTIONS -d Decode: the input, previously created by stego using the same dictionary, is decoded to recover the original input file. -e Encode: converts the input into an output text file using the specified (or default) diction- ary. -f dictfile The specified dictfile is used as the dictionary to encode the file. The dictionary is assumed to be a text file with one word per line, con- taining no extraneous white space, duplicate words, or punctuation within the words. To ver- ify that the given dictionary meets these condi- tions, use the -v option. The default diction- ary is the system's spelling checker dictionary, if any. -l len Output lines will be broken so as not to exceed len characters, if possible. len may be any number up to 4096 characters per line, allowing generation of output compatible with word pro- cessors which write each paragraph as a single long line. The default is 65 characters per line. -t textfile The named textfile is used to build the diction- ary used to encode or decode the input file. The textfile is scanned and words, consisting of an ISO 8859/1 alphabetic character followed by a sequence of ISO alphanumeric characters, are extracted. Duplicate words are automatically discarded to prevent errors in encoding and decoding. -u Print how-to-call information. -v Verbose diagnostic messages are sent to standard error chronicling the momentous events in stego's processing of the file, and extra verif- ications of the correctness of the dictionary are made. -w outdict The dictionary, usually built from a text file specified with the -t option, is written into the file outdict in a form suitable for subse- quent use as a dictionary supplied with the -f option: all duplicate words and words containing punctuation characters are deleted, and each word appears by itself on a separate line. Preprocessing a text file into dictionary format allows it to be loaded much faster in subsequent runs of stego. APPLICATION NOTES The efficiency of encoding a file as words depends upon the size of the dictionary used and the average length of the words in the dictionary. When used in conjunction with existing compression and encryption tools, the resulting growth in file size is usually acceptable. For example, a random extract of electronic mail 32768 bytes in length was chosen as a test sample. Compression with gzip compacted the file to 14623 bytes. It was then encrypted for transmission to a single recipient with pgp, which resulted in a 14797 byte file. (Even though pgp has its own compres- sion, smaller files usually result from initial compression with gzip. In this case, pgp alone would have produced a file of 15194 bytes.) Using a 25144 word spelling dictionary, stego translated this file into a 71727 byte text file. Thus, the growth from the original text file into the encrypted and encoded output was a factor of 2.2. However, re-compressing this output file with gzip, a perfectly unsuspicious act (since anybody can uncompress such a file), reduces it to 37290 bytes, a growth of only 14% compared to the original text file. The ability to recognise gibberish in text is highly language dependent. Using a dictionary in a language dif- ferent than the mother tongue of the suspected eavesdropper may better disguise a message, especially if you and your correspondent have a credible reason to communicate in that language. For example, if you don't speak French, try using the electronic text of Jules Verne's ``De la Terre a la Lune'' available from: ftp://ftp.fourmilab.ch/pub/kelvin/etexts/DeLaTerreALaLune.txt as your -t dictionary file (be sure to strip the header and trailer from this file, leaving only the French language body text). FILES If no infile is specified or infile is a single ``-'', stego reads from standard input; if no outfile is given, or out- file is a single ``-'', output is sent to standard output. The input and output are processed strictly serially; conse- quently stego may be used in pipelines. BUGS Dictionaries (default or specified with the -f option) are not checked for duplicate words or words containing forbid- den punctuation characters unless the -v option is present. It's a good idea to use the -v option when testing a new dictionary. The default dictionary is the system spelling checker dic- tionary. This dictionary is not standard across all sys- tems. Users exchanging information between different machines and/or operating system versions should make sure they're using identical dictionary files to encode and decode messages. A comprehensive spelling checker dictionary is less than ideal for low-profile steganography. For example, most spelling dictionaries include words such as ``plutonium,'' ``assassinate,'' ``heroin,'' and ``CIA,'' which might tend to draw unwanted attention toward your document. It's better to use a dictionary drawn from a text that doesn't contain such trigger words. The output would be less obviously gibberish if based upon template sentence structures filled in by dictionaries which specify parts of speech. This would, of course, require different templates for each natural language and dic- tionaries containing part-of-speech information. It would also preclude using simple word lists or documents as the dictionary. stego accepts 8-bit ISO 8859/1 characters in dictionary files and, if they are present, emits them in its encoded output. If the medium used to transmit the output of stego cannot correctly deliver such data, the recipient will be unable to reconstruct the original message. To avoid this problem, either encode the data before transmission or use a dictionary which contains only characters which can be transmitted without loss. SEE ALSO gzip(1), pgp(1), iso_8859_1(7), uuencode(1) AUTHOR John Walker [EMAIL PROTECTED] http://www.fourmilab.ch/ This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documen- tation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided ``as is'' without express or implied warranty. _______________________________________________________ Linux Mailing List - http://www.unixtech.be Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux Archives: http://www.mail-archive.com/linux@lists.unixtech.be IRC: efnet.unixtech.be:6667 - #unixtech