John Young wrote - 
>
>In it the CDC 7600 is cited as having the pop-count
>"facility for nuclear physics applications programming,
>etc." That "etc." is provocative in the light of what Jitze 
>and Steve have written.
>

A correspondent asked me if I could expound some more
on what I know about the use of the pop-count instruction.

The paper cited earlier in this thread by Steve Bellovin
tells you far more than I can, as well as a paper that
it cites in turn about "A Programmable Plaintext Recognizer".

However, and without boring through detail, I can perhaps 
give a flavor of how it was used  a long time ago and in a 
manner that is interesting from a cryptanalysis perspective
- and given that it jibes with similar discussion in the
papers mentioned above.

In one of the first "interactive" environments supported
on the 6600 - the Intercom time-sharing system, we had
a utility that allowed one to search a text file for
some arbitrary character string - much as any text editor
allows today.

This was implemented approx. as follows. Each line of text
in the file had appended to it a 60-bit word, the contents
of which were a bit vector. If there were one or more
of the letter "A" in the line, bit 1 was set, if there
were one or more of the letter "B" in the line, then
bit 2 was set, etc... thus setting some sub-set of 36
bits coresponding to the characters A..Z and 0..9. Let's
call this word appended to each line its "thumbprint".

There are 24 bits left over - more about that later.

Now to search for any line containing some search
string, all you have to do is create a similar thumbprint
for the search string, and loop through the thumbrints
of the lines in the text file, checking each against your
search thumbrint. "Checking" in this sense means doing a 
logical "AND" between the two to extract matching bits, 
and then counting how many bits matched - hence the pop-count 
instruction.

A "hit" would of course only be tentative. It does not
tell you that the line in question contains the actual string
you are looking for - only that the line contains somewhere some 
(count of) the characters contained in your search string.

This little loop can however be coded in an extremely tight 
manner to fit in the instruction stack (equivalent to instruction 
cache in today's technology) thus allowing a very fast winnowing
of interesting text lines that can then be examined more closely.

The actual instructions used within the loop also made good use
of the parallelism offered by the separate functional units
of the 6600 cpu - and in later machines - pipelining.

The extra 24 bits were used in other applications (of similar
ilk) to reflect not single characters, but the occurrence of
certain digraphs in the text line. Thus bit 40 in the the 
thumbprint might indicate whether a given "interesting" digraph 
did or did not occur in a given line.

Later the border between the 36 bits and 24 bits within the
60-bit word was moved and successively tuned. Certain bits 
of the original 36 were commoned up - i.e. a particular bit
might indicate the presence of any of  Z, X, or Q because
maybe they are relatively uncommon. This would free up two 
more bits for use as digraph flags - and so on.

Then, in another place, the whole mechanism was turned on
its head. It wasn't used to *look for* a certain character
string, but rather to *reject* certain lines of text as 
being uninteresting. (Others might use the word "implausible"
here.) Furthermore, the concept of a thumbprint per "line of 
text" was refined to support differing granularity of the
thumbprinting, and of course the size of the thumbprint
itself, as 64 bits became a more common word-size.

Some 30 years later, I find the paper cited by Steve Bellovin
on "Probable Plaintext Cryptanalysis" to be extrememely
interesting - in particular it cites another paper
about "A Programmable Plaintext Recognizer". This is
the only open documentation I have ever come across that
discusses this kind of mechanism in any detail. 

Jitze



Reply via email to