Cryptography-Digest Digest #717

Digestifier Mon, 19 Feb 2001 12:57:15 -0800
Cryptography-Digest Digest #717, Volume #13      Mon, 19 Feb 01 16:13:01 EST

Contents:
  A seriously different cipher concept (long) ("Paul Pires")

----------------------------------------------------------------------------

From: "Paul Pires" <[EMAIL PROTECTED]>
Subject: A seriously different cipher concept (long)
Date: Mon, 19 Feb 2001 12:33:32 -0800

The following is a description of a cipher concept I
have been working on for quite awhile. It's reason
for existence is to explore a different method that
has the potential for being unusually fast. All rights
are preserved. Some or all of this material may be
covered by patents held by the author or unknown
others. No presentment is made that the methodology
contained herein is suitable, safe or unencumbered for
commercial use.

The complete description in PDF format including
source code and test results can be found at:

http://www.outsorcery.com/Pires/pires.html

An executable suitable for running under Windoze
can also be downloaded. This program randomly
re-keys the cipher, encrypts 10Megabytes of
zeros and writes the output to a file "Data.bin" in
a format suitable for testing with diehard or other
analysis packages.

I would appreciate any comments or observations
that any might have on this material.
Thank you,

Enefinef:
Paul Pires 02/10/01
Section 1, Introduction:

     This paper is meant to explore the advantages and disadvantages of a
structure that is driven by internal states like a stream cipher but which
processes information in block fashion. The discussion is presented in three
parts, General description, detailed description and appended support
information containing example source codes and reports on randomness
testing.
     An objective of this discussion is to show a very fast method of stream
encryption that avoids many of the draw-backs of simple XOR stream
ciphers such as known plaintext attacks and bit-flipping.

Contents:
Section 2, General description & specifications:
Section 3, Brief walkthrough:
Section 4, Detailed walkthrough:
Section 5, Arguments for security:
Section 6, Required support:
Section 7, Design decisions & justifications:
Section 8, Variations: (Omitted)
Section 9, Conclusion:
Section 10, Appendix: (Omitted)

Section 2, General description & specifications:

     Although described as a stream cipher, this method implements the algorithm in
block "cycles". Each cycle, the algorithm completely processes the text for an
entire "block", 2048 bits in this case. The algorithm runs once per cycle,
stepping through each element of the block. There are no recursive "rounds", one
pass and the block is encrypted or decrypted.

The specifications of this proposed cipher are as follows:
· Encrypts / decrypts at 5.4 clocks per byte (8 bits). (1)
· Has not failed extensive random testing. (2)
· Naturally authenticates and resists bit flipping attacks.
· Uses only simple bitwise operations, SHIFT, AND, OR, XOR.
· Simple key set-up and unique IV implementation.
· Variable block and key size, easily implemented.

    Like a conventional stream cipher, this cipher maintains a key dependent
internal state, it uses a portion of that state to encrypt-decrypt text and then
modifies the internal state in a pseudo random way before repeating these
operations on the next increment of text.
    Unlike a conventional stream cipher, this method maintains it's internal state
in the form of two different tables of values, each treated separately according
to different rules. It is the cooperation and interplay between these two tables
that forms the PRNG function and also accounts for the desirable properties
cited above. It is this interplay that this paper is meant to address. Each of
the two state tables is simple and ineffective in its construction and operation
when viewed alone but together, they support and cooperate so that the
characteristics of one complement and support the other.
    Two unusual concepts are presented here. The first oddity is the notion that a
formidable process (a Cryptographically secure PRNG) can be built from two
simplistic mechanisms that are designed to cooperate advantageously. The second
is that a simple XOR stream cipher can be made to have many of the advantages of
a block cipher, not by adding complex functions, but merely by re-arranging the
interaction of existing simple actions.

Section 3, Brief walkthrough:

    The major components of this cipher are the two state tables (A&B), tables for
temporarily storing the plaintext - ciphertext and a static table defining a
selection of different relative changes that can be imposed on state A. Before
getting to the walkthrough of a cycle, a description of these components might
be helpful.

· State table A, a 64-entry table initialized with the values 0-63 in a known
starting order. No omissions, no duplications.
· State table B, a 64-entry table sized to operate on the same unit size values
as the plaintext and ciphertext tables, initialized with a known, balanced set
of values.
· Ciphertext and plaintext tables, 64 entry tables, in this case, 32 bit units.
Total size is 2048 bits for each table.

    The following is a brief description of the cipher during one cycle, encrypting
one 2048-bit block and modifying its internal state tables. For the purpose of
describing this, the process will be described as if the keyed state was already
established and the cipher was examined at some point during encryption. In
reality, a key or passphrase is processed and an IV is used to further distance
the keyed state specific to each encryption session. These vital components will
be described but after the basic "engine" is described. A quick walkthrough will
be done first with a detailed discussion of the effects of each step.

The steps taken during one block or cycle are:

 1, Process the input text against states A&B to make output.
 2, Update or change State B.
 3, Extract a "Change argument" from this changed B.
4, Use this argument to select which of 64 different transpositions to
inflict on state A and then execute it.

 Step 1, Encryption:

 Step through the sixty-four addresses in the plaintext in an order modified by
the current State A values, XOR that plaintext value with a State B value and
store the result in a ciphertext address modified by a different State A value.

 Step 2, Update the State B table:

 Circularly rotate the entire B table 5 bits to the left and XOR this shifted
table with the plaintext and the ciphertext according to a different dynamic
mapping.

 Step 3, Select a state A change argument:

 Find a B value from an address chosen by an A table value which is chosen
according to the mod 64 value of the cycle counter. Condense this number to a
six-bit (0-63) value and save it as the change argument.

 Step 4, Update A.

Change the location of all of the values in state A according to one of
64 different relative mappings selected by the change argument of step 3.

     That's really all there is to it. The following is a more detailed description
of the process with a discussion of the purpose of each step and an explanation
of the required modification to successfully decrypt.

Section 4, Detailed walkthrough:

 Step 1, Encryption (decryption):

Counters i = 0 and j = 32 are established and incremented by 1 each pass.

For each of the 64 elements of the tables:

C[A[j mod 64]] = B[i] XOR P[A[i]];

To decrypt, a small change needs to be made as Follows:

P[A[i]] = B[i] XOR C[A[j mod 64]];

    While simple, this line of code does nasty things. If we assume (for the moment)
that state A and state B are constantly changing in an apparently random way
then:

1, The value of each state B unit used remains unknown since each ciphertext
unit could be related to any plaintext unit and for each probable examined,
there will be a different implied state B value where only one is actually
correct.

2, The location of the value in B that was actually used to encrypt a particular
ciphertext character is unknown.

    Transposing both the input (when fetched) and the output (when stored) with an
unmapped stream of values from B implicitly maps the observed effect of the B
values to unpredictable locations. Consider that even if the locational
displacement between a ciphertext and a plaintext is known, all that is known of
State A is that two values from it cooperated to produce the dislocation. The
location of that pair in A is unknown.
    All that can be inferred about the corresponding state B value is that it was
selected as a function of the true location of this pair of A values, not the
magnitude of the values themselves. All of the 64 values in state A could
produce the correct answers since any single A value used to effect any move
(from an alignment with a B value) can be paired with another state A value to
satisfy the observed total dislocation. All outcomes are equiprobable and are
(without knowledge of A & B) unpredictable and irreversible.

 Step 2, Update the State B table:

XOR 5-bit circularly rotated state B values with the re-mapped plaintext and the
ciphertext blocks. The last 32 bit piece of this new B also has the master counter
value "x" XOR'ed in.

     This XOR re-combination would normally be nuts. In a conventional cipher, XOR
of plaintext and ciphertext would reproduce the state values that made them.
Combining this back in with a statically rotated image of itself would add
trivial complexity. In this cipher, the ciphertext is displaced from the plaintext it 
was
made from and the order of the B values used is scrambled. XOR
of this plaintext with its ciphertext produces a set of results wholly different
from the state values that made them. The plaintext that is chained back in is
dynamically re-mapped in a state A dependent way that is different from that
used to encrypt. An adversary in possession of plaintext and ciphertext still
does not know the actual order of values re-combined to modify state B for the
next cycle nor the actual values in state B from the last cycle.
    The requirement for the evolution of state B is that it produce a different,
equiprobable table of values to be operated on by state A in the next cycle. No
presentment of this as being formidable or even sane is being made here. A
discussion of the value added from the interaction of the two state tables will
be made after this section and it is hoped that it will put many objections to
rest.

 Step 3, Select a state A change argument:

The master cycle counter is represented by the variable x;
The change argument is stored in the variable Tsel.

Tsel = B[A [x mod 64]];
for (i=1; i<6; i++) Tsel = Tsel XOR (Tsel >>(i*6));

Tsel = Tsel mod 64;   "Reduce Tsel to a 6-bit (0-63) value"

     While inelegant, this routine does do what is required. It extracts a value
from state B that is relative to the current state A and the current cycle
counter value and folds that value upon itself to produce a "messy" 6-bit (0-63)
argument.

 Step 4, Update the state A table according to Tsel;

"DW_A" is a 32-bit pointer,
"BT_A" is an 8-bit pointer,
"&" denotes "set to address"
"swap" is a 32 bit scratchpad variable.

The next two lines re-set the pointers to an address of state A that
is shifted the number of bytes dictated by the top two bits of the
6-bit change argument Tsel from above.

DW_A = (DWORD*) &STATE_A[Tsel>>4];
BT_A = &STATE_A[Tsel>>4];

Tsel = Tsel mod 16;    "Reduce Tsel to a four bit (0-15) value."

"This shifts the values in A around according to Tsel, the Trans table and
the byte skew accomplished by the pointer re-assignment."

for (i=0, j=1;  i<16;  i++, j++) DW_A[Trans[Tsel] [ i ]] = DW_A[Trans[Tsel][ j ]];

"Shift around 4 bytes in state A."

swap = BT_A[3];
DW_A[0] = (DW_A[0]<<8) OR swap;

"Duplicate the first two 32 bit chunks at the array end."

DW_A[16] = DW_A[0]; DW_A[17] =  DW_A[1];

     In this routine state table A (although a table of 64 bytes) is handle as a
table of 16 32-bit values (DWORD) having four actual entries in each 32-bit
chunk. Each cycle, all of the 64 values in state table A are relocated to new
locations according to one of 64 different mappings. These changes are relative
and cumulative e.g. new values replace and overwrite values in old locations.
Using one of 16 different transpositions selected from the static "Trans" table
and modified by one of four different byte alignments provides for 64 different
transpositions of all of the 64 state table A values. The "Trans" table dictates
how state table A values are relocated in 32-bit (four element) chunks. The byte
shift when setting the pointer address shifts the fetch of these transpositions
in one of four different ways.

Note: The construction of the Trans table will be discussed in detail later in
the text but is generally set up as a group of 16 different convoluted slides
allowing the transpositions to be implemented as scrambled but sequential
shifts.

    The two lines following the transposition are there to mix the state A elements,
They compensate for an artifact of the UGLY method above. Without these, state A
is treated as two separate halves. This halving happens because by transposing
elements four at a time, a member from an even position, in state A, can never
be laid down next to another value from an even position. These two instructions
allow state A values to slowly migrate between these two logical halves. It's a
slow leak.
     This sliding and shunting process requires state A to be oversized. The first
11 bytes are reproduced at the end of state A so that as the pointers are
re-assigned for the next cycle, they fall on a contiguous section of state A
regardless of the byte shift. The last instruction above does this.

Section 5, Arguments for security:

State A versus State B:

     State B evolves serially. It progresses from one state to the next with no
branching allowed. State B preserves a "memory" of past activities. The old is
combined in with the new.
    State A evolves in a branching fashion and preserves no memory. New values
overwrite old ones. For each cycle, one of 16 different transpositions is
implemented and one of 4 different byte shifts is used. This allows for 64
different, possible, outcomes of changing state A but only one of the possible
64 is used.
    The branch actually chosen by a change to state A is a function of a value
retrieved from state B. The values combined in state B are harvested as a
function of state A values (via Ptext xor CText) and with old, rotated values of
state B. The two states are interdependent and yet evolve individually by
different rules.
     In my mind (alleged), this cross-connected property is essential to the pseudo
random behavior. The changes done to each state are arbitrary (as far as one
state is concerned) aspects of the other state. It's positive feedback, cross
connected between two different systems.
     The branching behavior of state A, and the interaction with B, provides for
rapid dispersion. I won't call it avalanche but it is similar. After 3
iterations, the states could be in one of 6^43 wholly different (at the per
element level) conditions. This accounts for the ciphers extreme sensitivity to
initial conditions.
    The dynamic re-mapping means that an adversary cannot flip bits in the
ciphertext to produce a knowledgeable change in the decrypted plaintext. The
attacker does not know which portions of a plaintext block correspond to what
ciphertext portions.
     A side effect of this dynamic re-mapping is that it makes the evolution of the
internal state message dependent. The last state that the cipher exhibits is a
function of the key, the IV AND the plaintext encrypted versus its PROPER
ciphertext. This chaining is valuable for integrity checking since only the
correct key, plaintext and ciphertext will process to make the same ending state
for the encryptor and decryptor. Appending a hash of the last state at
encryption, to be checked against at decryption provides a keyed message
authentication code that can only be constructed by a party in possession of the
correct key. An adversary must engage in trial end error to do bit flipping and
it will be detected anyway since the last state will diverge if it is attempted.

 Why would this be secure?

     The security premise of this method is simple. Isolate the internal state from
the observed change in a ciphertext versus its associated plaintext. This goal
is achieved by interposing a process that is simultaneously indeterminate in the
reverse direction and lossy. Consider the actions of state A and state B when
viewed as a "black box". The total combination of different outputs a block can
assume is a function of its size and is, in this case, 2^2048. Each state B is
the same size and can have the same number of outcomes but can be transformed
differently by each possible state A. State A can be any arrangement of 64
unique values and can therefore have 64! or 2^296 different outcomes.
 Total number of different possible inputs divided by the possible different
outputs yields the number of different inputs that could equally have made one
particular output. In this case, that is 2^2048 * 2^296 / 2^2048 = 2^296. This
is adirect measurement of the irreversibility of this process. For each known and
related ciphertext-plaintext block, there could be 2^296 equally possible
different combinations of A&B that made it.
    The goal isn't to provide an absolute proof of security but to establish a
confident minimum level of security as being greater than the key strength
provided e.g. a practical level of security. The attached source code shows a
method for effectively using a key that is one of 2^240 different keys. This
setting is variable and can be raised to a comfortable maximum of 512 bits.
This is one goal of this method. Time and independent analysis will tell if it
is justifiable. Much work has been dedicated to breaking this cipher.
Application of some attacks like linear and differential cryptanalysis cannot
even be envisioned by the author due to the lack of relevant structure and the
dynamic variation on two levels going on. This cipher appears chaotic in
behavior and highly non-linear.
    Another approach to finding a flaw in this method would be identifying a bias in
its pseudo-random behavior or a correlation in it's output VRS input that
deviates from a random expectation. Alas, no help there. This ciphers output has
been extensively analyzed using a large suite of accepted tests and some that
were independently developed. No hints of measurable, let alone exploitable,
characteristics or statistically deviant behaviors have yet been found.

Section 6, Required support:

Key setup and IV (initialization vector) insertion.

     To operate, this cipher needs to have the values in states A & B mixed around
in a profound and key dependent fashion. The method shown here uses a simple
approach using the existing encryption routine previously described and is
showed in detail in the attached source code.
     To begin, a "fixed and known" starting state is established for states A & B
and the key or passprase is XOR'ed into state B. A number "Kd" of cycles of the
cipher are then run with a small modification to how the change argument is
generated. The number of cycles "Kd" controls the possible ending key states.
Since at each iteration, state A can change to any of 64 different arrangements
and since these changes are cumulative, the total number of different conditions
that state A can be in after the key setup routine is run is 64Kd. A value of 40
for Kd produces a state A that is one of 2240 different possible states.
     Compound changes also occur to state B but these are much harder to quantify as
they are relative to the size and quality of the key or passphrase used. To be
conservative, the key strength is judged by the number of cycles (Kd) alone or
the size & entropy of key or passphrase, whichever is less.
     The Initialization Vector (IV) is implemented simultaneously with key setup.
The purpose of the IV is to make each encryption session with a certain key
unique. This is necessary as without it, the first states of A & B are identical
for a "same key" and would produce an undesirable correlation in the first few
blocks of different plaintexts encrypted with the same key. The hazard here is
not to key recovery but that an adversary might have a known plaintext and use
that information to extract knowledge from an unknown but key related plaintext.
Keep in mind that the IV is not required to be unknown, or even of good random
quality but merely different for different messages.
    The IV is implemented here using an unpredictable and session specific stream
of values, The C rand function initialized with the system time in this case.
This value is used to relatively offset the change function found in B, which is
used as the argument to change A. Some hint of this IV must be communicated in
the clear to the recipient in order for the recipient to arrive at the same
keyed state needed for decryption. Conventionally, this information is attached
"In the clear" to the ciphertext. This cipher is different in that a state
dependent "IV Ciphertext" is made and appended of this unknown effect, which is
bound to and specific to a particular key generation session. It is easily and
cheaply kept secret and is only relevant to generating a particular key. The
advantage to doing this is not known but it just seemed like such a wonderfully
nerdly thing to do.
     The IV routine is only entered every other iteration while processing the key
according to Kd. Making it the same size is a waste as it is not a factor in
exhaustive searches and only needs to be large enough that correlation is not a
problem between same key ciphertexts. 120-bits is plenty large enough for that.

Slides:

     As mentioned before, the table "Trans" dictates the different relative
transpositions that could be done to state A for each cycle. It has been
constructed here according to a set of rules deemed important. Each of the 16
different changes or "slides" are generated from a random stream of candidates
but their acceptance is governed by the following set of rules:

1, Each change will comprise a contiguous set. This means that each change can
be re-written in a sequence such as, element 0 is replaced by 3, 3 is replaced
by 13, 13 is replaced by 7.etc. They are convoluted slides.

2, Each slide will move each individual element to a location which is different
from it's previous location and where all element moves are wholly different
from all the moves of at least 8 different slides. An element is a packet of
four state A
values treated as one transposition chunk.

3, No more than 2 slides will have a common individual element move and there
will never be more than one common element move between any pairs of slides.

Padding:

    This cipher requires that the plaintext and ciphertext be an even multiple of
the block size. Some reasonable method of padding the text out and recognizing
and removing the padding will be required for proper operation.

Section 7, Design decisions & justifications:

Key generation and the different "A change routine":

     There are 64 different changes that could be made to state A, one of which must
be selected by the condition of state B. It would be preferable to "Poll" all of
state B by some form of CRC or hash to come up with the 6 bit value required.
This would mean that the chances of finding a different change argument for a
different state B would be 1-(1/64) or 0.9844. This is actually done in the key
generation loop but a simpler routine is used during block processing. In block
encryption-decryption, a single entry of B is reduced down to the required 6
bits and used as the change argument. In the case where a small change occurs in
state B, confined to a single unit of state B, the chances that the unit
selected is the actual one (where a difference resides) is 1/64 or 0.0156. In
reducing it down to 6 bits, the chance of arriving at a different value is a
function of both. 0.9844 * 0.0156 = 0.0154. Why settle for the far less
sensitive method?
     It all has to do with how the (chaining back) of plaintext and ciphertext
reacts to difference and communicates that change to state B. It is a cumulative
effect very similar to avalanche but it occurs over a span of block iterations
rather than over a quantity of rounds.
    Plaintext and ciphertext are mapped to unaligned unit locations. If a 1 bit
error occurred to a ciphertext block, by definition, it would only effect one
unit of that block. When plaintext and ciphertext are chained back in by XOR and
the ciphertext has been mapped to different locations, two units of state B
change from the expected during decryption since that changed cipher unit does
not align with its true corresponding plaintext unit. They both change and in
different places. Next iteration, (since some memory of the last B is preserved)
these two changes will propagate two more errors each from the expected. After 6
iterations, the chance of a complete and total change of all of the bits of
states A and B is back up to 1-(1/(26)). If a change occurs in plaintext or
ciphertext, the seed is sown and everything quickly avalanches toward complete
and total divergence. Why spend the added work for such a small improvement in a
completely adequate process?
    Key generation is another matter. Quick divergence for different keys is a
valuable feature and well worth the work.

XOR of the block counter into state B and why the circular bit shift?

     Why is this done? Well, two reasons: First, plaintext and ciphertext are both
mixed back into state B. If the plaintext was a huge file of zero bits then, in
that case, updating state B is simply XORing scrambled state B values with
others that have been circularly rotated 5 bits. While unlikely (Read; no way in
hell), a combination of A and B could arise to produce a ciphertext block
identical to the next 5 bit rotated state B it was made from. Once this chains
back in, the resultant state B becomes all zeros and no amount of further
processing will change this. Feeding in the constantly changing counter supplies
a source of 1's that subsequent processing can use to repair this "Bad B". Don't
faint, this isn't likely to happen during the life of the universe and even if
it does, the counter will kick it back into play.
     A bigger risk (although still unlikely) is that the individual bits, per unit,
of state B will zero out in detail. The 5 bit circular rotate is cumulative and
treats this problem somewhat more aggressively in that it continually smears
bits from one bit location with bits resident in other bit locations.
     The real reason the counter value is used is to assure a minimum period of the
pseudo random behavior. Consider that the "Avalanche" behavior assures that
different states will diverge regardless of key, plaintext or ciphertext. If
this is true, then the only way a repeat can occur is if all of the internal
state is identical between two iterations. The distance (in cycles) between
these two identical states would be a repeating period and if this occurred, the
iterations of this period would repeat endlessly. This is a deterministic process
and so it has to happen eventually but the size of the internal state makes the
average occurrence of this a non-problem.
    The possibility of a short repeating period occurring is a valid concern. In this
method, each counter value is a part of the internal state even though it is not a
secret component of the state and the counter is managed separate from A & B. In
order for the counter to duplicate, it must roll over. If it is a thirty-two-bit
integer, then the cipher cannot repetitively cycle in a shorter period than
every 2^32 cycles. Since each cycle processes 2048 bits, the minimum period (in
bits of output) is 2^43 Not big enough? Just keep promoting the unit size of the
counter till comfort is reached. The limit is the size of state B so I imagine
that a minimum period of 2^2059 bits is achievable without too much impact on
performance.

Section 9, Conclusion:

     The practical fault with stream ciphers isn't in their theory but in the
implementation. The idea of a key dependent, random walk, as the basis for
making a ciphertext unpredictable (without knowledge of the key) is a powerful
concept. The problem arises when it is actually put into practice. It is just
too easy to obtain large amounts of the internal state from a conventional
stream cipher for analysis on attacking the behavior of the underlying PRNG.
This makes these methods less "universally useful" than block ciphers. The
method described here has a simple goal, keep all of the speed and simplicity
advantages of a stream cipher and remove it's basic weakness by denying an
attacker confident knowledge of it's internal workings.
    Awhile back there was some discussion of what defined the main difference
between a block cipher and a stream cipher. Some folks said that a block cipher
is "stateless" others pointed out that this is not true in CBC mode. One notable
sees the "size of computational units" as telling where 32 bits is starting to
get into the "fuzzy" area between block and stream. The Purists maintain that a
block cipher is a large single substitution, an Electronic Code Book. Others say
it is reserved for those that are "Fiestel like".
    I am confident that RC-4 is a stream cipher, that DES is a block cipher and that
this proposed cipher is arguably neither. It is truly "Neither Fish Nor Fowl"
Hence the name (NFNF). Try the anagram phonetically.

Paul Pires

















====== Posted via Newsfeeds.Com, Uncensored Usenet News ======
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
=======  Over 80,000 Newsgroups = 16 Different Servers! ======

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list by posting to sci.crypt.

End of Cryptography-Digest Digest
******************************
Cryptography-Digest Digest #717

Reply via email to