A while back I described a way of hashing stream data such that bad data
could be detected and rejected before waiting for the end of the stream.
That was a two-level hash where each block of data gets hashed, and a
table of all these hashes is sent before the data; that table is itself
hashed to form the overall hash of the document (the CHK).

Looking through some old crypto papers I found an alternative scheme
from the Crypto 97 conference.  It introduces a complication in the
stream transfer but has some advantages.

Similar to the earlier idea, the data is broken up into blocks and each
block is hashed.  However the hashes are computed from the last block
to the first, and each block's hash covers that block's data PLUS the
hash of the next block.

So the last block is hashed to form H(n).  For the next to the last
block we hash that block's data plus H(n) to get H(n-1).  For the block
before that we hash the data plus H(n-1) to get H(n-2), etc.  At the
end we hash the first block of data plus the hash for the next block,
and that is the hash of the whole document which is the CHK.

Graphically, the data is calculated as follows:

H(N) = hash( block_N )
H(N-1) = hash( block_N-1 + H(N) )
...
H(1) = hash( block_1 + H(2) )
H(0) = hash( block_0 + H(1) )

CHK = H(0)

Now what happens is that as we send the data, we send each block plus
the hash for the next block.  The data gets stored in this form; the
client software would have to strip out the hashes.  (It has to know
about the special format anyway because it is going to calculate and
verify the hashes.)  So the data is stored and sent as block_0, H(1),
block_1, H(2), block_2, ...

When the first block is sent, along with the hash for the next block,
this data is hashed and compared with the CHK.  If it matches we continue,
storing the hash for the next block.  When we receive that next block,
we hash it and compare with our stored hash.  If OK we then update the
stored hash from the data in that block and proceed.

This is similar to the two-level hash concept, except the per-block hashes
are stored with the blocks rather than at the front.  That is better because
there is no need to deal with a potentially large table of hashes, rather the
data is spread out into the stream.

By chaining the blocks together, H(0) depending on H(1) which depends on
H(2), there is no way a node can forge a partial message which matches
the CHK.  So it has the same strong properties as an overall hash.

Calculating the hash data would take some effort.  It has to be calculated
back to front (which is what prevents a rogue node from substituting
data front to back).  The sending client could make a copy of the data,
leaving room for the hash blocks, then go through and insert them in the
file one at a time, last to first.  Alternatively it could go ahead and
make a large table of all the hashes, and interleave them into the data
stream as it goes out.  Doing it this way would be more similar to the
two-level hash proposal, but it would eliminate one of the potential
advantages of this method, the avoidance of the need for a large table.

All in all I'm not sure which way is better, but I thought I'd mention
this one as an alternative.

Hal

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

Reply via email to