Hey guys, I am doing work on partially-ordered transcripts. This was inspired 
by Old Blue[0], discussions with George Kadianakis and Trevor Perrin, as well 
as background knowledge on git, and immutable content-addressing schemes as 
seen in Freenet and Tahoe-LAFS.

For everyone's interest, here is what I have so far, followed by a statement of 
some issues that (AFAICS) have not been discussed before, and directions for 
solving them.

A partially-ordered transcript has the following definitions and properties:

a. each element is a message, associated with a sender/author
b. the messages form a partial order (equiv. directed acyclic graph), where a 
<= b means "the author of b has delivered[1] a"
  - it is the responsibility of message b to truthfully declare its 
parents/ancestors, according to some (unspecified here) representation[2]
  - consequently, b should not process a message a3, if it has not yet 
delivered its predecessors a2, a1, a0 etc. this is detectable via the scheme 
described in (f).
c. the messages from a single sender, form a total order over <= (this gives a 
degree of protection against dishonest senders, and limits the number of 
"branches" in a session)
d. define:
  - prevS(m) as the previous message sent by the sender of m, and
  - prevR(m, u) as the previous message sent by other-user u, that has been 
seen (delivered) by the sender of m
e. the transcript also satifies, for all messages m, users u != sender(m): 
prevR(prevS(m), u) <= prevR(m, u), which I'll call "freshness consistency". 
(this is useful for more complex topics like consensus, and gives some 
protection against dishonest declarations of parents.)

(A "reasonable" implementation of OldBlue should automatically satisfy (c) and 
(e), but this is not checked by other users and can be corrupted, although this 
is outside of their security model.)

(aside: Using prevS and prevR, we can also generalise forward-secrecy ratchets 
such as Axolotl to multiple parties, but this is a topic for another day. :)

Additionally:

f. each message can be referred-to by an unforgeable content-based ID, e.g. a 
preimage-resistant and 2-preimage-resistant hash of its ciphertext 
representation.
  - for discussions of the benefits of such a scheme, see other docs on the 
internet. abstractly, they act as unforgeable tokens of validity, that anyone 
can use by themselves without requiring other arbiters.
  - this also automatically enforces acyclicity, so that we don't need to 
explicitly check this ourselves
g. this reduces "protection against re-order/replay-attack of all messages" 
into "protection against replay-attack of the first message in a session", a 
much easier problem.
h. this also automaticaly protects against certain types of drop-attacks (i.e. 
cases where we receive future messages that refer to dropped messages).
  - we can partially-protect against all drop-attacks by requiring periodic 
acks (that refer to previous messages), which turns this problem into the 
previous problem.
  - I suspect this is the best we can do for drop-attacks

These last few points are especially important in high-latency / 
low-availability environments.

I currently have two mental images of looking at transcripts, that are 
equivalent to each other and to the formal model above:

S. as a sparse DAG, like a git history log - but note that git history logs do 
not have the restrictions (c) and (e)
T. as a set of totally-ordered message-chains, one chain for each user, where 
every message m also points to any appropriate parents (messages sent by other 
users), such that prevR(m, u) is defined for all u.
  - this is basically a cryptographic form of a vector clock, where seqnums are 
replaced by hash IDs to parent messages
  - as with (b), the actual representation of the parent-pointers is an 
implementation detail[2]

Depending on what you're trying to achieve, either S or T is more suitable as 
the image to play with, but they are both equivalent.

# Problem 1: representing history for both old and new members, when a member 
enters the chat

So far the model has assumed, as in git and OldBlue, that all participants see 
all messages. This gives us a simple way to achieve (g) and (h), by following 
parent pointers around and waiting until we've verified that the older messages 
do indeed have the expected IDs.

However, in the mpOTR (and general group chat) setting, we have dynamic groups, 
with the condition that new members should not see previous messages[3].

The reverse situation is easy - when old members part a chat, we simply stop 
encrypting/sending messages to them, similarly to how we might refrain from 
publishing private branches in git (where we commited new secrets) to an old 
repo. However, when new members join a chat, existing members still have 
history of the previous conversation, and, as required by (b), must declare 
this history.

There are a few options. I have not thought through any of these in great 
detail, yet. Some of them I don't intend to seriously consider, but might be 
convinced to otherwise.

- start an entirely new session (too much overhead)
- don't declare the old parent (too complex, too "exceptional", may screw with 
transcript accounting of the other members)
- declare the old parent, but (somehave) have the new member not verify these 
parents. this is my currently-preferred approach, but I have yet to work out 
the details of it. potential problems:
  - leak of information from the previous chat. probably not a problem, since 
the IDs are not reversible.
  - we now rely on other members to "sound the alarm" on a bad parent ID. 
probably even less of a problem, since we are not meant to know the history 
anyways.
  - a more significant problem is the case where a member parts and re-joins 
the chat. in this case they have *some* partial shared old history. do we want 
to reconcile this with the others' history (what problems would occur if we 
don't do this)? how do we do it well?

# Problem 2: defining consistent orders for global events

In the case of a total-ordering on messages, it is simple "when" a person joins 
the chat, since the history is linear, and all events are "before" or "after" 
any other event.

In the case of a general partial-order, where we can have arbitrarily-many 
branches, this problem is not solvable. A party may fork the chat and decide to 
completely ignore a decision-event. But - our restriction (c) forces a 
decision-event to be accepted, or else dissentors must stop participating.

However, the traditional total-ordering sense of "before" (a <= b) or "after" 
(b <= a) is no longer useful, since there may be events that are neither <= nor 
>= wrt a decision-event. Instead, we can talk about "not-after" and "after" a 
decision-event, and now the transition between "not-after" and "after" is a cut 
in the graph that might span multiple edges.

So far, things are still manageable. However, when we bring multiple join/part 
events into play, the question of "who is in the chat" becomes more complex. 
(This is vital, because you need to know "who do I send my next message to".) 
In our current model, join/part events are not totally-ordered, and picking an 
arbitrary order for these events to operate on "a set of participants" may lead 
to inconsistencies between participants. I have only fully-defined this problem 
in the past few days, so we'll see how hard it is to resolve, but we may need 
to introduce additional restrictions on top of (c) and (e).

Comments, advice, and/or pointers to any previous research in this area would 
be greatly appreciated!

X

[0] http://matt.singlethink.net/projects/mpotr/oldblue-draft.pdf
[1] "delivered" here means "notify the application layers", i.e. it preserves 
causal ordering/consistency in the application layer. the transcript engine may 
delay delivery of old received messages if they are determined to be later in 
the actual transcript
[2] I favour sparse pointers then relying on transitivity of <=; others have 
favoured a more verbose/redundant approach. The sparse option may be more 
complex to code (although arguably with the verbose/redundant option you need 
the same level of complexity, in order to *verify* the redundancy), but results 
in a smaller representation.
[3] With hash IDs this is highly infeasible anyway, since old ciphertext was 
not encrypted to the new member, and re-encrypting/sending would generate 
messages with different IDs.

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Messaging mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/messaging

Reply via email to