Re: [PATCH] Change Message-ID generation to be more unique and leak less information

Ian Collier Mon, 20 Apr 2020 14:10:12 -0700

On Mon, Apr 20, 2020 at 07:08:07PM +0200, Gero Treuner wrote:
> The concern is that the inputs based on local and/or private information
> can be leaked. To achieve this the search space must be big enough.
[heavily snipped, of course]
> We need data which is unrelated to the email but - to be deterministic
> with regard to other Mutt instances - is equal to all Mutt instances on
> the same machine


I am -0 on this whole thing.  I don't consider the PID to be a meaningful
"leak" and nor am I that concerned about revealing the number of messages
I've sent (modulo 26) in this mutt session.  I think the duplicate-messageid
attack is theoretical only and would never occur in practice.

However...

As Arnt has implied, the current method of generating the Message-ID
does not *guarantee* uniqueness; merely makes it highly improbable to
be non-unique.  The thing is that we are not just concerned with other
instances of mutt on this host, but with other instances of mutt that
have the same "hostname" setting.  Derek has already mentioned that
some people set this to their mail domain rather than the name of the
specific host it's running on.  In my domain we have a couple of hundred
hosts so configured in their /etc/Muttrc.local.  Given 15-bit PIDs, the
birthday paradox tells us that with one instance of mutt started up on
each machine there's a ~45% chance that two (or more) of them have the
same PID - and the probability is higher if people habitually start mutt
soon after booting and logging in.  [Of course in practice the actual
number of machines here with a running instance of mutt is roughly 1].

I believe the 1998 Internet Draft by Matt Curtin and Jamie Zawinski[0]
has been cited in this thread.  It was dismissed because this draft
uses random numbers and therefore doesn't *guarantee* uniqueness.  But
it suggests the use of a 64-bit number from a high-quality RNG; if the
number is truly random then generating a million of these during the
same second would give you a 1-in-40-million chance of a collision.
These seem like acceptable odds to me (much better than the current
odds for a couple of users using mutt on different computers in the
same domain as described above).

But what if your RNG isn't good enough?  I understand (but may stand
to be corrected because I am not an expert in cryptography) that a
cryptographic hash function such as sha1 (which we have in mutt) can be
used to make random numbers.  The size of the seed would be 160 bits,
and if we use just 64 bits of the seed to produce the random number,
that leaves enough hidden information that the sequence cannot be
predicted.  What to use as the initial seed?  How about 128 bits
from your imperfect RNG plus the least 16 bits of the current time
(in microseconds if available) plus the least 16 bits of the PID
(doesn't really matter if the top bit of that is always zero).  This
guarantees that two mutt instances started at different times (or at
the same time on a single machine) will get different seeds even if
the RNG is rubbish enough to duplicate itself across instances.

Alternatively, would the sha1 of the current message body and envelope
plus the current time in milliseconds be random and/or unique enough?
I guess so.  It depends on whether you want to go to the trouble of
hashing the whole message.

One thing, though: use base36, not base64 - as recommended in [0].
Base64 only saves 4 characters and you don't necessarily need to put all
160 bits of the sha1 into the Message-ID.

imc

[0] https://www.jwz.org/doc/mid.html

Re: [PATCH] Change Message-ID generation to be more unique and leak less information

Reply via email to