Hi,
I've written a small program that gathers randomness from the
uncertainty of scheduling between threads/cores in a multithreaded
program/system. The purpose of this is to generate random numbers
entirely in userspace (in case the /dev/random traffic is somehow
being watched, etc.). Attackers could still get their hands on the
random numbers, but I'm guessing it would be a lot more work than
simply tapping into the kernel/userspace interfaces -- they would have
to peek into the address space of the process generating the numbers.
One way to use this may be to seed a PRNG which runs in parallel with
e.g. /dev/random; XORing them together should yield a bitstream with
quality at least as good as the best of them, and will make it more
difficult to predict the output simply having access to the kernel's
secret state (or the numbers generated by it).
Is this interesting for the OpenSSL project? Or do you have something
like this already? (I could only find the use of certain things like
the current time, pid, uid, etc., which I think an attacker would have
much easier access to.) Or is it not really a concern in the first
place?
I attached my code -- it is not rigorous, but I think I avoided the
worst pitfalls. I'm not sure what kind of entropy/quality the output
has, but it should be better than nothing at all.
$ g++ seed.cc -lpthread -lssl -lcrypto && ./a.out
b41c91f348638116
a165d0fac8b2304e
282d0d24311d7511
[...]
The code works only on x86_64, and probably also only on Linux.
Vegard
#include <assert.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
extern "C" {
#include <openssl/sha.h>
#include <pthread.h>
}
static inline uint64_t xchg(uint64_t *addr, uint64_t value)
{
uint64_t result;
asm volatile ("lock xchgq %0, %1"
: "=r" (result), "+m" (*addr)
: "0" (value)
: "memory");
return result;
}
struct thread_data {
uint64_t *shared;
uint64_t id;
uint8_t output[32];
};
static void *thread_fn(void *data)
{
thread_data &d = *(thread_data *) data;
static const unsigned int nr_samples = 100;
/* Each thread should go into the loop, counting the number of times
* they were able to read the counter in a row. */
uint8_t samples[nr_samples];
for (unsigned int i = 0; i < nr_samples; ++i) {
uint8_t sample = 0;
while (1) {
uint64_t old_value = xchg(d.shared, d.id);
if (old_value == (d.id & 1))
++sample;
else if (sample >= 5)
break;
}
samples[i] = sample;
}
/* Signal to the other thread that we are done, while at the same
* time allowing the other thread to continue sampling if it is
* not done itself. */
while (1) {
uint64_t old_value = xchg(d.shared, 2 | d.id);
if (old_value == (2 | !d.id))
break;
}
/* At this point, both threads are done. */
/* Hash the gathered data ("false entropy amplification") */
SHA256(samples, sizeof(samples), d.output);
return 0;
}
uint64_t seed()
{
uint64_t shared = 0;
thread_data data[2];
pthread_t threads[2];
/* Do as much of the setup as we can before starting any of the
* threads. (The kernel overhead is probably much too large for
* this to matter anyway, though.) */
for (unsigned int i = 0; i < 2; ++i) {
data[i].shared = &shared;
data[i].id = i;
}
for (unsigned int i = 0; i < 2; ++i)
pthread_create(&threads[i], 0, &thread_fn, &data[i]);
for (unsigned int i = 0; i < 2; ++i)
pthread_join(threads[i], 0);
/* Merge the data from the two threads and compress it to a 64-bit
* value. */
uint64_t ret = 0;
for (unsigned int i = 0; i < 32; i += 8) {
for (unsigned int j = 0; j < 8; ++j)
ret ^= (uint64_t) (data[0].output[i + j] ^ data[1].output[i + j]) << (j << 3);
}
return ret;
}
int main(int argc, char *argv[])
{
while (1)
fprintf(stderr, "%016lx\n", seed());
return 0;
}