Thanks a lot for your reply, Andy.
Some time ago I came up with a proof of concept multi-threaded implementation.
The GPU is not used if the system load (measured through getloadvg under linux)
is below a certain threshold.
Otherwise each thread puts its message (on which the private key operation has
to be performed) into a shared buffer.
If the buffer is full after inserting the message, the current thread runs the
private key operation batch on the GPU.
If the buffer is not full it sleeps for some time.
The first thread to wake up runs the batch on the GPU even if the buffer is not
full.
There thread running the batch wakes up the others afterwards.
What do you think about this approach?
Can you give more insight about developing the idea for DNSSEC?
How can I go about that?