> On Mon, 23 Oct 2000, David Schwartz wrote:
> >
> > >   Some of the difficulties that I'm having revolve around the
> > > fact that the socket I/O code is
> > > integrated into the protocol code.
> >
> >     It really isn't. You can very easily use the SSL code just
> to do the SSL
> > stuff and do all the I/O yourself. The only feature you lose when you do
> > this is client connection reuse.
> This is not true. Session caching is independant of the IO mechanism you
> choose to use.

        Then how does the client code know which session to reuse? It doesn't know
what server it's talking to.

> >     It's a bit tricky, but not impossible. All you have to do
> is create an SSL
> > structure and run a loop. When you receive SSL data, hand it to the SSL
> > code. When you have plaintext you want to send, hand it to the SSL code.
> Hm, this is not really sensible - and explains why you made the incorrect
> assumption below (that "you spin the CPU" etc). I've posted to this list a
> couple of times before a general way to do this sort of "sandboxed" SSL,
> where it operates in a memory IO environment, and network IO is handled
> somewhere else. It is a state-machine approach, and as such is invoked
> when buffered data has arrived from the network abstraction for delivery
> to either ("clean" or "dirty") side of the SSL machine - it requires that
> the state machine also ensure that any data coming out of the SSL state
> machine on either side is placed into "outgoing buffers" - and these can
> be used by the network framework to indicate a desire to "send" by the SSL
> object. The only time the state machine needs to "loop" without the
> catalyst of network activity is when (for whatever reason) you wish to
> proactively do something to the SSL stream - eg. force a session
> renegotiate at a given point in time, or close it down etc. Whatever
> triggers your program's desire to do these things implicitly gives you
> your chance to make a call in to do it (which by invoking the state
> machine will then generate the subsequent traffic to place into network
> buffers, etc).

        Exactly. I've done this myself in the SSL version of ConferenceRoom and its
web server. Really the only special case is during connection setup. In this
case, the SSL code may want to send data out the dirty side even though you
didn't hand it anything to send.

> The only true "blocking" you'll get is if you use blocking IO directly in
> the SSL/BIO setup. If you are operating the SSL state machine using memory
> buffers then this does not (and can not) happen. Of course, at certain
> points in the flow of an SSL handshake, your program will kind of "block"
> - it's the point where the OpenSSL code sees that it is time to compute an
> RSA/DSA/etc operation and these are not cheap in CPU terms. However, no
> matter which way you slice it - your CPU is still going to have to do
> these operations (unless you want to talk hardware acceleration which
> opens a new can of worms), but this is not "blocking" in the sense you
> meant it.

        Right, the SSL will never block on I/O. The only harmful affect it may have
on your system is that it will spin the CPU when it has expensive
operations. Some servers have response time requirements that this can
adversely affect.

> >     If you really have a single process that handles large
> numbers of clients
> > concurrently, it would have to already be multithreaded. Otherwise, for
> Not true. The only need for multithreading/multiprocessing is to get a
> tighter limit on latency, it won't gain you anything in terms of the
> throughput (with the obvious exception that a single thread or process can
> only utilise 1 CPU, even in a SMP machine).

        Nonsense. A singly-threaded program can't do any other work while it's
waiting for a disk read or while the operating system is servicing a page
fault. A multi-threaded process can. Singly-threaded servers are notoriously
bursty because of this.

> In fact, non-blocking (async)
> multiplexing of many SSL streams in the same process/thread can have some
> distinct performance advantages; most notably; (1) you don't have loads of
> context switches going on,

        Why would you have context switches in a multi-threaded approach? I wasn't
suggesting one-thread-per-connnection. I'm not that stupid.

> and (2) the longer the "loop" across all the
> distinct SSL streams takes, the more chance that an SSL stream will
> accumulate larger "packets" for when its turn next comes round again. If
> your responsiveness is too quick, you may end up encapsulating more (and
> smaller) blocks, each one having its own SSL overhead (in terms of
> processing and data size). If your peer is trickling through bytes quickly
> but only one at a time, the SSL traffic bloat will be huge if you pump
> these in and out of the SSL machine a byte at a time - so having a slight
> builtin latency to let those bytes accumulate before they're pumped into
> the SSL en masse means you'll be making better use of your "packaging" :-)

        You get that automatically with any setup. If load is high, it will take
you longer to get around to doing anything -- with any architecture.

> > example, a single page fault would stall your entire server. If
> your server
> Well, failures can wreck havoc in any circumstances - and "threading" only
> solves a certain set of failure types, which even then depends on your
> flavour of threading (green, kernel, ....). You're pointing out other
> forms of "blocking" as an argument against doing non-blocking async SSL in
> one thread/process ... yes, if your flow (whether within SSL or outside
> it) includes genuine blocking (eg. waiting for interrupts, doing reads
> across NFS, whatever) then you need to bear that in mind from the point of
> view of latencies.

        Blocking is, in the general case, unavoidable. There is always, for
example, the case where a client causes an seldom-used code path to be used
which requires the servicing of a page fault. Why stall your entire server
for as long as it takes the page fault to get serviced?

> However, the original question seemed to indicate quite
> the opposite - the person was wanting SSL logic inside an API sandbox that
> took away control of network sockets etc. If disk reads and writes are
> similarly not factoring into his inline considerations, then this whole
> point is nulled. If your required throughput is 10 concurrent SSL streams,
> the program logic will have to do the same real work, whether or not you
> choose to throw a few thousand extra context switches in there. If total
> throughput performance and traffic inflation are important, async in one
> thread/process is generally much better than multitasking, if latency is
> paramount, the converse is true. Multitasking is not the be-all and
> end-all ...

        You seem to be equating multithreaded with a thread-per-connection
approach, as opposed to a 'thread per CPU, plus thread per I/O I wish to

> > does any disk I/O and encounters an NFS-mounted file, do you
> stall the whole
> > server while the read completes? If your server is 100% CPU and
> no blocking
> > I/O, then you can get away with a single thread, but then SSL
> won't change
> > anything. All it will do, if you don't let it do the I/O, is
> spin the CPU.
> Again, this is not true - you get an "event" to turn the SSL state machine
> - namely notification from your outside network code that data has
> arrived. When you run around the state machine trying to push that data
> in, you should at the same time pop out any generated outgoing data the
> SSL wants to send. Once that is done - the SSL will not spontaneously
> create traffic out of thin air ... the SSL is "idle" until some other
> network activity takes place, or you proactively decide that you *want*
> something to happen.

        Nevertheless, the SSL code will take the CPU for arbitrarily long amounts
of time, depending upon how much work it has to do. This may or may not be a
problem, depending upon parameters of his situation that we don't know.
Threads may or may not be a good solution to that, again depending upon
factors we don't know.

        My own code uses bio pairs. We special case the connection setup phase.
Otherwise, we basically just manage the four I/O streams and the SSL code
does its part without any special effort. Of course, it's multithreaded, but
then it has to run on high-end SMP machines.


