On Wed, Feb 25, 2015 at 9:23 PM, Cliff Jansen <cliffjan...@gmail.com> wrote:

> Two usage cases, very desirable in Proton, happen to be mostly trivial
> on POSIX but mostly non-trivial on Windows: multi-threading and
> external loops.  Proton io and selector classes are POSIX-y in look
> and feel, but IO completion ports are very different and can mimic the
> POSIX functionality only so far.  The documented restrictions I wrote
> (PROTON-668) tries to curb expectations for cross platform
> functionality (but still allow use cases like Dispatch).
>
> This, however, is not a cross platform issue.  This particular problem
> is confined to user space and affects all platforms equally.
> Presumably the API needs fixing or we agree to take a step backwards.
>

There is a pretty severe limitation without this fix since there is no way
for the API to communicate socket errors back to the user. Also, even if we
roll back the fix, the wouldblock flag is accessed with exactly the same
pattern as the error slot. I would assume this means there is still an
issue even with the fix rolled back since presumably two different threads
could overwrite the wouldblock and you could get a hang or an error since
one/both of them could get the wrong value.

Note that Bozo previously pointed out (Proton mailing list) that the
> pn_io_t API had threading inconsistencies with pn_io_error() and
> pn_io_wouldblock().  Perhaps a pn_selectable_t should be passed in
> instead of a socket parameter, or proton should maintain a map of
> errors and wouldblock state for sockets until they are pn_close'd (as
> the Windows code already does for completion port descriptor state).
> The former would be more consistent with proton overall, the latter
> would require some user space locking.  Another possibility could be
> to pass in pointers to error/wouldblock state as part of the io call.
>

Passing in specific pointers to error and/or wouldblock state seems like it
is less flexible than passing in a pointer to something more abstract that
can contain not only the error state, but also whatever other thread local
state makes sense. My assumption is/was that pn_io_t provides this context.


> While I still think this is not a Windows issue, and the documentation
> is supposed to reflect the Dispatch pattern and not handcuff it, here
> is more about the pn_io_t implementation:
>
>   Global state is in struct iocp_t, per socket state is in struct
> iocpdesc_t
>   (see iocp.h, the OS looks after this stuff in POSIX)
>
>   It has to set up and tear down the Windows socket system.
>   It has to "know" if the application is using an external loop or
> pn_selector_select
>   Setup the completion port subsystem (unless external loop)
>   It has to find IOCP state for each passed in socket
>   Manage multiple concurrent io operations per socket (N writes + 1 read)
>   Notify PN_READABLE, PN_WRITABLE, etc changes to the selector (if
> any) for each call
>   Do an elaborate tear down on pn_io_free (drain completions, force
> close dangling sockets)
>
> Regarding the documentation, I looked at Dispatch, which had been
> using Proton in a multi-threaded manner for some time with
> considerable success.  The old driver.c (now deprecated) allowed
> simultaneous threads to do
>
>   pn_connector_process()  <- but no two threads sharing a
> connector/transport/socket
>   pn_driver_wait_n() combined with pn_connector_next() <- only one
> thread waiting/choosing at a time
>   pn_driver_wakeup() <- any thread, any time, to unstick things
>   everything else (listen accept connect) considered non-thread safe
>
> which provides plenty of parallelism if you have more than one connection.
>
> The documentation I wrote tried to say that you could do that much on
> any platform, but no more (without risking undefined behavior).
> Things (and the documentation) get further complicated by supporting
> external loops, which prevents the use of IO completion ports
> completely for a given pn_io_t and uses a different set of system
> calls.
>
> Perhaps the doc restrictions could be summarized as:
>
>   One pn_io_t/pn_selector_t, one thread -> no restrictions
>   One pn_io_t/pn_selector_t, multi threads -> limited thread safety
> (Dispatch)
>   One pn_io_t, no pn_selector_t, external loop, one thread -> no
> restrictions
>   One pn_io_t, no selector, external loop, multi threads -> ???
>   multiple pn_io_t: doable, but sockets must stick to one pn_io_
>
>
> Some difficulties you might not expect: Linux doesn't care if sockets
> move between selectors, or if one thread is reading on a socket while
> another is writing to the same one.  Simple things like this would
> have major design and performance implications for Proton on Windows.
>

It sounds like one way or another we need at least some design changes. I
don't think it's workable to have overlapping/close but distinct semantics
for the API on different platforms (e.g. you can move sockets on one
platform but not on another). I'm starting to think we either need one
platform to precisely and fully emulate the semantics of the other
platform, or they both need to implement some interface that is slightly
higher level and can better accommodate the differences.

--Rafael

Reply via email to