Thanks for writing this up and exploring the different options Akash! I left some comments in the doc. It seems to me the windows thread pool API is a mix of "event" processing (timers, i/o), as well a work queue. Since libprocess already provides a work queue via `Process`es, there's some overlap there. I assume that using the event processing subset of the windows thread pool API along with just 1 fixed thread is essentially equivalent to having a "windows event loop"? We won't be using the work queue aspect of the windows thread pool, right?
On Thu, Apr 12, 2018 at 11:58 AM, Akash Gupta (EOSG) < aka...@microsoft.com.invalid> wrote: > Hi all, > > A few weeks ago, we found serious issues with the current asynchronous IO > implementation on Windows. The two eventing libraries in Mesos (libevent > and libev) use `select` on Windows, which is socket-only on Windows. In > fact, both of these libraries typedef their socket type as SOCKET, so > passing in an arbitrary file handle should not even compile. Essentially, > they aren't suitable for general purpose asynchronous IO on Windows. > > This bug wasn't found earlier due to a number of reasons. Mesos has a > `WindowsFD` class that multiplexes and demultiplexes the different Windows > file types (HANDLE & SOCKET) into a singular type in order to work similar > to UNIX platforms that use `int` for any type of file descriptor. Since > WindowsFD is castable to a SOCKET, there were no compile errors for using > HANDLES in libevent. Furthermore, none of the Windows HANDLEs were opened > in asynchronous mode, so they were always blocking. This means that > currently, any non-socket IO in Mesos blocks on Windows, so we never got > runtime errors for sending arbitrary handles to libevent's event loop. > Also, some of the unit tests that would catch this blocking behavior like > in io_tests.cpp were disabled, so it was never caught in the unit tests. > > We wrote up a proposal on implementing asynchronous IO on Windows. The > proposal is split into two parts that focus on stout and libprocess > changes. The stout changes focus on opening and using asynchronous handles > in the stout IO implementations. The libprocess changes focus on replacing > libevent with another eventing library. We propose using the Windows > Threadpool library, which is a native Win32 API that works like an event > loop by allowing the user to schedule asynchronous events. Both Mesos and > Windows uses the proactor IO pattern, so they map very cleanly. We prefer > it over other asynchronous libraries like libuv and ASIO, since they have > some issues mentioned in the design proposal like missing some features due > to supporting older Windows versions. However, we understand the > maintenance burden of adding another library, so we're looking for feedback > on the design proposal. > > Link to JIRA issue: https://issues.apache.org/jira/browse/MESOS-8668 > > Link to design doc: https://docs.google.com/document/d/1VG_ > 8FTpWHiC7pKPoH4e-Yp7IFvAm-2wcFuk63lYByqo/edit?usp=sharing > > Thanks, > Akash >