[gwt-contrib] RFC: Web worker proposal and proof of concept

Brendan Kenny Thu, 08 Apr 2010 18:06:02 -0700

This might be a bit unusual. I have a fairly reasonable proposal for
supporting web workers in GWT proper, and a working proof of concept
that shouldn't come anywhere near the core toolkit. Everyone seems
busy with what looks like 2.1 and the upcoming I/O, so I'm not
expecting miracles, but I would appreciate any wider perspective and
maybe some Compiler insight to make the implementation less of a
complete hack.


A few months ago I put together a module to get workers working nicely
with the GWT compiling/linking process. It was originally just on a
bit of a lark, but it's proven so easy to use (thanks to GWT, not me)
that I've ended up developing a few projects with it, two examples of
which I've posted over with the code (http://code.google.com/p/gwt-
ns/). I finally have some time to come back to the module and finish
some lingering issues, but thought I might try and get some other
perspectives first. If there's a GWT 2.1 web worker skunkworks project
already in progress, please let me know now =)

To give credit where its due: several files came from the Speedtracer
worker overlay code and one or two files came from GALGWT. I've
preserved copyright notices everywhere, and everything else is also
Apache 2.0 licensed. I also finally got around to watching 2008's GWT
Extreme! and found out my compilation scheme is very similar to what
Ray Cromwell was up to two years ago (which probably happens a lot).
He called it the Generator+Linker+Generator+Linker pattern, though in
this case it's more Generator+Linker(+Generator+Linker)*.

Anyway, like Ray Ryan, I wish this were a wave, but I'll stick to the
board for now. I tried to keep it short, but it got rather long, so if
there is interest I can make it look a bit more like an actual design
document in a wave and hook it up to the water cooler.

=== Motivation ===
Web workers, as currently specified
http://www.whatwg.org/specs/web-workers/current-work/
have two primary requirements. They are separate Javascript files,
loaded in a worker constructor that
1. Doesn't access the DOM
2. Don't share state or context with other execution contexts (e.g.
their parent script).
Once a worker is loaded and running, the worker and its parent can
pass messages to each other, usually in JSON form.

At least for transitional worker support, I'm suggesting a third
condition
3. No single code execution will throw a slow script warning, even
when run *normally* in an older/slower browser.
This does necessarily limit functionality, e.g. ruling out the usual
worker examples of blocking I/O and long running calculations. The
first isn't really welcome in the GWT world, anyways, and really any
synchronous operation is a bad fit for GWT/Javascript. This condition
can also just be considered transitional; I'll go with it for now and
just assert that another level of deferred binding could give allow
more flexibility on this point.

Accepting these three conditions, a true worker object becomes
functionally equivalent to a simple isolated object with only post and
receive message methods in its exposed interface (an Actor, if you're
into that sort of thing). In practice, the only difference is that
some newer platforms will execute the native worker off the main
thread.

In exchange for some loss of flexibility, instead of compiling the
workers separately, permutations for non-worker-supporting browsers
can load the worker code into an asynchronous wrapper (I'm calling it
a proxy, but I'm sure there's a more descriptive pattern name out
there) and run it on the main thread. This gets you
-- a single code base, regardless of browser worker support
-- Full development mode support, for the parent and worker scripts,
without any OOPHM alterations. The generator just loads the non-worker
version of the worker object, regardless of browser.

=== Goals ===
-- Enable the compilation of a GWT worker module to a valid worker
script (per current specification).
-- Allow any GWT module to instantiate worker modules as a worker
object, regardless of target platform's support for native javascript
workers
-- This includes allowing workers to themselves instantiate sub-
workers, including from their own module (per spec).
-- Absolutely minimal overhead to ensure gain from use of workers
-- Ensure the same behavior from emulated workers as from native
workers
-- Enable normal development-mode debugging of workers and parent
program as they execute.
-- Simplest possible construction of worker objects. If a worker
module is on the classpath, the module name is enough information to
create the worker object, since no state or context will be shared
except what is explicitly passed in a message string.
-- Creating a native worker should result in only a single server
request.
-- Worker script files should be aggressively cacheable.
-- Adding workers to a project should be as transparent as adding any
other module to the build process (excepting logged information and
extended compilation time)
-- Adding workers should not alter the behavior of any other Linkers
defined in a project (most importantly the std, sso, and xs primary
linkers)

=== Potential Future Goals ===
-- Different entry points for emulated and native workers.
-- The option to have the generator compile a worker module (or
specify an already compiled script) for development mode to debug
behavior (but not Java source) of native worker.
-- Built-in IPC to allow GWT compiler to better do its magic

=== Current Implementation ===
**Implementation goals**
-- As stated, no change for other linkers.
-- To patching of GWT. (this necessitated some workarounds)

**Valid Worker Modules**
Implicit requirements to compile a module as a worker (currently
unchecked)
-- No DOM access
-- No shared state or context with other execution contexts.
-- No SSW-generating code

Explicit requirements to compile a module as a worker (currently
enforced)
-- Addition of a primary linker which packages the resulting script in
a simple bootstrapping closure
-- A single specified entry point which extends WorkerEntryPoint and
implements the required methods
-- Properties set such that only one permutation is produced
-- No code splitting.
The last three aren't set in stone. For instance, multiple
permutations might be needed for i18n reasons. Support is certainly
possible, but makes the script selection process more complicated than
the one described below. By contrast, code splitting seems a little
strange in a worker context, but importScripts() is practically built
in support for it.

**To Use a Worker Module**
Currently, the canonical name of a valid Worker module is all the
information needed. Example usage:
@WorkerModuleDef("pkg1.pkg2.ModuleName")
interface MyWorkerFactory extends WorkerFactory { }
MyWorkerFactory factory = GWT.create(MyWorkerFactory.class);
...
Worker myWorker = factory.createAndStart();

An already running worker object is returned. Messages can be passed
to the worker via postMessage(), handlers can be attached to listen
for messages and errors from the worker, and the worker can be stopped
at any time by calling terminate().

**Non-native Implementation**
For platforms without worker support, the Generator creates a new
instance of the module's specified entry point and wraps it (as
described above). The wrapper is returned. Messages passed into and
coming out of the wrapper are queued to decouple worker and parent
execution contexts.

**Development Mode/Debugging**
Worker emulation that is "close enough" to a native implementation can
be used in its place. This means that the main script and all workers
are run in (fast) java-land and can be debugged there. This is
currently accomplished by development mode always triggering emulated
support, regardless of the browser used.

**Native Implementation**
This is where the build process becomes a little unusual. The general
overview: the generated factory uses a native method to return a
Javascript worker object, created from a URL that points to where the
worker module's compiled script will be. All worker functionality is
handled by native methods. The Generator also emits a
WorkerRequestArtifact with a reference to the indicated module. The
associated (pre) Linker collects all requests and runs another GWT
Compiler in a second process to compile the worker modules, then
inserts the finished scripts into the correct directory.

This last point causes some issues, mostly because workers can
arbitrarily create more workers.
1. Strong naming: To allow caching, worker scripts must be strongly
named. But because workers can spawn workers within themselves by
loading from a url, a change to one worker, which causes a change to
its strong name, will in turn cause a change in the strong name of its
parent. Because the worker creation graph could potentially be fully
connected, I just took a page out of code splitting and workers are
found at /workerjs/strong_name_over_all_workers/modulename.cache.js.

Conveniently, since workers are loaded by relative url, the strong
name isn't needed by them, just the module name. However, the parent
module does need the full strong name, and--unlike code splitting--
because all permutations (currently) share a set of workers, the name
is unrelated to any particular permutation's strongname. It is
admittedly a complete hack, but I currently insert a placeholder
string in the Generator stage, then do search and replace in the
compilation result in the pre-linker, after the worker scripts have
been compiled and linked. A far better solution would be something
like how the strongname of a permutation is included within the
emitted script, but without altering the primary linker or selection
scripts I'm not sure how this could be accomplished

2. A new process and compilation every time a worker is needed isn't
acceptable when workers can create workers can create workers....
Since workers only need to know the module name, when a module detects
that it is being compiled in the secondary level, it just sends a
message back up to the primary level to enqueue the request for the
worker script it needs, then just loads from the relative url (no
strong name needed) and relies on the parent compilation process to
create the script.

However, even with improvements to these two points, fundamentally
this process is very redundant. Creating a worker is actually a
somewhat simpler form of code splitting, except instead of shared code
all going in the first loaded fragment, it needs to be part of every
fragment that uses it. In this implementation, because older-browser
permutations load the worker code directly, full ASTs are being
generated, used for some permutations, discarded, and then created
again in a second process. Ridiculous, but it was the most efficient
way I could find to do the job without patching the core library. I
also didn't want to get into creating a code-splitting-like feature
that doesn't interfere with actual code splitting, not to mention
linkers, selection scripts, etc etc, which this approach is able to
side step.

=== ===
Like I said, the code is still early and many features are currently
unimplemented. Notably, I've only got it working on the simplest build
system possible (Eclipse plugin on a single computer), so I'm not sure
what will happen in more complicated setups (custom ant scripts,
maven, etc).

If that wasn't enough, I've written a bit more here
http://extremelysatisfactorytotalitarianism.com/blog/?p=645
http://extremelysatisfactorytotalitarianism.com/blog/?p=932
The first goes into how one of the samples works. The second makes the
argument that, if you're already using the MVP pattern or something
like an event bus, the transition to using workers may tip heavily to
the benefit side of a cost/benefit analysis. The examples I've
released publicly are mostly just toys, but hopefully I can talk about
something more enterprise-y soon.

For me, I've found workers to be incredibly easy to use and integrate
into my own projects, far easier than I expected. This is partly due
to the dead-simple worker spec, but most of it has to do with the
completely amazing toolset you guys have created and continue to
improve. Magic from good engineering. I'm sure you'll be hearing that
sort of thing a lot more in a month or so, but I thought I'd get it in
early.

Again, I'd appreciate any thoughts in this busy time. If I'm missing
something incredibly obvious, let me down easy.

-- 
http://groups.google.com/group/Google-Web-Toolkit-Contributors

To unsubscribe, reply using "remove me" as the subject.

[gwt-contrib] RFC: Web worker proposal and proof of concept

Reply via email to