Re: Modules: Name capture

Kam Kasravi Wed, 26 May 2010 19:34:49 -0700

Hi Ihab

This problem has many similarities to the XML / WSDL world.
The use of namespaces and versioning has been leveraged there to
disambiguate names which otherwise could be occluded by 
changing interfaces. That being said, I like your suggestion of 
'renaming at the boundary' where one could possibly 
disambiguate like names. Independent of versioning, 
which is how many large software systems guarantee 
integrity, we need something like namespaces or 
your 'boundary renaming' to avoid name collisions. 
Section 4.2.1 in XSchema describes import, include and 
redefine as mechanisms to allow composition of multiple 
schemas and is worth reading.


thanks
kam







________________________________
From: "ihab.a...@gmail.com" <ihab.a...@gmail.com>
To: es-discuss Steen <es-discuss@mozilla.org>
Sent: Wed, May 26, 2010 3:39:10 PM
Subject: Modules: Name capture

Hi folks,

As promised, this is the first major issue I wish to raise regarding
the Simple Modules strawman. This point was first brought to our
collective attention by my colleague, Jasvir Nagra.

I'm describing this from scratch here, even though this came up
previously on the list, to help anyone who has not been following the
previous thread in detail get a handle on the issues. I've divided it
into sections to avoid tl;dr for people who are already up to speed.

== Motivating factors we agree upon ==

The Web is large, and full of code. For any given corpus, there may be
millions of copies on various people's servers; hundreds of forks on
github; and dozens of "officially supported" versions on the Website
of the maintainers. This presents a distributed version matching and
*naming* problem quite unlike any other we have encountered so far in
software development. Software is now created as casually as Web
pages.

All of us who are interested in the module problem seem to agree (and
please correct me if I am mistaken in this or any other collective
claim) that even URL string equality is not a good metric of whether
the resource found at that URL is "the same": on the one hand, the
stuff retrieved via a URL can change from one moment to the next; and
on the other hand, distinct URLs may validly refer to bitwise
identical code.

All of us also seem to agree that we cannot brush this problem under
the rug completely: we cannot *completely* relegate it to off-spec
"configuration management" or "build process". When one is composing
software from various independent sources, this software must
sometimes make demands for loading things that the parent loader did
not even know about. Yet the parent loader must sometimes provide
modules to what is being loaded for convenience and unification. So,
if I write an application that loads modules M1 and M2:

* I may want to provide a common module jQuery to M1 and M2; but also

* In order to do its work, M1 may want to fetch some module Z of which
I have never heard.

To put it another way, modules on the Web must be able to wire
themselves, and compute, over the Web as it is, rather than over the
small set of software that happens to be "installed" in (say) the
"/usr/lib" directory of the local system. In Python, I can simply
"import smtplib". On the Web, the question is, "which one"?

== Current Simple Modules solution ==

The Simple Modules strawman shows a rather clever solution to this
problem. It distributes the mappings -- from agreed-upon names to
gnarly URLs -- across the codebase in a fine-grained fashion. At the
most basic level, I can map the name "jQuery" to my chosen copy of the
jQuery library, and use it as below. Nested modules can choose their
own versions of (say) YUI, and use them as well.

  module jQuery = load "http://.../jquery-1.3.2.js";;
  module Foo {
    module YUI = load "http://.../yui-3.1.1.js";;
    module Bar {
      import jQuery.ajax;
      import YUI.Accordion;
      ajax(...); Accordion(...);
    }
  }

This also works if I know that some library simply uses the name
"jQuery", and I map that to my own chosen copy. In the following
example:

  // somelib.js
  import jQuery.ajax;
  ajax(...);

  // My code
  module jQuery = load "http://.../jquery-1.3.2.js";;
  module Somelib = load "http://.../somelib.js";;

the code in "somelib.js" will pick up the jQuery that I have defined
prior to the "load".

The clever part of this is, again, the decentralized name assignments.

== Pitfalls in the Simple Modules solution ==

The problem is that, in the space of module names, the current Simple
Modules strawman introduces a hazard of inadvertent name capture
(though it does clearly separate other names such as "var", "const"
and "function" declarations). An example is:

  // zero.js:
  module jQuery = load 'jquery.js';
  module Drawing = load 'footgun.js';
  module One = load 'one.js';

  // one.js:
  import jQuery.ajax;
  module Two = load 'two.js';

  // two.js:
  import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a
reference to Drawing. Now, unbeknownst to the author of "one.js",
"two.js" changed and now refers to something named Drawing, which it
expected to draw a picture:

  // two.js:
  import jQuery.ajax;
  import Drawing.draw;
  draw(); // intended to draw a picture

Unfortunately, it is now actually using the gun library, and thus
breaks the correctness of the system.

The key things to note here are:

1. Under *mutation* of the code of "two.js" on the Web, which we agree
will happen (see above regarding mistrust of URL equality), the author
of "one.js" has no defense against this accidental capture of the name
"Drawing". The author cannot prevent this capture by controlling the
environment of "two.js", nor can they foresee all possible
environments such as "one.js" in which they may be embedded.

2. The above is an easily seen (not to mention crippling...)
manifestation of this problem. More generally, in a large system with
many independently named versions of the same library floating around,
more subtle captures, with harder-to-debug problems, may occur.

3. Since we agree we cannot control "two.js" in this scenario, our
choice here is whether to fail subtly or to fail fast. I claim we must
fail fast.

There are some fail-fast solutions to this problem that come to mind
immediately. I encourage us to brainstorm others as well.

== Solution 1: Explicit sub-module environment ==

Whenever there is a transition from one compilation unit to the other
-- i.e., across any "load" -- we should explicitly specify the imports
the module is allowed to inherit. No other imports may cross the
"load" boundary. So, we would rewrite the example above as:

  // zero.js:
  module jQuery = load 'jquery.js';
  module Drawing = load 'footgun.js';
  module One = load 'one.js' with {Drawing, jQuery};

  // one.js:
  import jQuery.ajax;
  module Two = load 'two.js' with {jQuery};

  // two.js:
  import jQuery.ajax;
  import Drawing.draw; // => static error!!!
  draw();

Clearly, we could also support renaming at the boundary, like this:

  // zero.js:
  module JQ = load 'jquery.js';
  module Gun = load 'footgun.js';
  module One = load 'one.js' with {Drawing: Gun, jQuery: JQ};

== Solution 2: Catalog per loader ==

A module loader represents a community of mutually independent module
instances which together form a coherent subsystem. If there is to be
any sharing of module instances, and absent URL equality, these
modules must be written with some knowledge of the "community" in
which they live.

Perhaps a good way to understand this is to make the analogy with
Linux distros. A distro effectively maps memorable, agreed-upon names
like "libmpeg2" to concrete software resources. There is nothing
keeping a dozen programmers in the wild from building a dozen
different things and calling them "libmpeg2" -- but, within the Linux
community, and specifically within a distro, there is only one
"libmpeg2". Programs declare their dependency on it, often with a
version identifier.

If we view modules in a loader in the same way, then each loader
should contain a *single* mapping -- call it a catalog -- from
memorable names to concrete resources (perhaps identified by URL).
Each catalog is an implicit community, or agreement point:

  * The catalog of "biology software" hosted at
http://biojavascript.org/catalog.json

  * The catalog of "canvas widgets" hosted at http://cwidgets.org/cat.json

  * A meta-catalog of "general stuff" hosted at http://allmyjs.org/root.json
        Perhaps there are distinct versions:
              http://allmyjs.org/distros/distro3.1.8.json
              http://allmyjs.org/distros/distro4.2.9.json

All modules in a loader would be expected to use the same catalog.
Catalogs could refer to one another, and so reuse one anothers'
bindings. This would allow all programmers to predict the effect of
using a new name. In the example of the modified "two.js" previously,
the statement:

  import Drawing.draw;

would work but, by referring to the same catalog, the authors of all
three of "one.js", "two.js" and "three.js" would have agreed that the
symbol Drawing stands for something that operates a footgun and treat
it with due care. The catalog would contain an entry like:

  {
    Drawing: "http://.../footgun.js";,
    ...
  }

That all said, I don't yet have a good solution for how a module would
declare that it is defined relative to a specific catalog. It's not
enough to add an extra argument -- the catalog -- to the constructor
of a module loader. The individual modules would have to specify their
dependency on the catalog. This requires extra syntax and machinery. I
am loath to add either.

== Solution 3: Forming a more generative Union ==

Going back to basics, let's ask ourselves: Why is this such a problem
in the first place? In other words, why is it so important that
"one.js" and "two.js" use the *same* jQuery and Drawing modules? There
are a variety of answers, including:

* Performance optimization: The code can be shared and memory saved.

* Shared state: Each module contains important shared state (such as
cached information about the DOM, or off-screen bitmaps used to
double-buffer a <canvas> widget module) which its clients need to
share in order to properly collaborate according to the module's
rules.

* Programmer familiarity: Programmers are accustomed to dividing up
their program into sections, each of which is logically a singleton in
"the system", and establishing communication between them.

What if we assumed that the result of "importing" a module is just the
code of the module, ready to be instantiated with external state? In
low-level terms, a module represents a "code segment" in memory which
may be shared between its instantiations.

The effect of this choice is that dependencies between software are
written using direct object passing, rather than attempts to denote
the "same" module. Each module specifies as free variables the objects
it requires from its caller and, when it loads another module, it does
so with no expectation that what it gets is shared with anyone else.
So "two.js" could start out like:

  // two.js
  /** @require jQuery a jQuery instance */
  jQuery.ajax();

and could move to:

  // two.js
  /** @require jQuery a jQuery instance */
  module Drawing = load "http://.../picturesOfCats.js"; with {
    jQuery: jQuery
  };
  jQuery.ajax();
  Drawing.draw();

The difference between this and Solution 1 (and the original Simple
Modules strawman) is that there are no promises made that are not
explicitly wired. With Solution 1, there is still lack of clarity
about what a "singleton" module represents -- depending on how modules
are redefined down the chain of loadings, some singletons are more
single than others. With this solution, only object APIs may define
the behavior expected, and nothing is naturally expected to be a
singleton.

To put it another way, if the original Simple Modules and Solution 1
are taken to their logical conclusion, one must assume, because
modules may be redefined along the loading chain, that nothing is
*really* a singleton. If one is to code defensively against this
situation anyway, why not make this the default and gain the attendant
simplicity?

Let's revisit the points brought up earlier:

* Performance optimization: This is up to the implementation. For
example, a perfectly reasonable implementation can do a HEAD request
for every URL loaded and, if it detects no change, reuse the code.

* Shared state: Shared state is now represented explicitly using objects.

* Programmer familiarity: It's really not that bad. :) Programs
continue to be written with free variables, just as <script>s are.
Programmers learn to introduce concrete objects into the lexical scope
of their programs. And they write "export" statements to export
variables back.

To optimize this a bit, it is possible to introduce a concept of
"packages" (under active debate in CommonJS at the moment) to gather
things up. This improves, but does not intrinsically modify, the model
presented here.

== Afterword ==

Some of the solutions I present here are similar to proposals I have
already made. In all sincerity, I cannot help that. But, if there are
other solutions extant, I assure you that my brain is open. :)

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Modules: Name capture

Reply via email to