Hi Ihab This problem has many similarities to the XML / WSDL world. The use of namespaces and versioning has been leveraged there to disambiguate names which otherwise could be occluded by changing interfaces. That being said, I like your suggestion of 'renaming at the boundary' where one could possibly disambiguate like names. Independent of versioning, which is how many large software systems guarantee integrity, we need something like namespaces or your 'boundary renaming' to avoid name collisions. Section 4.2.1 in XSchema describes import, include and redefine as mechanisms to allow composition of multiple schemas and is worth reading.
thanks kam ________________________________ From: "ihab.a...@gmail.com" <ihab.a...@gmail.com> To: es-discuss Steen <es-discuss@mozilla.org> Sent: Wed, May 26, 2010 3:39:10 PM Subject: Modules: Name capture Hi folks, As promised, this is the first major issue I wish to raise regarding the Simple Modules strawman. This point was first brought to our collective attention by my colleague, Jasvir Nagra. I'm describing this from scratch here, even though this came up previously on the list, to help anyone who has not been following the previous thread in detail get a handle on the issues. I've divided it into sections to avoid tl;dr for people who are already up to speed. == Motivating factors we agree upon == The Web is large, and full of code. For any given corpus, there may be millions of copies on various people's servers; hundreds of forks on github; and dozens of "officially supported" versions on the Website of the maintainers. This presents a distributed version matching and *naming* problem quite unlike any other we have encountered so far in software development. Software is now created as casually as Web pages. All of us who are interested in the module problem seem to agree (and please correct me if I am mistaken in this or any other collective claim) that even URL string equality is not a good metric of whether the resource found at that URL is "the same": on the one hand, the stuff retrieved via a URL can change from one moment to the next; and on the other hand, distinct URLs may validly refer to bitwise identical code. All of us also seem to agree that we cannot brush this problem under the rug completely: we cannot *completely* relegate it to off-spec "configuration management" or "build process". When one is composing software from various independent sources, this software must sometimes make demands for loading things that the parent loader did not even know about. Yet the parent loader must sometimes provide modules to what is being loaded for convenience and unification. So, if I write an application that loads modules M1 and M2: * I may want to provide a common module jQuery to M1 and M2; but also * In order to do its work, M1 may want to fetch some module Z of which I have never heard. To put it another way, modules on the Web must be able to wire themselves, and compute, over the Web as it is, rather than over the small set of software that happens to be "installed" in (say) the "/usr/lib" directory of the local system. In Python, I can simply "import smtplib". On the Web, the question is, "which one"? == Current Simple Modules solution == The Simple Modules strawman shows a rather clever solution to this problem. It distributes the mappings -- from agreed-upon names to gnarly URLs -- across the codebase in a fine-grained fashion. At the most basic level, I can map the name "jQuery" to my chosen copy of the jQuery library, and use it as below. Nested modules can choose their own versions of (say) YUI, and use them as well. module jQuery = load "http://.../jquery-1.3.2.js"; module Foo { module YUI = load "http://.../yui-3.1.1.js"; module Bar { import jQuery.ajax; import YUI.Accordion; ajax(...); Accordion(...); } } This also works if I know that some library simply uses the name "jQuery", and I map that to my own chosen copy. In the following example: // somelib.js import jQuery.ajax; ajax(...); // My code module jQuery = load "http://.../jquery-1.3.2.js"; module Somelib = load "http://.../somelib.js"; the code in "somelib.js" will pick up the jQuery that I have defined prior to the "load". The clever part of this is, again, the decentralized name assignments. == Pitfalls in the Simple Modules solution == The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture (though it does clearly separate other names such as "var", "const" and "function" declarations). An example is: // zero.js: module jQuery = load 'jquery.js'; module Drawing = load 'footgun.js'; module One = load 'one.js'; // one.js: import jQuery.ajax; module Two = load 'two.js'; // two.js: import jQuery.ajax; At the time that "one.js" was written, "two.js" did not contain a reference to Drawing. Now, unbeknownst to the author of "one.js", "two.js" changed and now refers to something named Drawing, which it expected to draw a picture: // two.js: import jQuery.ajax; import Drawing.draw; draw(); // intended to draw a picture Unfortunately, it is now actually using the gun library, and thus breaks the correctness of the system. The key things to note here are: 1. Under *mutation* of the code of "two.js" on the Web, which we agree will happen (see above regarding mistrust of URL equality), the author of "one.js" has no defense against this accidental capture of the name "Drawing". The author cannot prevent this capture by controlling the environment of "two.js", nor can they foresee all possible environments such as "one.js" in which they may be embedded. 2. The above is an easily seen (not to mention crippling...) manifestation of this problem. More generally, in a large system with many independently named versions of the same library floating around, more subtle captures, with harder-to-debug problems, may occur. 3. Since we agree we cannot control "two.js" in this scenario, our choice here is whether to fail subtly or to fail fast. I claim we must fail fast. There are some fail-fast solutions to this problem that come to mind immediately. I encourage us to brainstorm others as well. == Solution 1: Explicit sub-module environment == Whenever there is a transition from one compilation unit to the other -- i.e., across any "load" -- we should explicitly specify the imports the module is allowed to inherit. No other imports may cross the "load" boundary. So, we would rewrite the example above as: // zero.js: module jQuery = load 'jquery.js'; module Drawing = load 'footgun.js'; module One = load 'one.js' with {Drawing, jQuery}; // one.js: import jQuery.ajax; module Two = load 'two.js' with {jQuery}; // two.js: import jQuery.ajax; import Drawing.draw; // => static error!!! draw(); Clearly, we could also support renaming at the boundary, like this: // zero.js: module JQ = load 'jquery.js'; module Gun = load 'footgun.js'; module One = load 'one.js' with {Drawing: Gun, jQuery: JQ}; == Solution 2: Catalog per loader == A module loader represents a community of mutually independent module instances which together form a coherent subsystem. If there is to be any sharing of module instances, and absent URL equality, these modules must be written with some knowledge of the "community" in which they live. Perhaps a good way to understand this is to make the analogy with Linux distros. A distro effectively maps memorable, agreed-upon names like "libmpeg2" to concrete software resources. There is nothing keeping a dozen programmers in the wild from building a dozen different things and calling them "libmpeg2" -- but, within the Linux community, and specifically within a distro, there is only one "libmpeg2". Programs declare their dependency on it, often with a version identifier. If we view modules in a loader in the same way, then each loader should contain a *single* mapping -- call it a catalog -- from memorable names to concrete resources (perhaps identified by URL). Each catalog is an implicit community, or agreement point: * The catalog of "biology software" hosted at http://biojavascript.org/catalog.json * The catalog of "canvas widgets" hosted at http://cwidgets.org/cat.json * A meta-catalog of "general stuff" hosted at http://allmyjs.org/root.json Perhaps there are distinct versions: http://allmyjs.org/distros/distro3.1.8.json http://allmyjs.org/distros/distro4.2.9.json All modules in a loader would be expected to use the same catalog. Catalogs could refer to one another, and so reuse one anothers' bindings. This would allow all programmers to predict the effect of using a new name. In the example of the modified "two.js" previously, the statement: import Drawing.draw; would work but, by referring to the same catalog, the authors of all three of "one.js", "two.js" and "three.js" would have agreed that the symbol Drawing stands for something that operates a footgun and treat it with due care. The catalog would contain an entry like: { Drawing: "http://.../footgun.js", ... } That all said, I don't yet have a good solution for how a module would declare that it is defined relative to a specific catalog. It's not enough to add an extra argument -- the catalog -- to the constructor of a module loader. The individual modules would have to specify their dependency on the catalog. This requires extra syntax and machinery. I am loath to add either. == Solution 3: Forming a more generative Union == Going back to basics, let's ask ourselves: Why is this such a problem in the first place? In other words, why is it so important that "one.js" and "two.js" use the *same* jQuery and Drawing modules? There are a variety of answers, including: * Performance optimization: The code can be shared and memory saved. * Shared state: Each module contains important shared state (such as cached information about the DOM, or off-screen bitmaps used to double-buffer a <canvas> widget module) which its clients need to share in order to properly collaborate according to the module's rules. * Programmer familiarity: Programmers are accustomed to dividing up their program into sections, each of which is logically a singleton in "the system", and establishing communication between them. What if we assumed that the result of "importing" a module is just the code of the module, ready to be instantiated with external state? In low-level terms, a module represents a "code segment" in memory which may be shared between its instantiations. The effect of this choice is that dependencies between software are written using direct object passing, rather than attempts to denote the "same" module. Each module specifies as free variables the objects it requires from its caller and, when it loads another module, it does so with no expectation that what it gets is shared with anyone else. So "two.js" could start out like: // two.js /** @require jQuery a jQuery instance */ jQuery.ajax(); and could move to: // two.js /** @require jQuery a jQuery instance */ module Drawing = load "http://.../picturesOfCats.js" with { jQuery: jQuery }; jQuery.ajax(); Drawing.draw(); The difference between this and Solution 1 (and the original Simple Modules strawman) is that there are no promises made that are not explicitly wired. With Solution 1, there is still lack of clarity about what a "singleton" module represents -- depending on how modules are redefined down the chain of loadings, some singletons are more single than others. With this solution, only object APIs may define the behavior expected, and nothing is naturally expected to be a singleton. To put it another way, if the original Simple Modules and Solution 1 are taken to their logical conclusion, one must assume, because modules may be redefined along the loading chain, that nothing is *really* a singleton. If one is to code defensively against this situation anyway, why not make this the default and gain the attendant simplicity? Let's revisit the points brought up earlier: * Performance optimization: This is up to the implementation. For example, a perfectly reasonable implementation can do a HEAD request for every URL loaded and, if it detects no change, reuse the code. * Shared state: Shared state is now represented explicitly using objects. * Programmer familiarity: It's really not that bad. :) Programs continue to be written with free variables, just as <script>s are. Programmers learn to introduce concrete objects into the lexical scope of their programs. And they write "export" statements to export variables back. To optimize this a bit, it is possible to introduce a concept of "packages" (under active debate in CommonJS at the moment) to gather things up. This improves, but does not intrinsically modify, the model presented here. == Afterword == Some of the solutions I present here are similar to proposals I have already made. In all sincerity, I cannot help that. But, if there are other solutions extant, I assure you that my brain is open. :) Ihab -- Ihab A.B. Awad, Palo Alto, CA _______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss