Some additional points that hopefully are not entirely redundant with what others have already said:
* There is a growing ecosystem of JavaScript parsing and instrumentation toolkits, beyond Jalangi, e.g.: https://github.com/wala/JS_WALA https://github.com/substack/node-falafel The nice thing about supporting a source-to-source API is that it will encourage experimentation with yet more approaches, which might lead to new insights into doing low-overhead instrumentation, what would be appropriate for a lower-level API, etc. * A source-to-source API enables greater portability, e.g., for writing analyses / transformations that work for node.js programs and (at least partially) on other browsers. Even if some analyses support FF-specific features, probably much of the logic would be shareable across runtimes. * As Koushik mentioned on another thread, Michael Pradel has already created a modified version of Firefox that supports S2S instrumentation: https://github.com/Berkeley-Correctness-Group/Jalangi-Berkeley Given that an outside developer could do this without deep expertise on the Firefox JS engine, I imagine the maintenance burden of an S2S API would be fairly low, making it worth doing even in addition to a lower-level API. * While the overhead of Jalangi instrumentation is high, this is not fundamental. Instrumentation customized to a particular client could have much lower overhead. * Regarding a lower-level API, one motivating client might be Event Racer: http://eventracer.org/ Event Racer cannot be built using JS instrumentation alone, as it requires detailed information about DOM and event-loop operations. Right now, it's built upon a modified Webkit (with work on porting to Blink in progress). We had a very preliminary discussion with Servo developers about designing an API upon which Event Racer could be built, but we didn't pursue it further. If you think supporting such a client analysis might be desirable, I can ask the Event Racer developers to chime in with more feedback. Best, Manu On Thursday, June 26, 2014 4:01:38 PM UTC-7, Robert O'Callahan wrote: > On Fri, Jun 27, 2014 at 1:57 AM, Nicolas B. Pierron < > > [email protected]> wrote: > > > > > Yes, the idea I have in mind is to have some-kind of self-hosted > > > compartment dedicated to analysis where if a function named "xyz" is > > > declared on the global, then it can be used preferably asynchronously (as > > > we might not want to pay the cross-compartment call), or synchronously > > > (waiting for the day we inline cross-compartment calls in ICs / code), or > > > maybe both. > > > > > > In terms of hooks, an API enabling arbitrary program transformation has > > >> big > > >> advantages as a basis for implementing dynamic analyses, compared to other > > >> kinds of API: > > >> 1) Maximum flexibility for tool builders. You can do essentially anything > > >> you want with the program execution. > > >> 2) Simple interface to the underlying VM. So it's easy to maintain as the > > >> VM evolves. And, importantly, minimal work for Mozilla. > > >> > > > > > > Except if Mozilla is maintaining these tools as we want to rely on these. > > > For example the Security team wants to rely on some taint analysis or even > > > other simple analysis for checking if events have been validated before > > > being processed. > > > > > > > Yes, but we can collaborate on that with a large group of people --- at > > least, a larger group than the set of people who want to hack on > > Spidermonkey. > > > > > > > > > > 3) Potentially very low overhead, because instrumentation code can be > > >> inlined into application code by the JIT. > > >> > > > > > > I have a question for you, and also for people who have made such analysis > > > in SpiderMonkey. Why taking all the pain of integrating such analysis in > > > SpiderMonkey's code, which is hard and change frequently when it would be > > > easy (based on what you mention) to just do source-to-source transformation? > > > > > > Why do we have 3 propositions of implementing taint analysis in > > > SpiderMonkey so far? It sounds to me that there is something which is not > > > easily accessible from source-to-source transformation, which might be > > > easier to get hooked once you are deep inside the engine. > > > > > > > I don't know. One reasonable guess would be that if you're doing a research > > project and you want to minimize overhead and don't care about > > maintainability, you can't go wrong by modifying the engine directly. > > > > > > > You identified some disadvantages: > > >> 1) It may be difficult to keep the language support of code transformation > > >> tools in sync with Spidermonkey. > > >> 2) Code transformation tools may introduce application bugs (e.g. by > > >> polluting the global or due to a bug in translation). > > >> 3) Transformed code may incur unacceptable slowdown (e.g. due to > > >> ubiquitous > > >> boxing). > > >> (Did I miss anything?) > > >> > > > > > > Source-to-source implies that analysis developers have to know about the > > > JS implementation, and JS syntax. While such work belongs to the > > > JavaScript engine developers. > > > > > > > Analysis developers would be exposed to less JS implementation details by > > working at the source level than by working at the bytecode or some more > > Spidermonkey-internal level. Yes, they would have to have detailed > > knowledge of JS syntax and semantics, but that's OK; people developing > > program analysis frameworks expect to have to know those things :-). > > > > And not every analysis developer will have to know everything. A good > > framework will present higher-level abstractions that make it easy to write > > simple analyses (while still being possible to write deep ones). I trust > > Manu and his friends to write a good framework :-). > > > > > > > I think #2 really only matters for people who want to deploy dynamic > > >> analysis in customer-facing production systems, and I don't think that > > >> will > > >> be important anytime soon. > > >> > > > > > > On the contrary, I think/hope we could have trivial taint analysis to > > > monitor privacy, in a similar way as Lightbeam (Collusion) is doing. > > > > > > > I hesitate to use "trivial" and "taint analysis" in the same sentence, but > > OK. I still think we can leave this up to the developers of the analysis > > framework. They are just as smart as us, trust me :-). > > > > The asynchronism is one suggestion to make recording analysis faster, by > > > avoiding frequent cross-compartment calls. I do not see any issue to have > > > synchronous request, on the contrary I think it might be interesting to > > > interrupt the program execution on such request, or even change the program > > > execution (things that we can only do synchronously) to prevent security > > > holes / privacy leaks. > > > > > > > OK but fast synchronous calls to instrumentation code will very quickly > > become important. It's not clear to me why we can't have instrumentation > > code running in the same compartment. > > > > I echo what Shu said. Standardizing a code format lower-level than JS > > syntax seems like a big maintenance burden for Spidermonkey. Better to have > > a separate front end maintained outside Spidermonkey. These formats have > > different requirements so it makes sense to allow them to evolve > > independently. In practice, I don't think keeping the extra front end up to > > date will be a problem. People are already doing this, e.g. Traceur. > > > > Rob > > -- > > Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni > > le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids teoa > > stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg iyvoeunr, > > 'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? gBoutt uIp > > waanndt wyeonut thoo mken.o w _______________________________________________ dev-tech-js-engine-internals mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

