Hi,

This emails echoes the dev.platform email, and it contains a proposal for adding a JavaScript API for implementing Dynamic Analysis on top of SpiderMonkey as opposed to implementing each of them inside the JavaScript engine.

1. Motivation

So far, we received 3 different proposals for adding coarse/fine grain implementations of taint analysis in the JavaScript engine. (I cannot name all of them publicly yet)

Accepting any of the taint analysis proposals has a price, either this is a maintenance cost, as these implementations are entangled in many parts of the JavaScript engine, or/and these implementation suffer from a overhead even when they are disabled.

Assuming that we were to accept any, we still have to deal with the acceptable trade-off between performances and the ability to check, while this choice should be done by the persons who are running the analysis, and not by the JavaScript Engine developers.

On the other hand, some tools external, such as Jalangi [1,2,3], are able to instrument web pages which are running in the browser, and run dynamic analysis on JavaScript programs.

Sadly, Jalangi does not answer all our needs yet, because: It does not support SpiderMonkey extensions which are used in Gecko; It modifies the current global to add the analysis framework; It emulates JavaScript operators & language (~x26 slowdown [4] while recording); It does not work on Firefox OS devices.

What Jalangi teach us is that having a dynamic analysis framework is quite capable to do analysis such as recording & replay, tracing NaN, taint analysis, and doing some code coverage.

Firefox OS / Gecko could make use of such API to implement a proper & correct way of doing Code Coverage. We can also see that code coverage being used as a metric on tbpl.

Security teams can use such API for tainting analysis. We can see multiple applications, such as using it under fuzzing to find potential code injections (.innerHTML, document.write, …) from untrusted sources or sending SMS / Emails with untrusted data on your behalf.

Dev-tools teams can use such API either to expose it to web developers, and/or to implement Debugger features, such as: finding the last assignment(s) to a property (I wished I had such feature in gdb); tracing where NaN / null / undefined are produced (for game developers).

In order to make the right design choices, we need to know what would be expected from such API:
 - Do we want to use it on web pages / Firefox OS apps / Gecko's JavaScript?
 - Do we want to use it during the start-up of Firefox / Firefox OS?
- What speed overhead is acceptable? (for Record & replay, code coverage, simple taint analysis, tracing NaN, …)
 - What memory overhead is acceptable?
 - Can we risk changing the semantics of the analyzed code?
 - Should this API cover JavaScript features used in Gecko / Firefox OS?
 - Can we rely on Source-to-Source transformation for Gecko's code?

[1] https://www.eecs.berkeley.edu/~gongliang13/jalangi_ff/
[2] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses
[3] https://air.mozilla.org/test-and-cure-your-javascript-blues-with-jalangi/
[4] http://srl.cs.berkeley.edu/~ksen/papers/jalangi.pdf
[5] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses

2. Dynamic Analysis API

In terms of implementation, we could replicate/embed Jalangi but If we decide to embed Jalangi, we might have to fix the following issues:

Operators are emulated by Jalangi:
- Analysis have to be synchronous, even if they can record (~x26 slowdown) and replay the recorded analysis to check other analysis result.
 - The analyzed code environment is polluted and Stack traces are not correct.
      { valueOf: function () { throw new Error(); } }
 - All operators are in one function which is likely mega-morphic.

Each analysis has to to add its boxing and unboxing logic. Even if we can verify that the core of Jalangi is safe and behave as specified, the boxing and unboxing logic might be buggy, which does not serve the purpose of the analysis. So developers of such analysis have no safe guards.

Also, using a source-to-source system, implies that we have to restrict our-self to only analyze the intersection of SpiderMonkey and Jalangi (acorn.js [6]) which have identical semantic. This might be problematic for new SpiderMoney features, or for not-yet standardized / not yet compatible features.

Personally, I think that these issues implies that we should avoid relying on a source-to-source mapping if we want to provide meaningful security results. We could replicate the same or a similar API in SpiderMonkey, and even make one compatible with Jalangi analysis.

If we add opcodes dedicated to monitor values (at the bytecode emitter level), instead of doing source-to-source transformation. One of the advantage would be that frontend developers would not have to maintain Jalangi sources when we are adding new features in SpiderMonkey, and more over, the bytecode emitter already breakdown everything to opcodes, which are easier to wrap than the source.

Analysis are usually made to observe the execution of a code, and not to mutate it. So if we only monitor the execution, instead of emulating it, we might be able to batch analysis calls. Doing batches asynchronously implies that the overhead of running an analysis is minimal while the analyzed code is running.

On an orthogonal aspect, we could isolate the analysis code from the analyzed code by making a separated compartment for the analysis. This would provide any boxing and unboxing feature as a safe guard, but this would be extremely expensive in terms of speed (without a batching system), and in terms of memory (without executing batches before GCs). In addition to provide safe guards for persons making analysis, it avoid the pitfall of the mega-morphic calls.

Separating the analysis from the code being analyzed provide an additional advantage which is that we know what the analysis might be looking for. This implies that we could only trace values which are being watched by the analysis, and thus avoid useless overhead.

[6] http://marijnhaverbeke.nl/acorn/

--
Nicolas B. Pierron
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to