[JS-internals] Dynamic Analysis API discussion

Nicolas B. Pierron Wed, 25 Jun 2014 08:50:24 -0700

Hi,

This emails echoes the dev.platform email, and it contains a proposal foradding a JavaScript API for implementing Dynamic Analysis on top ofSpiderMonkey as opposed to implementing each of them inside the JavaScriptengine.


1. Motivation

So far, we received 3 different proposals for adding coarse/fine grainimplementations of taint analysis in the JavaScript engine. (I cannot nameall of them publicly yet)

Accepting any of the taint analysis proposals has a price, either this is amaintenance cost, as these implementations are entangled in many parts ofthe JavaScript engine, or/and these implementation suffer from a overheadeven when they are disabled.

Assuming that we were to accept any, we still have to deal with theacceptable trade-off between performances and the ability to check, whilethis choice should be done by the persons who are running the analysis, andnot by the JavaScript Engine developers.

On the other hand, some tools external, such as Jalangi [1,2,3], are able toinstrument web pages which are running in the browser, and run dynamicanalysis on JavaScript programs.

Sadly, Jalangi does not answer all our needs yet, because: It does notsupport SpiderMonkey extensions which are used in Gecko; It modifies thecurrent global to add the analysis framework; It emulates JavaScriptoperators & language (~x26 slowdown [4] while recording); It does not workon Firefox OS devices.

What Jalangi teach us is that having a dynamic analysis framework is quitecapable to do analysis such as recording & replay, tracing NaN, taintanalysis, and doing some code coverage.

Firefox OS / Gecko could make use of such API to implement a proper &correct way of doing Code Coverage. We can also see that code coverage beingused as a metric on tbpl.

Security teams can use such API for tainting analysis. We can see multipleapplications, such as using it under fuzzing to find potential codeinjections (.innerHTML, document.write, …) from untrusted sources or sendingSMS / Emails with untrusted data on your behalf.

Dev-tools teams can use such API either to expose it to web developers,and/or to implement Debugger features, such as: finding the lastassignment(s) to a property (I wished I had such feature in gdb); tracingwhere NaN / null / undefined are produced (for game developers).

In order to make the right design choices, we need to know what would beexpected from such API:

 - Do we want to use it on web pages / Firefox OS apps / Gecko's JavaScript?
 - Do we want to use it during the start-up of Firefox / Firefox OS?

- What speed overhead is acceptable? (for Record & replay, code coverage,simple taint analysis, tracing NaN, …)

 - What memory overhead is acceptable?
 - Can we risk changing the semantics of the analyzed code?
 - Should this API cover JavaScript features used in Gecko / Firefox OS?
 - Can we rely on Source-to-Source transformation for Gecko's code?

[1] https://www.eecs.berkeley.edu/~gongliang13/jalangi_ff/
[2] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses
[3] https://air.mozilla.org/test-and-cure-your-javascript-blues-with-jalangi/
[4] http://srl.cs.berkeley.edu/~ksen/papers/jalangi.pdf
[5] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses

2. Dynamic Analysis API

In terms of implementation, we could replicate/embed Jalangi but If wedecide to embed Jalangi, we might have to fix the following issues:


Operators are emulated by Jalangi:

- Analysis have to be synchronous, even if they can record (~x26 slowdown)and replay the recorded analysis to check other analysis result.

 - The analyzed code environment is polluted and Stack traces are not correct.
      { valueOf: function () { throw new Error(); } }
 - All operators are in one function which is likely mega-morphic.

Each analysis has to to add its boxing and unboxing logic. Even if we canverify that the core of Jalangi is safe and behave as specified, the boxingand unboxing logic might be buggy, which does not serve the purpose of theanalysis. So developers of such analysis have no safe guards.

Also, using a source-to-source system, implies that we have to restrictour-self to only analyze the intersection of SpiderMonkey and Jalangi(acorn.js [6]) which have identical semantic. This might be problematic fornew SpiderMoney features, or for not-yet standardized / not yet compatiblefeatures.

Personally, I think that these issues implies that we should avoid relyingon a source-to-source mapping if we want to provide meaningful securityresults. We could replicate the same or a similar API in SpiderMonkey, andeven make one compatible with Jalangi analysis.

If we add opcodes dedicated to monitor values (at the bytecode emitterlevel), instead of doing source-to-source transformation. One of theadvantage would be that frontend developers would not have to maintainJalangi sources when we are adding new features in SpiderMonkey, and moreover, the bytecode emitter already breakdown everything to opcodes, whichare easier to wrap than the source.

Analysis are usually made to observe the execution of a code, and not tomutate it. So if we only monitor the execution, instead of emulating it, wemight be able to batch analysis calls. Doing batches asynchronously impliesthat the overhead of running an analysis is minimal while the analyzed codeis running.

On an orthogonal aspect, we could isolate the analysis code from theanalyzed code by making a separated compartment for the analysis. Thiswould provide any boxing and unboxing feature as a safe guard, but thiswould be extremely expensive in terms of speed (without a batching system),and in terms of memory (without executing batches before GCs). In additionto provide safe guards for persons making analysis, it avoid the pitfall ofthe mega-morphic calls.

Separating the analysis from the code being analyzed provide an additionaladvantage which is that we know what the analysis might be looking for.This implies that we could only trace values which are being watched by theanalysis, and thus avoid useless overhead.


[6] http://marijnhaverbeke.nl/acorn/

--
Nicolas B. Pierron
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

[JS-internals] Dynamic Analysis API discussion

Reply via email to