Hi,
This emails echoes the dev.platform email, and it contains a proposal for
adding a JavaScript API for implementing Dynamic Analysis on top of
SpiderMonkey as opposed to implementing each of them inside the JavaScript
engine.
1. Motivation
So far, we received 3 different proposals for adding coarse/fine grain
implementations of taint analysis in the JavaScript engine. (I cannot name
all of them publicly yet)
Accepting any of the taint analysis proposals has a price, either this is a
maintenance cost, as these implementations are entangled in many parts of
the JavaScript engine, or/and these implementation suffer from a overhead
even when they are disabled.
Assuming that we were to accept any, we still have to deal with the
acceptable trade-off between performances and the ability to check, while
this choice should be done by the persons who are running the analysis, and
not by the JavaScript Engine developers.
On the other hand, some tools external, such as Jalangi [1,2,3], are able to
instrument web pages which are running in the browser, and run dynamic
analysis on JavaScript programs.
Sadly, Jalangi does not answer all our needs yet, because: It does not
support SpiderMonkey extensions which are used in Gecko; It modifies the
current global to add the analysis framework; It emulates JavaScript
operators & language (~x26 slowdown [4] while recording); It does not work
on Firefox OS devices.
What Jalangi teach us is that having a dynamic analysis framework is quite
capable to do analysis such as recording & replay, tracing NaN, taint
analysis, and doing some code coverage.
Firefox OS / Gecko could make use of such API to implement a proper &
correct way of doing Code Coverage. We can also see that code coverage being
used as a metric on tbpl.
Security teams can use such API for tainting analysis. We can see multiple
applications, such as using it under fuzzing to find potential code
injections (.innerHTML, document.write, …) from untrusted sources or sending
SMS / Emails with untrusted data on your behalf.
Dev-tools teams can use such API either to expose it to web developers,
and/or to implement Debugger features, such as: finding the last
assignment(s) to a property (I wished I had such feature in gdb); tracing
where NaN / null / undefined are produced (for game developers).
In order to make the right design choices, we need to know what would be
expected from such API:
- Do we want to use it on web pages / Firefox OS apps / Gecko's JavaScript?
- Do we want to use it during the start-up of Firefox / Firefox OS?
- What speed overhead is acceptable? (for Record & replay, code coverage,
simple taint analysis, tracing NaN, …)
- What memory overhead is acceptable?
- Can we risk changing the semantics of the analyzed code?
- Should this API cover JavaScript features used in Gecko / Firefox OS?
- Can we rely on Source-to-Source transformation for Gecko's code?
[1] https://www.eecs.berkeley.edu/~gongliang13/jalangi_ff/
[2] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses
[3] https://air.mozilla.org/test-and-cure-your-javascript-blues-with-jalangi/
[4] http://srl.cs.berkeley.edu/~ksen/papers/jalangi.pdf
[5] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses
2. Dynamic Analysis API
In terms of implementation, we could replicate/embed Jalangi but If we
decide to embed Jalangi, we might have to fix the following issues:
Operators are emulated by Jalangi:
- Analysis have to be synchronous, even if they can record (~x26 slowdown)
and replay the recorded analysis to check other analysis result.
- The analyzed code environment is polluted and Stack traces are not correct.
{ valueOf: function () { throw new Error(); } }
- All operators are in one function which is likely mega-morphic.
Each analysis has to to add its boxing and unboxing logic. Even if we can
verify that the core of Jalangi is safe and behave as specified, the boxing
and unboxing logic might be buggy, which does not serve the purpose of the
analysis. So developers of such analysis have no safe guards.
Also, using a source-to-source system, implies that we have to restrict
our-self to only analyze the intersection of SpiderMonkey and Jalangi
(acorn.js [6]) which have identical semantic. This might be problematic for
new SpiderMoney features, or for not-yet standardized / not yet compatible
features.
Personally, I think that these issues implies that we should avoid relying
on a source-to-source mapping if we want to provide meaningful security
results. We could replicate the same or a similar API in SpiderMonkey, and
even make one compatible with Jalangi analysis.
If we add opcodes dedicated to monitor values (at the bytecode emitter
level), instead of doing source-to-source transformation. One of the
advantage would be that frontend developers would not have to maintain
Jalangi sources when we are adding new features in SpiderMonkey, and more
over, the bytecode emitter already breakdown everything to opcodes, which
are easier to wrap than the source.
Analysis are usually made to observe the execution of a code, and not to
mutate it. So if we only monitor the execution, instead of emulating it, we
might be able to batch analysis calls. Doing batches asynchronously implies
that the overhead of running an analysis is minimal while the analyzed code
is running.
On an orthogonal aspect, we could isolate the analysis code from the
analyzed code by making a separated compartment for the analysis. This
would provide any boxing and unboxing feature as a safe guard, but this
would be extremely expensive in terms of speed (without a batching system),
and in terms of memory (without executing batches before GCs). In addition
to provide safe guards for persons making analysis, it avoid the pitfall of
the mega-morphic calls.
Separating the analysis from the code being analyzed provide an additional
advantage which is that we know what the analysis might be looking for.
This implies that we could only trace values which are being watched by the
analysis, and thus avoid useless overhead.
[6] http://marijnhaverbeke.nl/acorn/
--
Nicolas B. Pierron
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals