Re: [JS-internals] Dynamic Analysis API discussion

Nicolas B. Pierron Thu, 26 Jun 2014 07:01:21 -0700

On 06/26/2014 04:50 AM, Robert O'Callahan wrote:

Your email is unclear as to whether you're proposing integrating some
particular analysis engine or framework into Spidermonkey (or more than
one), or just some minimal set of hooks to enable others to supply such
engines/frameworks. I'm going to assume the latter since I think the former
makes no sense at all.

Yes, the idea I have in mind is to have some-kind of self-hosted compartmentdedicated to analysis where if a function named "xyz" is declared on theglobal, then it can be used preferably asynchronously (as we might not wantto pay the cross-compartment call), or synchronously (waiting for the day weinline cross-compartment calls in ICs / code), or maybe both.

In terms of hooks, an API enabling arbitrary program transformation has big
advantages as a basis for implementing dynamic analyses, compared to other
kinds of API:
1) Maximum flexibility for tool builders. You can do essentially anything
you want with the program execution.
2) Simple interface to the underlying VM. So it's easy to maintain as the
VM evolves. And, importantly, minimal work for Mozilla.

Except if Mozilla is maintaining these tools as we want to rely on these.For example the Security team wants to rely on some taint analysis or evenother simple analysis for checking if events have been validated beforebeing processed.

3) Potentially very low overhead, because instrumentation code can be
inlined into application code by the JIT.

I have a question for you, and also for people who have made such analysisin SpiderMonkey. Why taking all the pain of integrating such analysis inSpiderMonkey's code, which is hard and change frequently when it would beeasy (based on what you mention) to just do source-to-source transformation?

Why do we have 3 propositions of implementing taint analysis in SpiderMonkeyso far? It sounds to me that there is something which is not easilyaccessible from source-to-source transformation, which might be easier toget hooked once you are deep inside the engine.

I spent a few years writing dynamic analysis tools for Java, and they all
used bytecode transformation for all these reasons.

I understand your argument saying that we should support transformations ona support which is standardized. Maybe this is just a matter of naming theAPI properly such as the analysis feel like that are being hooked on theSpecs definitions of JavaScript.

You identified some disadvantages:
1) It may be difficult to keep the language support of code transformation
tools in sync with Spidermonkey.
2) Code transformation tools may introduce application bugs (e.g. by
polluting the global or due to a bug in translation).
3) Transformed code may incur unacceptable slowdown (e.g. due to ubiquitous
boxing).
(Did I miss anything?)

Source-to-source implies that analysis developers have to know about the JSimplementation, and JS syntax. While such work belongs to the JavaScriptengine developers.

The goal of such API is to balance the work to where is the knowledge, and Ido not expect analysis developers to understand all the subtle details ofJavaScript. (cf Jalangi issues) On the other hand, I do not expectJavaScript developers to maintain any kind of analysis integrated in the JSengine (except for optimization purposes).

Having a Dynamic Analysis API, is just a way in the middle to let peopledeal with the problem they know.

I think #2 really only matters for people who want to deploy dynamic
analysis in customer-facing production systems, and I don't think that will
be important anytime soon.

On the contrary, I think/hope we could have trivial taint analysis tomonitor privacy, in a similar way as Lightbeam (Collusion) is doing.

#1 doesn't seem like a big problem to me. Extending a JS parser is not that
hard.


Extending a JS parser, maybe.  Extending 2 JS parser the same way, is harder.

New language features with complex semantics require significant tool
updates whatever API we use.

No as much as the syntax, the bytecode is an example of it, as the bytecodeis some-kind of subset that we target with the bytecode emitter. As youmentionned, manipulating bytecode is easy, but manipulating the source toensure that we have the same semantic might be more complex.


A trivial example is the deconstruction syntax:

  var [a, b] = a;

Where do you hook the getters? Or do you have to understand it to translateit to:


  var a = $.arrayGet(c, 0);
  var b = $.arrayGet(c, 1);

And I do not have to go far to see that is this already done by the parser,and that the parser handle the name clashes for us (what if instead of "c",this was "a"?). Do we want every analysis developer do the same mistake, orjust provide them with an API as Jalangi does.

If we're using these tools ourselves, we'd
have to update the tools sometime between landing the feature in
Spidermonkey and starting to use it in FirefoxOS or elsewhere where we're
depending on analysis.

Like everything else, but there is more chance to break something which relyon source-to-source transformation than something which relies on a lowerlevel (ECMA based?) API.

#3 is interesting and perhaps where lessons learned from Java and other
contexts do not apply. I think we should dig into specific tool examples
for this; maybe some combination of more intelligent translation and
judicious API extensions can solve the problems.

Nicolas B. Pierron wrote:

Personally, I think that these issues implies that we should avoid relying
on a source-to-source mapping if we want to provide meaningful security
results. We could replicate the same or a similar API in SpiderMonkey, and
even make one compatible with Jalangi analysis.

It's not clear what you mean by "the same or a similar API" here.

I mean that I want such API to be a JavaScript API. I do not want us toprovide function for adding hooks. I want the JS engine to provide onefunction for registering all the hooks you want in a separated compartment.


  var a = newAnalysisGlobal();
  a.eval("load('my-analysis.js')");

  var g = newGlobal({analysis = a});

  // Generate bytecode probes based on function currently present on the
  // analysis global.
  g.eval("…");

We can either inspire our-self from Jalangi interface for making analysis,or just bridge the two with a wrapper. Such analysis should be implementedin JavaScript and not any other language as our primary target areJavaScript developers.

If we add opcodes dedicated to monitor values (at the bytecode emitter
level), instead of doing source-to-source transformation. One of the
advantage would be that frontend developers would not have to maintain
Jalangi sources when we are adding new features in SpiderMonkey, and more
over, the bytecode emitter already breakdown everything to opcodes, which
are easier to wrap than the source.

Analysis are usually made to observe the execution of a code, and not to
mutate it.  So if we only monitor the execution, instead of emulating it, we
might be able to batch analysis calls.  Doing batches asynchronously implies
that the overhead of running an analysis is  minimal while the analyzed code
is running.

Logging and log analysis have their place, but a lot of dynamic analysis
tools rely on efficient synchronous online data processing in
instrumentation code. For example, if you want to count the number of times
a program point is reached, it's much more efficient to increment a global
variable at that program point than to log to a buffer every time that
point is reached, and count log entries offline. For many analyses of
real-world applications, high-volume data logging is neither efficient nor
scalable. Here are a couple of examples of Java tools I worked on where
synchronous online data processing was essential:
-- http://fsl.cs.illinois.edu/images/e/e8/P385-goldsmith.pdf
-- http://web5.cs.columbia.edu/~junfeng/09fa-e6998/papers/hybrid.pdf
So I think injection of synchronously executed instrumentation is essential
for a large class of analyses.

The asynchronism is one suggestion to make recording analysis faster, byavoiding frequent cross-compartment calls. I do not see any issue to havesynchronous request, on the contrary I think it might be interesting tointerrupt the program execution on such request, or even change the programexecution (things that we can only do synchronously) to prevent securityholes / privacy leaks.

On the other, I do think that we should have asychronous analysis first, butonly the use case of potential users can answer this question for us.


--
Nicolas B. Pierron

_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Re: [JS-internals] Dynamic Analysis API discussion

Reply via email to