Ivan Alagenchev and Mark Goodwin asked me to take a look at their project to bring DOMinator, a taint analysis for SpiderMonkey, up to date, and help them come up with a path to having it integrated into SpiderMonkey. Taint analyses are valuable in auditing web security - looking for XSS vulnerabilities, for example. The implementation has a long way to go, on several fronts, but it's a really cool tool, and it did seem to me that there was a path to making it production-quality. I'm posting my thoughts here, so that interested SpiderMonkey folks can comment if they like.

The taint analysis applies to strings only, and has four parts:

 * It identifies certain *sources* of strings as "tainted":
   document.URL, input fields, and so on.
 * The JavaScript engine propagates taint information on strings.
   Taking a substring of a tainted string, or concatenating a tainted
   string, yields a tainted string. Regexp operations, charAt, and so
   on all propagate taint information. And so on.
 * It identifies certain *sinks* as vulnerable: eval, 'src' attributes
   on script elements, and so on.
 * Finally, the tool's user interface logs the appearance of tainted
   strings at vulnerable sinks. The taint metadata actually records the
   provenance of each region of a tainted string, so the tool can
   explain exactly why the final string is tainted, which is really
   helpful in constructing XSS attacks.

Here's a video (no sound; amazing sub-title technique) that demonstrates DOMinator. It's a little hard to follow, but he finds an XSS vulnerability on www.aol.com easily:

http://www.youtube.com/watch?feature=player_embedded&v=f_It469LUFM

tl;dr: The present implementation has many issues, but with enough work, I think it could be maintainable, efficient, and valuable.


Details:

I looked at the code in the github repo at https://github.com/alagenchev/spider_monkey. It seems to me that the following issues need to be addressed:

 * The way the taint metadata is stored needs to change. A linked list
   isn't appropriate for operating at scale.

   We should steal a flag bit from JSString::d::lengthAndFlags (as we
   do now), but use that bit to mean that the string has an entry in a
   per-runtime taint metadata HashMap, indexed by string address.
   js::gc::MarkChildren(JSTracer *, JSString *) should check that bit,
   and mark any strings hash table entry refers to. JSString::finalize
   should check that bit, and clean up the hash table entry. That way,
   the hash table entries live exactly as long as their strings do, so
   it's easy to understand their performance impact and check for
   correctness.

 * Atoms present a challenge to any taint analysis. SpiderMonkey's
   existing atomization optimization relies on the assumption that
   strings with the same text are interchangable - but that assumption
   does not hold if some strings with a given text are tainted and
   others aren't. The whole point of atoms is to let object identity
   stand in for textual equality. I don't think we should try to change
   that; atoms should not be taintable.

   Fortunately, atoms are not actually stored in very many places (as
   far as I can tell). String literals are atoms, but that doesn't
   concern us, as string literals are never tainted. Property names,
   stored in js::Shape::propid_, are atoms; but for that use, we could
   store the taint metadata (or a pointer to such) in Shape directly,
   alongside the atom, and let propid_ continue to work as it does now.
   Operations that retrieve property names from objects, like for-in
   and Object.getOwnPropertyNames, would need to check for the presence
   of taint metadata, and return a properly tainted string, instead of
   the real atom.

   There may be other places atoms get used that I'm forgetting. But
   taint analyses don't need to be perfect to be helpful; if we lose
   taint from time to time, that's fine. The web pages we're analyzing
   aren't hostile code, so they won't wiggle and squirm to avoid
   propagating taint. So as long as we handle the common paths, the
   tool should still be useful.

   As far as the present implementation is concerned, it seems like
   js_AtomizeString in the github repo now has a taint leak. If we
   create an atom from a tainted string, then we taint the atom;
   looking up the atom for an untainted string with the same text will
   yield a tainted atom, when it should not.

 * The patch adds functions on 'String', and an accessor on
   'String.prototype'. We can't really add methods directly to String,
   for the sake of web compatibility. Rather, we should handle taint
   the way we handle Debugger. See JS_DefineDebuggerObject, in
   js/src/vm/Debugger.cpp, and look at its uses. Or, similarly, look at
   JS_InitReflect. Those functions define 'Debugger' or 'Reflect' on
   globals when requested; perhaps you could define a 'Taint' object,
   with functions like 'Taint.getTaintInfo(string)' and so on.

 * The original author of DOMinator ignored SpiderMonkey coding style:
   indentation, spacing, naming, and so on. That's easy but
   time-consuming to fix, but it's got to be done. The SM style is
   described here:
   https://wiki.mozilla.org/JavaScript:SpiderMonkey:C_Coding_Style

   Current code tends to avoid preprocessor macros and conditionals, in
   favor of little utility classes with inline method definitions. I
   think the TAINT_CONDITION macros could be replaced by utility
   classes with inline methods; these methods could have different
   definitions when building with taint analysis enabled than disabled,
   such that, when taint is disabled, the compiler can (say) see that
   some methods always return false, and remove the taint-propagation
   code, without the use of preprocessor conditionals (other than in
   the definition of the class itself).

I hope this is helpful. It certainly would be nice to hear what other SpiderMonkey developers think about it.

_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to