Fantastic SATE reply from Steven M. Christey:
I participated in SATE 2008 and SATE 2009, much more actively in the
2008 effort. I'm not completely sure of the 2009 results and final
publication, as I've been otherwise occupied lately :-/ Looks like a
final report has been delayed till June (the SATE 2008 report didn't
get published till July 2009).
For SATE 2008, we did not release final results because the human
analysis itself had too many false positives - so sometimes we claimed
a false positive when, in fact, the issue was a true positive. Given
this and other data-quality problems (e.g. we only covered ~12% of the
more than 49,000 items), we believed that to release the raw data
would make it way too easy for people to make completely wrong
conclusions about the tools.
The problems that the data would have revealed is:
1) false positive rates from these tools are overwhelming
As covered extensively in the 2008 SATE report (see my section for
example), there is no clear definition of "false positive" especially
when it comes to proving that a specific finding is a vulnerability.
For example: suppose you have a report in a function of a buffer
overflow. To prove the finding is a vulnerability, you have to dig
back through all the data flow, sometimes going 20 levels deep. This
is not feasible for a human evaluator to determine if there's really a
vulnerability. Or, maybe the overflow happens when you're reading a
configuration file that's only under the control of the
administrator. These could be regarded as false positives. However,
the finding may be "locally true" - i.e. the function itself might not
do any validation at all, so *if* it's called incorrectly, an overflow
will occur. My suspicion is that a lot of the "false positives"
people complain about are actually "locally true." And, as we saw in
SATE 2008 (and 2009 I suspect), sometimes the human evaluator is
actually wrong, and the finding is correct. Hopefully we'll account
for "locally true" in the design of SATE 2010.
2) the work load to triage results from ONE of these tools were
man-years
This was also covered (albeit estimated) in the 2008 SATE report, both
the original section and my section.
3) by every possible measurement, manual review was more cost effective
There was no consideration of cost in this sense.
One lost opportunity for SATE 2008, however, was in comparing the
results from the manual-review participants (e.g. Aspect) versus the
tools in terms of what kinds of problems got reported. (This also had
major implications for how to count number of results). I believe
that such a focused effort would have shown some differences in what
got reported. At least, that's in the raw data since it shows who
claimed what got found.
While the SATE 2008 report is quite long mostly thanks to my excessive
verbiage, I believe people who read that document will see that SATE
has been steadily improving its design over the years. The reality is
that any study of this type is going to suffer from limited manpower
in evaluating the results.
http://samate.nist.gov/docs/NIST_Special_Publication_500-279.pdf
The coverage was limited ONLY to injection and data flow problems
that tools have a chance of finding. In fact, the NIST team chose
only a small percentage of the automated findings to review, since it
would have taken years to review everything due to the massive number
of false positives. Get the problem here?
While there were focused efforts in various types of issues, there was
also random sampling to get some exposure the wide range of problems
being reported by the tools. Your critique of SATE with respect to
its focus on tools versus manual methods is understandable, but SATE
(and its parent SAMATE project) are really about understanding tools,
so this focus should not be a surprise. After all, the first three
letters of SATE expand to "Static Analysis Tool."
- Steve
_______________________________________________
Secure Coding mailing list (SC-L) SC-L@securecoding.org
List information, subscriptions, etc - http://krvw.com/mailman/listinfo/sc-l
List charter available at - http://www.securecoding.org/list/charter.php
SC-L is hosted and moderated by KRvW Associates, LLC (http://www.KRvW.com)
as a free, non-commercial service to the software security community.
Follow KRvW Associates on Twitter at: http://twitter.com/KRvW_Associates
_______________________________________________