Re: [go-nuts] Recommendation for Fuzzed Types without Native On-Disk Format

Jesper Louis Andersen Thu, 11 Feb 2021 06:36:55 -0800

On Mon, Feb 8, 2021 at 10:07 PM Matt Proud <matt.pr...@gmail.com> wrote:


> Suppose I am interested in fuzz testing an API using dvyukov/go-fuzz
> <https://github.com/dvyukov/go-fuzz>, and the primary API under test
> accepts relatively complex composite data types itself: func F(*Table)
> error, where the types below are minimally as complex as this:
>
> type Entry struct {
> Name  string
> Date  time.Date
> Child *Entry      // optional
> Data  interface{} // optional and user-defined
> }
>
> type Table struct {
> Attrs   map[string]string
> Entries []*Entry
> }
>
>
Haphazard list of ideas follows. They are not likely to hit the vampire
with the silver bullet directly, but it might ricochet off of a tree
somewhere and hit by accident.

This just screams quickcheck-style testing to me, even though that isn't
what you want. The reason it's strong is that you are guaranteed to
generate an example which is valid for your function, and you can easily
generate millions of them. Fuzz style testing will treat your data as a
black box, and thus trades CPU time for the click-of-a-button approach. You
don't have to specify a generator, and you don't have to think about a
generator. The tricks fuzzers employ are of the one-size-fits-all kind and
will work on all data. But the trade-off is definitely more CPU cycles
burned.

If you have a binary format and you flip a bit, there is a chance your
flipped []byte is valid. If this chance is low, you don't get to test a
lot. If this chance is high, you get more tests, but you have to factor
through a decoder step of some kind. The trick is to pick a format in which
*every* representation of bits is valid in some way, or at least converges
to that point. This will cut down decoding CPU time to 0, and get you to
the point where you are testing the underlying function more. This is also
the approach you take in quickcheck-style generators of this kind: you
generate a "skeleton" of the structure, then fill in missing bits drawn
from another generator. The extreme variant is akin to mmap()'ing the
memory data out to a disk file.

If `F` is simple, then you are going to spend a lot of time in the decoder.
If `F` is complex, then it doesn't matter as the time in `F` will dwarf the
decoding time.

Hybrids are also possible: take a binary decode, then run 100 steps where
you randomly alter data before invoking `F`. Here you use the fuzzer as a
way to come up with the skeleton structure, and then you test variants of
the leaf-data within the structure.

I've done some work with quickchecking in the industry (But in Haskell,
OCaml, and Erlang rather than Go). If there's any folklore trick to hand
out it's this: use descriptive statistics! Plot the data you are generating
and run it through statistical analysis. You need to verify some
assumptions about the data you are testing, and it can inform you if the
fuzzer is able to dig deep into the structure, or if it's being stopped by
most random bit flips being uninteresting. Don't rely on intuition, but
verify through data. This is often where the insight is hidden. Another
folk-lore trick is to simplify: you don't have to test the full "width" of
the underlying `F` from the get go. We can maybe just assume `Entry.Child`
is always the `nil` value, `Entry.Data` is empty and `Table.Entries` are
all non-nil. If we can establish that *this* passes the test, we can expand
later.

As a simple example: it turns out that in many cases integers of the form
pow(2, n) +/- k are more interesting to test than just a uniform
distribution of integers. Fuzzers can spend a lot of time here, whereas a
quickcheck style generator can generate those integers 5% of the time, say.

As another simple example: when testing concurrency in a database, we just
need a database table with one row. If concurrency rules fail on the single
row, it is bound to fail on multiple rows (ramification: in this scenario,
it's probably not worthwhile for a fuzzer to start exploring multi-row
input).

The strength of fuzzers is that they work by the click of a button and burn
CPU cycles. You don't have to think about the data very much. Exploit this.
If it fails, you can always try another approach down the road. Randomized
methods are strong on modern hardware which is fast. Your goal is to break
the program and most programs break easily if there's a bug in them.
Especially with a fuzzer trying to dig into call paths.

go-fuzz expects the fuzz entrypoints to be func Fuzz(data []byte) int,
> meaning *Table cannot be used directly without some intermediate
> serialization and deserialization.  Suppose that *Table has no regular
> on-disk format; what would you recommend as the approach to take toward
> generating an initial corpus so that go-fuzz can explore the search space
> and generate prospective crashers.
>
> I've toyed around a bit with taking representative values and encoding
> them to disk with encoding/gob and then having go-fuzz deserialize,
> validate, and reject bad gob value candidates by returning -1.
> Notwithstanding any issues inherent to encoding/gob, this does not feel
> like a particularly efficient on account of extraneous exercising of the
> gob encoding format, when really func F(*Table) error is what I want to
> exercise (as in commit my CPU cycles toward).
>
> Potentially there are a few other ways of encoding *Table, like Protocol
> Buffers (with caveats), but encoding/binary is definitely out due to the
> opaque size of *Table.  The point still stands: I would ideally not want to
> exercise the serialization flow so much as handling of the type on which
> the data is projected.
>
> What strategy would you use here?  I feel like there should be an obvious
> choice under my nose that I am somehow omitting.  And no, I don't want
> testing/quick for this, even though it can generate plenty of complex
> composite types at runtime.
>
> With warm regards,
>
> Matt T. Proud
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/802159d8-ae45-4758-ac57-3d800bdc406en%40googlegroups.com
> <https://groups.google.com/d/msgid/golang-nuts/802159d8-ae45-4758-ac57-3d800bdc406en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
J.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGrdgiW9nk6i0F%3DFG-zGVoaRw_DcQ1b3dSMLw709gcEdk2gmHQ%40mail.gmail.com.

Re: [go-nuts] Recommendation for Fuzzed Types without Native On-Disk Format

Reply via email to