I was (relatively) recently bitten by an issue with putting contracts on
serializable structs. What's worse, once I figured out what was going on, I
realized that I'd run into this very issue before and even discussed it on
this list
<https://groups.google.com/d/topic/racket-users/xIrqdJE-qdk/discussion>. In
that case, though, I'd encountered the problem with higher-order contracts
and other complications: I hadn't fully appreciated that the same caveats
apply to much simpler cases.

I want to write up this problem for the list, both because I suspect others
are making the same oversight I was and because I think this problem
deserves a linguistic solution.

In essence, the problem is that, if you are using `serializable-struct`,
any contracts you may put on the struct (usually through a `struct` clause
in `contract-out`) are not enforced when instances are deserialized with
`deserialize`. Here is an example program:

#lang racket

(module server racket
  (require racket/serialize)
  (provide (contract-out
            [struct boxed-set ([v set?])]))
  (serializable-struct boxed-set (v)
    #:transparent))

(module bad racket
  (require racket/serialize
           (submod ".." server))
  (provide (contract-out
            [make-empty-boxed-set/bad
             (-> boxed-set?)]))
  (define (corrupt-serialized s)
    (list-set s 6 '(0 "not a set")))
  (define (make-empty-boxed-set/bad)
    (deserialize
     (corrupt-serialized
      (serialize (boxed-set (set)))))))

(module main racket
  (require (submod ".." server)
           (submod ".." bad))
  (boxed-set-v (make-empty-boxed-set/bad)))

The `server` module tries to protect its `boxed-set` datatype with a
contract in just the way I was doing, which I expect many other Racketeers
are doing as well. If I'd written such a module with `struct` instead of
`serializable-struct`, I would know that no other module will be able to
construct a `boxed-set` instance with a bad value. If I know that my own
module doesn't create malformed instances, either (perhaps because it
doesn't construct any instances), I could correctly assume that every
instance satisfies its contract, giving all of the familiar benefits for
all parties when reasoning about their code.

Using `serializable-struct` opens a loophole in this guarantee. The
expansion of `serializable-struct` uses the same lower-level mechanisms
available for implementing a custom serialized representation: it generates
a `prop:serializable` property and a `deserialize-info` value, which is
exported from the `deserialize-info` submodule as
`deserialize-info:boxed-set-v0`. The `deserialize-info` value encapsulates
a function to construct a `boxed-set` instance from the serialized
representation, which is implemented using the `boxed-set` constructor. The
key point is that the expansion of `serializable-struct` uses the version
of the `boxed-set` constructor from inside the `server` module, which *is
not protected by a contract.* Thus, `deserialize` effectively exposes the
contract-less constructor, albeit quite indirectly.

The `bad` module demonstrates how malformed instances of `boxed-set` can be
constructed without triggering a contract violation. The `main` module then
attempts to use `boxed-set-v` on a malformed value it has gotten from
`bad`, triggering this exception, which accurately blames `server` for
failing to live up to its promised contract on `boxed-set-v`:
boxed-set-v: broke its own contract
  promised: set?
  produced: "not a set"
  in: the range of
      (-> boxed-set? set?)
  contract from:
      (<...>/serial-contract.rkt server)
  blaming: (<...>/serial-contract.rkt server)
   (assuming the contract is correct)
  at: <...>/serial-contract.rkt:7.23

The especially pernicious part is that, if `main` hadn't used
`boxed-set-v`, the error might not have been detected until the value had
flowed far away into some unrelated code.

Why is this significant in practice? Realistically, I don't expect
programmers would write functions like `corrupt-serialized` in an attempt
to deliberately exploit some library's invariant, and the chance of someone
doing so accidentally seems fairly remote. What worries me much more is
that I read serializable structs in over the network, stick them in
databases, and write them to disk. In the absence of a contract, any struct
that has ever been deserialized may be silently corrupt. Some bits could
have been flipped or some other process could have mangled a file on disk.
An attacker could have intentionally manipulated a value sent over the
network as part of some potential exploit. Perhaps most likely, the could
have been a backwards-incompatible change (perhaps an unintentional one) to
the contract on some field of the struct.

A contract couldn't protect me from encountering a malformed serialized
representation, but it would let me detect the error as soon as I try to
deserialize the bad value, rather than delaying the error until I actually
attempt to access a bad field. As things stand, if I usually just pass the
value around and move it in and out of storage, I might be unknowingly
passing around bad data indefinitely.

I'm particularly concerned that people may be vulnerable to these
possibilities without realizing it, as I have been. In every other way I
can think of, `serializable-struct` is a drop-in replacement for `struct`,
and, as I explained, a `server` module that used `struct` would have been
sound—so it's surprising that the version with `serializable-struct` isn't,
especially since the unprotected constructor is exposed through a
side-channel that isn't obvious in the source text of the program.

I said at the outset that I think this problem deserves a linguistic
solution. Ultimately, I would like it to be as easy to implement a
protected serializable struct as it is to use `struct` and `contract-out`.
All of the workarounds that I know of seem to have significant limitations.
One approach would be to drop down to the level of `prop:serializable` and
`make-deserialize-info` and write manual checks, but that seems bad for all
of the reasons contracts are so much better than manual checks in general.
Robby had suggested
<https://groups.google.com/d/msg/racket-users/xIrqdJE-qdk/B20DSUEjBAAJ>
expecting clients not to use `deserialize` directly, which could help in a
number of cases, but it would be a problem when working with libraries like
`#lang web-server` that expect to be given serializable values and to
serialize them for you. Matthew gave an example
<https://groups.google.com/d/msg/racket-users/xIrqdJE-qdk/zoHUWdEiBAAJ>
showing how to put the `deserialize-info` in a separate module, so that it
uses the contracted version of the constructor: this could well be the
basis for a robust solution, but it would definitely need a friendlier API
than manipulating module-path indices. It might well help to add some hooks
to `racket/serialize`—maybe something like `chaperone-serializable-struct`?

I don't have a very concrete vision of how to approach this, but I think
it's worth doing. In the mean time, I hope this is helpful to anyone else
who may think they've been enforcing stronger guarantees than they actually
have.

-Philip

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to