> Jose E. Marchesi wrote: >>> [...] >>> >>>> I agree that distcheck is good but not a cure all. Any static >>>> system can be attacked when there is motive, and unit tests are >>>> easily gamed. >>>> >>> The issue seems to be releases containing binary data for unit tests, >>> instead of source or scripts to generate that data. In this case, >>> that binary data was used to smuggle in heavily obfuscated object >>> code. >>> >> >> As a side note, GNU poke (https://jemarch.net/poke) is good for >> generating arbitrarily complex binary data from clear textual >> descriptions. > > While it is suitable for that use, at last check poke is itself very > complex, complete with its own JIT-capable VM. This is good for > interactive use, but I get nervous about complexity in testsuites, > where simplicity can greatly aid debugging, and it /might/ be possible > to hide a backdoor similarly in a poke pickle. (This seems to be a > general problem with powerful interactive editors.)
Yes, I agree simplicity it is very desirable, in testsuites and actually everywhere else. I also am not fond of dragging in dependencies. But I suppose we also agree in that it is not possible to assembly non-trivial binary data structures in a simple way, without somehow moving the complexity of the encoding into some sort of generator, which will not be simple. The GDB testsuite, for example, ships with a DWARF assembler written in around 3000 lines of Tcl. Sure, it is simpler than poke and doesn't drag in additional dependencies. But it has to be carefully maintained and kept up to date, and the complexity is there. > Further, GNU poke defines its own specialized programming language for > manipulating binary data. Supplying generator programs in C (or C++) > for binary test data in a package that itself uses C (or C++) ensures > that every developer with the skills to improve or debug the package > can also understand the testcase generators. Here we will have to disagree. IMO it is precisely the many and tricky details on properly marshaling binary data in general-purpose programming languages that would have greater odds to lead to difficult to understand, difficult to maintain and possibly buggy or malicious encoders. The domain specific language is here an advantage, not a liability. This you need to do in C to encode and generate test data for a single signed 32-bit NUMBER in an output file in a _more or less_ portable way: void generate_testdata (off_t offset, int endian, int number) { int bin_flag = 0, fd; #ifdef _WIN32 int bin_flag = O_BINARY; #endif fd = open ("testdata.bin", bin_flag, S_IWUSR); if (fd == -1) fatal ("error generating data."); if (endian == BIG) { b[0] = (number >> 24) & 0xff; b[1] = (number >> 16) & 0xff; b[2] = (number >> 8) & 0xff; b[3] = number & 0xff; } else { b[3] = (number >> 24) & 0xff; b[2] = (number >> 16) & 0xff; b[1] = (number >> 8) & 0xff; b[0] = number & 0xff; } lseek (fd, offset, SEEK_SET); for (i = 0; i < 4; ++i) write (fd, &b[i], 1); close (fd); } This is the Poke equivalent: fun generate_testdata = (offset<uint<64>,B> off, int<32> endian, int<32> number) void: { var fd = open ("testdata.bin"); set_endian (endian); int<32> @ fd : off = number; close (fd); } And thanks to the DSL, this scales nicely to more complex structures, such as an ELF64 relocation instead of a signed 32-bit integer: fun generate_testdata = (offset<uint<64>,B> off, int<32> endian, int<32> number) void: { type Elf64_RelInfo = struct Elf64_Xword { uint<32> r_sym; uint<32> r_type; }; type Elf64_Rela = struct { offset<uint<64>,B> r_offset; Elf64_RelInfo r_info; offset<int<64>,B> r_addend; }; var fd = open ("got32reloc.bin"); set_endian (endian); Elf64_Rela @ 0#B = Elf64_Rela { r_info = Elf64_RelInfo { r_sym = 0xff00, r_type = 3 } } close (fd); }