In your letter dated Tue, 1 Nov 2016 16:50:36 +0000 you wrote: >> Did you consider not (partially) decoding the DNS payload and instead just >> storing DNS payloads directly as binary blobs? > >We did, and we entirely appreciate the advantages of having the binary. >But from a processing point of view, this seemed to us to be equivalent to >using PCAP. Our aim was to produce files that are as small as possible, >but while >consuming minimal machine resources. We expect files in our format to be >run through >general purpose compression. Our measurements indicated that compressing >raw PCAP >is much more expensive in terms of CPU and working set size, and >delivers files that >are still twice the size of our format after compression. We concluded that >any format shipping binary blobs would not meet our goals.
If find it hard to believe that after compression, the BSON encoded version of the DNS data would be a lot smaller than just the raw DNS data. There is a not a lot of redundancy in the DNS encoding. On the other hand, the IP and UDP headers are quite big, certainly compared to most queries and also compared to an NXDOMAIN answer. In addition, IP headers may have random IDs, and UDP headers have random checksums. Which compress rather poorly and are stored twiced in the pcap. The UDP port of the client also compresses poorly and is stored once in BSON and twice in a PCAP. So I don't think it follows from badly compressing pcaps that storing raw DNS would compress badly as well. Unless I missed some tricks why the CBOR version compresses a lot better. >> Another issue is to consider whether the format would benefit from local >> extensions. For example, enrichtment of data according to local specificatio >ns. >> If so, then BSON would be another format to consider. > >We deliberately specified CBOR maps for most of the data structures to allow >other fields to be added, either in later versions or in local >modifications. We intend >that decoders should just ignored any fields they don't recognise. > >We looked at a variety of binary forms, and I think I did at least look >briefly >at BSON. It didn't seem to have any major advantages over CBOR, though >obviously I may have missed something. We looked closer at Apache Avro >and Protocol >Buffers; Avro was the closest competition, but in the end did not offer any >significant advantage, so we went with the format with the IETF standard. The downside of CBOR, certainly as used here is that uses integers to identify fields in what JSON calls objects. So anybody who writes a local extension is likely to just continue numbering fields, which leeds to mutually incompatible extensions. In contrast, formats like XML, JSON, but also BSON where fields have names make it less likely that people will pick the same identifier for completely different purposes. I looked at the BSON specs, and BSON can do a lot of things in this regard. But also seems to bring a lot of complexity and how to do it right is easily lost in all the details. _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop