In your letter dated Tue, 1 Nov 2016 16:50:36 +0000 you wrote:
>> Did you consider not (partially) decoding the DNS payload and instead just
>> storing DNS payloads directly as binary blobs?
>
>We did, and we entirely appreciate the advantages of having the binary.
>But from a processing point of view, this seemed to us to be equivalent to
>using PCAP. Our aim was to produce files that are as small as possible, 
>but while
>consuming minimal machine resources. We expect files in our format to be 
>run through
>general purpose compression. Our measurements indicated that compressing 
>raw PCAP
>is much more expensive in terms of CPU and working set size, and 
>delivers files that
>are still twice the size of our format after compression. We concluded that
>any format shipping binary blobs would not meet our goals.

If find it hard to believe that after compression, the BSON encoded
version of the DNS data would be a lot smaller than just the
raw DNS data.

There is a not a lot of redundancy in the DNS encoding.

On the other hand, the IP and UDP headers are quite big, certainly compared
to most queries and also compared to an NXDOMAIN answer.

In addition, IP headers may have random IDs, and UDP headers have random
checksums. Which compress rather poorly and are stored twiced in the pcap.
The UDP port of the client also compresses poorly and is stored once in BSON
and twice in a PCAP.

So I don't think it follows from badly compressing pcaps that storing
raw DNS would compress badly as well. Unless I missed some tricks
why the CBOR version compresses a lot better.

>> Another issue is to consider whether the format would benefit from local
>> extensions. For example, enrichtment of data according to local specificatio
>ns.
>> If so, then BSON would be another format to consider.
>
>We deliberately specified CBOR maps for most of the data structures to allow
>other fields to be added, either in later versions or in local 
>modifications. We intend
>that decoders should just ignored any fields they don't recognise.
>
>We looked at a variety of binary forms, and I think I did at least look 
>briefly
>at BSON. It didn't seem to have any major advantages over CBOR, though
>obviously I may have missed something. We looked closer at Apache Avro 
>and Protocol
>Buffers; Avro was the closest competition, but in the end did not offer any
>significant advantage, so we went with the format with the IETF standard.

The downside of CBOR, certainly as used here is that uses integers to
identify fields in what JSON calls objects.

So anybody who writes a local extension is likely to just continue numbering
fields, which leeds to mutually incompatible extensions.

In contrast, formats like XML, JSON, but also BSON where fields have names
make it less likely that people will pick the same identifier for 
completely different purposes.

I looked at the BSON specs, and BSON can do a lot of things in this
regard. But also seems to bring a lot of complexity and how to do it right
is easily lost in all the details.

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to