On Feb 27, 2009, at 5:12 PM, pa...@eng wrote:


Hi, the latest version of XML format includes more information and more verbose. We have new experiment results based on data from route-views4 (2009 Jan).

+-----------+--------------+-------+------------+-------+
| format    | umcompressed | ratio | compressed | ratio |
+-----------+--------------+-------+------------+-------+
| XFB       |   12,564(MB) | 14.87 |    211(MB) |  1.99 |
| XFB(full) |   15,039     | 17.80 |    368     |  3.47 |
| bgpdump   |    2,315     |  2.74 |    103     |  0.97 |
| MRT       |      845     |  1.00 |    106     |  1.00 |
+-----------+--------------+-------+------------+-------+
                Storage Size Comparison

. XFB is the PURE xml representation of BGP updates,
  while XFB(full) also includes the MRT binary data in the xml string.

well, not quite. it doesn't really include the MRT data in the xml string, it just includes the bytes off the wire.

Regardless of whether you use MRT or XFB or whatever, every archived BGP message needs to include some header information describing the peer that sent the update, the time received, and so forth. So for example we can think of MRT as MRT = binary HEADER + binary BYTES OFF WIRE.

With XFB, you have a choice of what to include. Again, we need a header to identify the peer, message time, etc. After that we can include the bytes off the wire which are encoded as <OCTET> bytes off wire in hex </OCTET>. We can also expand these bytes into an ASCII format with tags like <AS_PATH> and so forth. Since it is XML, the format allows you to include either the binary format or the human readable version or both.

Choice 1: store <HEADER> + <OCTET> + <ASCII>. This is the format we are currently using to provide data as gives maximum flexibility. But note the same data is encoded twice; once as hex in the <OCTET> field and the again in more human readable ASCII fields like <AS_PATH>. The is the large ratio number in the table above.

Choice 2: store <HEADER> + <OCTET>. To get this, one could easily strip (or never generate) the <ASCII> parts. The ASCII tags can be easily recreated from the binary data at any time. This saves disk space, but requires the data user to convert the bytes back into <ASCII> part. Payne and He Yan, please correct me if wrong but I don't think that choice is shown in the table above. We will generate these numbers and send in a future message.

Choice 3: store <HEADER + <ASCII>. To get this, one could easily strip (or never generate) the <OCTET> field. All the data is still there and this is sufficient for many applications, but this could be insufficient if you want to do something like replay the bytes as they came off the wire. This is row one in the table above.

Note that if you have only the <ASCII> fields, you could regenerate the binary <OCTET> string but there are some limitations. For example, suppose the original message had two community attributes, A and B. Both of these are encoded in the ASCII string as say <COMMUNITY> A </COMMUNITY> and <COMMUNITY> B </COMMUNITY>. The challenge in recreating the binary message is that XML does not impose ordering and so you can't be certain whether A or B was actually encoded first in the bytes off the wire. This may or may not matter.... the reconstructed binary octet string is equivalent to one that would have been sent over the wire. In other words both have community A and community B in the binary. It may have them in the wrong order.... if the problem you are trying to debug is that some router is scrambling bits 100-200, then the order might matter a great deal.

Thanks,
Dan





_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to