On May 22, 2009, at 4:32 PM, Doug Cutting wrote:
Matt Massie wrote:
(1) Spec related: Should we have a max-length attribute for the
variable length objects?
My instinct would be to put this in the server implementation--it
should guard against requests that use too much memory.
That will work. If this approach doesn't work as well as we'd like,
we can update the spec at a later time.
(2) Maps: Do we need to maintain the key/value pair order for maps?
I think re-ordering is fine. Does anyone disagree?
My vote would be to not impose any ordering on maps.
(3) Blocks: Sanity check
If the elements of an array are fixed length (e.g. 8 bytes), then
the block of 100 of them would look like...
[ long = 100 ][ 100 * 8 = 800 bytes of data in the block][ long = 0 ]
... terminated with a zero.. or
[ long = 90 ][ 90 * 8 = 720 bytes of data in the block ][ long =
10 ][ 10 * 8 = 80 bytes in the block ][ long = 0]
.. correct?
Yes, that's the idea.
However, if the objects are variable length, there is no way to
calculate the size of the block based on the element sizes so we
use the negative "count" value. For example...
[ long = -1 ][ long = 23948 ][ 23948 bytes of data in the block ]
[ long = 0 ]
.. which is terminated with a zero.
Almost. AVRO-25 proposes to permit, for your first example:
[long = -100] [long = 800] [100 * 8 = 800 bytes of data] [long = 0]
The item count is always required, but when its negative, its
followed by the byte count. This is the same for variable and fixed-
sized data.
I guess one advantage of requiring the "count" is to allow an extra
check once the block is processed.
(4) RPC related: Should we explicitly specify the entire RPC
communication as an Avro schema?
You mean the handshake stuff? I've thought about that, but felt
that the bootstrapping got complicated.
Yes. Actually, I believe that we should express all messages
exchanged between Avro components in Avro schema so that we don't need
to hand-craft the RPC layer. Having the schema will also make the
protocols more transparent and Avro more adaptive to changes in RPC.
Are there any reasons you can see for hand-coding the RPC layer?
-Matt