I've just submitted a patch to build C documentation using doxygen

https://issues.apache.org/jira/browse/AVRO-37


I'm starting work on building file/socket handles and processing the Avro compound data types (expect a patch with proper code/unit tests soon) and I have a few (hopefully quick) questions:

(1) Spec related: Should we have a max-length attribute for the variable length objects?

Since we're going to be using Avro for RPC, do we need to consider the possibility of malicious data in the Avro streams? Malicious clients could swamp out the memory on the server by sending intentionally long values (e.g. a string that is 2GB in length). Instead of having a server-wide max, it might make sense to allow a maximum length to be specified per object.


(2) Maps: Do we need to maintain the key/value pair order for maps?

I will be converting Avro maps to apr_table_t structures in order to make key search constant time. Do I need to guarantee that the order I decode the map is the order I encode it later? Just want to know if I need to store a private key array to maintain the order. It's ok if I do, but I'd like to use less memory if I can avoid it.


(3) Blocks: Sanity check

If the elements of an array are fixed length (e.g. 8 bytes), then the block of 100 of them would look like...

[ long = 100 ][ 100 * 8 = 800 bytes of data in the block][ long = 0 ]

... terminated with a zero.. or

[ long = 90 ][ 90 * 8 = 720 bytes of data in the block ][ long = 10 ] [ 10 * 8 = 80 bytes in the block ][ long = 0]

.. correct?

However, if the objects are variable length, there is no way to calculate the size of the block based on the element sizes so we use the negative "count" value. For example...

[ long = -1 ][ long = 23948 ][ 23948 bytes of data in the block ] [ long = 0 ]

.. which is terminated with a zero.


(4) RPC related: Should we explicitly specify the entire RPC communication as an Avro schema?

For examples, the entire RPC communication schema can be expressed in a single XDR .x file. The zeroc guys who wrote ICE express the RPC of all their components using their IDL.

Having the Avro RPC shema would make implementing RPC automatic and flexible.


Hope you all have a great Memorial Day weekend!

-Matt






Reply via email to