Add encoding for sparse records
-------------------------------

                 Key: AVRO-196
                 URL: https://issues.apache.org/jira/browse/AVRO-196
             Project: Avro
          Issue Type: New Feature
          Components: java
            Reporter: Justin SB
            Priority: Minor


If we have a large record with many fields in avro which is mostly empty, 
currently avro will still serialize every field, leading to big overhead.  We 
could support a sparse record format for this case: before each record a 
bitmask is serialized indicating the presence of the fields.  We could specify 
the encoding type as a new attribute in the avpr e.g.  {"type":"record", 
"name":"Test", "encoding":"sparse", "fields":....}

I've put an implementation of the idea on github:
http://github.com/justinsb/avro/commit/7f6ad2532298127fcdd9f52ce90df21ff527f9d1

This leads to big improvements in the serialization size in our case, when 
we're using avro to serialize performance metrics, where most of the fields are 
usually empty.

The alternative of using a Map isn't a good idea because it (1) serializes the 
names of the fields and (2) means we lose strong typing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to