[ 
https://issues.apache.org/jira/browse/AVRO-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799863#action_12799863
 ] 

Jeff Hammerbacher commented on AVRO-196:
----------------------------------------

bq. Perhaps we should see what can be achieved through compression first 
(AVRO-135).

I think compression is the way to go here. We are missing a strategy for RPC 
compression, but it seems that sparse records in data files will compress well.

> Add encoding for sparse records
> -------------------------------
>
>                 Key: AVRO-196
>                 URL: https://issues.apache.org/jira/browse/AVRO-196
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Justin SB
>            Priority: Minor
>
> If we have a large record with many fields in avro which is mostly empty, 
> currently avro will still serialize every field, leading to big overhead.  We 
> could support a sparse record format for this case: before each record a 
> bitmask is serialized indicating the presence of the fields.  We could 
> specify the encoding type as a new attribute in the avpr e.g.  
> {"type":"record", "name":"Test", "encoding":"sparse", "fields":....}
> I've put an implementation of the idea on github:
> http://github.com/justinsb/avro/commit/7f6ad2532298127fcdd9f52ce90df21ff527f9d1
> This leads to big improvements in the serialization size in our case, when 
> we're using avro to serialize performance metrics, where most of the fields 
> are usually empty.
> The alternative of using a Map isn't a good idea because it (1) serializes 
> the names of the fields and (2) means we lose strong typing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to