[ 
https://issues.apache.org/jira/browse/AVRO-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835132#action_12835132
 ] 

Scott Carey commented on AVRO-251:
----------------------------------

I am using DataFileReader/Writer and the header is about 5K in size because the 
whole schema is in text.

I'm not sure if the approach in this ticket is best for the file format, but 
some way to persist a schema in a compact form would be useful.  A binary 
format would be smaller, but every field and type would still have to be there 
in text.  Maybe, for the data file we could just store the schema as the 
string, deflate compressed.  That might be computationally more expensive for a 
compact schema representation, but it could be clean in general -- if the first 
character in a byte[] that represents a schema is a special marker value (that 
is invalid in JSON), then the remaining bytes are compressed json, otherwise 
its utf-8 json.

My largest schema is 6.3k as a string including whitespace 'pretty printed', 
and 4.9k without whitespace as printed by Schema.toString().
It is 1.3k compressed by gzip -5 or higher,  and 1.5k by gzip -1.

> add schema for schemas
> ----------------------
>
>                 Key: AVRO-251
>                 URL: https://issues.apache.org/jira/browse/AVRO-251
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-251.patch, AVRO-251.patch
>
>
> A schema for schemas would permits schemas to be written in binary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to