[ https://issues.apache.org/jira/browse/AVRO-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835132#action_12835132 ]
Scott Carey commented on AVRO-251: ---------------------------------- I am using DataFileReader/Writer and the header is about 5K in size because the whole schema is in text. I'm not sure if the approach in this ticket is best for the file format, but some way to persist a schema in a compact form would be useful. A binary format would be smaller, but every field and type would still have to be there in text. Maybe, for the data file we could just store the schema as the string, deflate compressed. That might be computationally more expensive for a compact schema representation, but it could be clean in general -- if the first character in a byte[] that represents a schema is a special marker value (that is invalid in JSON), then the remaining bytes are compressed json, otherwise its utf-8 json. My largest schema is 6.3k as a string including whitespace 'pretty printed', and 4.9k without whitespace as printed by Schema.toString(). It is 1.3k compressed by gzip -5 or higher, and 1.5k by gzip -1. > add schema for schemas > ---------------------- > > Key: AVRO-251 > URL: https://issues.apache.org/jira/browse/AVRO-251 > Project: Avro > Issue Type: New Feature > Components: java > Reporter: Doug Cutting > Assignee: Doug Cutting > Attachments: AVRO-251.patch, AVRO-251.patch > > > A schema for schemas would permits schemas to be written in binary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.