[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

Doug Cutting (JIRA) Tue, 28 Jun 2016 16:19:31 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353906#comment-15353906
 ]


Doug Cutting commented on AVRO-1704:
------------------------------------

I think all the methods are useful but some of them (e.g., non-reuse) will 
always be implemented by boilerplate and are thus not core to the interface, 
but rather something more suitable for a base class.

An abstract base class would still permit independent alternative 
implementations.  The only additional power an interface has is that one can 
implement multiple interfaces.  But interfaces don't let you implement 
convenience methods, nor do they permit compatible evolution (if you ever add 
or remove a method, you break implementations, because you cannot provide 
default impls).  But if you feel multiple inheritance is important here, then 
it's probably easier to stick to an interface than, e.g., refactor into 
encoder/decoder provider classes that are separate from the user-invoked 
classes or some other way to avoid such boilerplate implementations.

Encoding to a ByteBuffer should be thread-safe, since it has no caller-visible 
state, no?

> Standardized format for encoding messages with Avro
> ---------------------------------------------------
>
>                 Key: AVRO-1704
>                 URL: https://issues.apache.org/jira/browse/AVRO-1704
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Daniel Schierbeck
>            Assignee: Niels Basjes
>         Attachments: AVRO-1704-2016-05-03-Unfinished.patch, 
> AVRO-1704-20160410.patch
>
>
> I'm currently using the Datafile format for encoding messages that are 
> written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync 
> markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, 
> meaning that I can read and write data with minimal effort across the various 
> languages in use in my organization. If there was a standardized format for 
> encoding single values that was optimized for out-of-band schema transfer, I 
> would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode 
> datums in this format, as well as a MessageReader that, given a SchemaStore, 
> would be able to decode datums. The reader would decode the fingerprint and 
> ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed 
> library users to inject custom backends. A simple, file system based one 
> could be provided out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

Reply via email to