[ 
https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692853#comment-14692853
 ] 

Ewen Cheslack-Postava commented on KAFKA-2367:
----------------------------------------------

The runtime API should not affect serialization at all. So the JSON comment 
isn't relevant I think -- if we wanted to use Avro for the runtime API, we 
would really just be lifting the Schema and GenericRecord classes but none of 
the serialization code. I personally don't have any issue with doing that, but 
the concern was that someone a) might not like adding Avro as a dependency and 
b) that we do want to support different serialization formats (which, at a 
minimum, is necessary since you may have data in other formats delivered by 
other tools to Kafka, and we still want Copycat to be able to push that data to 
other systems such as HDFS) and don't want to treat Avro as a first class 
citizen and other formats as second class.

If nobody objects, I think using Avro directly isn't a bad choice. I dislike 
some of its choices (in particular that nullable fields need to be defined as 
union types with the null type), but I agree it would be better to offload 
maintaining that code to another project that is already going to be doing it 
anyway and it does have well thought through schema migration support.

> Add Copycat runtime data API
> ----------------------------
>
>                 Key: KAFKA-2367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2367
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: copycat
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>             Fix For: 0.8.3
>
>
> Design the API used for runtime data in Copycat. This API is used to 
> construct schemas and records that Copycat processes. This needs to be a 
> fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to 
> support complex, varied data types that may be input from/output to many data 
> systems.
> This should issue should also address the serialization interfaces used 
> within Copycat, which translate the runtime data into serialized byte[] form. 
> It is important that these be considered together because the data format can 
> be used in multiple ways (records, partition IDs, partition offsets), so it 
> and the corresponding serializers must be sufficient for all these use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to