[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API

Gwen Shapira (JIRA) Fri, 14 Aug 2015 16:24:07 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697942#comment-14697942
 ]


Gwen Shapira commented on KAFKA-2367:
-------------------------------------

The fact that Avro brings in bazillion things doesn't have much impact for us - 
it has modules and we will only bring the parts we need.  I don't think we will 
see the Jackson dependency at all. (I think @ewencp had a prototype with Avro, 
so he can confirm this).

Also, I don't see how this would discourage connector developers. They will 
have to convert to some internal format. At least Avro errors can be googled... 
our own format will be harder to work with. I also think we are overestimating 
how many people will want to write connectors in any case, vs how many people 
will want to use them (and the users shouldn't care either way).
But - if Avro is so controversial, we can create a thin wrapper :)

Can you dive into more details about the binary compatibility issues? what is 
the exact problem and in which scenarios it will happen? I don't quite see it. 
Are you talking about the code generation? Because we will not use that here. 
Breakage in serialization happened at 1.3 accidentally, but never since. If 
Avro accidentally breaks compatibility, we should be able to catch it with 
Ducktape and get things resolved in Avro community.

Builder methods can be resolved with wrappers, which I don't object to.

Again, I want to point out that the whole generic object representation of 
typed data was very painful for Sqoop and I'd like to avoid going through this 
again (although it is possible that Kafka team is much better and we won't have 
those issues at all, I'd rather not test...)

I kinda disagree that the data types are the core contract... sure they are 
critical (and critical to get right) but from connector developer perspective 
the core issue is to figure out how to partition his job and how to track 
offsets correctly (for sources) and how to write output efficiently and 
correctly (for sinks) - which is the way it should be. Converting results of 
SQL query to our data format is a necessary evil.

I agree that clean copycat interfaces is the goal. Do you see issues with the 
Avro interfaces? Note that backward-compatibly fixes can be contributed to Avro.





> Add Copycat runtime data API
> ----------------------------
>
>                 Key: KAFKA-2367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2367
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: copycat
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>             Fix For: 0.8.3
>
>
> Design the API used for runtime data in Copycat. This API is used to 
> construct schemas and records that Copycat processes. This needs to be a 
> fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to 
> support complex, varied data types that may be input from/output to many data 
> systems.
> This should issue should also address the serialization interfaces used 
> within Copycat, which translate the runtime data into serialized byte[] form. 
> It is important that these be considered together because the data format can 
> be used in multiple ways (records, partition IDs, partition offsets), so it 
> and the corresponding serializers must be sufficient for all these use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API

Reply via email to