[ 
https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Whitacre updated CRUNCH-606:
----------------------------------
    Attachment: CRUNCH-606.diff

So the way I have the source currently written it will support reading from 1:M 
topics as long as they produce the same payloads.  This fits our use case where 
our topics are segregated by source but payloads are all the same.  In theory 
the flexibility in the Serializer/Deserializer is by topic to support things 
like migration of the serialized format (e.g. person_json, person_avro could 
both produce Person objects but their byte form might be different)  

To be honest we don't really have this use case but it has been indicated that 
this is how things like the SchemaRegistry from Confluent does passivity.

Attached is the patch of my code so far.  I think the only missing piece is the 
converter and need for more integration testing.  

You'll notice that right now I have the KafkaSource taking in a PTableType.  
While I'd love for the source to be PTypeFamily agnostic it seems like if I 
could change the InputFormat/RecordReader to only support AvroTypeFamily or 
write a ConverterShim.  I'll play around with this some more.

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606.diff, CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it 
> Kafka Connect, Camus, Gobblin do not integrate nicely with existing 
> processing pipelines like Crunch.  With Kafka 0.9, the consuming API is a lot 
> easier so we should build a Source implementation that can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to