[jira] [Updated] (CRUNCH-606) Create a KafkaSource

Micah Whitacre (JIRA) Mon, 09 May 2016 14:23:19 -0700

     [ 
https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Micah Whitacre updated CRUNCH-606:
----------------------------------
    Attachment: CRUNCH-606-byteswritable.diff

Ok went with the simplest version I could get working by the end of the day.  
The KafkaSource always produces PTableType<BytesWritable, BytesWritable> from 
the WritableTypeFamily to avoid the Avro restriction.  Tests all work.

If we went with this approach the one outstanding TODO I have in the code is 
closing out the Consumer that gets created during materialize() or 
ReadableData.  I could make the iterator close the Consumer once all is 
consumed but then that'd be single use for the Iterable and is that ok?

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606-byteswritable.diff, CRUNCH-606.diff, 
> CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it 
> Kafka Connect, Camus, Gobblin do not integrate nicely with existing 
> processing pipelines like Crunch.  With Kafka 0.9, the consuming API is a lot 
> easier so we should build a Source implementation that can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CRUNCH-606) Create a KafkaSource

Reply via email to