[ 
https://issues.apache.org/jira/browse/CRUNCH-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Whitacre updated CRUNCH-611:
----------------------------------
    Attachment: CRUNCH-611.patch

So this patch provides a basic API for reading/writing Kafka offsets.  It then 
also provides a simple implementation that reads/writes the values from HDFS.  
In theory this then should make regularly schedule Crunch pipeline's easier to 
implement with regard to offset management.

I did add a few optional dependencies so hopefully these won't cause too bad of 
conflicts with the Hadoop stack.  We aren't having a problem on our cluster but 
didn't universally check.  We are also setting out classpath first and running 
through Oozie so that changes classpath ordering as well.

> Simplified Kafka Offset Management in HDFS
> ------------------------------------------
>
>                 Key: CRUNCH-611
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-611
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-611.patch
>
>
> With the KafkaSource the responsibility of offset management is the burden of 
> the consumer.  With some simple APIs it is actually trivial to support 
> read/storing these offsets in an HDFS directory as checkpoints for the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to