Kafka as master data store

2016-02-15 Thread Ted Swerve
Hello,

Is it viable to use infinite-retention Kafka topics as a master data
store?  I'm not talking massive volumes of data here, but still potentially
extending into tens of terabytes.

Are there any drawbacks or pitfalls to such an approach?  It seems like a
compelling design, but there seem to be mixed messages about its
suitability for this kind of role.

Regards,
Ted


Kafka as master data store

2016-02-15 Thread Ted Swerve
Hello,

Is it viable to use infinite-retention Kafka topics as a master data
store?  I'm not talking massive volumes of data here, but still potentially
extending into tens of terabytes.

Are there any drawbacks or pitfalls to such an approach?  It seems like a
compelling design, but there seem to be mixed messages about its
suitability for this kind of role.

Regards,
Ted


Re: Kafka as master data store

2016-02-15 Thread Ted Swerve
HI Ben, Sharninder,

Thanks for your responses, I appreciate it.

Ben - thanks for the tips on settings. A backup could certainly be a
possibility, although if only with similar durability guarantees, I'm not
sure what the purpose would be?

Sharninder - yes, we would only be using the logs as forward-only streams -
i.e. picking an offset to read from and moving forwards - and would be
setting retention time to essentially infinite.

Regards,
Ted.

On Tue, Feb 16, 2016 at 5:05 AM, Sharninder Khera 
wrote:

> This topic comes up often on this list. Kafka can be used as a datastore
> if that’s what your application wants with the caveat that Kafka isn’t
> designed to keep data around forever. There is a default retention time
> after which older data gets deleted. The high level consumer essentially
> reads data as a stream and while you can do sort of random access with the
> low level consumer, its not ideal.
>
>
>
> > On 15-Feb-2016, at 10:26 PM, Ted Swerve  wrote:
> >
> > Hello,
> >
> > Is it viable to use infinite-retention Kafka topics as a master data
> > store?  I'm not talking massive volumes of data here, but still
> potentially
> > extending into tens of terabytes.
> >
> > Are there any drawbacks or pitfalls to such an approach?  It seems like a
> > compelling design, but there seem to be mixed messages about its
> > suitability for this kind of role.
> >
> > Regards,
> > Ted
>
>


Re: Kafka as master data store

2016-02-16 Thread Ted Swerve
I guess I was just drawn in by the elegance of having everything available
in one well-defined Kafka topic should I start up some new code.

If instead the Kafka topics were on a retention period of say 7 days, that
would involve firing up a topic to load the warehoused data from HDFS (or a
more traditional load), and then switching over to the live topic?

On Tue, Feb 16, 2016 at 8:32 AM, Ben Stopford  wrote:

> Ted - it depends on your domain. More conservative approaches to long
> lived data protect against data corruption, which generally means snapshots
> and cold storage.
>
>
> > On 15 Feb 2016, at 21:31, Ted Swerve  wrote:
> >
> > HI Ben, Sharninder,
> >
> > Thanks for your responses, I appreciate it.
> >
> > Ben - thanks for the tips on settings. A backup could certainly be a
> > possibility, although if only with similar durability guarantees, I'm not
> > sure what the purpose would be?
> >
> > Sharninder - yes, we would only be using the logs as forward-only
> streams -
> > i.e. picking an offset to read from and moving forwards - and would be
> > setting retention time to essentially infinite.
> >
> > Regards,
> > Ted.
> >
> > On Tue, Feb 16, 2016 at 5:05 AM, Sharninder Khera 
> > wrote:
> >
> >> This topic comes up often on this list. Kafka can be used as a datastore
> >> if that’s what your application wants with the caveat that Kafka isn’t
> >> designed to keep data around forever. There is a default retention time
> >> after which older data gets deleted. The high level consumer essentially
> >> reads data as a stream and while you can do sort of random access with
> the
> >> low level consumer, its not ideal.
> >>
> >>
> >>
> >>> On 15-Feb-2016, at 10:26 PM, Ted Swerve  wrote:
> >>>
> >>> Hello,
> >>>
> >>> Is it viable to use infinite-retention Kafka topics as a master data
> >>> store?  I'm not talking massive volumes of data here, but still
> >> potentially
> >>> extending into tens of terabytes.
> >>>
> >>> Are there any drawbacks or pitfalls to such an approach?  It seems
> like a
> >>> compelling design, but there seem to be mixed messages about its
> >>> suitability for this kind of role.
> >>>
> >>> Regards,
> >>> Ted
> >>
> >>
>
>


Creating new consumers after data has been discarded

2016-02-24 Thread Ted Swerve
Hello,

One of the big attractions of Kafka for me was the ability to write new
consumers of topics that would then be able to connect to a topic and
replay all the previous events.

However, most of the time, Kafka appears to be used with a retention period
- presumably in such cases, the events have been warehoused into HDFS
or something similar.

So my question is - how do people typically approach the scenario where a
new piece of code needs to process all events in a topic from "day one",
but has to source some of them from e.g HDFS and then connect to the
real-time Kafka topic?  Are there any wrinkles with such an approach?

Thanks,
Ted