Re: data archiving

2014-06-16 Thread Robert Hodges
Have you looked at Pinterest Secor?  (
http://engineering.pinterest.com/post/84276775924/introducing-pinterest-secor
)

Cheers, Robert


On Mon, Jun 16, 2014 at 5:17 AM, Mark Godfrey  wrote:

> There is Bifrost, which archives Kafka data to S3:
> https://github.com/uswitch/bifrost
>
> Obviously that's a fairly specific archive solution, but it might work for
> you.
>
>
> Mark.
>
> On Mon, Jun 16, 2014 at 11:02 AM, Anatoly Deyneka 
> wrote:
>
> > Hi all,
> >
> > I'm looking for the way of archiving data.
> > The data is hot for few days in our system.
> > After that it can rarely be used. Speed is not so important for archive.
> >
> > Lets say we have kafka cluster and storage system.
> > It would be great if kafka supported moving data to storage system
> instead
> > of eviction and end user could specify what storage system is
> used(dynamo,
> > s3, hadoop, etc...).
> > Is it possible to implement?
> >
> > What other solutions you can advice?
> >
> > Regards,
> > Anatoly
> >
>


Re: data archiving

2014-06-16 Thread Mark Godfrey
There is Bifrost, which archives Kafka data to S3:
https://github.com/uswitch/bifrost

Obviously that's a fairly specific archive solution, but it might work for
you.


Mark.

On Mon, Jun 16, 2014 at 11:02 AM, Anatoly Deyneka 
wrote:

> Hi all,
>
> I'm looking for the way of archiving data.
> The data is hot for few days in our system.
> After that it can rarely be used. Speed is not so important for archive.
>
> Lets say we have kafka cluster and storage system.
> It would be great if kafka supported moving data to storage system instead
> of eviction and end user could specify what storage system is used(dynamo,
> s3, hadoop, etc...).
> Is it possible to implement?
>
> What other solutions you can advice?
>
> Regards,
> Anatoly
>


Re: data archiving

2014-06-16 Thread Joe Stein
You should do this as a consumer (i.e. "archiveDataConsumer")

Take a look at the AWS section of the eco system
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem (e.g.
https://github.com/pinterest/secor ).

Also the tools is a good place to check out
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools (e.g.
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-MirrorMaker
).

If there isn't a consumer you need you could write one (most often what
folks do) or google and maybe find it and let the community know.

Thanks!

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Mon, Jun 16, 2014 at 6:02 AM, Anatoly Deyneka  wrote:

> Hi all,
>
> I'm looking for the way of archiving data.
> The data is hot for few days in our system.
> After that it can rarely be used. Speed is not so important for archive.
>
> Lets say we have kafka cluster and storage system.
> It would be great if kafka supported moving data to storage system instead
> of eviction and end user could specify what storage system is used(dynamo,
> s3, hadoop, etc...).
> Is it possible to implement?
>
> What other solutions you can advice?
>
> Regards,
> Anatoly
>


data archiving

2014-06-16 Thread Anatoly Deyneka
Hi all,

I'm looking for the way of archiving data.
The data is hot for few days in our system.
After that it can rarely be used. Speed is not so important for archive.

Lets say we have kafka cluster and storage system.
It would be great if kafka supported moving data to storage system instead
of eviction and end user could specify what storage system is used(dynamo,
s3, hadoop, etc...).
Is it possible to implement?

What other solutions you can advice?

Regards,
Anatoly