Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Zachary Smith
You may want to look at Secor also
https://github.com/pinterest/secor

On Tue, Dec 6, 2016 at 10:53 AM, noah  wrote:

> If you are willing to setup Kafka Connect, my company has built this
> connector: https://github.com/spredfast/kafka-connect-s3
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Hans Jespersen

I know several people that use the qubole Kafka Sink Connector for S3 ( see 
https://github.com/qubole/streamx ) to store 
Kafka messages in S3 for long term archiving. You can also do this with the 
Confluent HDFS Kafka Connector if you have access to a Hadoop cluster

-hans




> On Dec 6, 2016, at 3:25 AM, Aseem Bansal  wrote:
> 
> Hi
> 
> Has anyone done a storage of Kafka JSON messages to deep storage like S3.
> We are looking to back up all of our raw Kafka JSON messages for
> Exploration. S3, HDFS, MongoDB come to mind initially.
> 
> I know that it can be stored in kafka itself but storing them in Kafka
> itself does not seem like a good option as we won't be able to query it and
> the configurations of machines containing kafka will have to be increased
> as we go. Something like S3 we won't have to manage.



Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread noah
If you are willing to setup Kafka Connect, my company has built this
connector: https://github.com/spredfast/kafka-connect-s3


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
@Asaf Mesika Stored to S3?

On Tue, Dec 6, 2016 at 5:28 PM, Asaf Mesika  wrote:

> We rolled our own since we couldn't (1.5 years ago) find one. The code is
> quite simple and short.
>
>
> On Tue, Dec 6, 2016 at 1:55 PM Aseem Bansal  wrote:
>
> > I just meant that is there an existing tool which does that. Basically I
> > tell it "Listen to all X streams and write them to S3/HDFS at Y path as
> > JSON". I know spark streaming can be used and there is flume. But I am
> not
> > sure about their scalability/reliability. That's why I thought to
> initiate
> > a discussion here to see whether someone knows about that already.
> >
> > On Tue, Dec 6, 2016 at 5:14 PM, Sharninder  wrote:
> >
> > > What do you mean by streaming way? The logic to push to S3 will be in
> > your
> > > consumer, so it totally depends on how you want to read and store. I
> > think
> > > that's an easier way to do what you want to, instead of trying to
> backup
> > > kafka and then read messages from there. Not even sure that's possible.
> > >
> > > On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal 
> > wrote:
> > >
> > > > I get that we can read them and store them in batches but is there
> some
> > > > streaming way?
> > > >
> > > > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal 
> > > wrote:
> > > >
> > > > > Because we need to do exploratory data analysis and machine
> learning.
> > > We
> > > > > need to backup the messages somewhere so that the data scientists
> can
> > > > > query/load them.
> > > > >
> > > > > So we need something like a router that just opens up a new
> consumer
> > > > group
> > > > > which just keeps on storing them to S3.
> > > > >
> > > > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera <
> > sharnin...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Why not just have a parallel consumer read all messages from
> > whichever
> > > > >> topics you're interested in and store them wherever you want to?
> You
> > > > don't
> > > > >> need to "backup" Kafka messages.
> > > > >>
> > > > >> _
> > > > >> From: Aseem Bansal 
> > > > >> Sent: Tuesday, December 6, 2016 4:55 PM
> > > > >> Subject: Storing Kafka Message JSON to deep storage like S3
> > > > >> To:  
> > > > >>
> > > > >>
> > > > >> Hi
> > > > >>
> > > > >> Has anyone done a storage of Kafka JSON messages to deep storage
> > like
> > > > S3.
> > > > >> We are looking to back up all of our raw Kafka JSON messages for
> > > > >> Exploration. S3, HDFS, MongoDB come to mind initially.
> > > > >>
> > > > >> I know that it can be stored in kafka itself but storing them in
> > Kafka
> > > > >> itself does not seem like a good option as we won't be able to
> query
> > > it
> > > > >> and
> > > > >> the configurations of machines containing kafka will have to be
> > > > increased
> > > > >> as we go. Something like S3 we won't have to manage.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > Sharninder
> > >
> >
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Asaf Mesika
We rolled our own since we couldn't (1.5 years ago) find one. The code is
quite simple and short.


On Tue, Dec 6, 2016 at 1:55 PM Aseem Bansal  wrote:

> I just meant that is there an existing tool which does that. Basically I
> tell it "Listen to all X streams and write them to S3/HDFS at Y path as
> JSON". I know spark streaming can be used and there is flume. But I am not
> sure about their scalability/reliability. That's why I thought to initiate
> a discussion here to see whether someone knows about that already.
>
> On Tue, Dec 6, 2016 at 5:14 PM, Sharninder  wrote:
>
> > What do you mean by streaming way? The logic to push to S3 will be in
> your
> > consumer, so it totally depends on how you want to read and store. I
> think
> > that's an easier way to do what you want to, instead of trying to backup
> > kafka and then read messages from there. Not even sure that's possible.
> >
> > On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal 
> wrote:
> >
> > > I get that we can read them and store them in batches but is there some
> > > streaming way?
> > >
> > > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal 
> > wrote:
> > >
> > > > Because we need to do exploratory data analysis and machine learning.
> > We
> > > > need to backup the messages somewhere so that the data scientists can
> > > > query/load them.
> > > >
> > > > So we need something like a router that just opens up a new consumer
> > > group
> > > > which just keeps on storing them to S3.
> > > >
> > > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera <
> sharnin...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Why not just have a parallel consumer read all messages from
> whichever
> > > >> topics you're interested in and store them wherever you want to? You
> > > don't
> > > >> need to "backup" Kafka messages.
> > > >>
> > > >> _
> > > >> From: Aseem Bansal 
> > > >> Sent: Tuesday, December 6, 2016 4:55 PM
> > > >> Subject: Storing Kafka Message JSON to deep storage like S3
> > > >> To:  
> > > >>
> > > >>
> > > >> Hi
> > > >>
> > > >> Has anyone done a storage of Kafka JSON messages to deep storage
> like
> > > S3.
> > > >> We are looking to back up all of our raw Kafka JSON messages for
> > > >> Exploration. S3, HDFS, MongoDB come to mind initially.
> > > >>
> > > >> I know that it can be stored in kafka itself but storing them in
> Kafka
> > > >> itself does not seem like a good option as we won't be able to query
> > it
> > > >> and
> > > >> the configurations of machines containing kafka will have to be
> > > increased
> > > >> as we go. Something like S3 we won't have to manage.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > --
> > Sharninder
> >
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
I just meant that is there an existing tool which does that. Basically I
tell it "Listen to all X streams and write them to S3/HDFS at Y path as
JSON". I know spark streaming can be used and there is flume. But I am not
sure about their scalability/reliability. That's why I thought to initiate
a discussion here to see whether someone knows about that already.

On Tue, Dec 6, 2016 at 5:14 PM, Sharninder  wrote:

> What do you mean by streaming way? The logic to push to S3 will be in your
> consumer, so it totally depends on how you want to read and store. I think
> that's an easier way to do what you want to, instead of trying to backup
> kafka and then read messages from there. Not even sure that's possible.
>
> On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal  wrote:
>
> > I get that we can read them and store them in batches but is there some
> > streaming way?
> >
> > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal 
> wrote:
> >
> > > Because we need to do exploratory data analysis and machine learning.
> We
> > > need to backup the messages somewhere so that the data scientists can
> > > query/load them.
> > >
> > > So we need something like a router that just opens up a new consumer
> > group
> > > which just keeps on storing them to S3.
> > >
> > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera  >
> > > wrote:
> > >
> > >> Why not just have a parallel consumer read all messages from whichever
> > >> topics you're interested in and store them wherever you want to? You
> > don't
> > >> need to "backup" Kafka messages.
> > >>
> > >> _
> > >> From: Aseem Bansal 
> > >> Sent: Tuesday, December 6, 2016 4:55 PM
> > >> Subject: Storing Kafka Message JSON to deep storage like S3
> > >> To:  
> > >>
> > >>
> > >> Hi
> > >>
> > >> Has anyone done a storage of Kafka JSON messages to deep storage like
> > S3.
> > >> We are looking to back up all of our raw Kafka JSON messages for
> > >> Exploration. S3, HDFS, MongoDB come to mind initially.
> > >>
> > >> I know that it can be stored in kafka itself but storing them in Kafka
> > >> itself does not seem like a good option as we won't be able to query
> it
> > >> and
> > >> the configurations of machines containing kafka will have to be
> > increased
> > >> as we go. Something like S3 we won't have to manage.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
>
>
>
> --
> --
> Sharninder
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sudev A C
HI Aseem,

You can run Apache Flume to consume messages from Kafka and write them to
s3/HDFS in (micro batches)streaming fashion.

Writes to s3/HDFS should be in micro batches, you can do it for every
message (not every sure if s3 supports append) but it won't be performant.

https://flume.apache.org/

Thanks
Sudev

On Tue, Dec 6, 2016 at 5:14 PM Sharninder  wrote:

> What do you mean by streaming way? The logic to push to S3 will be in your
> consumer, so it totally depends on how you want to read and store. I think
> that's an easier way to do what you want to, instead of trying to backup
> kafka and then read messages from there. Not even sure that's possible.
>
> On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal  wrote:
>
> > I get that we can read them and store them in batches but is there some
> > streaming way?
> >
> > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal 
> wrote:
> >
> > > Because we need to do exploratory data analysis and machine learning.
> We
> > > need to backup the messages somewhere so that the data scientists can
> > > query/load them.
> > >
> > > So we need something like a router that just opens up a new consumer
> > group
> > > which just keeps on storing them to S3.
> > >
> > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera  >
> > > wrote:
> > >
> > >> Why not just have a parallel consumer read all messages from whichever
> > >> topics you're interested in and store them wherever you want to? You
> > don't
> > >> need to "backup" Kafka messages.
> > >>
> > >> _
> > >> From: Aseem Bansal 
> > >> Sent: Tuesday, December 6, 2016 4:55 PM
> > >> Subject: Storing Kafka Message JSON to deep storage like S3
> > >> To:  
> > >>
> > >>
> > >> Hi
> > >>
> > >> Has anyone done a storage of Kafka JSON messages to deep storage like
> > S3.
> > >> We are looking to back up all of our raw Kafka JSON messages for
> > >> Exploration. S3, HDFS, MongoDB come to mind initially.
> > >>
> > >> I know that it can be stored in kafka itself but storing them in Kafka
> > >> itself does not seem like a good option as we won't be able to query
> it
> > >> and
> > >> the configurations of machines containing kafka will have to be
> > increased
> > >> as we go. Something like S3 we won't have to manage.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
>
>
>
> --
> --
> Sharninder
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sharninder
What do you mean by streaming way? The logic to push to S3 will be in your
consumer, so it totally depends on how you want to read and store. I think
that's an easier way to do what you want to, instead of trying to backup
kafka and then read messages from there. Not even sure that's possible.

On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal  wrote:

> I get that we can read them and store them in batches but is there some
> streaming way?
>
> On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal  wrote:
>
> > Because we need to do exploratory data analysis and machine learning. We
> > need to backup the messages somewhere so that the data scientists can
> > query/load them.
> >
> > So we need something like a router that just opens up a new consumer
> group
> > which just keeps on storing them to S3.
> >
> > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera 
> > wrote:
> >
> >> Why not just have a parallel consumer read all messages from whichever
> >> topics you're interested in and store them wherever you want to? You
> don't
> >> need to "backup" Kafka messages.
> >>
> >>         _____
> >> From: Aseem Bansal 
> >> Sent: Tuesday, December 6, 2016 4:55 PM
> >> Subject: Storing Kafka Message JSON to deep storage like S3
> >> To:  
> >>
> >>
> >> Hi
> >>
> >> Has anyone done a storage of Kafka JSON messages to deep storage like
> S3.
> >> We are looking to back up all of our raw Kafka JSON messages for
> >> Exploration. S3, HDFS, MongoDB come to mind initially.
> >>
> >> I know that it can be stored in kafka itself but storing them in Kafka
> >> itself does not seem like a good option as we won't be able to query it
> >> and
> >> the configurations of machines containing kafka will have to be
> increased
> >> as we go. Something like S3 we won't have to manage.
> >>
> >>
> >>
> >>
> >>
> >
> >
>



-- 
--
Sharninder


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
I get that we can read them and store them in batches but is there some
streaming way?

On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal  wrote:

> Because we need to do exploratory data analysis and machine learning. We
> need to backup the messages somewhere so that the data scientists can
> query/load them.
>
> So we need something like a router that just opens up a new consumer group
> which just keeps on storing them to S3.
>
> On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera 
> wrote:
>
>> Why not just have a parallel consumer read all messages from whichever
>> topics you're interested in and store them wherever you want to? You don't
>> need to "backup" Kafka messages.
>>
>> _
>> From: Aseem Bansal 
>> Sent: Tuesday, December 6, 2016 4:55 PM
>> Subject: Storing Kafka Message JSON to deep storage like S3
>> To:  
>>
>>
>> Hi
>>
>> Has anyone done a storage of Kafka JSON messages to deep storage like S3.
>> We are looking to back up all of our raw Kafka JSON messages for
>> Exploration. S3, HDFS, MongoDB come to mind initially.
>>
>> I know that it can be stored in kafka itself but storing them in Kafka
>> itself does not seem like a good option as we won't be able to query it
>> and
>> the configurations of machines containing kafka will have to be increased
>> as we go. Something like S3 we won't have to manage.
>>
>>
>>
>>
>>
>
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
Because we need to do exploratory data analysis and machine learning. We
need to backup the messages somewhere so that the data scientists can
query/load them.

So we need something like a router that just opens up a new consumer group
which just keeps on storing them to S3.

On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera 
wrote:

> Why not just have a parallel consumer read all messages from whichever
> topics you're interested in and store them wherever you want to? You don't
> need to "backup" Kafka messages.
>
> _
> From: Aseem Bansal 
> Sent: Tuesday, December 6, 2016 4:55 PM
> Subject: Storing Kafka Message JSON to deep storage like S3
> To:  
>
>
> Hi
>
> Has anyone done a storage of Kafka JSON messages to deep storage like S3.
> We are looking to back up all of our raw Kafka JSON messages for
> Exploration. S3, HDFS, MongoDB come to mind initially.
>
> I know that it can be stored in kafka itself but storing them in Kafka
> itself does not seem like a good option as we won't be able to query it and
> the configurations of machines containing kafka will have to be increased
> as we go. Something like S3 we won't have to manage.
>
>
>
>
>


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sharninder Khera
Why not just have a parallel consumer read all messages from whichever topics 
you're interested in and store them wherever you want to? You don't need to 
"backup" Kafka messages. 

_
From: Aseem Bansal 
Sent: Tuesday, December 6, 2016 4:55 PM
Subject: Storing Kafka Message JSON to deep storage like S3
To:  


Hi

Has anyone done a storage of Kafka JSON messages to deep storage like S3.
We are looking to back up all of our raw Kafka JSON messages for
Exploration. S3, HDFS, MongoDB come to mind initially.

I know that it can be stored in kafka itself but storing them in Kafka
itself does not seem like a good option as we won't be able to query it and
the configurations of machines containing kafka will have to be increased
as we go. Something like S3 we won't have to manage.





Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
Hi

Has anyone done a storage of Kafka JSON messages to deep storage like S3.
We are looking to back up all of our raw Kafka JSON messages for
Exploration. S3, HDFS, MongoDB come to mind initially.

I know that it can be stored in kafka itself but storing them in Kafka
itself does not seem like a good option as we won't be able to query it and
the configurations of machines containing kafka will have to be increased
as we go. Something like S3 we won't have to manage.