Re: Data replication and zero data loss

2015-05-02 Thread xiao li
Hi, Joong,

Please check the following two links:

-
https://cwiki.apache.org/confluence/display/KAFKA/KIP-3+-+Mirror+Maker+Enhancement

-
https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API

They might help you understand the problem.

Cheers,

Xiao Li

2015-05-01 6:28 GMT-07:00 Joe Stein :

> If you want 0 data loss you should also look into the min.insync.repica
> setting in 0.8.2.1 as it guarantees data in multiple racks.
>
> If you don't have that set then you have this scenario as possible.
>
> lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1
>
> b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas).
>
> b1,b2 dies, b3 is leader. so far all is well.
>
> 10 minutes go by and b3 dies
>
> 1 minute later b1 comes back online, it will truncate essentially 45
> minutes of data upstream thought was saved.
>
> but now, you can have ACK=-1 get a failure if you don't have a enough
> replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on
> data
>
> Also take a look at
> https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it
> might be helpful for what you are looking for.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Fri, May 1, 2015 at 7:43 AM, Joong Lee  wrote:
>
> > It is based on our understanding from reading the documents.
> >
> > We aren't concerned of data duplication as that is going to be handled by
> > elasticsearch.
> >
> > > On May 1, 2015, at 12:15 AM, Daniel Compton <
> > daniel.compton.li...@gmail.com> wrote:
> > >
> > > When we evaluated MirrorMaker last year we didn't find any risk of data
> > > loss, only duplicate messages in the case of a network partition.
> > >
> > > Did you discover data loss in your tests, or were you just looking at
> the
> > > docs?
> > > On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin 
> > > wrote:
> > >
> > >> Which mirror maker version did you look at? The MirrorMaker in trunk
> > >> should not have data loss if you just use the default setting.
> > >>
> > >>> On 4/30/15, 7:53 PM, "Joong Lee"  wrote:
> > >>>
> > >>> Hi,
> > >>> We are exploring Kafka to keep two data centers (primary and DR)
> > running
> > >>> hosts of elastic search nodes in sync. One key requirement is that we
> > >>> can't lose any data. We POC'd use of MirrorMaker and felt it may not
> > meet
> > >>> out data loss requirement.
> > >>>
> > >>> I would like ask the community if we should look for another solution
> > or
> > >>> would Kafka be the right solution considering zero data loss
> > requirement.
> > >>>
> > >>> Thanks
> > >>
> > >>
> >
>


Re: Data replication and zero data loss

2015-05-01 Thread Joe Stein
If you want 0 data loss you should also look into the min.insync.repica
setting in 0.8.2.1 as it guarantees data in multiple racks.

If you don't have that set then you have this scenario as possible.

lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1

b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas).

b1,b2 dies, b3 is leader. so far all is well.

10 minutes go by and b3 dies

1 minute later b1 comes back online, it will truncate essentially 45
minutes of data upstream thought was saved.

but now, you can have ACK=-1 get a failure if you don't have a enough
replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on
data

Also take a look at
https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it
might be helpful for what you are looking for.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 1, 2015 at 7:43 AM, Joong Lee  wrote:

> It is based on our understanding from reading the documents.
>
> We aren't concerned of data duplication as that is going to be handled by
> elasticsearch.
>
> > On May 1, 2015, at 12:15 AM, Daniel Compton <
> daniel.compton.li...@gmail.com> wrote:
> >
> > When we evaluated MirrorMaker last year we didn't find any risk of data
> > loss, only duplicate messages in the case of a network partition.
> >
> > Did you discover data loss in your tests, or were you just looking at the
> > docs?
> > On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin 
> > wrote:
> >
> >> Which mirror maker version did you look at? The MirrorMaker in trunk
> >> should not have data loss if you just use the default setting.
> >>
> >>> On 4/30/15, 7:53 PM, "Joong Lee"  wrote:
> >>>
> >>> Hi,
> >>> We are exploring Kafka to keep two data centers (primary and DR)
> running
> >>> hosts of elastic search nodes in sync. One key requirement is that we
> >>> can't lose any data. We POC'd use of MirrorMaker and felt it may not
> meet
> >>> out data loss requirement.
> >>>
> >>> I would like ask the community if we should look for another solution
> or
> >>> would Kafka be the right solution considering zero data loss
> requirement.
> >>>
> >>> Thanks
> >>
> >>
>


Re: Data replication and zero data loss

2015-05-01 Thread Joong Lee
It is based on our understanding from reading the documents. 

We aren't concerned of data duplication as that is going to be handled by 
elasticsearch. 

> On May 1, 2015, at 12:15 AM, Daniel Compton  
> wrote:
> 
> When we evaluated MirrorMaker last year we didn't find any risk of data
> loss, only duplicate messages in the case of a network partition.
> 
> Did you discover data loss in your tests, or were you just looking at the
> docs?
> On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin 
> wrote:
> 
>> Which mirror maker version did you look at? The MirrorMaker in trunk
>> should not have data loss if you just use the default setting.
>> 
>>> On 4/30/15, 7:53 PM, "Joong Lee"  wrote:
>>> 
>>> Hi,
>>> We are exploring Kafka to keep two data centers (primary and DR) running
>>> hosts of elastic search nodes in sync. One key requirement is that we
>>> can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
>>> out data loss requirement.
>>> 
>>> I would like ask the community if we should look for another solution or
>>> would Kafka be the right solution considering zero data loss requirement.
>>> 
>>> Thanks
>> 
>> 


Re: Data replication and zero data loss

2015-05-01 Thread Joong Lee
0.8.2.1

> On Apr 30, 2015, at 11:28 PM, Jiangjie Qin  wrote:
> 
> Which mirror maker version did you look at? The MirrorMaker in trunk
> should not have data loss if you just use the default setting.
> 
>> On 4/30/15, 7:53 PM, "Joong Lee"  wrote:
>> 
>> Hi,
>> We are exploring Kafka to keep two data centers (primary and DR) running
>> hosts of elastic search nodes in sync. One key requirement is that we
>> can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
>> out data loss requirement.
>> 
>> I would like ask the community if we should look for another solution or
>> would Kafka be the right solution considering zero data loss requirement.
>> 
>> Thanks
> 


Re: Data replication and zero data loss

2015-04-30 Thread Daniel Compton
When we evaluated MirrorMaker last year we didn't find any risk of data
loss, only duplicate messages in the case of a network partition.

Did you discover data loss in your tests, or were you just looking at the
docs?
On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin 
wrote:

> Which mirror maker version did you look at? The MirrorMaker in trunk
> should not have data loss if you just use the default setting.
>
> On 4/30/15, 7:53 PM, "Joong Lee"  wrote:
>
> >Hi,
> >We are exploring Kafka to keep two data centers (primary and DR) running
> >hosts of elastic search nodes in sync. One key requirement is that we
> >can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
> >out data loss requirement.
> >
> >I would like ask the community if we should look for another solution or
> >would Kafka be the right solution considering zero data loss requirement.
> >
> >Thanks
>
>


Re: Data replication and zero data loss

2015-04-30 Thread Jiangjie Qin
Which mirror maker version did you look at? The MirrorMaker in trunk
should not have data loss if you just use the default setting.

On 4/30/15, 7:53 PM, "Joong Lee"  wrote:

>Hi,
>We are exploring Kafka to keep two data centers (primary and DR) running
>hosts of elastic search nodes in sync. One key requirement is that we
>can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
>out data loss requirement.
>
>I would like ask the community if we should look for another solution or
>would Kafka be the right solution considering zero data loss requirement.
>
>Thanks



Data replication and zero data loss

2015-04-30 Thread Joong Lee
Hi,
We are exploring Kafka to keep two data centers (primary and DR) running hosts 
of elastic search nodes in sync. One key requirement is that we can't lose any 
data. We POC'd use of MirrorMaker and felt it may not meet out data loss 
requirement. 

I would like ask the community if we should look for another solution or would 
Kafka be the right solution considering zero data loss requirement. 

Thanks