Re: Data loss detection

2014-06-04 Thread Jun Rao
It should be sth like clientId-MessagesPerSec.

Thanks,

Jun


On Wed, Jun 4, 2014 at 9:35 AM, Maung Than  wrote:

>
> We could not find producer msg rate from the matrices in the JConsole —
> give us some pointers.
>
> Also confirming that the reduction in data is due to Avro encoding and we
> are calculating what we send to producer rather than the output of
> serializer encoder.
>
> Thanks,
> Maung
>
> On Jun 3, 2014, at 10:47 PM, Maung Than  wrote:
>
> > Thanks, Jun.
> >
> > Will check and get back..
> >
> > We are converting JSON to Avro and that conversion is done by the custom
> serializer.
> >
> > Our volume calculation on the producer side is based on the AVRO generic
> record that is passed to the producer send method— not of the encoded data
> output from the serializer that is what actual got send the Broker I
> believe.
> >
> > That could be the gap and I am testing now without the customer
> serializer and seeing the two volumes are very close.  That could be it!!
> >
> > Thanks,
> > Maung
> >
> > On Jun 3, 2014, at 7:22 PM, Jun Rao  wrote:
> >
> >> We have a metric on msg rate in both the producer and the broker. Could
> you
> >> see if they match?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >>
> >> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> We are seeing less data on the brokers than we send form the producers:
> >>> 84 GB to 58 GB.
> >>>
> >>> What is the best way to ensure / detect if all data has been send
> properly
> >>> to the brokers from the producers.
> >>>
> >>> Is there any logs that we can check on the producers?
> >>>
> >>> Configuration is 5 Brokers, 2 producers, no replication factor, async
> and
> >>> ask is 1 and no compression.
> >>>
> >>> Thanks,
> >>> Maung
> >>>
> >
>
>


Re: Data loss detection

2014-06-04 Thread Maung Than

We could not find producer msg rate from the matrices in the JConsole — give us 
some pointers. 

Also confirming that the reduction in data is due to Avro encoding and we are 
calculating what we send to producer rather than the output of serializer 
encoder. 

Thanks,
Maung

On Jun 3, 2014, at 10:47 PM, Maung Than  wrote:

> Thanks, Jun. 
> 
> Will check and get back..
> 
> We are converting JSON to Avro and that conversion is done by the custom 
> serializer. 
> 
> Our volume calculation on the producer side is based on the AVRO generic 
> record that is passed to the producer send method— not of the encoded data 
> output from the serializer that is what actual got send the Broker I believe. 
> 
> That could be the gap and I am testing now without the customer serializer 
> and seeing the two volumes are very close.  That could be it!!
> 
> Thanks,
> Maung
> 
> On Jun 3, 2014, at 7:22 PM, Jun Rao  wrote:
> 
>> We have a metric on msg rate in both the producer and the broker. Could you
>> see if they match?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:
>> 
>>> Hi,
>>> 
>>> We are seeing less data on the brokers than we send form the producers:
>>> 84 GB to 58 GB.
>>> 
>>> What is the best way to ensure / detect if all data has been send properly
>>> to the brokers from the producers.
>>> 
>>> Is there any logs that we can check on the producers?
>>> 
>>> Configuration is 5 Brokers, 2 producers, no replication factor, async and
>>> ask is 1 and no compression.
>>> 
>>> Thanks,
>>> Maung
>>> 
> 



Re: Data loss detection

2014-06-03 Thread Maung Than
Thanks, Jun. 

Will check and get back..

We are converting JSON to Avro and that conversion is done by the custom 
serializer. 

Our volume calculation on the producer side is based on the AVRO generic record 
that is passed to the producer send method— not of the encoded data output from 
the serializer that is what actual got send the Broker I believe. 

That could be the gap and I am testing now without the customer serializer and 
seeing the two volumes are very close.  That could be it!!

Thanks,
Maung

On Jun 3, 2014, at 7:22 PM, Jun Rao  wrote:

> We have a metric on msg rate in both the producer and the broker. Could you
> see if they match?
> 
> Thanks,
> 
> Jun
> 
> 
> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:
> 
>> Hi,
>> 
>> We are seeing less data on the brokers than we send form the producers:
>> 84 GB to 58 GB.
>> 
>> What is the best way to ensure / detect if all data has been send properly
>> to the brokers from the producers.
>> 
>> Is there any logs that we can check on the producers?
>> 
>> Configuration is 5 Brokers, 2 producers, no replication factor, async and
>> ask is 1 and no compression.
>> 
>> Thanks,
>> Maung
>> 



Re: Data loss detection

2014-06-03 Thread Maung Than
Yes. We did..some output of it..

2014-06-03 21:46:09 INFO  Producer:68 - Shutting down producer
2014-06-03 21:46:09 INFO  ProducerSendThread:68 - Begin shutting down 
ProducerSendThread
2014-06-03 21:46:09 INFO  ProducerSendThread:68 - Shutdown ProducerSendThread 
complete
2014-06-03 21:46:09 INFO  ProducerPool:68 - Closing all sync producers


On Jun 3, 2014, at 9:58 PM, Timothy Chen  wrote:

> By the way if you're using async producer how do you verify that you
> sent all the data from the producer?
> 
> Do you shutdown the producer before you check?
> 
> Tim
> 
> On Tue, Jun 3, 2014 at 3:27 PM, Maung Than  wrote:
>> Thanks, Tim.
>> 
>> We are just trying to benchmark the kafka producers and there is no issue 
>> with cluster or brokers being down in this case.
>> 
>> We are seeing way less data on the borers after calculating the sizes of the 
>> logs on the brokers) and there is no compression.
>> 
>> We send 84 GB, but total logs sizes are only 58 GB on the brokers.
>> 
>> Since replication factor is zero, can we use ack other than 1?
>> 
>> Maung
>> 
>> On Jun 3, 2014, at 3:00 PM, Timothy Chen  wrote:
>> 
>>> Hi Maung,
>>> 
>>> If your required.acks is 1 then the producer only ensures that one
>>> broker receives the data before it's sucessfully returned to the
>>> client.
>>> 
>>> Therefore if the broker crashes and lost all the data then you lose
>>> data, or similarly it can happen even before the data is fsynced.
>>> 
>>> To ensure there are more copies of your data in case of failure
>>> scenarios you want to increase your required.acks to more than 1 to
>>> tolerate failuries.
>>> 
>>> Also async producer doesn't wait until the data is sent before it
>>> returns, as it buffers and writes asynchronously. To ensure each write
>>> that has a succesful response is written you want to use the sync
>>> producer.
>>> 
>>> Tim
>>> 
>>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:
 Hi,
 
 We are seeing less data on the brokers than we send form the producers:  
 84 GB to 58 GB.
 
 What is the best way to ensure / detect if all data has been send properly 
 to the brokers from the producers.
 
 Is there any logs that we can check on the producers?
 
 Configuration is 5 Brokers, 2 producers, no replication factor, async and 
 ask is 1 and no compression.
 
 Thanks,
 Maung
>> 



Re: Data loss detection

2014-06-03 Thread Timothy Chen
By the way if you're using async producer how do you verify that you
sent all the data from the producer?

Do you shutdown the producer before you check?

Tim

On Tue, Jun 3, 2014 at 3:27 PM, Maung Than  wrote:
> Thanks, Tim.
>
> We are just trying to benchmark the kafka producers and there is no issue 
> with cluster or brokers being down in this case.
>
> We are seeing way less data on the borers after calculating the sizes of the 
> logs on the brokers) and there is no compression.
>
> We send 84 GB, but total logs sizes are only 58 GB on the brokers.
>
> Since replication factor is zero, can we use ack other than 1?
>
> Maung
>
> On Jun 3, 2014, at 3:00 PM, Timothy Chen  wrote:
>
>> Hi Maung,
>>
>> If your required.acks is 1 then the producer only ensures that one
>> broker receives the data before it's sucessfully returned to the
>> client.
>>
>> Therefore if the broker crashes and lost all the data then you lose
>> data, or similarly it can happen even before the data is fsynced.
>>
>> To ensure there are more copies of your data in case of failure
>> scenarios you want to increase your required.acks to more than 1 to
>> tolerate failuries.
>>
>> Also async producer doesn't wait until the data is sent before it
>> returns, as it buffers and writes asynchronously. To ensure each write
>> that has a succesful response is written you want to use the sync
>> producer.
>>
>> Tim
>>
>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:
>>> Hi,
>>>
>>> We are seeing less data on the brokers than we send form the producers:  84 
>>> GB to 58 GB.
>>>
>>> What is the best way to ensure / detect if all data has been send properly 
>>> to the brokers from the producers.
>>>
>>> Is there any logs that we can check on the producers?
>>>
>>> Configuration is 5 Brokers, 2 producers, no replication factor, async and 
>>> ask is 1 and no compression.
>>>
>>> Thanks,
>>> Maung
>


Re: Data loss detection

2014-06-03 Thread Jun Rao
We have a metric on msg rate in both the producer and the broker. Could you
see if they match?

Thanks,

Jun


On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:

> Hi,
>
> We are seeing less data on the brokers than we send form the producers:
>  84 GB to 58 GB.
>
> What is the best way to ensure / detect if all data has been send properly
> to the brokers from the producers.
>
> Is there any logs that we can check on the producers?
>
> Configuration is 5 Brokers, 2 producers, no replication factor, async and
> ask is 1 and no compression.
>
> Thanks,
> Maung
>


Re: Data loss detection

2014-06-03 Thread Maung Than
Thanks, Tim. 

We are just trying to benchmark the kafka producers and there is no issue with 
cluster or brokers being down in this case. 

We are seeing way less data on the borers after calculating the sizes of the 
logs on the brokers) and there is no compression. 

We send 84 GB, but total logs sizes are only 58 GB on the brokers. 

Since replication factor is zero, can we use ack other than 1?  

Maung 

On Jun 3, 2014, at 3:00 PM, Timothy Chen  wrote:

> Hi Maung,
> 
> If your required.acks is 1 then the producer only ensures that one
> broker receives the data before it's sucessfully returned to the
> client.
> 
> Therefore if the broker crashes and lost all the data then you lose
> data, or similarly it can happen even before the data is fsynced.
> 
> To ensure there are more copies of your data in case of failure
> scenarios you want to increase your required.acks to more than 1 to
> tolerate failuries.
> 
> Also async producer doesn't wait until the data is sent before it
> returns, as it buffers and writes asynchronously. To ensure each write
> that has a succesful response is written you want to use the sync
> producer.
> 
> Tim
> 
> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:
>> Hi,
>> 
>> We are seeing less data on the brokers than we send form the producers:  84 
>> GB to 58 GB.
>> 
>> What is the best way to ensure / detect if all data has been send properly 
>> to the brokers from the producers.
>> 
>> Is there any logs that we can check on the producers?
>> 
>> Configuration is 5 Brokers, 2 producers, no replication factor, async and 
>> ask is 1 and no compression.
>> 
>> Thanks,
>> Maung



Re: Data loss detection

2014-06-03 Thread Timothy Chen
Hi Maung,

If your required.acks is 1 then the producer only ensures that one
broker receives the data before it's sucessfully returned to the
client.

Therefore if the broker crashes and lost all the data then you lose
data, or similarly it can happen even before the data is fsynced.

To ensure there are more copies of your data in case of failure
scenarios you want to increase your required.acks to more than 1 to
tolerate failuries.

Also async producer doesn't wait until the data is sent before it
returns, as it buffers and writes asynchronously. To ensure each write
that has a succesful response is written you want to use the sync
producer.

Tim

On Tue, Jun 3, 2014 at 2:13 PM, Maung Than  wrote:
> Hi,
>
> We are seeing less data on the brokers than we send form the producers:  84 
> GB to 58 GB.
>
> What is the best way to ensure / detect if all data has been send properly to 
> the brokers from the producers.
>
> Is there any logs that we can check on the producers?
>
> Configuration is 5 Brokers, 2 producers, no replication factor, async and ask 
> is 1 and no compression.
>
> Thanks,
> Maung


Data loss detection

2014-06-03 Thread Maung Than
Hi, 

We are seeing less data on the brokers than we send form the producers:  84 GB 
to 58 GB. 

What is the best way to ensure / detect if all data has been send properly to 
the brokers from the producers. 

Is there any logs that we can check on the producers? 

Configuration is 5 Brokers, 2 producers, no replication factor, async and ask 
is 1 and no compression. 

Thanks,
Maung