Re: Data loss detection
It should be sth like clientId-MessagesPerSec. Thanks, Jun On Wed, Jun 4, 2014 at 9:35 AM, Maung Than wrote: > > We could not find producer msg rate from the matrices in the JConsole — > give us some pointers. > > Also confirming that the reduction in data is due to Avro encoding and we > are calculating what we send to producer rather than the output of > serializer encoder. > > Thanks, > Maung > > On Jun 3, 2014, at 10:47 PM, Maung Than wrote: > > > Thanks, Jun. > > > > Will check and get back.. > > > > We are converting JSON to Avro and that conversion is done by the custom > serializer. > > > > Our volume calculation on the producer side is based on the AVRO generic > record that is passed to the producer send method— not of the encoded data > output from the serializer that is what actual got send the Broker I > believe. > > > > That could be the gap and I am testing now without the customer > serializer and seeing the two volumes are very close. That could be it!! > > > > Thanks, > > Maung > > > > On Jun 3, 2014, at 7:22 PM, Jun Rao wrote: > > > >> We have a metric on msg rate in both the producer and the broker. Could > you > >> see if they match? > >> > >> Thanks, > >> > >> Jun > >> > >> > >> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than > wrote: > >> > >>> Hi, > >>> > >>> We are seeing less data on the brokers than we send form the producers: > >>> 84 GB to 58 GB. > >>> > >>> What is the best way to ensure / detect if all data has been send > properly > >>> to the brokers from the producers. > >>> > >>> Is there any logs that we can check on the producers? > >>> > >>> Configuration is 5 Brokers, 2 producers, no replication factor, async > and > >>> ask is 1 and no compression. > >>> > >>> Thanks, > >>> Maung > >>> > > > >
Re: Data loss detection
We could not find producer msg rate from the matrices in the JConsole — give us some pointers. Also confirming that the reduction in data is due to Avro encoding and we are calculating what we send to producer rather than the output of serializer encoder. Thanks, Maung On Jun 3, 2014, at 10:47 PM, Maung Than wrote: > Thanks, Jun. > > Will check and get back.. > > We are converting JSON to Avro and that conversion is done by the custom > serializer. > > Our volume calculation on the producer side is based on the AVRO generic > record that is passed to the producer send method— not of the encoded data > output from the serializer that is what actual got send the Broker I believe. > > That could be the gap and I am testing now without the customer serializer > and seeing the two volumes are very close. That could be it!! > > Thanks, > Maung > > On Jun 3, 2014, at 7:22 PM, Jun Rao wrote: > >> We have a metric on msg rate in both the producer and the broker. Could you >> see if they match? >> >> Thanks, >> >> Jun >> >> >> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: >> >>> Hi, >>> >>> We are seeing less data on the brokers than we send form the producers: >>> 84 GB to 58 GB. >>> >>> What is the best way to ensure / detect if all data has been send properly >>> to the brokers from the producers. >>> >>> Is there any logs that we can check on the producers? >>> >>> Configuration is 5 Brokers, 2 producers, no replication factor, async and >>> ask is 1 and no compression. >>> >>> Thanks, >>> Maung >>> >
Re: Data loss detection
Thanks, Jun. Will check and get back.. We are converting JSON to Avro and that conversion is done by the custom serializer. Our volume calculation on the producer side is based on the AVRO generic record that is passed to the producer send method— not of the encoded data output from the serializer that is what actual got send the Broker I believe. That could be the gap and I am testing now without the customer serializer and seeing the two volumes are very close. That could be it!! Thanks, Maung On Jun 3, 2014, at 7:22 PM, Jun Rao wrote: > We have a metric on msg rate in both the producer and the broker. Could you > see if they match? > > Thanks, > > Jun > > > On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: > >> Hi, >> >> We are seeing less data on the brokers than we send form the producers: >> 84 GB to 58 GB. >> >> What is the best way to ensure / detect if all data has been send properly >> to the brokers from the producers. >> >> Is there any logs that we can check on the producers? >> >> Configuration is 5 Brokers, 2 producers, no replication factor, async and >> ask is 1 and no compression. >> >> Thanks, >> Maung >>
Re: Data loss detection
Yes. We did..some output of it.. 2014-06-03 21:46:09 INFO Producer:68 - Shutting down producer 2014-06-03 21:46:09 INFO ProducerSendThread:68 - Begin shutting down ProducerSendThread 2014-06-03 21:46:09 INFO ProducerSendThread:68 - Shutdown ProducerSendThread complete 2014-06-03 21:46:09 INFO ProducerPool:68 - Closing all sync producers On Jun 3, 2014, at 9:58 PM, Timothy Chen wrote: > By the way if you're using async producer how do you verify that you > sent all the data from the producer? > > Do you shutdown the producer before you check? > > Tim > > On Tue, Jun 3, 2014 at 3:27 PM, Maung Than wrote: >> Thanks, Tim. >> >> We are just trying to benchmark the kafka producers and there is no issue >> with cluster or brokers being down in this case. >> >> We are seeing way less data on the borers after calculating the sizes of the >> logs on the brokers) and there is no compression. >> >> We send 84 GB, but total logs sizes are only 58 GB on the brokers. >> >> Since replication factor is zero, can we use ack other than 1? >> >> Maung >> >> On Jun 3, 2014, at 3:00 PM, Timothy Chen wrote: >> >>> Hi Maung, >>> >>> If your required.acks is 1 then the producer only ensures that one >>> broker receives the data before it's sucessfully returned to the >>> client. >>> >>> Therefore if the broker crashes and lost all the data then you lose >>> data, or similarly it can happen even before the data is fsynced. >>> >>> To ensure there are more copies of your data in case of failure >>> scenarios you want to increase your required.acks to more than 1 to >>> tolerate failuries. >>> >>> Also async producer doesn't wait until the data is sent before it >>> returns, as it buffers and writes asynchronously. To ensure each write >>> that has a succesful response is written you want to use the sync >>> producer. >>> >>> Tim >>> >>> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: Hi, We are seeing less data on the brokers than we send form the producers: 84 GB to 58 GB. What is the best way to ensure / detect if all data has been send properly to the brokers from the producers. Is there any logs that we can check on the producers? Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression. Thanks, Maung >>
Re: Data loss detection
By the way if you're using async producer how do you verify that you sent all the data from the producer? Do you shutdown the producer before you check? Tim On Tue, Jun 3, 2014 at 3:27 PM, Maung Than wrote: > Thanks, Tim. > > We are just trying to benchmark the kafka producers and there is no issue > with cluster or brokers being down in this case. > > We are seeing way less data on the borers after calculating the sizes of the > logs on the brokers) and there is no compression. > > We send 84 GB, but total logs sizes are only 58 GB on the brokers. > > Since replication factor is zero, can we use ack other than 1? > > Maung > > On Jun 3, 2014, at 3:00 PM, Timothy Chen wrote: > >> Hi Maung, >> >> If your required.acks is 1 then the producer only ensures that one >> broker receives the data before it's sucessfully returned to the >> client. >> >> Therefore if the broker crashes and lost all the data then you lose >> data, or similarly it can happen even before the data is fsynced. >> >> To ensure there are more copies of your data in case of failure >> scenarios you want to increase your required.acks to more than 1 to >> tolerate failuries. >> >> Also async producer doesn't wait until the data is sent before it >> returns, as it buffers and writes asynchronously. To ensure each write >> that has a succesful response is written you want to use the sync >> producer. >> >> Tim >> >> On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: >>> Hi, >>> >>> We are seeing less data on the brokers than we send form the producers: 84 >>> GB to 58 GB. >>> >>> What is the best way to ensure / detect if all data has been send properly >>> to the brokers from the producers. >>> >>> Is there any logs that we can check on the producers? >>> >>> Configuration is 5 Brokers, 2 producers, no replication factor, async and >>> ask is 1 and no compression. >>> >>> Thanks, >>> Maung >
Re: Data loss detection
We have a metric on msg rate in both the producer and the broker. Could you see if they match? Thanks, Jun On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: > Hi, > > We are seeing less data on the brokers than we send form the producers: > 84 GB to 58 GB. > > What is the best way to ensure / detect if all data has been send properly > to the brokers from the producers. > > Is there any logs that we can check on the producers? > > Configuration is 5 Brokers, 2 producers, no replication factor, async and > ask is 1 and no compression. > > Thanks, > Maung >
Re: Data loss detection
Thanks, Tim. We are just trying to benchmark the kafka producers and there is no issue with cluster or brokers being down in this case. We are seeing way less data on the borers after calculating the sizes of the logs on the brokers) and there is no compression. We send 84 GB, but total logs sizes are only 58 GB on the brokers. Since replication factor is zero, can we use ack other than 1? Maung On Jun 3, 2014, at 3:00 PM, Timothy Chen wrote: > Hi Maung, > > If your required.acks is 1 then the producer only ensures that one > broker receives the data before it's sucessfully returned to the > client. > > Therefore if the broker crashes and lost all the data then you lose > data, or similarly it can happen even before the data is fsynced. > > To ensure there are more copies of your data in case of failure > scenarios you want to increase your required.acks to more than 1 to > tolerate failuries. > > Also async producer doesn't wait until the data is sent before it > returns, as it buffers and writes asynchronously. To ensure each write > that has a succesful response is written you want to use the sync > producer. > > Tim > > On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: >> Hi, >> >> We are seeing less data on the brokers than we send form the producers: 84 >> GB to 58 GB. >> >> What is the best way to ensure / detect if all data has been send properly >> to the brokers from the producers. >> >> Is there any logs that we can check on the producers? >> >> Configuration is 5 Brokers, 2 producers, no replication factor, async and >> ask is 1 and no compression. >> >> Thanks, >> Maung
Re: Data loss detection
Hi Maung, If your required.acks is 1 then the producer only ensures that one broker receives the data before it's sucessfully returned to the client. Therefore if the broker crashes and lost all the data then you lose data, or similarly it can happen even before the data is fsynced. To ensure there are more copies of your data in case of failure scenarios you want to increase your required.acks to more than 1 to tolerate failuries. Also async producer doesn't wait until the data is sent before it returns, as it buffers and writes asynchronously. To ensure each write that has a succesful response is written you want to use the sync producer. Tim On Tue, Jun 3, 2014 at 2:13 PM, Maung Than wrote: > Hi, > > We are seeing less data on the brokers than we send form the producers: 84 > GB to 58 GB. > > What is the best way to ensure / detect if all data has been send properly to > the brokers from the producers. > > Is there any logs that we can check on the producers? > > Configuration is 5 Brokers, 2 producers, no replication factor, async and ask > is 1 and no compression. > > Thanks, > Maung
Data loss detection
Hi, We are seeing less data on the brokers than we send form the producers: 84 GB to 58 GB. What is the best way to ensure / detect if all data has been send properly to the brokers from the producers. Is there any logs that we can check on the producers? Configuration is 5 Brokers, 2 producers, no replication factor, async and ask is 1 and no compression. Thanks, Maung