Re: Producer performance is awful when acks=all

Colin McCabe Fri, 27 Oct 2017 15:51:06 -0700

Hi Vijay,

It is surprising to see such a large drop.  The first thing that it
suggests to me is that perhaps there is a network bottleneck that is
being glossed over in the acks=1 case, but not in the acks=all case.  In
the acks=1 case, is the broker co-located on the same machine as the
producer, or is it on a separate machine?


What kind of network are you using?  Is it gigabit ethernet?  10gigE?

If you run all three brokers on a single computer rather than on
multiple computes, do the results change?  If so, it suggests that the
network is acting as the bottleneck, again.

Are you using the new producer and consumer, or old versions?

best,
Colin


On Fri, Oct 27, 2017, at 14:42, Vijay Prakash wrote:
> Hey all,
> 
> I'm currently running Kafka 0.10.1 on Windows and am doing some perf
> testing. I tried out the perf test setup described in this blog post:
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines.
> When I try "single producer thread, 3x asynchronous replication", I get
> about 550k records/sec which seems acceptable for the perf loss due to
> running on Windows. However, when I set acks=all to try synchronous
> replication, I drop to about 120k records/sec, which is a LOT worse than
> the numbers in the blog post. Are there some specific settings that might
> cause acks=all to cause more perf issues? My broker settings are below.
> Thanks!
> 
> Broker settings:
> advertised.host.name = null
>                 advertised.listeners = null
>                 advertised.port = null
>                 authorizer.class.name =
>                 auto.create.topics.enable = false
>                 auto.leader.rebalance.enable = true
>                 background.threads = 10
>                 broker.id = 3
>                 broker.id.generation.enable = true
>                 broker.rack = sc3
>                 compression.type = producer
>                 connections.max.idle.ms = 600000
>                 controlled.shutdown.enable = true
>                 controlled.shutdown.max.retries = 3
>                 controlled.shutdown.retry.backoff.ms = 5000
>                 controller.socket.timeout.ms = 30000
>                 default.replication.factor = 1
>                 delete.topic.enable = true
>                 fetch.purgatory.purge.interval.requests = 1000
>                 group.max.session.timeout.ms = 300000
>                 group.min.session.timeout.ms = 6000
>                 host.name =
>                 inter.broker.protocol.version = 0.10.1-IV2
>                 leader.imbalance.check.interval.seconds = 300
>                 leader.imbalance.per.broker.percentage = 10
>                 listeners = <redacted>
>                 log.cleaner.backoff.ms = 15000
>                 log.cleaner.dedupe.buffer.size = 134217728
>                 log.cleaner.delete.retention.ms = 86400000
>                 log.cleaner.enable = true
>                 log.cleaner.io.buffer.load.factor = 0.9
>                 log.cleaner.io.buffer.size = 524288
>                 log.cleaner.io.max.bytes.per.second =
>                 1.7976931348623157E308
>                 log.cleaner.min.cleanable.ratio = 0.5
>                 log.cleaner.min.compaction.lag.ms = 0
>                 log.cleaner.threads = 1
>                 log.cleanup.policy = [delete]
>                 log.dir =
>                 
> D:/data/kafka/kafka-logs,H:/data/kafka/kafka-logs,I:/data/kafka/kafka-logs,J:/data/kafka/kafka-logs
>                 log.dirs = null
>                 log.flush.interval.messages = 20000
>                 log.flush.interval.ms = 1000
>                 log.flush.offset.checkpoint.interval.ms = 60000
>                 log.flush.scheduler.interval.ms = 9223372036854775807
>                 log.index.interval.bytes = 4096
>                 log.index.size.max.bytes = 10485760
>                 log.message.format.version = 0.10.1-IV2
>                 log.message.timestamp.difference.max.ms =
>                 9223372036854775807
>                 log.message.timestamp.type = CreateTime
>                 log.preallocate = false
>                 log.retention.bytes = -1
>                 log.retention.check.interval.ms = 300000
>                 log.retention.hours = 24
>                 log.retention.minutes = null
>                 log.retention.ms = null
>                 log.roll.hours = 168
>                 log.roll.jitter.hours = 0
>                 log.roll.jitter.ms = null
>                 log.roll.ms = null
>                 log.segment.bytes = 536870912
>                 log.segment.delete.delay.ms = 60000
>                 max.connections.per.ip = 2147483647
>                 max.connections.per.ip.overrides =
>                 message.max.bytes = 1000012
>                 metric.reporters = []
>                 metrics.num.samples = 2
>                 metrics.sample.window.ms = 30000
>                 min.insync.replicas = 3
>                 num.io.threads = 20
>                 num.network.threads = 20
>                 num.partitions = 1
>                 num.recovery.threads.per.data.dir = 1
>                 num.replica.fetchers = 1
>                 offset.metadata.max.bytes = 4096
>                 offsets.commit.required.acks = -1
>                 offsets.commit.timeout.ms = 5000
>                 offsets.load.buffer.size = 5242880
>                 offsets.retention.check.interval.ms = 600000
>                 offsets.retention.minutes = 1440
>                 offsets.topic.compression.codec = 0
>                 offsets.topic.num.partitions = 50
>                 offsets.topic.replication.factor = 5
>                 offsets.topic.segment.bytes = 104857600
>                 port = 9092
>                 principal.builder.class = class
>                 org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
>                 producer.purgatory.purge.interval.requests = 1000
>                 queued.max.requests = 500
>                 quota.consumer.default = 9223372036854775807
>                 quota.producer.default = 9223372036854775807
>                 quota.window.num = 11
>                 quota.window.size.seconds = 1
>                 replica.fetch.backoff.ms = 1000
>                 replica.fetch.max.bytes = 1048576
>                 replica.fetch.min.bytes = 1
>                 replica.fetch.response.max.bytes = 10485760
>                 replica.fetch.wait.max.ms = 500
>                 replica.high.watermark.checkpoint.interval.ms = 5000
>                 replica.lag.time.max.ms = 10000
>                 replica.socket.receive.buffer.bytes = 65536
>                 replica.socket.timeout.ms = 30000
>                 replication.quota.window.num = 11
>                 replication.quota.window.size.seconds = 1
>                 request.timeout.ms = 30000
>                 reserved.broker.max.id = 1000
>                 sasl.enabled.mechanisms = [GSSAPI]
>                 sasl.kerberos.kinit.cmd = /usr/bin/kinit
>                 sasl.kerberos.min.time.before.relogin = 60000
>                 sasl.kerberos.principal.to.local.rules = [DEFAULT]
>                 sasl.kerberos.service.name = null
>                 sasl.kerberos.ticket.renew.jitter = 0.05
>                 sasl.kerberos.ticket.renew.window.factor = 0.8
>                 sasl.mechanism.inter.broker.protocol = GSSAPI
>                 security.inter.broker.protocol = PLAINTEXT
>                 socket.receive.buffer.bytes = 1048576
>                 socket.request.max.bytes = 104857600
>                 socket.send.buffer.bytes = 1048576
>                 ssl.cipher.suites = null
>                 ssl.client.auth = none
>                 ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>                 ssl.endpoint.identification.algorithm = null
>                 ssl.key.password = null
>                 ssl.keymanager.algorithm = SunX509
>                 ssl.keystore.location = null
>                 ssl.keystore.password = null
>                 ssl.keystore.type = JKS
>                 ssl.protocol = TLS
>                 ssl.provider = null
>                 ssl.secure.random.implementation = null
>                 ssl.trustmanager.algorithm = PKIX
>                 ssl.truststore.location = null
>                 ssl.truststore.password = null
>                 ssl.truststore.type = JKS
>                 unclean.leader.election.enable = false
>                 zookeeper.connect = <redacted>
>                 zookeeper.connection.timeout.ms = 1000000
>                 zookeeper.session.timeout.ms = 6000
>                 zookeeper.set.acl = false
>                 zookeeper.sync.time.ms = 2000

Re: Producer performance is awful when acks=all

Reply via email to