Only one broker,and eight partitions, async mode.
Increase the number of batch.num.messages is useless.
We split the whole file into 1K per block.
-邮件原件-
发件人: robairrob...@gmail.com [mailto:robairrob...@gmail.com] 代表 Robert Turner
发送时间: 2014年5月16日 13:45
收件人: users@kafka.apache.org
主题: Re: kafka performance question
A couple of thoughts spring to mind, are you sending the whole file as 1
message? and is your producer code using sync or async mode?
Cheers
Rob.
On 14 May 2014 15:49, Jun Rao jun...@gmail.com wrote:
How many brokers and partitions do you have? You may try increasing
batch.num.messages.
Thanks,
Jun
On Tue, May 13, 2014 at 5:56 PM, Zhujie (zhujie, Smartcare)
first.zhu...@huawei.com wrote:
Dear all,
We want to use kafka to collect and dispatch data file, but the
performance is maybe lower than we want.
In our cluster,there is a provider and a broker. We use a one thread
read file from local disk of provider and send it to broker. The
average throughput is only 3 MB/S~4MB/S.
But if we just use java NIO API to send file ,the throughput can
exceed 200MB/S.
Why the kafka performance is so bad in our test, are we missing
something??
Our server:
Cpu: Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz*4 Mem:300G Disk:600G
15K RPM SAS*8
Configuration of provider:
props.put(serializer.class, kafka.serializer.NullEncoder);
props.put(metadata.broker.list, 169.10.35.57:9092);
props.put(request.required.acks, 0); props.put(producer.type,
async);//异步
props.put(queue.buffering.max.ms,500);
props.put(queue.buffering.max.messages,10);
props.put(batch.num.messages, 1200);
props.put(queue.enqueue.timeout.ms, -1);
props.put(send.buffer.bytes, 10240);
Configuration of broker:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed
with # this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License,
Version 2.0 # (the License); you may not use this file except in
compliance with # the License. You may obtain a copy of the License
at #
#http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
software # distributed under the License is distributed on an AS
IS BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or
implied.
# See the License for the specific language governing permissions
and # limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
# Server Basics
#
# The id of the broker. This must be set to a unique integer for
each broker.
broker.id=0
# Socket Server Settings
#
# The port the socket server listens on
port=9092
# Hostname the broker will bind to. If not set, the server will bind
to all interfaces #host.name=localhost
# Hostname the broker will advertise to producers and consumers. If
not set, it uses the # value for host.name if configured.
Otherwise, it will use the value returned from #
java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=hostname routable by clients
# The port to publish to ZooKeeper for clients to use. If this is
not
set,
# it will publish the same port that the broker binds to.
#advertised.port=port accessible by clients
# The number of threads handling network requests
#num.network.threads=2
# The number of threads doing disk I/O
#num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
#socket.send.buffer.bytes=1048576
# The receive buffer (SO_RCVBUF) used by the socket server
#socket.receive.buffer.bytes=1048576
# The maximum size of a request that the socket server will accept
(protection against OOM)
#socket.request.max.bytes=104857600
# Log Basics
#
# A comma seperated list of directories under which to store log
files log.dirs=/data/kafka-logs
# The default number of log partitions per topic. More partitions
allow greater # parallelism for consumption, but this will also
result in more files across # the brokers.
#num.partitions=2
# Log Flush Policy
#
# Messages are immediately written to the filesystem but by default
we only fsync() to sync # the OS cache lazily. The following
configurations control the flush of data to disk.
# There are a few important trade-offs here:
#1. Durability: Unflushed data may be lost if you are not using
replication.
#2. Latency: Very large flush intervals may lead to latency spikes
when the flush does occur as there will be a lot