Todd,
Thank you very much for your reply. My understanding of RAID 10 is wrong.
I understand that one can not get absolute sequential disk access even on
one single disk, the reason I'm interested with this question is that the
design document of Kafka emphasize that Kafka make advantage of the
sequential disk acceess to improve the disk performance, and I can' t
understand how to achive this with thounds of open files.
I thought that compare to one or fewer files, thounds of open files will
make the disk access much more random, and make the disk performance much
more weak.
You mentioned that to increase overall IO cpapcity, one will have to use
multiple spindles with sufficiently fast disk speed, but will it be more
effective for the disk with fewer files? Or does the num of files is not an
important factor for the entire performance of Kafka?
Thanks again.
xiaobinshe
2014-10-23 22:01 GMT+08:00 Todd Palino tpal...@gmail.com:
Your understanding of RAID 10 is slightly off. Because it is a combination
of striping and mirroring, trying to say that there are 4000 open files per
pair of disks is not accurate. The disk, as far as the system is concerned,
is the entire RAID. Files are striped across all mirrors, so any open file
will cross all 7 mirror sets.
Even if you were to operate on a single disk, you're never going to be able
to ensure sequential disk access with Kafka. Even if you have a single
partition on a disk, there will be multiple log files for that partition
and you will have to seek to read older data. What you have to do is use
multiple spindles, with sufficiently fast disk speeds, to increase your
overall IO capacity. You can also tune to get a little more. For example,
we use a 120 second commit on that mount point to reduce the frequency of
flushing to disk.
-Todd
On Wed, Oct 22, 2014 at 10:09 PM, Xiaobin She xiaobin...@gmail.com
wrote:
Todd,
Thank you for the information.
With 28,000+ files and 14 disks, that makes there are averagely about
4000
open files on two disk ( which is treated as one single disk) , am I
right?
How do you manage to make the all the write operation to thest 4000 open
files be sequential to the disk?
As far as I know, write operation to different files on the same disk
will
cause random write, which is not good for performance.
xiaobinshe
2014-10-23 1:00 GMT+08:00 Todd Palino tpal...@gmail.com:
In fact there are many more than 4000 open files. Many of our brokers
run
with 28,000+ open files (regular file handles, not network
connections).
In
our case, we're beefing up the disk performance as much as we can by
running in a RAID-10 configuration with 14 disks.
-Todd
On Tue, Oct 21, 2014 at 7:58 PM, Xiaobin She xiaobin...@gmail.com
wrote:
Todd,
Actually I'm wondering how kafka handle so much partition, with one
partition there is at least one file on disk, and with 4000
partition,
there will be at least 4000 files.
When all these partitions have write request, how did Kafka make the
write
operation on the disk to be sequential (which is emphasized in the
design
document of Kafka) and make sure the disk access is effective?
Thank you for your reply.
xiaobinshe
2014-10-22 5:10 GMT+08:00 Todd Palino tpal...@gmail.com:
As far as the number of partitions a single broker can handle,
we've
set
our cap at 4000 partitions (including replicas). Above that we've
seen
some
performance and stability issues.
-Todd
On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She
xiaobin...@gmail.com
wrote:
hello, everyone
I'm new to kafka, I'm wondering what's the max num of partition
can
one
siggle machine handle in Kafka?
Is there an sugeest num?
Thanks.
xiaobinshe