Thanks @Jay for suggesting changes to batch.size and linger.ms. I tried
them out. It appears one can do better than the default batch.size for
this synchronous batch mode with flush().
These new measurements are giving more "rational" numbers which with I can
reason and infer some thumb rules (for batch-sync mode using flush).
Here are my observations:
- The new producer API does much better than the older one for *single
threaded* producer. (best# i saw with old is ~68MB/s, with new ~140MB/s)
- Higher linger.ms sometimes helps perf and at other times hurts. No
simple rule here. Best to try it out and decide whether default is good
for your case or not.
- For single threaded producer: To get the most throughput, set
batch.size = (total bytes between flushes / partition count).
- Running more single threaded producer processes helped (till about
till 3 / 4 processes)
- 1-producer going to single partition is faster than 1 producer going
to multiple partitions
- The number of bytes between two explicit flushes (ie. flush interval)
made much smaller impact than the buffer.size. Something to be learnt
here.. my speculation is that with smaller flush intervals this might
change. Having two knobs (batch.size & flush interval is a a bit confusing
for end users trying to tune it, will be good if we can find if there is
some simple guidance feasible)
- Other than some inconveniences previously mentioned, I feel flush()
could be used as a way to simulate sync-batch behavior.
Producer Limits:
- Able to exceed 1gigEthernet capacity, but not 10gigEthernet. Does not
appear to go beyond ~460MB/s. Verified my test machines are able to
achieve 1GB/s.
Todo:
- Need to try Multi threaded producer.
- I did some testing of the Consumer APIs as well with 0.8.1 consumer-perf
tool. Wasnt able to push it beyond 30MB/s. When producers ran in parallel
it fell to under 10MB/s. Need to dig deeper. Will report back. Suggestions
welcome.
Measurements:
- See attachment
- Also available on paste bin: http://pastebin.com/p3kSAjy6
Settings: acks=1, single broker, single threaded producer (new api)
Machines: 32 cores, 256GB RAM, 10 gigE, 6x15000 rpm disks
1 partition
FlushInt=4MB FlushInt=8MB
FlushInt=16MB
linger=def batch.size = default 57 54 52
linger=1s batch.size = default 57 61 59
linger=def batch.size= flushInt/parts 136 125 116
linger=1s batch.size= flushInt/parts 92 77 56
linger=def batch.size == flushInt 140 123 124
linger=def batch.size = 10MB 140 123 124
linger=def batch.Size = 20MB 31 30 42
4 partitions
FlushInt=4MB FlushInt=8MB
FlushInt=16MB
linger=def batch.size = default 95 82 80
linger=1s batch.size = default 85 83 85
linger=def batch.size= batch/#part 127 133 90
linger=1s batch.size= batch/#part 94 100 101
linger=def batch.size == flushInt 60 8 6
linger=def batch.size = 10M 7 7 7
linger=def batch.Size = 20M 6 6 5
8 partitions
FlushInt=4MB FlushInt=8MB
FlushInt=16MB
linger=def batch.size = default 100 89 96
linger=1s batch.size = default 105 97 98
linger=def batch.size= batch/#part 114 128 78
linger=1s batch.size= batch/#part 95 94 102
linger=def batch.size == flushInt 7 8 8
linger=def batch.size = 10M 7 8 7
linger=def batch.Size = 20M 6 6 6
With multiple procduers (each single threaded)
For 1 partition :
1 process = 136 MB/s
3 process = 344 MB/s
4 process = 290 MB/s
For 4 partition ():
1 process = 127 MB/s
3 process = 345 MB/s
4 process = 372 MB/s
For 8 partition ():
1 process = 128 MB/s
3 process = 304 MB/s
4 process = 460 MB/s