On Wed, Jul 12, 2017 at 10:35 PM, Enrico Olivelli <[email protected]>
wrote:

> Sijie, JV, just a recap my point of view:
> - considering latency = "time for asynchAddEntry to complete"
> - there is a some difference from 4.4 and 4.5 in the usage of memory, but
> no so clear
> - the type of GC (parallel vs G1) does not impact very much but with G1 you
> achieve best latency
>

Did you try CMS for v3 protocol?


> - using the v2 protocol brings huge improvements in the usage of memory
> - if you want good "latency" for each entry you have to set at least
> throttle to 1000, but the throughput will be very lower (from 30 MB/s vs
> 300 MB/s on my disk)
>

What type of disk do you have? What is your filesystem mount options?

When you did the test, how did GC activities look like when setting
throttle to be more than 1000?

I think the latency looked "bad" when throttle was more than 1000, is
because of GC. Did you have that test results?


> below the results of my simple bench
>
> Enrico
>
> Bench code:
> https://github.com/eolivelli/bookkeepers-benchs/blob/master/src/test/java/
> BookKeeperWriteTest.java
>
> Results:
> with 4.5.0-SNAPSHOT and v2 protocol - G1 GC
>
> #0 Total wall clock time: 1000.0 ms, total callbacks time: 8460 ms, entry
> size 0,034 MB -> 8,46 ms per entry (latency),1,0 ms per entry (throughput)
> 34,2 MB/s throughput
> #1 Total wall clock time: 1005.0 ms, total callbacks time: 4529 ms, entry
> size 0,034 MB -> 4,53 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #2 Total wall clock time: 1002.0 ms, total callbacks time: 4293 ms, entry
> size 0,034 MB -> 4,29 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #3 Total wall clock time: 1003.0 ms, total callbacks time: 4317 ms, entry
> size 0,034 MB -> 4,32 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #4 Total wall clock time: 1002.0 ms, total callbacks time: 4553 ms, entry
> size 0,034 MB -> 4,55 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #5 Total wall clock time: 1001.0 ms, total callbacks time: 4448 ms, entry
> size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #6 Total wall clock time: 1004.0 ms, total callbacks time: 4449 ms, entry
> size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #7 Total wall clock time: 1003.0 ms, total callbacks time: 4504 ms, entry
> size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #8 Total wall clock time: 1002.0 ms, total callbacks time: 4290 ms, entry
> size 0,034 MB -> 4,29 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #9 Total wall clock time: 1003.0 ms, total callbacks time: 4441 ms, entry
> size 0,034 MB -> 4,44 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
>
> gc.log
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(9153868k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> 1,988: [GC pause (G1 Evacuation Pause) (young) 146M->59M(2048M), 0,0104300
> secs]
>
>
> Results, with 4.5.0-SNAPSHOT and v3 protocol - G1 GC
> #0 Total wall clock time: 1001.0 ms, total callbacks time: 11800 ms, entry
> size 0,034 MB -> 11,80 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #1 Total wall clock time: 1001.0 ms, total callbacks time: 4585 ms, entry
> size 0,034 MB -> 4,59 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #2 Total wall clock time: 1002.0 ms, total callbacks time: 4458 ms, entry
> size 0,034 MB -> 4,46 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #3 Total wall clock time: 1002.0 ms, total callbacks time: 4471 ms, entry
> size 0,034 MB -> 4,47 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #4 Total wall clock time: 1004.0 ms, total callbacks time: 4473 ms, entry
> size 0,034 MB -> 4,47 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #5 Total wall clock time: 1003.0 ms, total callbacks time: 4477 ms, entry
> size 0,034 MB -> 4,48 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #6 Total wall clock time: 1007.0 ms, total callbacks time: 4424 ms, entry
> size 0,034 MB -> 4,42 ms per entry (latency),1,0 ms per entry (throughput)
> 33,9 MB/s throughput
> #7 Total wall clock time: 1005.0 ms, total callbacks time: 4576 ms, entry
> size 0,034 MB -> 4,58 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #8 Total wall clock time: 1002.0 ms, total callbacks time: 4447 ms, entry
> size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #9 Total wall clock time: 1007.0 ms, total callbacks time: 4501 ms, entry
> size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry (throughput)
> 33,9 MB/s throughput
>
> gc.log
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(9042512k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> 0,830: [GC pause (G1 Evacuation Pause) (young) 118M->33M(2048M), 0,0116376
> secs]
> 1,372: [GC pause (G1 Evacuation Pause) (young) 162M->73M(2048M), 0,0055103
> secs]
> 2,172: [GC pause (G1 Evacuation Pause) (young) 227M->99M(2048M), 0,0023962
> secs]
> 10,358: [GC pause (G1 Evacuation Pause) (young) 1620M->122M(2048M),
> 0,0026519 secs]
> 11,676: [GC pause (Metadata GC Threshold) (young) (initial-mark)
> 242M->114M(2048M), 0,0021797 secs]
> 11,678: [GC concurrent-root-region-scan-start]
> 11,679: [GC concurrent-root-region-scan-end, 0,0007421 secs]
> 11,679: [GC concurrent-mark-start]
> 11,684: [GC concurrent-mark-end, 0,0046645 secs]
> 11,684: [GC remark, 0,0027234 secs]
> 11,687: [GC cleanup 115M->111M(2048M), 0,0008957 secs]
> 11,688: [GC concurrent-cleanup-start]
> 11,688: [GC concurrent-cleanup-end, 0,0000070 secs]
>
>
> With 4.4.0 - v3 protocol - G1 GC
> #0 Total wall clock time: 1001.0 ms, total callbacks time: 6723 ms, entry
> size 0,034 MB -> 6,72 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #1 Total wall clock time: 1003.0 ms, total callbacks time: 4495 ms, entry
> size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #2 Total wall clock time: 1001.0 ms, total callbacks time: 4406 ms, entry
> size 0,034 MB -> 4,41 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #3 Total wall clock time: 1005.0 ms, total callbacks time: 4457 ms, entry
> size 0,034 MB -> 4,46 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #4 Total wall clock time: 1002.0 ms, total callbacks time: 4388 ms, entry
> size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #5 Total wall clock time: 1004.0 ms, total callbacks time: 4451 ms, entry
> size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #6 Total wall clock time: 1003.0 ms, total callbacks time: 4574 ms, entry
> size 0,034 MB -> 4,57 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #7 Total wall clock time: 1003.0 ms, total callbacks time: 4523 ms, entry
> size 0,034 MB -> 4,52 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #8 Total wall clock time: 1004.0 ms, total callbacks time: 4450 ms, entry
> size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #9 Total wall clock time: 1002.0 ms, total callbacks time: 4542 ms, entry
> size 0,034 MB -> 4,54 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
>
> gc.log
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8931792k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> 0,750: [GC pause (G1 Evacuation Pause) (young) 106M->14M(2048M), 0,0070948
> secs]
> 1,089: [GC pause (G1 Evacuation Pause) (young) 127M->21M(2048M), 0,0038837
> secs]
> 1,608: [GC pause (G1 Evacuation Pause) (young) 193M->41M(2048M), 0,0039903
> secs]
> 3,745: [GC pause (G1 Evacuation Pause) (young) 733M->49M(2048M), 0,0046991
> secs]
> 7,990: [GC pause (G1 Evacuation Pause) (young) 1423M->65M(2048M), 0,0052844
> secs]
>
>
>
> with 4.4 - v3 protocol Parallel GC
>
> #0 Total wall clock time: 999.0 ms, total callbacks time: 6007 ms, entry
> size 0,034 MB -> 6,01 ms per entry (latency),1,0 ms per entry (throughput)
> 34,2 MB/s throughput
> #1 Total wall clock time: 1001.0 ms, total callbacks time: 4813 ms, entry
> size 0,034 MB -> 4,81 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #2 Total wall clock time: 1003.0 ms, total callbacks time: 4580 ms, entry
> size 0,034 MB -> 4,58 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #3 Total wall clock time: 1002.0 ms, total callbacks time: 4349 ms, entry
> size 0,034 MB -> 4,35 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #4 Total wall clock time: 1002.0 ms, total callbacks time: 4456 ms, entry
> size 0,034 MB -> 4,46 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #5 Total wall clock time: 1006.0 ms, total callbacks time: 4504 ms, entry
> size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #6 Total wall clock time: 1003.0 ms, total callbacks time: 4387 ms, entry
> size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #7 Total wall clock time: 1002.0 ms, total callbacks time: 4643 ms, entry
> size 0,034 MB -> 4,64 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #8 Total wall clock time: 1002.0 ms, total callbacks time: 4548 ms, entry
> size 0,034 MB -> 4,55 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #9 Total wall clock time: 1001.0 ms, total callbacks time: 4633 ms, entry
> size 0,034 MB -> 4,63 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
>
> gc.log
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8877776k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
> 0,603: [GC (Allocation Failure)  524800K->8488K(2010112K), 0,0056746 secs]
> 1,996: [GC (Allocation Failure)  533268K->181534K(2010112K), 0,0248460
> secs]
> 3,475: [GC (Allocation Failure)  703337K->237612K(2010112K), 0,0192195
> secs]
> 4,963: [GC (Allocation Failure)  762412K->212913K(2010112K), 0,0085357
> secs]
> 6,473: [GC (Allocation Failure)  737684K->192597K(1951232K), 0,0065437
> secs]
> 8,019: [GC (Allocation Failure)  704597K->319617K(1863680K), 0,0189275
> secs]
> 9,008: [GC (Allocation Failure)  651905K->315463K(1853952K), 0,0121369
> secs]
> 9,997: [GC (Allocation Failure)  647729K->418132K(1848832K), 0,0222260
> secs]
> 10,742: [GC (Allocation Failure)  679764K->340097K(1878528K), 0,0111347
> secs]
>
>
> with 4.5 - v2 protocol Parallel GC
>
> #0 Total wall clock time: 1004.0 ms, total callbacks time: 8030 ms, entry
> size 0,034 MB -> 8,03 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #1 Total wall clock time: 1003.0 ms, total callbacks time: 4394 ms, entry
> size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #2 Total wall clock time: 1002.0 ms, total callbacks time: 4417 ms, entry
> size 0,034 MB -> 4,42 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #3 Total wall clock time: 1005.0 ms, total callbacks time: 4362 ms, entry
> size 0,034 MB -> 4,36 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #4 Total wall clock time: 1003.0 ms, total callbacks time: 4435 ms, entry
> size 0,034 MB -> 4,44 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #5 Total wall clock time: 1002.0 ms, total callbacks time: 4489 ms, entry
> size 0,034 MB -> 4,49 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #6 Total wall clock time: 1002.0 ms, total callbacks time: 4346 ms, entry
> size 0,034 MB -> 4,35 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #7 Total wall clock time: 1004.0 ms, total callbacks time: 4474 ms, entry
> size 0,034 MB -> 4,47 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #8 Total wall clock time: 1005.0 ms, total callbacks time: 4436 ms, entry
> size 0,034 MB -> 4,44 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #9 Total wall clock time: 1001.0 ms, total callbacks time: 4322 ms, entry
> size 0,034 MB -> 4,32 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
>
> gc.log
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8951980k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
> 0,583: [GC (Allocation Failure)  524800K->8921K(2010112K), 0,0054470 secs]
> 6,397: [GC (Allocation Failure)  533721K->178956K(2010112K), 0,0306493 secs
>
>
> with 4.5 - v3 protocol Parallel GC
>
> #0 Total wall clock time: 1002.0 ms, total callbacks time: 10679 ms, entry
> size 0,034 MB -> 10,68 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #1 Total wall clock time: 1003.0 ms, total callbacks time: 4802 ms, entry
> size 0,034 MB -> 4,80 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #2 Total wall clock time: 1003.0 ms, total callbacks time: 4479 ms, entry
> size 0,034 MB -> 4,48 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #3 Total wall clock time: 1002.0 ms, total callbacks time: 4726 ms, entry
> size 0,034 MB -> 4,73 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #4 Total wall clock time: 1007.0 ms, total callbacks time: 4393 ms, entry
> size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry (throughput)
> 33,9 MB/s throughput
> #5 Total wall clock time: 1006.0 ms, total callbacks time: 4619 ms, entry
> size 0,034 MB -> 4,62 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #6 Total wall clock time: 1002.0 ms, total callbacks time: 4346 ms, entry
> size 0,034 MB -> 4,35 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #7 Total wall clock time: 1002.0 ms, total callbacks time: 4402 ms, entry
> size 0,034 MB -> 4,40 ms per entry (latency),1,0 ms per entry (throughput)
> 34,1 MB/s throughput
> #8 Total wall clock time: 1004.0 ms, total callbacks time: 4487 ms, entry
> size 0,034 MB -> 4,49 ms per entry (latency),1,0 ms per entry (throughput)
> 34,0 MB/s throughput
> #9 Total wall clock time: 1007.0 ms, total callbacks time: 4315 ms, entry
> size 0,034 MB -> 4,32 ms per entry (latency),1,0 ms per entry (throughput)
> 33,9 MB/s throughput
>
> gc.log
>
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8921388k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
> 0,563: [GC (Allocation Failure)  524800K->8935K(2010112K), 0,0055378 secs]
> 1,936: [GC (Allocation Failure)  533735K->187494K(2010112K), 0,0281195
> secs]
> 3,917: [GC (Allocation Failure)  712294K->275491K(2010112K), 0,0298168
> secs]
> 6,385: [GC (Allocation Failure)  798141K->394430K(2010112K), 0,0279730
> secs]
> 8,934: [GC (Allocation Failure)  919230K->394530K(2008576K), 0,0127794
> secs]
> 11,634: [GC (Metadata GC Threshold)  811154K->348997K(1868288K), 0,0080161
> secs]
> 11,642: [Full GC (Metadata GC Threshold)  348997K->39874K(1868288K),
> 0,0247462 secs]
>
>
> 4.5 v3 protocol parallel GC throttle = 0
>
> #0 Total wall clock time: 334.0 ms, total callbacks time: 267444 ms, entry
> size 0,034 MB -> 267,44 ms per entry (latency),0,3 ms per entry
> (throughput) 102,3 MB/s throughput
> #1 Total wall clock time: 189.0 ms, total callbacks time: 154428 ms, entry
> size 0,034 MB -> 154,43 ms per entry (latency),0,2 ms per entry
> (throughput) 180,8 MB/s throughput
> #2 Total wall clock time: 136.0 ms, total callbacks time: 103012 ms, entry
> size 0,034 MB -> 103,01 ms per entry (latency),0,1 ms per entry
> (throughput) 251,3 MB/s throughput
> #3 Total wall clock time: 156.0 ms, total callbacks time: 113315 ms, entry
> size 0,034 MB -> 113,32 ms per entry (latency),0,2 ms per entry
> (throughput) 219,1 MB/s throughput
> #4 Total wall clock time: 110.0 ms, total callbacks time: 79296 ms, entry
> size 0,034 MB -> 79,30 ms per entry (latency),0,1 ms per entry (throughput)
> 310,7 MB/s throughput
> #5 Total wall clock time: 104.0 ms, total callbacks time: 71213 ms, entry
> size 0,034 MB -> 71,21 ms per entry (latency),0,1 ms per entry (throughput)
> 328,7 MB/s throughput
> #6 Total wall clock time: 161.0 ms, total callbacks time: 72447 ms, entry
> size 0,034 MB -> 72,45 ms per entry (latency),0,2 ms per entry (throughput)
> 212,3 MB/s throughput
> #7 Total wall clock time: 103.0 ms, total callbacks time: 82783 ms, entry
> size 0,034 MB -> 82,78 ms per entry (latency),0,1 ms per entry (throughput)
> 331,8 MB/s throughput
> #8 Total wall clock time: 138.0 ms, total callbacks time: 106086 ms, entry
> size 0,034 MB -> 106,09 ms per entry (latency),0,1 ms per entry
> (throughput) 247,7 MB/s throughput
> #9 Total wall clock time: 118.0 ms, total callbacks time: 77750 ms, entry
> size 0,034 MB -> 77,75 ms per entry (latency),0,1 ms per entry (throughput)
> 289,7 MB/s throughput
>
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8858832k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
> 0,660: [GC (Allocation Failure)  524800K->8955K(2010112K), 0,0061442 secs]
> 1,288: [GC (Allocation Failure)  533726K->191538K(2010112K), 0,0329663
> secs]
> 1,634: [GC (Allocation Failure)  716338K->349468K(2010112K), 0,0508922
> secs]
> 1,942: [GC (Allocation Failure)  874268K->671717K(2010112K), 0,0609749
> secs]
> 2,266: [GC (Allocation Failure)  1196517K->781335K(2010112K), 0,0320305
> secs]
>
> 4.5 v2protocol throttle = 0 G1 GC
>
> #0 Total wall clock time: 200.0 ms, total callbacks time: 173682 ms, entry
> size 0,034 MB -> 173,68 ms per entry (latency),0,2 ms per entry
> (throughput) 170,9 MB/s throughput
> #1 Total wall clock time: 128.0 ms, total callbacks time: 89987 ms, entry
> size 0,034 MB -> 89,99 ms per entry (latency),0,1 ms per entry (throughput)
> 267,0 MB/s throughput
> #2 Total wall clock time: 117.0 ms, total callbacks time: 86411 ms, entry
> size 0,034 MB -> 86,41 ms per entry (latency),0,1 ms per entry (throughput)
> 292,1 MB/s throughput
> #3 Total wall clock time: 95.0 ms, total callbacks time: 64054 ms, entry
> size 0,034 MB -> 64,05 ms per entry (latency),0,1 ms per entry (throughput)
> 359,8 MB/s throughput
> #4 Total wall clock time: 92.0 ms, total callbacks time: 60863 ms, entry
> size 0,034 MB -> 60,86 ms per entry (latency),0,1 ms per entry (throughput)
> 371,5 MB/s throughput
> #5 Total wall clock time: 92.0 ms, total callbacks time: 62026 ms, entry
> size 0,034 MB -> 62,03 ms per entry (latency),0,1 ms per entry (throughput)
> 371,5 MB/s throughput
> #6 Total wall clock time: 93.0 ms, total callbacks time: 65033 ms, entry
> size 0,034 MB -> 65,03 ms per entry (latency),0,1 ms per entry (throughput)
> 367,5 MB/s throughput
> #7 Total wall clock time: 88.0 ms, total callbacks time: 59851 ms, entry
> size 0,034 MB -> 59,85 ms per entry (latency),0,1 ms per entry (throughput)
> 388,4 MB/s throughput
> #8 Total wall clock time: 93.0 ms, total callbacks time: 61816 ms, entry
> size 0,034 MB -> 61,82 ms per entry (latency),0,1 ms per entry (throughput)
> 367,5 MB/s throughput
> #9 Total wall clock time: 93.0 ms, total callbacks time: 70397 ms, entry
> size 0,034 MB -> 70,40 ms per entry (latency),0,1 ms per entry (throughput)
> 367,5 MB/s throughput
>
>
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8894828k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> 1,078: [GC pause (G1 Evacuation Pause) (young) 150M->65M(2048M), 0,0138100
> secs]
>
> bk 4.5 v3 protocol G1 GC throttle = 0
>
> #0 Total wall clock time: 275.0 ms, total callbacks time: 238043 ms, entry
> size 0,034 MB -> 238,04 ms per entry (latency),0,3 ms per entry
> (throughput) 124,3 MB/s throughput
> #1 Total wall clock time: 148.0 ms, total callbacks time: 106877 ms, entry
> size 0,034 MB -> 106,88 ms per entry (latency),0,1 ms per entry
> (throughput) 230,9 MB/s throughput
> #2 Total wall clock time: 130.0 ms, total callbacks time: 96448 ms, entry
> size 0,034 MB -> 96,45 ms per entry (latency),0,1 ms per entry (throughput)
> 262,9 MB/s throughput
> #3 Total wall clock time: 118.0 ms, total callbacks time: 78246 ms, entry
> size 0,034 MB -> 78,25 ms per entry (latency),0,1 ms per entry (throughput)
> 289,7 MB/s throughput
> #4 Total wall clock time: 109.0 ms, total callbacks time: 79110 ms, entry
> size 0,034 MB -> 79,11 ms per entry (latency),0,1 ms per entry (throughput)
> 313,6 MB/s throughput
> #5 Total wall clock time: 124.0 ms, total callbacks time: 86652 ms, entry
> size 0,034 MB -> 86,65 ms per entry (latency),0,1 ms per entry (throughput)
> 275,6 MB/s throughput
> #6 Total wall clock time: 110.0 ms, total callbacks time: 76729 ms, entry
> size 0,034 MB -> 76,73 ms per entry (latency),0,1 ms per entry (throughput)
> 310,7 MB/s throughput
> #7 Total wall clock time: 110.0 ms, total callbacks time: 80104 ms, entry
> size 0,034 MB -> 80,10 ms per entry (latency),0,1 ms per entry (throughput)
> 310,7 MB/s throughput
> #8 Total wall clock time: 104.0 ms, total callbacks time: 73514 ms, entry
> size 0,034 MB -> 73,51 ms per entry (latency),0,1 ms per entry (throughput)
> 328,7 MB/s throughput
> #9 Total wall clock time: 102.0 ms, total callbacks time: 74487 ms, entry
> size 0,034 MB -> 74,49 ms per entry (latency),0,1 ms per entry (throughput)
> 335,1 MB/s throughput
>
> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 32870636k(8958604k free), swap
> 15622140k(15464700k free)
> CommandLine flags: -XX:InitialHeapSize=2147483648
> -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCTimeStamps
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> 0,876: [GC pause (G1 Evacuation Pause) (young) 102M->19M(2048M), 0,0107143
> secs]
> 0,989: [GC pause (G1 Evacuation Pause) (young) 152M->81M(2048M), 0,0074561
> secs]
> 1,125: [GC pause (G1 Evacuation Pause) (young) 198M->108M(2048M), 0,0038296
> secs]
> 1,286: [GC pause (G1 Evacuation Pause) (young) 306M->146M(2048M), 0,0034297
> secs]
>
>
>
> 2017-07-12 3:07 GMT+02:00 Sijie Guo <[email protected]>:
>
> > On Wed, Jul 12, 2017 at 2:04 AM, Venkateswara Rao Jujjuri <
> > [email protected]
> > > wrote:
> >
> > > Enrico, let me try to paraphrase the issue.
> > >
> > > - With G1GC + Netty 4.1 is giving you roughly same perf as prev
> release.
> > Is
> > > that accurate statement?
> > > But you are still seeing latency spikes with Netty 4.1??
> >
> >
> >
> >
> >
> > > I did not fully
> > > understand your sleep usecase.
> > > How sleep is yielding better latency?
> > >
> >
> > I think the "sleep(1)" in Enrico's benchmark is for throttling. It is
> same
> > as setting 'throttle' value to 1000 in his test.
> >
> > - Sijie
> >
> >
> >
> > >
> > > Thanks,
> > > JV
> > >
> > > On Tue, Jul 11, 2017 at 8:27 AM, Enrico Olivelli <[email protected]>
> > > wrote:
> > >
> > > > Another interesting thing...during my profiling activity I gave a
> > chance
> > > to
> > > > the old v2 protocol and activated the gc logs, as expected the result
> > is
> > > > that with v2 protocol there is almost no GC activity during the
> > benchmark
> > > > -- Enrico
> > > >
> > > > 2017-07-11 12:07 GMT+02:00 Enrico Olivelli <[email protected]>:
> > > >
> > > > >
> > > > >
> > > > > 2017-07-11 11:04 GMT+02:00 Sijie Guo <[email protected]>:
> > > > >
> > > > >> I think Netty4 requires more offheap memory. you might need to
> tune
> > > the
> > > > >> JVM
> > > > >> settings. I doubt that latency diff coming from the JVM gc.
> > > > >>
> > > > >> A simple thing to verify that is to dump the gc log by adding "
> > > > -Xloggc:"
> > > > >> setting and compare the gc logs between versions.
> > > > >>
> > > > >
> > > > >
> > > > > Enabling G1 GC (-XX:+UseG1GC) apparently solves the difference
> > between
> > > > 4.4
> > > > > and 4.5.
> > > > >
> > > > > And I a loop the bench (in order to have a warm JVM) numbers become
> > > more
> > > > > stable and similar to each other
> > > > >
> > > > > I have the "other issue" pending, the fact that latency (time for
> > > > > asynchAddEntry to complete) is really bad and unpredictable (from
> 100
> > > ms
> > > > to
> > > > > 2000ms)
> > > > >
> > > > > and if I introduce a Thread.sleep(1) all the callbacks complete
> > always
> > > > > with success in < 10ms. This happens even with G1 GC.
> > > > >
> > > > > Without the "sleep(1)" my machine uses a lot of CPU (I have 8
> "CPUs")
> > > and
> > > > > with the "sleep(1)" the load is slightly lower
> > > > >
> > > > > Honestly for me this is still a problem and I hope that with your
> > help
> > > I
> > > > > will be able to find the problem, wherever it is (in BK code or in
> > the
> > > > way
> > > > > I am doing the bench)
> > > > >
> > > > > I will try to create a new more complete bench
> > > > >
> > > > > -- Enrico
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >>
> > > > >> - Sijie
> > > > >>
> > > > >> On Tue, Jul 11, 2017 at 12:16 AM, Enrico Olivelli <
> > > [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >> > a did a bisect and the culprit (in my opinion) is the switch to
> > > netty
> > > > 4
> > > > >> for
> > > > >> > the performance regression from 4.5 and 4.4
> > > > >> >
> > > > >> > at commit:
> > > > >> > commit 811ece53a1c975c4e768422f3d622ac9de6b3e41
> > BOOKKEEPER-1058:
> > > > >> Ignore
> > > > >> > already deleted ledger on replication audit
> > > > >> >
> > > > >> > Total time: 204 ms
> > > > >> > Total real time: 79 ms per entry
> > > > >> >
> > > > >> > at commit:
> > > > >> > commit 74f795136c1fff3badb29fc982d0cc2d43096b45
> BOOKKEEPER-1008:
> > > > Netty
> > > > >> 4.1
> > > > >> >
> > > > >> > Total time: 308 ms
> > > > >> > Total real time: 189 ms per entry
> > > > >> >
> > > > >> > I have tried with epoll and with local transport, results does
> not
> > > > >> change.
> > > > >> > I tried to upgrade to netty 4.1.13 too, but no change
> > > > >> >
> > > > >> > Could it be  the memory allocator of netty which is overwhelmed
> > with
> > > > >> sudden
> > > > >> > bursts of allocation ?
> > > > >> > I did some trial with UnpooledByteBufAllocator.DEFAULT and it
> > > helps a
> > > > >> > little, we get to "110 ms per entry" vs "189 ms per entry"
> > > > >> >
> > > > >> > the bench is here:
> > > > >> > https://github.com/eolivelli/bookkeepers-benchs/blob/master/
> > > > >> src/test/java/
> > > > >> > BookKeeperWriteTest.java
> > > > >> >
> > > > >> >
> > > > >> > -- Enrico
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2017-07-10 19:46 GMT+02:00 Enrico Olivelli <[email protected]
> >:
> > > > >> >
> > > > >> > >
> > > > >> > >
> > > > >> > > Il lun 10 lug 2017, 18:21 Venkateswara Rao Jujjuri <
> > > > [email protected]
> > > > >> >
> > > > >> > ha
> > > > >> > > scritto:
> > > > >> > >
> > > > >> > >> With Netty changes, lack of native epoll() has huge perf
> impact
> > > as
> > > > >> per
> > > > >> > >> Kishore.
> > > > >> > >> Are you sure you are using epoll()?
> > > > >> > >>
> > > > >> > >
> > > > >> > > Yes. I tried with netty local transport too. It seems not
> > related
> > > to
> > > > >> > netty
> > > > >> > > to me.
> > > > >> > > I will double check, tomorrow
> > > > >> > > Enrico
> > > > >> > >
> > > > >> > >
> > > > >> > >> On Mon, Jul 10, 2017 at 1:49 AM, Enrico Olivelli <
> > > > >> [email protected]>
> > > > >> > >> wrote:
> > > > >> > >>
> > > > >> > >> > 2017-07-10 10:40 GMT+02:00 Sijie Guo <[email protected]>:
> > > > >> > >> >
> > > > >> > >> > > Also one other thing to check is the JVM settings. Do you
> > > mind
> > > > >> > sharing
> > > > >> > >> > that
> > > > >> > >> > > as well?
> > > > >> > >> > >
> > > > >> > >> > >
> > > > >> > >> > this is the surefire config, I am using oracle jdk 8
> > > > >> > >> >
> > > > >> > >> >              <plugin>
> > > > >> > >> >                 <artifactId>maven-surefire-
> > plugin</artifactId>
> > > > >> > >> >                 <version>2.20</version>
> > > > >> > >> >                 <configuration>
> > > > >> > >> >                     <forkCount>1</forkCount>
> > > > >> > >> >                     <reuseForks>false</reuseForks>
> > > > >> > >> >
> > > > >> > >> > <forkedProcessTimeoutInSeconds>300</
> > > > forkedProcessTimeoutInSeconds>
> > > > >> > >> >                     <argLine>-Xmx2G
> > > > >> > >> > -Djava.io.tmpdir=${basedir}/target</argLine>
> > > > >> > >> >                 </configuration>
> > > > >> > >> >             </plugin>
> > > > >> > >> >
> > > > >> > >> > -- Enrico
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> > > Sijie
> > > > >> > >> > >
> > > > >> > >> > > On Jul 10, 2017 1:17 AM, "Sijie Guo" <[email protected]
> >
> > > > wrote:
> > > > >> > >> > >
> > > > >> > >> > > > I am not sure if there is any default values changed
> for
> > > > >> journal
> > > > >> > >> > > settings.
> > > > >> > >> > > > I would suggest you testing by setting specifically the
> > > > journal
> > > > >> > >> > settings.
> > > > >> > >> > > >
> > > > >> > >> > > > Also if you can share your benchmark, that would be
> good
> > > for
> > > > >> other
> > > > >> > >> > people
> > > > >> > >> > > > to verify.
> > > > >> > >> > > >
> > > > >> > >> > > > Sijie
> > > > >> > >> > > >
> > > > >> > >> > > > On Jul 10, 2017 12:32 AM, "Enrico Olivelli" <
> > > > >> [email protected]>
> > > > >> > >> > wrote:
> > > > >> > >> > > >
> > > > >> > >> > > >> Hi,
> > > > >> > >> > > >> I am doing some benchmarks on BK, I see that from
> 4.4.0
> > to
> > > > >> 4.5.0
> > > > >> > >> there
> > > > >> > >> > > is
> > > > >> > >> > > >> something "slow" but I cannot understand what. I
> really
> > > hope
> > > > >> that
> > > > >> > >> I am
> > > > >> > >> > > >> wrong.
> > > > >> > >> > > >>
> > > > >> > >> > > >> I am working with writes, I will pass to reads once
> > writes
> > > > >> will
> > > > >> > be
> > > > >> > >> ok.
> > > > >> > >> > > >> My problem is both on latency (time for AddComplete
> > > callback
> > > > >> to
> > > > >> > >> > > complete)
> > > > >> > >> > > >> and on overall throuput.
> > > > >> > >> > > >>
> > > > >> > >> > > >> Actually I have two distinct problems, but working on
> > the
> > > > >> first
> > > > >> > >> > problem
> > > > >> > >> > > I
> > > > >> > >> > > >> found a performance regression.
> > > > >> > >> > > >> I know that talking about "slow" things it is an hard
> > > > matter,
> > > > >> so
> > > > >> > I
> > > > >> > >> > will
> > > > >> > >> > > >> try
> > > > >> > >> > > >> do describe as much as possible all the aspects that I
> > > think
> > > > >> are
> > > > >> > >> > > relevant.
> > > > >> > >> > > >>
> > > > >> > >> > > >> First problem: under certain load performance
> > > > >> > (latency+throughput)
> > > > >> > >> > > degrade
> > > > >> > >> > > >> too much
> > > > >> > >> > > >> Second problem: the first problem is more evident in
> > 4.5.0
> > > > >> > >> > > >>
> > > > >> > >> > > >> Let's describe my testcase and why I am worried.
> > > > >> > >> > > >> The bench issues a batch of asyncAddEntry and prints
> the
> > > > >> average
> > > > >> > >> time
> > > > >> > >> > > for
> > > > >> > >> > > >> AddComplete to complete and the overall clock time.
> > > > >> > >> > > >>
> > > > >> > >> > > >> This is the code
> > > > >> > >> > > >>
> > > > >> > >> > > >> private static final byte[] TEST_DATA = new byte[35 *
> > > 1024];
> > > > >> > >> > > >> private static final int testsize = 1000;
> > > > >> > >> > > >>
> > > > >> > >> > > >> ...... (start 1 bookie, see below)
> > > > >> > >> > > >>             ClientConfiguration clientConfiguration =
> > new
> > > > >> > >> > > >> ClientConfiguration();
> > > > >> > >> > > >>             clientConfiguration.setZkServ
> > > > >> ers(env.getAddress());
> > > > >> > >> > > >>             try (BookKeeper bk = new BookKeeper(
> > > > >> > >> clientConfiguration);
> > > > >> > >> > > >>                 LedgerHandle lh = bk.createLedger(1,
> 1,
> > 1,
> > > > >> > >> > > >> BookKeeper.DigestType.CRC32, new byte[0])) {
> > > > >> > >> > > >>                 LongAdder totalTime = new LongAdder();
> > > > >> > >> > > >>                 long _start =
> > System.currentTimeMillis();
> > > > >> > >> > > >>                 Collection<CompletableFuture> batch =
> > new
> > > > >> > >> > > >> ConcurrentLinkedQueue<>();
> > > > >> > >> > > >>                 for (int i = 0; i < testsize; i++) {
> > > > >> > >> > > >>                     CompletableFuture cf = new
> > > > >> > CompletableFuture();
> > > > >> > >> > > >>                     batch.add(cf);
> > > > >> > >> > > >>                     lh.asyncAddEntry(TEST_DATA, new
> > > > >> > >> > > >> AsyncCallback.AddCallback() {
> > > > >> > >> > > >>
> > > > >> > >> > > >>                         long start =
> > > > >> System.currentTimeMillis();
> > > > >> > >> > > >>
> > > > >> > >> > > >>                         @Override
> > > > >> > >> > > >>                         public void addComplete(int
> rc,
> > > > >> > >> LedgerHandle
> > > > >> > >> > lh,
> > > > >> > >> > > >> long entryId, Object ctx) {
> > > > >> > >> > > >>                             long now =
> > > > >> > >> > > >> System.currentTimeMillis();
> > > > >> > >> > > >>                             CompletableFuture _cf =
> > > > >> > >> > (CompletableFuture)
> > > > >> > >> > > >> ctx;
> > > > >> > >> > > >>                             if (rc ==
> > > BKException.Code.OK) {
> > > > >> > >> > > >>                                 _cf.complete("");
> > > > >> > >> > > >>                             } else {
> > > > >> > >> > > >>
> > > > >> > >> > > >> _cf.completeExceptionally(BKException.create(rc));
> > > > >> > >> > > >>                             }
> > > > >> > >> > > >>                             totalTime.add(now -
> start);
> > > > >> > >> > > >>                         }
> > > > >> > >> > > >>                     }, cf);
> > > > >> > >> > > >> //                    Thread.sleep(1);      // this is
> > the
> > > > >> > >> tirgger!!!
> > > > >> > >> > > >>                 }
> > > > >> > >> > > >>                 assertEquals(testsize, batch.size());
> > > > >> > >> > > >>                 for (CompletableFuture f : batch) {
> > > > >> > >> > > >>                     f.get();
> > > > >> > >> > > >>                 }
> > > > >> > >> > > >>                 long _stop =
> System.currentTimeMillis();
> > > > >> > >> > > >>                 long delta = _stop - _start;
> > > > >> > >> > > >>                 System.out.println("Total time: " +
> > delta
> > > +
> > > > "
> > > > >> > ms");
> > > > >> > >> > > >>                 System.out.println("Total real time:
> " +
> > > > >> > >> > > totalTime.sum() +
> > > > >> > >> > > >> " ms -> "+(totalTime.sum()/testsize)+" ms per entry");
> > > > >> > >> > > >>             }
> > > > >> > >> > > >>
> > > > >> > >> > > >> Bookie config:
> > > > >> > >> > > >>         ServerConfiguration conf = new
> > > > ServerConfiguration();
> > > > >> > >> > > >>         conf.setBookiePort(5621);
> > > > >> > >> > > >>         conf.setUseHostNameAsBookieID(true);
> > > > >> > >> > > >>
> > > > >> > >> > > >>         Path targetDir = path.resolve("bookie_data");
> > > > >> > >> > > >>         conf.setZkServers("localhost:1282");
> > > > >> > >> > > >>         conf.setLedgerDirNames(new
> > > > >> > >> > > >> String[]{targetDir.toAbsolutePath().toString()});
> > > > >> > >> > > >>         conf.setJournalDirName(
> > > targetDir.toAbsolutePath().
> > > > >> > >> > toString());
> > > > >> > >> > > >>         conf.setFlushInterval(1000);
> > > > >> > >> > > >>         conf.setJournalFlushWhenQueueEmpty(true);
> > > > >> > >> > > >>         conf.setProperty("journalMaxGroupWaitMSec",
> 0);
> > > > >> > >> > > >>         conf.setProperty("
> > journalBufferedWritesThreshold
> > > ",
> > > > >> > 1024);
> > > > >> > >> > > >>         conf.setAutoRecoveryDaemonEnabled(false);
> > > > >> > >> > > >>         conf.setEnableLocalTransport(true);
> > > > >> > >> > > >>         conf.setAllowLoopback(true);
> > > > >> > >> > > >>
> > > > >> > >> > > >> The tests starts one ZK server + 1 Bookie + the
> testcase
> > > in
> > > > a
> > > > >> > JUnit
> > > > >> > >> > test
> > > > >> > >> > > >>
> > > > >> > >> > > >>
> > > > >> > >> > > >> Results:
> > > > >> > >> > > >> A - BK-4.4.0:
> > > > >> > >> > > >> Total time: 209 ms
> > > > >> > >> > > >> Total real time: 194337 ms -> 194 ms per entry
> > > > >> > >> > > >>
> > > > >> > >> > > >> B - BK-4.5.0-SNAPSHOT:
> > > > >> > >> > > >> Total time: 269 ms
> > > > >> > >> > > >> Total real time: 239918 ms -> 239 ms per entry
> > > > >> > >> > > >>
> > > > >> > >> > > >> C - BK-4.4,0 with sleep(1):
> > > > >> > >> > > >> Total time: 1113 ms (1000 ms sleep time)
> > > > >> > >> > > >> Total real time: 4238 ms  -> 4 ms per entry
> > > > >> > >> > > >>
> > > > >> > >> > > >> D - BK-4.5,0-SNAPSHOT with sleep(1):
> > > > >> > >> > > >> Total time: 1121 ms (1000 ms sleep time)
> > > > >> > >> > > >> Total real time: 8018 ms -> 8 ms per entry
> > > > >> > >> > > >>
> > > > >> > >> > > >> Problem 1 (unexpected performance degradation):
> > > > >> > >> > > >> Times per entry (latency) are incredibly slow in
> cases A
> > > and
> > > > >> B.
> > > > >> > >> > > >> If I add a sleep(1) between one call of asyncAddEntry
> > and
> > > > the
> > > > >> > next
> > > > >> > >> > > >> "latency" is around 4 ms per entry.
> > > > >> > >> > > >>
> > > > >> > >> > > >> Problem 2: worse performance on 4.5.0
> > > > >> > >> > > >> Compare A vs B and C vs D, it is self-explaining.
> > > > >> > >> > > >>
> > > > >> > >> > > >> I am running the test on my laptop, with linux 64bit
> > > > >> (Fedora), 12
> > > > >> > >> GB
> > > > >> > >> > > RAM,
> > > > >> > >> > > >> no swap, on an SSD disk. The results are similar on
> > other
> > > > >> > >> computers.
> > > > >> > >> > > >>
> > > > >> > >> > > >> It seems that if I issue too many addEntry the systems
> > > slows
> > > > >> > down.
> > > > >> > >> > > >>
> > > > >> > >> > > >> Please note this fact:
> > > > >> > >> > > >> numbers for case A and B (without sleep) mean that all
> > the
> > > > >> adds
> > > > >> > got
> > > > >> > >> > > >> completed almost together
> > > > >> > >> > > >>
> > > > >> > >> > > >> for the 4.5 vs 4.4 case:
> > > > >> > >> > > >> I tried to disable all of the threadpool enhancements
> > > > >> (different
> > > > >> > >> > > >> read/write
> > > > >> > >> > > >> pools)....it makes not difference
> > > > >> > >> > > >>
> > > > >> > >> > > >> Questions:
> > > > >> > >> > > >>
> > > > >> > >> > > >> Is the "grouping" logic of the journal ?
> > > > >> > >> > > >>
> > > > >> > >> > > >> Is there a way of making a burst of 1000 async writes
> on
> > > the
> > > > >> same
> > > > >> > >> > ledger
> > > > >> > >> > > >> perform <10 ms latency ?  (in my real case I have
> bursts
> > > of
> > > > >> > >> concurrent
> > > > >> > >> > > >> writes from different threads)
> > > > >> > >> > > >>
> > > > >> > >> > > >> Why 4.5.0 is anyway slower ?
> > > > >> > >> > > >>
> > > > >> > >> > > >> Thanks
> > > > >> > >> > > >>
> > > > >> > >> > > >> -- Enrico
> > > > >> > >> > > >>
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> --
> > > > >> > >> Jvrao
> > > > >> > >> ---
> > > > >> > >> First they ignore you, then they laugh at you, then they
> fight
> > > you,
> > > > >> then
> > > > >> > >> you win. - Mahatma Gandhi
> > > > >> > >>
> > > > >> > > --
> > > > >> > >
> > > > >> > >
> > > > >> > > -- Enrico Olivelli
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jvrao
> > > ---
> > > First they ignore you, then they laugh at you, then they fight you,
> then
> > > you win. - Mahatma Gandhi
> > >
> >
>

Reply via email to