[ https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711675#comment-14711675 ]
ASF GitHub Bot commented on STORM-855: -------------------------------------- Github user mjsax commented on the pull request: https://github.com/apache/storm/pull/694#issuecomment-134682336 Here are some additional benchmark results with larger `--messageSize` (ie, 100 and 250). Those benchmarks are run in a 12 node cluster (with `nimbus.thrift.max_buffer_size: 33554432` and `worker.childopts: "-Xmx2g"`) as follows: `storm jar target/storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main --name test -l 1 -n 1 --messageSize 100 --workers 24 --spout 1 --bolt 10 --testTimeSec 300` As you can observe, the output rate is slightly reduced for larger tuple size in all cases (what is expected due to larger serialization costs). That batching case with batch size 100 and`--messageSize 250` does not improve output rate as significant as before, because it hits the underlaying 1Gbit Ethernet limit. In the beginning, it starts promising. After about 2 minutes performance does down. Because tuples cannot be transfered over the network fast enough, they get buffered in main memory which reaches it's limit (assumption) and thus Storm slows down the spout. Not sure, why performances goes down in each report. Any ideas? `--messageSize 100` **master branch** ``` status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s) WAITING 1 144 0 11 0 1440523358523 0 0 0.0 WAITING 1 144 11 11 11 1440523388523 30000 6336040 20.14172871907552 WAITING 1 144 11 11 11 1440523418523 30000 10554200 33.55089823404948 RUNNING 1 144 11 11 11 1440523448523 30000 10005840 31.807708740234375 RUNNING 1 144 11 11 11 1440523478523 30000 10514120 33.4234873453776 RUNNING 1 144 11 11 11 1440523508523 30000 10430700 33.158302307128906 RUNNING 1 144 11 11 11 1440523538523 30000 7608960 24.188232421875 RUNNING 1 144 11 11 11 1440523568523 30000 10279460 32.67752329508463 RUNNING 1 144 11 11 11 1440523598524 30001 10496260 33.36559974774929 RUNNING 1 144 11 11 11 1440523628523 29999 10382860 33.00732328691556 RUNNING 1 144 11 11 11 1440523658523 30000 10138280 32.22872416178385 RUNNING 1 144 11 11 11 1440523688523 30000 10072940 32.02101389567057 RUNNING 1 144 11 11 11 1440523718523 30000 10095820 32.09374745686849 ``` **batching branch (no batching)** ``` status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s) WAITING 1 144 0 11 0 1440520763917 0 0 0.0 WAITING 1 144 11 11 11 1440520793917 30000 4467900 14.203071594238281 WAITING 1 144 11 11 11 1440520823917 30000 10616160 33.74786376953125 RUNNING 1 144 11 11 11 1440520853917 30000 10473700 33.294995625813804 RUNNING 1 144 11 11 11 1440520883917 30000 10556860 33.55935414632162 RUNNING 1 144 11 11 11 1440520913917 30000 10580760 33.63533020019531 RUNNING 1 144 11 11 11 1440520943917 30000 10367580 32.95764923095703 RUNNING 1 144 11 11 11 1440520973917 30000 10646760 33.84513854980469 RUNNING 1 144 11 11 11 1440521003917 30000 10750300 34.17428334554037 RUNNING 1 144 11 11 11 1440521033917 30000 10607220 33.719444274902344 RUNNING 1 144 11 11 11 1440521063917 30000 10456920 33.24165344238281 RUNNING 1 144 11 11 11 1440521093917 30000 10108000 32.132466634114586 RUNNING 1 144 11 11 11 1440521123917 30000 10576120 33.6205800374349 ``` **batching branch (batch size: 100)** ``` status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s) WAITING 1 144 0 11 0 1440521937049 0 0 0.0 WAITING 1 144 11 11 11 1440521967049 30000 11346480 36.069488525390625 WAITING 1 144 11 11 11 1440521997050 30001 17333500 55.0998758822297 RUNNING 1 144 11 11 11 1440522027049 29999 17815260 56.635074176137906 RUNNING 1 144 11 11 11 1440522057049 30000 17993660 57.200304667154946 RUNNING 1 144 11 11 11 1440522087049 30000 17720880 56.333160400390625 RUNNING 1 144 11 11 11 1440522117050 30001 17957200 57.08249869861108 RUNNING 1 144 11 11 11 1440522147049 29999 18286500 58.13315572840058 RUNNING 1 144 11 11 11 1440522177050 30001 18027820 57.306986149778076 RUNNING 1 144 11 11 11 1440522207049 29999 12470520 39.64403692199896 RUNNING 1 144 11 11 11 1440522237049 30000 17256760 54.8577626546224 RUNNING 1 144 11 11 11 1440522267049 30000 17288300 54.95802561442057 RUNNING 1 144 11 11 11 1440522297049 30000 17077300 54.28727467854818 ``` `--messageSize 250` **master branch** ``` status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s) WAITING 1 144 0 11 0 1440523772498 0 0 0.0 WAITING 1 144 11 11 11 1440523802498 30000 6047420 48.06057612101237 WAITING 1 144 11 11 11 1440523832498 30000 9909100 78.7504514058431 RUNNING 1 144 11 11 11 1440523862498 30000 9846400 78.25215657552083 RUNNING 1 144 11 11 11 1440523892498 30000 7100940 56.43320083618164 RUNNING 1 144 11 11 11 1440523922498 30000 9870500 78.4436861673991 RUNNING 1 144 11 11 11 1440523952498 30000 9856460 78.33210627237956 RUNNING 1 144 11 11 11 1440523982498 30000 9662740 76.79255803426106 RUNNING 1 144 11 11 11 1440524012498 30000 10060860 79.9565315246582 RUNNING 1 144 11 11 11 1440524042499 30001 10049660 79.86485975980162 RUNNING 1 144 11 11 11 1440524072498 29999 9937680 78.98021751278428 RUNNING 1 144 11 11 11 1440524102498 30000 10061600 79.96241251627605 RUNNING 1 144 11 11 11 1440524132498 30000 9804960 77.92282104492188 ``` **batching branch (no batching)** ``` status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s) WAITING 1 144 0 11 0 1440521276880 0 0 0.0 WAITING 1 144 11 11 11 1440521306880 30000 4983420 39.60466384887695 WAITING 1 144 11 11 11 1440521336880 30000 9760600 77.57027943929036 RUNNING 1 144 11 11 11 1440521366881 30001 9344540 74.26125626338171 RUNNING 1 144 11 11 11 1440521396880 29999 9323920 74.10232867951068 RUNNING 1 144 11 11 11 1440521426880 30000 9354460 74.3425687154134 RUNNING 1 144 11 11 11 1440521456880 30000 9601440 76.30538940429688 RUNNING 1 144 11 11 11 1440521486880 30000 9476900 75.3156344095866 RUNNING 1 144 11 11 11 1440521516880 30000 9581560 76.14739735921223 RUNNING 1 144 11 11 11 1440521546881 30001 9494500 75.4529915429414 RUNNING 1 144 11 11 11 1440521576880 29999 9260020 73.59448017774095 RUNNING 1 144 11 11 11 1440521606880 30000 9054000 71.95472717285156 RUNNING 1 144 11 11 11 1440521636880 30000 9278340 73.73762130737305 RUNNING 1 144 11 11 11 1440521666880 30000 9153180 72.74293899536133 ``` **batching branch (batch size: 100)** ``` status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s) WAITING 1 144 0 11 0 1440522380492 0 0 0.0 WAITING 1 144 11 11 11 1440522410492 30000 10616760 84.37442779541016 WAITING 1 144 11 11 11 1440522440492 30000 15851220 125.97417831420898 RUNNING 1 144 11 11 11 1440522470492 30000 12884860 102.39966710408528 RUNNING 1 144 11 11 11 1440522500492 30000 8743860 69.48995590209961 RUNNING 1 144 11 11 11 1440522530492 30000 6445660 51.225503285725914 RUNNING 1 144 11 11 11 1440522560492 30000 5974480 47.48090108235677 RUNNING 1 144 11 11 11 1440522590493 30001 5198640 41.3137016119645 RUNNING 1 144 11 11 11 1440522620492 29999 5209800 41.405150618464624 RUNNING 1 144 11 11 11 1440522650492 30000 4585960 36.445935567220054 RUNNING 1 144 11 11 11 1440522680492 30000 3870140 30.75710932413737 RUNNING 1 144 11 11 11 1440522710492 30000 3751040 29.810587565104168 RUNNING 1 144 11 11 11 1440522740493 30001 3450600 27.421990901898322 ``` > Add tuple batching > ------------------ > > Key: STORM-855 > URL: https://issues.apache.org/jira/browse/STORM-855 > Project: Apache Storm > Issue Type: New Feature > Reporter: Matthias J. Sax > Assignee: Matthias J. Sax > Priority: Minor > > In order to increase Storm's throughput, multiple tuples can be grouped > together in a batch of tuples (ie, fat-tuple) and transfered from producer to > consumer at once. > The initial idea is taken from https://github.com/mjsax/aeolus. However, we > aim to integrate this feature deep into the system (in contrast to building > it on top), what has multiple advantages: > - batching can be even more transparent to the user (eg, no extra > direct-streams needed to mimic Storm's data distribution patterns) > - fault-tolerance (anchoring/acking) can be done on a tuple granularity > (not on a batch granularity, what leads to much more replayed tuples -- and > result duplicates -- in case of failure) > The aim is to extend TopologyBuilder interface with an additional parameter > 'batch_size' to expose this feature to the user. Per default, batching will > be disabled. > This batching feature has pure tuple transport purpose, ie, tuple-by-tuple > processing semantics are preserved. An output batch is assembled at the > producer and completely disassembled at the consumer. The consumer output can > be batched again, however, independent of batched or non-batched input. Thus, > batches can be of different size for each producer-consumer pair. > Furthermore, consumers can receive batches of different size from different > producers (including regular non batched input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)