Github user mjsax commented on the pull request:
https://github.com/apache/storm/pull/694#issuecomment-134450197
I just pushed some changes (I added a new commit, so you can better see
what I changed):
- added type hints
- split tuple and batch serialization in separate classes
- assemble different "emit function" in Clojure for single tuple and batch
case (to add more type hints)
I get the following result running on a 4 node cluster with parameters:
`storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
com.yahoo.storm.perftest.Main --bolt 3 --name test -l 1 -n 1 --messageSize 4
--workers 4 --spout 1 --testTimeSec 300`
Master Branch:
```
status topologies totalSlots slotsUsed totalExecutors
executorsWithMetrics time time-diff ms transferred throughput
(MB/s)
WAITING 1 48 0 4 0 1440466170638 0 0
0.0
WAITING 1 48 4 4 4 1440466200638 30000
6122840 0.7785593668619791
WAITING 1 48 4 4 4 1440466230638 30000
11565400 1.4706166585286458
RUNNING 1 48 4 4 4 1440466260638 30000
11394040 1.4488271077473958
RUNNING 1 48 4 4 4 1440466290638 30000
11718240 1.49005126953125
RUNNING 1 48 4 4 4 1440466320638 30000
11615920 1.4770406087239583
RUNNING 1 48 4 4 4 1440466350638 30000
11557380 1.4695968627929688
RUNNING 1 48 4 4 4 1440466380638 30000
11581080 1.4726104736328125
RUNNING 1 48 4 4 4 1440466410638 30000
11492600 1.4613596598307292
RUNNING 1 48 4 4 4 1440466440638 30000
11413760 1.4513346354166667
RUNNING 1 48 4 4 4 1440466470638 30000
11300580 1.4369430541992188
RUNNING 1 48 4 4 4 1440466500638 30000
11368760 1.4456125895182292
RUNNING 1 48 4 4 4 1440466530638 30000
11509820 1.463549296061198
```
Batching branch with batching disabled:
```
status topologies totalSlots slotsUsed totalExecutors
executorsWithMetrics time time-diff ms transferred throughput
(MB/s)
WAITING 1 48 0 4 0 1440467016767 0 0
0.0
WAITING 1 48 4 4 4 1440467046767 30000
7095940 0.9022954305013021
WAITING 1 48 4 4 4 1440467076767 30000
11136640 1.4160970052083333
RUNNING 1 48 4 4 4 1440467106767 30000
11159220 1.4189682006835938
RUNNING 1 48 4 4 4 1440467136767 30000
7757660 0.9864374796549479
RUNNING 1 48 4 4 4 1440467166767 30000
11375580 1.4464797973632812
RUNNING 1 48 4 4 4 1440467196767 30000
11669980 1.4839146931966145
RUNNING 1 48 4 4 4 1440467226767 30000
11344380 1.4425125122070312
RUNNING 1 48 4 4 4 1440467256767 30000
11521460 1.4650293986002605
RUNNING 1 48 4 4 4 1440467286767 30000
11401040 1.4497172037760417
RUNNING 1 48 4 4 4 1440467316767 30000
11493700 1.461499532063802
RUNNING 1 48 4 4 4 1440467346767 30000
11452680 1.4562835693359375
RUNNING 1 48 4 4 4 1440467376767 30000
11148300 1.4175796508789062
```
Batching branch with batch size of 100 tuples:
```
status topologies totalSlots slotsUsed totalExecutors
executorsWithMetrics time time-diff ms transferred throughput
(MB/s)
WAITING 1 48 1 4 0 1440467461710 0 0
0.0
WAITING 1 48 4 4 4 1440467491710 30000
11686000 1.4859517415364583
WAITING 1 48 4 4 4 1440467521710 30000
18026640 2.292205810546875
RUNNING 1 48 4 4 4 1440467551710 30000
17936300 2.2807184855143228
RUNNING 1 48 4 4 4 1440467581710 30000
18969300 2.4120712280273438
RUNNING 1 48 4 4 4 1440467611710 30000
18581620 2.3627751668294272
RUNNING 1 48 4 4 4 1440467641711 30001
18963120 2.4112050268897285
RUNNING 1 48 4 4 4 1440467671710 29999
18607200 2.3661067022546587
RUNNING 1 48 4 4 4 1440467701710 30000
19333620 2.4583969116210938
RUNNING 1 48 4 4 4 1440467731710 30000
18629100 2.3688125610351562
RUNNING 1 48 4 4 4 1440467761711 30001
18847820 2.3965443624209923
RUNNING 1 48 4 4 4 1440467791710 29999
18021400 2.291615897287722
RUNNING 1 48 4 4 4 1440467821710 30000
18143360 2.3070475260416665
```
The negative impact is gone and batching increases output rate by about
50%. Need to do more tests. Also need to investigate the performance impact of
input debachting. Furthermore, need to test with acking enabled.
Some more question:
- What about `assert-can-serialize`? Is it performance critical? Did not
test it, but it seems that a generic approach for tuple and batch should be
good enough.
- I also do not understand the following code (line 648-622 in
`executor.clj` in batching branch). I used batching, but did not modify this
code. Nevertheless it works (I guess this part is not executed?). Can you
explain?
```
(task/send-unanchored task-data
ACKER-INIT-STREAM-ID
[root-id (bit-xor-vals out-ids)
task-id]
overflow-buffer)
```
- What about batching `acks`? Would it make sense? I don't understand the
acking code path good enough right now to judge. As acking is quite expensive,
it might be a good idea.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---