[GitHub] storm pull request: Disruptor batching v2

2015-10-29 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-152178557
  
@mjsax I am still very interested in #694 if it makes thing better.  I am 
skeptical that it will add much over this does because they both cover similar 
areas, but if it does then we should try to use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-29 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-152122558
  
I did not find time to rerun the tests and fix #694. If you want to move on 
with this PR, just go ahead... I will still try to get #694 in better shape --- 
still hope I can get it accepted. (If you are still interested.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-28 Thread kishorvpatil
Github user kishorvpatil commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-151942417
  
@d2r @knusbaum @mjsax  @harshach @HeartSaVioR @ptgoetz 
Now that #797 is in. We should consider pulling in batching solution. Can 
you please review  this PR and pass on your opinion?

I am still +1 on this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-27 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-151596007
  
Just rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-22 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-150163728
  
Great analysis!
I would like to run this tests, too. Especially to work on the flushing of 
batches. It's a pity that too many tuples times out in your test. To get a fair 
comparison, I would like to fix this, and redo the test.

Btw: I am still wondering, you #694 behaves so badly with low throughput. 
From a conceptual point of view there is no reason for this. Maybe my code is 
not good enough (as I am a Clojure newbe). Hope I can reproduce your result and 
get better insight into the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-22 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-150252797
  
@mjsax At least in my profiling the next largest consumer of CPU in word 
count, without acking enabled, after adding in batching is the metrics.  They 
are horribly slow.  I assumed that because Aeolus was only counting batches, 
not individual tuples that it mitigated a lot of this slowness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-22 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-150250711
  
I am not sure how the metric relate to it... Furthermore, I also used 
Aeolus on the current master and get 6x improvement (with acking disabled -- 
acking is currently not supported by Aeolus).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-22 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-150247138
  
@mjsax If you want to take some time to debug things for a better 
comparison I am happy to wait. It is entirely possible that I somehow messed 
things up when I was enabling hybrid-mode so getting and apples to apples 
comparison is important.  I want the best final solution no matter what it is, 
but I also don't want to wait forever.  For efficiency and cluster utilization 
purposes I really would like to pull in batching into the clusters that I run, 
but because the configuration/interfaces are incompatible with one another 
between the two approaches I don't want to do it until open source has decided 
on a direction they want to go in.

In the mean time I am going to be working on improving the metrics 
gathering.  You said previously that with your first batching attempt you saw 
up to 6x the throughput.  I suspect a lot of that is because the metrics are so 
very very slow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-22 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-150255113
  
Make sense. It might be worth explore counting in batches... (ie, increase 
a counter by "batch-size" for each batch). So we still get the correct value. 
Of course the metric will not increase slowly over time but "jump" in larger 
intervals. Not sure if this i acceptable behavior and/or what other metrics are 
there that cannot be updated in a "batch" fashion... count might be a too 
simple example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-21 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-149987537
  
So I have been running a number of tests trying to come to a conclusive 
decision on how storm should handle batching, and trying to understand the 
difference between my test results and the test results from #694. I ran the 
word count test I wrote as a part of #805 on a 35 node storm cluster.  This was 
done against several different storm versions, the baseline in the #805 pull 
request; this patch + #805 (batch-v2); and #694 + #805 + modifications to use 
the hybrid approach to enable acking and batch to work in a multi-process 
topology (STORM-855).  To avoid having all of the numbers be hard to parse I am 
just going to include some charts, but if anyone wants to see the raw numbers 
or reproduce it themselves I am happy to provide data and/or branches.

The numbers below were collected after the topology had been running for at 
least 200 seconds.  This is to avoid startup issues like JIT etc.  I filtered 
out any 30 second interval where the measured throughput was not +/- 10% of the 
target throughput on the assumption that if the topology cannot keep up with 
the desired throughput or it was trying to catch up from previous slowness it 
would not be within that range.  I did not filter based off of the number of 
failures that happened, simply because that would have resulted in removing all 
of the STORM-855 with batching enabled results.  None of the other test 
configurations saw any failures at all during testing.


![throughput-vs-latency](https://cloud.githubusercontent.com/assets/3441321/10644336/d0393222-77ed-11e5-849a-0b6be6ac5178.png)

This shows the 99%-ile latency vs measured throughput.  It is not too 
interesting except to note that batching in STORM-855 at low throughput 
resulted in nothing being fully processed.  All of the tuples timed out before 
they could finish.  Only at a medium throughput above 16,000 sentences/second 
were we able to maintain enough tuples to complete batches regularly, but even 
then many tuples would still time out. This should be able to be fixed with a 
batch timeout, but that is not implemented yet.

To get a better view I adjusted the latency to be a log scale.


![throughput-vs-latency-log](https://cloud.githubusercontent.com/assets/3441321/10644335/d02ab29c-77ed-11e5-883e-a647f6b4279b.png)

From this we can see that on the very low end batching-v2 is increasing the 
99%-ile latency from 5-10 ms to 19-21 ms.  Most of that you can get back by 
configuring the batch size to 1, instead of the default 100 tuples.  However, 
once the baseline stops functioning at around 7000 sentences/sec the batching 
code is able to continue working, with either a batch size of 1 or 100.  I 
believe that this has to do with the automatic backpressure.  In the baseline 
code backpressure does not take into account the overflow buffer, but in the 
batching code it does.  I think this gives the topology more stability in 
maintaining a throughput, but I don't have any solid evidence for that.

I then zoomed in on the graphs to show what a 2 second SLA would look like


![throughput-vs-latency-2-sec](https://cloud.githubusercontent.com/assets/3441321/10644332/d0176f5c-77ed-11e5-98c4-d2e7a9e48c70.png)

and a 100 ms SLA.


![throughput-vs-latency-100-ms](https://cloud.githubusercontent.com/assets/3441321/10644334/d0291540-77ed-11e5-9fb3-9c9c97f504f9.png)

In both cases the batching v2 with a batch size of 100 was able to handle 
the highest throughput for that given latency.

Then I wanted to look at memory and CPU Utilization.


![throughput-vs-mem](https://cloud.githubusercontent.com/assets/3441321/10644337/d03c3094-77ed-11e5-8cda-cf53fe3a2389.png)

Memory does not show much, the amount of memory used varies a bit from one 
to the other, but if you realize this is for 35 worker processes it is varying 
from 70 MB/worker to about 200 MB/worker.  The numbers simply show that as the 
throughput increases the memory utilizations does too, and it does not vary too 
much from one implementation to another.


![throughput-vs-cpu](https://cloud.githubusercontent.com/assets/3441321/10645834/6ba799e0-77f5-11e5-88fd-7e09475a5b6c.png)

CPU however shows that on the low end we are going from 7 or 8 cores worth 
of CPU time to about 35 cores worth for the batching code.  This seems to be 
the result of the batch flushing threads waking up periodically.  We should be 
able to mitigate this by adjusting that interval to be larger, but that would 
in turn impact the latency.

I believe that with further work we should be able to reduce that CPU 
utilization and the latency on the low end by dynamically adjusting the batch 
size and timeout based off of a specified SLA.  At this point I feel this 
branch is ready for a formal 

[GitHub] storm pull request: Disruptor batching v2

2015-10-15 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-148548577
  
Thanks for clarification. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-15 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-148544657
  
@mjsax 

What I saw when testing STORM-855 was that the maximum throughput was cut 
almost in half from 10,000 sentences per second to 5,500.  But your numbers 
showed maximum throughput more than doubling from around 7,960,300 tuples sent 
in 30 seconds to 16,347,100 in the same time period (no-acking).  And 1,832,160 
in 30 seconds to 2,323,580 an increase of 25% with acking.

To me this feels like a contradiction. The only thing I can think of is 
that the messaging layer is so scary slow that cutting the maximum throughput 
of a worker by half has no impact on the overall performance if it can double 
the throughput of the messaging layer, by doing more batching.

This is likely the case, as on the high end 16,347,100 / 30 seconds / 24 
workers is about 22,000 tuples per second per worker, where as 5,500 sentences 
per second results in about 181,500 total tuples per second/worker being 
processed.

I'm just looking for feedback from others on this, but it looks like I need 
to do a distributed apples to apples comparison as well to see the impact the 
messaging layer has.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-15 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-148540526
  
I had a look at the numbers. It's a lot of stuff and hard to parse for 
me... I am not sure what you mean by "contradict the numbers"? Can you explain 
in more detail?

However, I agree with your thought about STORM-855 #694 It basically 
reduces the contention on the output queue, because less calls are made here. I 
profiled Storm once and observed that on high-end data-rates (when we hit the 
wall) the contention on the output-queue is the bottleneck (the writing and 
reading thread have to lock the queue and a lot of waiting for the lock 
consumes a fair share of the consumed time).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-10-10 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-147124664
  
I have some new test results.  I did a comparison of several different 
branches.  I looked at this branch, the upgraded-disruptor branch #750, 
STORM-855 #694, and apache-master 0.11.0-SNAPSHOT 
(04cf3f6162ce6fdd1ec13b758222d889dafd5749).  I had to make a few modifications 
to get my test to work.  I applied the following patch 
https://gist.github.com/revans2/84301ef0fde0dc4fbe44 to each of the branches.  
For STORM-855 I had to modify the test a bit so it would optionally do 
batching.  In that case batching was enabled on all streams and all spouts and 
bolts.

I then ran the test at various throughputs 100, 200, 400, 800, 1600, 3200, 
6400, 1, 12800, 25600. and possibly a few others when looking for it to hit 
the maximum throughput, and different batch sizes.

Each test ran for 5 mins.  Here is the results of that test, excluding the 
tests where the worker could not keep up with the rate.

| 99%-ile ns | 99.9%-ils ns | throughput | branch-batch | mean latency ns | 
avg service latency ms | std-dev ns |
|---|---|---|---|---|---|---|
| 2,613,247 | 4,673,535 | 100 | STORM-855-0 | 2,006,347.25 | 1.26 | 
2,675,778.36 |
| 2,617,343 | 4,423,679 | 200 | STORM-855-0 | 1,991,238.45 | 1.29 | 
2,024,687.45 |
| 2,623,487 | 5,619,711 | 400 | STORM-855-0 | 1,999,926.81 | 1.24 | 
1,778,335.92 |
| 2,627,583 | 4,603,903 | 1600 | STORM-855-0 | 1,971,888.24 | 1.30 | 
893,085.40 |
| 2,635,775 | 8,560,639 | 800 | STORM-855-0 | 2,010,286.65 | 1.35 | 
2,134,795.12 |
| 2,654,207 | 302,252,031 | 3200 | STORM-855-0 | 2,942,360.75 | 2.13 | 
16,676,136.60 |
| 2,684,927 | 124,190,719 | 3200 | batch-v2-1 | 2,154,234.45 | 1.41 | 
6,219,057.66 |
| 2,701,311 | 349,700,095 | 5000 | batch-v2-1 | 2,921,661.67 | 1.78 | 
18,274,805.30 |
| 2,715,647 | 7,356,415 | 100 | storm-base-1 | 2,092,991.53 | 1.30 | 
2,447,956.21 |
| 2,723,839 | 4,587,519 | 400 | storm-base-1 | 2,082,835.21 | 1.31 | 
1,978,424.49 |
| 2,723,839 | 6,049,791 | 100 | dist-upgraade-1 | 2,091,407.68 | 1.31 | 
2,222,977.89 |
| 2,725,887 | 10,403,839 | 1600 | batch-v2-1 | 2,010,694.30 | 1.27 | 
2,095,223.90 |
| 2,725,887 | 4,607,999 | 200 | storm-base-1 | 2,074,784.50 | 1.30 | 
1,951,564.93 |
| 2,727,935 | 4,513,791 | 200 | dist-upgraade-1 | 2,082,025.31 | 1.33 | 
2,057,591.08 |
| 2,729,983 | 4,182,015 | 400 | dist-upgraade-1 | 2,056,282.29 | 1.43 | 
862,428.67 |
| 2,732,031 | 4,632,575 | 800 | storm-base-1 | 2,092,514.39 | 1.27 | 
2,231,550.66 |
| 2,734,079 | 4,472,831 | 800 | dist-upgraade-1 | 2,095,994.08 | 1.28 | 
1,870,953.62 |
| 2,740,223 | 4,192,255 | 200 | batch-v2-1 | 2,011,025.19 | 1.21 | 
911,556.19 |
| 2,742,271 | 4,726,783 | 1600 | storm-base-1 | 2,089,581.40 | 1.35 | 
2,410,668.79 |
| 2,748,415 | 4,444,159 | 400 | batch-v2-1 | 2,055,600.78 | 1.34 | 
1,729,257.92 |
| 2,748,415 | 4,575,231 | 100 | batch-v2-1 | 2,035,920.21 | 1.31 | 
1,213,874.52 |
| 2,754,559 | 16,875,519 | 1600 | dist-upgraade-1 | 2,098,441.13 | 1.35 | 
2,279,870.41 |
| 2,754,559 | 3,969,023 | 800 | batch-v2-1 | 2,026,222.88 | 1.29 | 
767,491.71 |
| 2,793,471 | 53,477,375 | 3200 | storm-base-1 | 2,147,360.05 | 1.42 | 
3,668,366.37 |
| 2,801,663 | 147,062,783 | 3200 | dist-upgraade-1 | 2,358,863.31 | 1.59 | 
7,574,577.81 |
| 13,344,767 | 180,879,359 | 6400 | batch-v2-100 | 11,319,553.69 | 10.62 | 
7,777,381.54 |
| 13,369,343 | 15,122,431 | 3200 | batch-v2-100 | 10,699,832.23 | 10.02 | 
1,623,949.38 |
| 13,418,495 | 15,392,767 | 800 | batch-v2-100 | 10,589,813.17 | 9.86 | 
2,439,134.80 |
| 13,426,687 | 14,680,063 | 400 | batch-v2-100 | 10,738,973.68 | 10.03 | 
2,298,229.99 |
| 13,484,031 | 14,368,767 | 200 | batch-v2-100 | 10,941,653.28 | 10.20 | 
2,471,899.43 |
| 13,508,607 | 14,262,271 | 100 | batch-v2-100 | 11,099,257.68 | 10.35 | 
1,658,054.66 |
| 13,524,991 | 14,376,959 | 1600 | batch-v2-100 | 10,723,471.83 | 10.00 | 
1,477,621.07 |
| 346,554,367 | 977,272,831 | 12800 | batch-v2-100 | 18,596,303.93 | 15.59 
| 78,326,501.83 |
| 710,934,527 | 827,326,463 | 4000 | STORM-855-100 | 351,305,653.90 | 
339.28 | 141,283,307.30 |
| 783,286,271 | 1,268,776,959 | 5000 | STORM-855-100 | 332,417,358.65 | 
312.07 | 139,760,316.82 |
| 888,668,159 | 1,022,361,599 | 3200 | STORM-855-100 | 445,646,342.60 | 
431.55 | 179,065,279.65 |
| 940,048,383 | 1,363,148,799 | 6400 | storm-base-1 | 20,225,300.17 | 17.17 
| 134,848,974.52 |
| 1,043,333,119 | 1,409,286,143 | 1 | batch-v2-1 | 22,750,840.18 | 6.13 
| 146,235,076.73 |
| 1,209,008,127 | 1,786,773,503 | 6400 | dist-upgraade-1 | 28,588,397.01 | 
24.70 | 181,801,409.69 |
| 1,747,976,191 | 1,946,157,055 | 1600 | STORM-855-100 | 738,741,774.85 | 
734.75 | 374,194,675.56 |
| 2,642,411,519 | 3,124,756,479 | 2 | batch-v2-100 | 133,706,248.88 | 

[GitHub] storm pull request: Disruptor batching v2

2015-10-10 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-147125596
  
Sorry I forgot to mention that for one test STORM-855-100 was able to run 
at a throughput of 6000 sentences/second successfully, but in other runs it 
failed, so I set the number to 5500, even though a 6000 is shown in the results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-30 Thread kishorvpatil
Github user kishorvpatil commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-144429694
  
Ran some sanity tests. LGTM. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-30 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40801096
  
--- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
@@ -123,12 +123,13 @@
 port (:port worker)
 storm-cluster-state (:storm-cluster-state worker)
 prev-backpressure-flag @(:backpressure worker)]
-(if executors 
-  (if (reduce #(or %1 %2) (map #(.get-backpressure-flag %1) 
executors))
-(reset! (:backpressure worker) true)   ;; at least one 
executor has set backpressure
-(reset! (:backpressure worker) false))) ;; no executor has 
backpressure set
+(when executors
+  (reset! (:backpressure worker)
+  (or @(:transfer-backpressure worker)
--- End diff --

This was actually a bug, and I am going to file a JIRA for this.  Having it 
reuse the worker backpressure flag, made it turn off backpressure too soon in 
some cases, and actually have it get stuck off when it should be on in other 
cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-30 Thread zhuoliu
Github user zhuoliu commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40803217
  
--- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
@@ -123,12 +123,13 @@
 port (:port worker)
 storm-cluster-state (:storm-cluster-state worker)
 prev-backpressure-flag @(:backpressure worker)]
-(if executors 
-  (if (reduce #(or %1 %2) (map #(.get-backpressure-flag %1) 
executors))
-(reset! (:backpressure worker) true)   ;; at least one 
executor has set backpressure
-(reset! (:backpressure worker) false))) ;; no executor has 
backpressure set
+(when executors
+  (reset! (:backpressure worker)
+  (or @(:transfer-backpressure worker)
--- End diff --

Previously, we have lowWaterMark function as "do nothing" for worker 
transfer queue's callback, which will not turn off the backpressure when the 
worker's queue population is below the lowWaterMark. On the other hand, it is 
possible that all executors all have population above above lowWaterMark but 
the worker's queue population is still above highWaterMark, and the 
backpressure flag gets cleared "too often". Anyway, I think this is a good fix, 
together with the _throttleOn flag in DisruptorQueue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-30 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40802660
  
--- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
@@ -290,6 +290,7 @@
   :transfer-fn (mk-transfer-fn <>)
   :assignment-versions assignment-versions
   :backpressure (atom false) ;; whether this worker is going slow
+  :transfer-backpressure (atom false) ;; whether this worker is going 
slow
--- End diff --

Yes you are correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-30 Thread zhuoliu
Github user zhuoliu commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40803550
  
--- Diff: storm-core/src/clj/backtype/storm/cluster.clj ---
@@ -489,10 +489,10 @@
 (let [path (backpressure-path storm-id node port)
   existed (exists-node? cluster-state path false)]
   (if existed
-(if (not on?)
-  (delete-node cluster-state path))   ;; delete the znode 
since the worker is not congested
-(if on?
-  (set-ephemeral-node cluster-state path nil acls) ;; 
create the znode since worker is congested
+(when (not on?)
--- End diff --

This is OK, never mind.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-30 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40800874
  
--- Diff: storm-core/src/clj/backtype/storm/cluster.clj ---
@@ -489,10 +489,10 @@
 (let [path (backpressure-path storm-id node port)
   existed (exists-node? cluster-state path false)]
   (if existed
-(if (not on?)
-  (delete-node cluster-state path))   ;; delete the znode 
since the worker is not congested
-(if on?
-  (set-ephemeral-node cluster-state path nil acls) ;; 
create the znode since worker is congested
+(when (not on?)
--- End diff --

It is mostly that I put in some debugging and didn't want an ```(if (do 
(log-message) ...)))``` to fix an issue, and ended up not changing it back 
after removing the debug logging.  If you want me to change it back and add 
back in the comments I can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-29 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-144059813
  

@mjsax The difference is not that huge yet.  It is 2x in a number of use 
cases.  But we have a lot of slowness in the metrics processing code, which, at 
least in the case of word count, is dominating the performance, which is 
probably why you were not seeing a 6x performance improvement in your pull like 
you were seeing before.  I am going to be doing some follow on optimization 
work on that.  The big thing right now is that the System CPU utilization went 
from 15-30% depending on the OS, etc. to 2-6%.  This frees up a huge amount of 
CPU for other processing, so the overall throughput capability of a node 
increased significantly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-29 Thread zhuoliu
Github user zhuoliu commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40731960
  
--- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
@@ -137,14 +138,11 @@
   check highWaterMark and lowWaterMark for backpressure"
   (disruptor/disruptor-backpressure-handler
 (fn []
-  "When worker's queue is above highWaterMark, we set its backpressure 
flag"
-  (if (not @(:backpressure worker))
-(do (reset! (:backpressure worker) true)
-(WorkerBackpressureThread/notifyBackpressureChecker 
(:backpressure-trigger worker)  ;; set backpressure no matter how the 
executors are
+   (reset! (:transfer-backpressure worker) true)
+   (WorkerBackpressureThread/notifyBackpressureChecker 
(:backpressure-trigger worker)))
--- End diff --

Will this cause the WorkerBackpressureThread gets too frequently notified 
even when it is not necessary to do so?
(In most cases, both the :backpressure and :transfer-backpressure have not 
changed, we may not really need to notify the backpressure-thread to check all 
executors's flags, which can be additional overhead.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-29 Thread zhuoliu
Github user zhuoliu commented on a diff in the pull request:

https://github.com/apache/storm/pull/765#discussion_r40730661
  
--- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
@@ -123,12 +123,13 @@
 port (:port worker)
 storm-cluster-state (:storm-cluster-state worker)
 prev-backpressure-flag @(:backpressure worker)]
-(if executors 
-  (if (reduce #(or %1 %2) (map #(.get-backpressure-flag %1) 
executors))
-(reset! (:backpressure worker) true)   ;; at least one 
executor has set backpressure
-(reset! (:backpressure worker) false))) ;; no executor has 
backpressure set
+(when executors
+  (reset! (:backpressure worker)
+  (or @(:transfer-backpressure worker)
--- End diff --

I agree, it is nice to add a separate flag for the worker's transfer queue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-28 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-143872114
  
I am just rebasing the code now so you can test that out yourself.  This 
code has no issues with acking.  But there are a few real issues.

First the latency on low throughput queues is much higher.  This is because 
it has to wait for the batch to time out.  That timeout is set to 1 ms by 
default, so it is not that bad, but we should be able to do some on the fly 
adjustments in a follow on JIRA to dynamically adjust the batch size for each 
queue to compensate.

Second the number of threads used is a lot more.  1 more per disruptor 
queue.  I expect to reduce the total number of disruptor queues once I have 
optimized other parts of the code.  As it stands right now I don't want to do 
that, because the two queues per bolt/spout design still improves performance 
in many cases.

Third in the worst case situation it is possible to allocate many more 
objects than previously.  It is not actually that many more, we already 
allocate a lot of objects, which needs to be looked at on a separate JIRA at 
some point.

Also I don't want to shove this code in without doing a real comparison 
between the two approaches and the code.  This is one way of doing batching, 
but there are others that may have advantages over this, or may compliment this 
approach as well.  I just want storm to eventually be a lot closer to the 1 
million sentences/second mark than it is now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Disruptor batching v2

2015-09-28 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/storm/pull/765#issuecomment-143900063
  
Curious to see first performance results :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---