[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-02-20 Thread zodvik
Github user zodvik commented on the issue:

https://github.com/apache/storm/pull/2203
  
Was the concern about "My concern here is that most of what we are truly 
encoding in the name is metadata" addressed? 

Is the expectation that the metric name in dashboards is used like 
`storm.topology.{topology ID}.*.*.*.*-myCounter`?



---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-21 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz 
OK great. I filed an issue for documentation: 
https://issues.apache.org/jira/browse/STORM-2904
Please handle this issue. I think we can merge this now.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-21 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR Absolutlely! I just want to make sure we are all on board with 
the changes.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-21 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz 
Could we have document for this? It's OK you would want to document it 
after merging this, but follow-up issue should be blocker for the release.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-21 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
**With Acking, Workers=1**

TVL (rate=500k)| lat | throughput
-- | -- | --
1.x | 88.5ms | 52,913
v2-metrics | 91ms | 58,623


Const->Id->Null  | Throughput | avg lat
-- | -- | --
1.x base | 152.6k | 3.3
v2-metrics | 153.5k | 2.8


- Numbers  look good for v2-metrics. Didnt run numbers for 
v2-metrics-no-concurrent-map binary.
- TVL was run with default number of ackers. C-I->N was run with 1 instance 
of each spout & bolt & acker.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-21 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
**ACKers=0,  Workers=1**


Const->Id->Null | 1.x | v2-metrics | v2-no-cm
-- | -- | -- | --
elapsed (sec) | Throughput | Throughput | Throughput
60 | 553,881 | 557,223 | 294,525
120 | 644,973 | 741,814 | 391,532
180 | 355,677 | 419,571 | 298,619
240 | 126,078 | 157,305 | 289,647
300 | 79,578 | 85,359 | 317,680
360 | 50,575 | 48,993 | 201,650
420 | 32,841 | 25,075 | 136,033
480 | 17,437 | 13,695 | 56,827
540 | 9,572 | 6,871 | 67,272



TVL (rate=500k) | 1.x | v2-metrics | v2-no-cm
-- | -- | -- | --
elapsed (sec) | throughput | throughput | throughput
31 | 88,941.94 | 74,339.33 | 76,130.32
61 | 261,134.00 | 198,329.68 | 220,800.00
92 | 83,889.68 | 131,720.67 | 128,738.00
122 | 42,874.67 | 55,596.67 | 52,115.33
152 | 25,074.00 | 13,489.33 | 26,921.33
182 | 6,610.67 | -473,302.67 | 6,022.67
212 | 0 | 190,620.67 | -475,712.26
242 | -369,348.67 | 153,038.67 | 128,894.00
272 | 238,574.67 | 118,828.67 | 194,515.33
302 | 116,838.00 | 36,540.00 | 96,574.00
332 | 30,613.33 | 3,562.67 | 75,434.67
362 | 3,744.00 | -497,488.67 | 24,842.67
392 | -450,178.67 | 215,857.33 | 9,878.67
422 | 253,464.67 | 202,255.33 | 5,750.00
452 | 131,784.00 | 53,916.00 | 1,986.67
482 | 25,712.00 | 12,618.00 | -544,343.33
512 | 25,050.67 | 5,380.67 | 189,269.33
542 | 5,532.67 | -489,704.00 | 244,687.74
572 | 492 | 257,200.00 | 34,429.33
602 | -507,870.00 | 193,524.67 | 35,392.67
632 | 205,754.00 | 42,620.67 | 990.67


- In both topos and all binaries we see rapid degradation in  throughput 
eventually leading to worker dying and getting restarted. As mentioned before, 
this is not a new concern for 1.x but being addressed in Storm 2.0.
- The workers for the v2-no-cm binaries ran a little longer than the others 
before dying.



---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-21 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
It seems like there is growing consensus that performance is good.

Are there any objections to merging this?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-19 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
Just wanted to unblock things here ... will post summarized numbers from my 
runs by tomorrow  here is a very brief summary.

I ran both TVL  and ConstSpoutIdBoltNullBoltTopo (with and without ACKing) 
in single worker mode. 

**With ACKING**:  The latency and throughput numbers are slightly better 
for this patch.

**Without ACKING**: In case of both binaries, the worker dies & restarts 
after a few every minutes of running if we let the topo run without any 
throttling. (a known issue that will get fixed in STORM-2306 for Storm 2.0) The 
numbers taken when the worker is running generally indicate that this patch is 
most likely slower wrt the peak throughput.  IMO, For Storm1.x this is not an 
issue as at higher throughputs worker is going to keep dying. But for Storm 
2.0, once we have STORM-2306, it will be possible to better measure the 
throughput and a fix may be necessary.

In short, my runs indicate that things are looking good for this patch wrt 
Storm 1.x.  May need to revisit perf for Storm 2.0

As mentioned before, will post summarized numbers later.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-19 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@arunmahadevan 
@roshannaik is going to do some performance tests. We could wait his input 
in a day or a bit more.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-19 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2203
  
+1 again.

@HeartSaVioR , based on the TVL numbers I interpret that the performance 
numbers (throughput and latency) are comparable to 1.x branch.  In that case 
can we merge this patch? If we find any other issues during RC testing we can 
take a call.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-19 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
Again I don't have much time to do perf test with details. I increased rate 
to 10 (with max spout 5000) which the all the branches even can't catch up 
the rate. I lowered rate to 85000 (with max spout 5000) which looks keeping the 
rate. Pasting raw values.

>> 1.2.0 (b8f76af)

1.

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
31 | 437,240 | 14,104.52 | 0 | 6,048,186,367 | 6,358,564,863 | 
1,769,996,288 | 6,715,080,703 | 4,566,189,266.37 | 885,893,684.01 | 162,030 | 
5,320 | 0 | 848.94
61 | 2,313,000 | 77,100.00 | 0 | 12,280,922,111 | 12,708,741,119 | 
4,894,752,768 | 13,186,891,775 | 8,458,374,872.10 | 1,323,986,260.94 | 303,800 
| 15,480 | 32,612 | 1,283.47
91 | 2,357,320 | 78,577.33 | 0 | 17,137,926,143 | 18,119,393,279 | 
7,637,827,584 | 18,605,932,543 | 11,163,048,118.57 | 2,259,041,300.64 | 303,190 
| 16,620 | 32,479 | 1,266.13
121 | 2,341,220 | 78,040.67 | 0 | 22,179,479,551 | 22,682,796,031 | 
9,277,800,448 | 22,984,785,919 | 13,752,645,112.96 | 3,421,493,870.62 | 300,820 
| 16,360 | 32,251 | 1,534.90
151 | 2,458,660 | 81,955.33 | 0 | 25,685,917,695 | 26,491,224,063 | 
11,333,009,408 | 27,514,634,239 | 16,290,749,165.18 | 4,629,241,671.60 | 
301,850 | 17,960 | 32,979 | 1,361.38
181 | 2,707,140 | 90,238.00 | 0 | 24,998,051,839 | 25,635,586,047 | 
9,554,624,512 | 26,675,773,439 | 16,656,627,804.48 | 4,532,406,841.16 | 314,300 
| 18,370 | 34,126 | 1,326.70
211 | 2,758,320 | 91,944.00 | 0 | 22,833,790,975 | 23,471,325,183 | 
6,601,834,496 | 24,729,616,383 | 15,588,408,898.78 | 4,512,253,905.57 | 309,680 
| 19,180 | 31,032 | 1,425.91
241 | 2,707,380 | 90,246.00 | 0 | 20,854,079,487 | 22,162,702,335 | 
3,615,490,048 | 23,018,340,351 | 13,927,540,232.07 | 4,854,717,857.60 | 308,130 
| 19,870 | 29,631 | 1,861.17
271 | 2,728,860 | 90,962.00 | 0 | 22,632,464,383 | 24,008,196,095 | 
845,152,256 | 25,216,155,647 | 12,358,302,921.24 | 5,927,445,795.86 | 306,950 | 
19,100 | 31,501 | 1,787.01
301 | 2,667,500 | 88,916.67 | 0 | 24,763,170,815 | 26,038,239,231 | 
17,498,112 | 27,447,525,375 | 11,183,029,224.17 | 7,159,481,478.40 | 308,470 | 
19,980 | 31,305 | 2,342.57


2.

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
30 | 578,720 | 19,290.67 | 0 | 7,402,946,559 | 7,662,993,407 | 266,469,376 
| 8,065,646,591 | 5,158,004,864.46 | 1,074,126,219.89 | 262,310 | 9,350 | 9,773 
| 1,602.33
60 | 2,372,600 | 79,086.67 | 0 | 11,668,553,727 | 12,054,429,695 | 
4,987,027,456 | 12,423,528,447 | 7,402,521,102.49 | 1,520,006,340.32 | 307,880 
| 14,850 | 33,679 | 1,237.30
90 | 2,437,040 | 81,234.67 | 0 | 15,216,934,911 | 15,653,142,527 | 
5,872,025,600 | 16,617,832,447 | 9,132,024,518.75 | 2,534,964,885.31 | 308,240 
| 15,100 | 35,179 | 754.82
120 | 2,543,240 | 84,774.67 | 0 | 16,475,226,111 | 16,944,988,159 | 
5,528,092,672 | 17,800,626,175 | 10,760,658,547.64 | 2,897,544,833.24 | 311,050 
| 17,130 | 33,127 | 1,370.09
150 | 2,636,720 | 87,890.67 | 0 | 14,310,965,247 | 14,982,053,887 | 
2,990,538,752 | 16,139,681,791 | 9,655,823,414.65 | 3,002,342,820.97 | 311,070 
| 18,610 | 32,034 | 1,531.01
180 | 2,752,440 | 91,748.00 | 0 | 14,571,012,095 | 15,997,075,455 | 
14,172,160 | 16,995,319,807 | 8,147,697,251.82 | 4,013,817,552.85 | 306,620 | 
19,910 | 29,342 | 1,275.41
210 | 2,713,300 | 90,443.33 | 0 | 15,032,385,535 | 16,022,241,279 | 
46,727,168 | 18,001,952,767 | 6,401,270,339.36 | 4,479,415,024.72 | 301,180 | 
20,950 | 27,750 | 1,848.82
240 | 2,692,920 | 89,764.00 | 0 | 15,820,914,687 | 16,827,547,647 | 
32,653,312 | 17,582,522,367 | 4,715,121,892.86 | 5,286,625,289.02 | 303,560 | 
21,070 | 27,904 | 1,181.89
271 | 2,726,920 | 87,965.16 | 0 | 14,579,400,703 | 15,619,588,095 | 
29,425,664 | 16,668,164,095 | 3,519,416,574.71 | 4,638,221,636.04 | 302,440 | 
22,290 | 27,634 | 2,023.50
301 | 2,769,580 | 92,319.33 | 0 | 7,050,625,023 | 7,931,428,863 | 7,737,344 
| 8,464,105,471 | 960,653,059.65 | 1,605,263,196.59 | 277,020 | 28,700 | 17,844 
| 1,250.71

3.

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
30 | 541,280 | 18,042.67 | 0 | 5,599,395,839 | 5,922,357,247 | 474,218,496 
| 6,257,901,567 | 4,213,619,216.15 | 685,663,910.05 | 157,680 | 5,270 | 0 | 
1,275.05
60 | 2,407,560 | 80,252.00 | 0 | 10,225,713,151 | 10,812,915,711 | 
3,972,005,888 | 11,307,843,583 | 6,705,396,513.71 | 1,066,545,879.38 | 302,660 
| 13,700 | 32,120 | 1,532.04
90 | 2,613,980 | 87,132.67 | 0 | 12,658,409,471 | 13,774,094,335 | 
5

[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-18 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
So far it seems only throughput impact has been measured. Impact on latency 
(if any) appears to be unknown.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-18 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
Without getting some perf numbers for TVL and ConstSpoutIDBoltNullBoltTopo 
with ACK=0 and workers=1 we wont know for sure if there are perf regressions.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-18 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz can you squash the commits and also add some user documentation 
for this?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-18 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz @HeartSaVioR , given that @revans2  was ok with the patch and the 
performance concerns have been addressed, I suggest we merge this in and start 
the RC process for Storm 1.2 release going. Meanwhile if @revans2 has 
additional comments, we can address it before the final RC.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-17 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Are there any additional changes you'd like to see for this? I'd 
like to move forward with a 1.2 release as well as start porting this to the 
master branch.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-12 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Just to be clear, which branch is used from your performance test: 
current or my updated branch? Just be sure that things are OK with current 
branch or need to adopt.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz 
Please review the change in HeartSaVioR/storm@16bfb84 and cherry-pick the 
commit when you think it worths. I'd be really appreciated if you could also 
spend time to run performance test in your side and share numbers to see we are 
happy with the patch. (1.x-branch vs metrics_v2 vs 
metrics_v2_experimental_no_concurrentmap)

@revans2 
Could you review this again, including performance perspective?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
I'm trying to optimize more since the number doesn't make me happy. 

I've applied pre-populating metrics (like @revans2 suggested) to avoid 
using ConcurrentHashMap.

https://github.com/HeartSaVioR/storm/commits/metrics_v2_experimental_no_concurrentmap

This commit 
(https://github.com/HeartSaVioR/storm/commit/16bfb84ebab98991fe4a8bd284dae9fe2bfe437d)
 represents the difference between 7947a07.

and performance test result from same environment (the number may 
fluctuate, as we know):

> 1.2.0-SNAPSHOT (9fbe283) - NOTE: ran earlier, just copied from above 
result

>> 1st

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
30 | 552,480 | 18,416.00 | 0 | 4,114,612,223 | 4,731,174,911 | 38,928,384 | 
5,448,400,895 | 1,954,055,956.00 | 966,653,713.81 | 156,170 | 5,290 | 0 | 686.05
60 | 1,578,480 | 52,616.00 | 0 | 1,695,547,391 | 2,191,523,839 | 6,631,424 
| 2,736,783,359 | 144,862,518.23 | 373,282,085.72 | 192,290 | 27,900 | 5,208 | 
824.31
91 | 1,505,000 | 48,548.39 | 0 | 36,732,927 | 63,275,007 | 6,549,504 | 
92,340,223 | 14,052,122.48 | 5,284,329.78 | 172,290 | 30,260 | 2,240 | 794.81
121 | 1,504,400 | 50,146.67 | 0 | 33,505,279 | 55,902,207 | 6,701,056 | 
107,020,287 | 13,687,527.65 | 4,682,294.46 | 169,990 | 30,770 | 2,248 | 818.05
151 | 1,503,640 | 50,121.33 | 0 | 30,801,919 | 42,827,775 | 6,594,560 | 
62,357,503 | 13,450,356.75 | 4,049,872.91 | 171,380 | 30,030 | 2,217 | 891.86
181 | 1,504,220 | 50,140.67 | 0 | 31,604,735 | 46,366,719 | 6,414,336 | 
72,417,279 | 13,549,466.35 | 4,224,069.48 | 173,460 | 30,120 | 2,214 | 817.73
211 | 1,503,580 | 50,119.33 | 0 | 33,816,575 | 46,858,239 | 6,545,408 | 
68,681,727 | 13,536,288.26 | 4,442,740.15 | 169,950 | 30,330 | 2,346 | 759.54
241 | 1,503,360 | 50,112.00 | 0 | 33,947,647 | 46,792,703 | 6,488,064 | 
68,812,799 | 13,633,138.42 | 4,510,106.94 | 171,560 | 30,490 | 2,456 | 791.93
271 | 1,052,500 | 35,083.33 | 0 | 28,016,639 | 39,452,671 | 6,586,368 | 
58,949,631 | 13,142,065.09 | 3,481,304.94 | 170,870 | 29,260 | 2,113 | 795.18
301 | 1,504,320 | 50,144.00 | 0 | 35,454,975 | 60,129,279 | 6,696,960 | 
82,575,359 | 13,781,332.77 | 4,965,559.58 | 171,430 | 30,730 | 2,372 | 826.15

>> 2nd

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
31 | 547,120 | 17,649.03 | 0 | 4,546,625,535 | 5,049,942,015 | 139,984,896 
| 5,536,481,279 | 2,700,418,416.88 | 836,528,892.27 | 157,850 | 4,930 | 0 | 
845.24
61 | 1,584,360 | 52,812.00 | 0 | 3,078,619,135 | 3,839,885,311 | 6,590,464 
| 4,284,481,535 | 248,409,981.64 | 629,496,874.97 | 195,980 | 27,490 | 5,697 | 
874.53
91 | 1,504,680 | 50,156.00 | 0 | 35,258,367 | 53,706,751 | 5,910,528 | 
91,488,255 | 13,589,168.15 | 4,802,833.49 | 174,400 | 29,550 | 2,183 | 871.71
121 | 1,503,900 | 50,130.00 | 0 | 32,538,623 | 52,789,247 | 6,533,120 | 
100,466,687 | 13,482,307.07 | 4,543,434.30 | 186,560 | 32,360 | 2,524 | 802.76
151 | 1,503,860 | 50,128.67 | 0 | 32,735,231 | 52,854,783 | 5,849,088 | 
97,976,319 | 13,396,304.62 | 4,506,368.26 | 188,430 | 31,780 | 2,514 | 785.88
181 | 1,503,340 | 50,111.33 | 0 | 29,802,495 | 41,156,607 | 6,737,920 | 
63,209,471 | 13,106,089.43 | 3,755,495.86 | 172,070 | 29,540 | 2,151 | 959.27
211 | 1,503,740 | 50,124.67 | 0 | 33,423,359 | 52,625,407 | 6,549,504 | 
85,000,191 | 13,426,039.55 | 4,508,334.58 | 187,360 | 32,380 | 2,525 | 869.87
241 | 1,578,440 | 52,614.67 | 0 | 32,129,023 | 50,102,271 | 6,549,504 | 
88,276,991 | 13,265,778.42 | 4,265,565.26 | 172,610 | 29,340 | 2,308 | 928.26
271 | 1,503,060 | 50,102.00 | 0 | 30,457,855 | 45,678,591 | 6,565,888 | 
73,662,463 | 13,237,281.21 | 4,002,588.03 | 173,180 | 30,550 | 2,291 | 870.36
301 | 1,541,260 | 51,375.33 | 0 | 30,932,991 | 44,498,943 | 6,569,984 | 
67,010,559 | 13,223,793.77 | 4,002,853.30 | 172,220 | 29,830 | 2,302 | 1,032.96

>> 3rd

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
30 | 548,680 | 18,289.33 | 0 | 4,353,687,551 | 5,096,079,359 | 6,729,728 | 
5,742,002,175 | 1,426,311,406.48 | 1,230,114,275.09 | 232,920 | 14,020 | 4,188 
| 999.57
60 | 1,586,360 | 52,878.67 | 0 | 49,643,519 | 85,590,015 | 6,541,312 | 
142,082,047 | 15,200,355.35 | 7,698,991.41 | 180,110 | 33,300 | 2,347 | 1,068.73
90 | 1,504,920 | 50,164.00 | 0 | 35,487,743 | 54,067,199 | 6,561,792 | 
95,354,879 | 13,937,343.35 | 4,915,338.14 | 177,210 | 33,140 | 2,256 | 929.59
120 | 1,504,980 | 50,166.00 | 0 | 34,340,863 | 54,558,719 | 6,60

[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
Yes, my report mainly represents that the critical perf issue observed from 
Bobby has been resolved, but may still need to try out more if we want to go 
with detailed comparison.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR to see the perf impact on throughput, you want to see what is 
its peak ability by setting a high rate... then see what is the highest rate 
each binary achieves. Here both seem to be able to easily achieve the 50k rate 
and thereafter the spout begins to throttle. 

The latency impact is not recorded in these readings... it is key IMO.

At a glance:  The memory impact seems a high. The CPU has utilization and 
gc activity has also increased... not sure by what %.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
I have a perf test report to share. Please note that it is rather sharing 
raw numbers, instead of some analysis.

OS: Ubuntu 17.10, 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 
2018
CPU: AMD Ryzen 5 1600 3.2Ghz 6 core (with hyper-thread = 12 logical cores)
RAM: Samsung DDR4 32G 19200
SSD: Samsung 850 Evo

Ran TVL, rate 5, max spout pending 5000, other parameters are default.
The values fluctuate a bit, so it would be better to just get overall trend 
instead of number to number comparison.

> 1.2.0-SNAPSHOT (9fbe283)

>> 1st

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
30 | 552,480 | 18,416.00 | 0 | 4,114,612,223 | 4,731,174,911 | 38,928,384 | 
5,448,400,895 | 1,954,055,956.00 | 966,653,713.81 | 156,170 | 5,290 | 0 | 686.05
60 | 1,578,480 | 52,616.00 | 0 | 1,695,547,391 | 2,191,523,839 | 6,631,424 
| 2,736,783,359 | 144,862,518.23 | 373,282,085.72 | 192,290 | 27,900 | 5,208 | 
824.31
91 | 1,505,000 | 48,548.39 | 0 | 36,732,927 | 63,275,007 | 6,549,504 | 
92,340,223 | 14,052,122.48 | 5,284,329.78 | 172,290 | 30,260 | 2,240 | 794.81
121 | 1,504,400 | 50,146.67 | 0 | 33,505,279 | 55,902,207 | 6,701,056 | 
107,020,287 | 13,687,527.65 | 4,682,294.46 | 169,990 | 30,770 | 2,248 | 818.05
151 | 1,503,640 | 50,121.33 | 0 | 30,801,919 | 42,827,775 | 6,594,560 | 
62,357,503 | 13,450,356.75 | 4,049,872.91 | 171,380 | 30,030 | 2,217 | 891.86
181 | 1,504,220 | 50,140.67 | 0 | 31,604,735 | 46,366,719 | 6,414,336 | 
72,417,279 | 13,549,466.35 | 4,224,069.48 | 173,460 | 30,120 | 2,214 | 817.73
211 | 1,503,580 | 50,119.33 | 0 | 33,816,575 | 46,858,239 | 6,545,408 | 
68,681,727 | 13,536,288.26 | 4,442,740.15 | 169,950 | 30,330 | 2,346 | 759.54
241 | 1,503,360 | 50,112.00 | 0 | 33,947,647 | 46,792,703 | 6,488,064 | 
68,812,799 | 13,633,138.42 | 4,510,106.94 | 171,560 | 30,490 | 2,456 | 791.93
271 | 1,052,500 | 35,083.33 | 0 | 28,016,639 | 39,452,671 | 6,586,368 | 
58,949,631 | 13,142,065.09 | 3,481,304.94 | 170,870 | 29,260 | 2,113 | 795.18
301 | 1,504,320 | 50,144.00 | 0 | 35,454,975 | 60,129,279 | 6,696,960 | 
82,575,359 | 13,781,332.77 | 4,965,559.58 | 171,430 | 30,730 | 2,372 | 826.15

>> 2nd

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
31 | 547,120 | 17,649.03 | 0 | 4,546,625,535 | 5,049,942,015 | 139,984,896 
| 5,536,481,279 | 2,700,418,416.88 | 836,528,892.27 | 157,850 | 4,930 | 0 | 
845.24
61 | 1,584,360 | 52,812.00 | 0 | 3,078,619,135 | 3,839,885,311 | 6,590,464 
| 4,284,481,535 | 248,409,981.64 | 629,496,874.97 | 195,980 | 27,490 | 5,697 | 
874.53
91 | 1,504,680 | 50,156.00 | 0 | 35,258,367 | 53,706,751 | 5,910,528 | 
91,488,255 | 13,589,168.15 | 4,802,833.49 | 174,400 | 29,550 | 2,183 | 871.71
121 | 1,503,900 | 50,130.00 | 0 | 32,538,623 | 52,789,247 | 6,533,120 | 
100,466,687 | 13,482,307.07 | 4,543,434.30 | 186,560 | 32,360 | 2,524 | 802.76
151 | 1,503,860 | 50,128.67 | 0 | 32,735,231 | 52,854,783 | 5,849,088 | 
97,976,319 | 13,396,304.62 | 4,506,368.26 | 188,430 | 31,780 | 2,514 | 785.88
181 | 1,503,340 | 50,111.33 | 0 | 29,802,495 | 41,156,607 | 6,737,920 | 
63,209,471 | 13,106,089.43 | 3,755,495.86 | 172,070 | 29,540 | 2,151 | 959.27
211 | 1,503,740 | 50,124.67 | 0 | 33,423,359 | 52,625,407 | 6,549,504 | 
85,000,191 | 13,426,039.55 | 4,508,334.58 | 187,360 | 32,380 | 2,525 | 869.87
241 | 1,578,440 | 52,614.67 | 0 | 32,129,023 | 50,102,271 | 6,549,504 | 
88,276,991 | 13,265,778.42 | 4,265,565.26 | 172,610 | 29,340 | 2,308 | 928.26
271 | 1,503,060 | 50,102.00 | 0 | 30,457,855 | 45,678,591 | 6,565,888 | 
73,662,463 | 13,237,281.21 | 4,002,588.03 | 173,180 | 30,550 | 2,291 | 870.36
301 | 1,541,260 | 51,375.33 | 0 | 30,932,991 | 44,498,943 | 6,569,984 | 
67,010,559 | 13,223,793.77 | 4,002,853.30 | 172,220 | 29,830 | 2,302 | 1,032.96

>> 3rd

uptime | acked | acked/sec | failed | 99% | 99.9% | min | max | mean | 
stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
30 | 548,680 | 18,289.33 | 0 | 4,353,687,551 | 5,096,079,359 | 6,729,728 | 
5,742,002,175 | 1,426,311,406.48 | 1,230,114,275.09 | 232,920 | 14,020 | 4,188 
| 999.57
60 | 1,586,360 | 52,878.67 | 0 | 49,643,519 | 85,590,015 | 6,541,312 | 
142,082,047 | 15,200,355.35 | 7,698,991.41 | 180,110 | 33,300 | 2,347 | 1,068.73
90 | 1,504,920 | 50,164.00 | 0 | 35,487,743 | 54,067,199 | 6,561,792 | 
95,354,879 | 13,937,343.35 | 4,915,338.14 | 177,210 | 33,140 | 2,256 | 929.59
120 | 1,504,980 | 50,166.00 | 0 | 34,340,863 | 54,558,719 | 6,602,752 | 
88,473,599 | 13,56

[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread roshannaik
Github user roshannaik commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz,  I suggest measuring the perf impact with ACKers=0 as well (with 
workers=1).  Having ACKing enabled or interworker-communication going on hides 
many perf issues.  

In addition to TVL, it is important to measure perf with other topos like 
[ConstSpoutNullBoltTopo](https://github.com/apache/storm/blob/master/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutNullBoltTopo.java)
 and 
[ConstSpoutIdBoltNullBoltTopo.java](https://github.com/apache/storm/blob/master/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutIdBoltNullBoltTopo.java).
  

The ConstSpout* topos can expose perf issues that TVL might not... and vice 
versa. 


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
I'm trying to run the performance test, with Ubuntu 17.10, 
4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018.

I updated latest security patches today, which I guess it may include 
patches for Meltdown and Spectre. Maybe the machine will take less effect then 
others since the machine's CPU is Ryzen, as far as I follow the discussion on 
linux kernel (only for Meltdown... I'm not following the mailing list, but read 
relevant thread since it was interesting one.)

Btw, both of builds will run in same environment, hence it shouldn't have 
much difference between results of both, I think.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
When reporting benchmark results, we should include OS patch level. The 
recent wave of patches will likely mess with benchmarks.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
It would need to have a performance test again but besides that I'm +1 on 
the change.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-11 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Could you take another look when you have a chance? 


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-10 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 @HeartSaVioR Moved to `StringBuilder` and replaced executorId with 
taskId.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-10 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR @revans2 

One obvious optimization worth trying is replacing `String.format()` with a 
`StringBuilder`. `String.format()` is cleaner visually, but much slower.

I'll make that change and see where it gets us. I'm open to any additional 
ideas as well.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-10 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR as far as task ids vs executor ids.  There are different ways 
of doing this for each of the two metrics systems.  For the built in zookeeper 
stats they are for the range `[start-task-id, end-task-id]` 


https://github.com/apache/storm/blob/466a7ad74da27c1250eedf412a487db409e42c19/storm-client/src/storm.thrift#L540

For the current metrics system they are for a single task, not an executor.


https://github.com/apache/storm/blob/466a7ad74da27c1250eedf412a487db409e42c19/storm-client/src/jvm/org/apache/storm/metric/api/IMetricsConsumer.java#L40

So this means that for things like queues that are specific to an executor, 
and not a task, the metrics may be odd in that the first task in an executor 
would get all of the metrics and the others would get little or nothing.

I agree that we probably want to have them be by component id and not 
executor id.  We can put them back together into executors when we query the 
data.



---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 @ptgoetz 
Btw, shouldn't we pass task id instead of executor id? You know a executor 
has tasks and we are providing metrics from each task, not from executor. Also 
the format of executor id is [`start task id` `end task id`] which @revans2 
@ptgoetz 
Btw, shouldn't we pass task id instead of executor id? You know a executor 
has tasks and we are providing metrics from each task, not from executor. Also 
the format of executor id is [`start task id` `end task id`] which format also 
doesn't make sense.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 @ptgoetz 
Looks like string concatenation affects a lot. Maybe we should spend time 
to optimize here.

One sketched idea is here: If metric name is deterministic at least without 
stream id and the metric type (acked, etc.), we might be able to cache metric 
name for each executor so that we can apply stream id and metric type 
afterwards. The streamId/metric type to full metric name can be cached to map 
in executor too. The issue could be changed to one level Hashmap lookup via 
string concatenation for (type+streamId) or two level Hashmap lookup (type -> 
streamId) vs String concatenation.

The result of dotToUnderscore is encouraged to be cached instead of doing 
it all the time.

Not sure how much it helps, but we could have executor level metric 
instance map (cache) instead of completely relying on coda hale REGISTRY if it 
helps.

cc. @roshannaik since he is working for optimizations so might have some 
techniques to apply.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-09 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
I just tried it again with 
44cd8ac on 1a03093 and got the same results.  A maximum throughput of about 
30,000 sentences per second.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-09 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
The changes look good an I am +1 on where they are at now.

I am a little concerned about the performance differences though.  I did a 
quick test with 1.x-branch (1a030939094e154a6b6d9d6b7fdf4abafd14d5b6) vs this 
(8af4fcac3) on 1.x-branch.

The numbers still do not look good, and I was hoping that others might run 
some tests too, just to see if they can reproduce my issues.

For ThroughputVsLatency with topology.max.spout.pending=500 I was able to 
to 50,000 sentences/second without much difficulty and could just get away with 
60,000.  This was using the default parallelism etc.

```
./bin/storm jar storm-starter-1.2.0-SNAPSHOT.jar 
org.apache.storm.starter.ThroughputVsLatency 5 -c 
topology.max.spout.pending=500
```

But with the new code it maxes out at about 30,000 sentences per second 
with the same settings.  I have run it multiple times to be sure it is not a 
fluke.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-09 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2203
  
Sorry about not responding earlier. Thanks for addressing my comments.

I skimmed the comments, and found a question that might have been missed 
https://github.com/apache/storm/pull/2203/files#r155645156. It's not that I 
think we necessarily have to make a change there, but I'd like to understand 
whether we're trying to guard against misconfiguration or something else.

Other than that nit this looks great. +1.

Since it's come up a few times, will documentation go here or in a later PR?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-08 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 I'm really sorry to bother you, but given that this is coupled 
with releasing Storm 1.2.0, I'd like to ask a favor to handle this issue prior 
to others.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2018-01-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Friendly reminder. I think your confirmation is required to merge 
this in, because I guess STORM-2156 is coupled with this patch (and maybe vice 
versa).


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-22 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@srdo Thanks for the review. I think I've addressed your comments in the 
latest commit.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-22 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2203
  
I had a couple of comments that weren't addressed. I'm mainly worried about 
leaking resources in tests/local mode 
https://github.com/apache/storm/pull/2203#discussion_r155637510, 
https://github.com/apache/storm/pull/2203#discussion_r155642325.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-22 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2203
  
Hi @revans2 , do you have any additional comments or are we good to merge 
this ?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-20 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@arunmahadevan Rebased.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-20 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz , can you rebase ? 
@revans2 , can you take a look again so that we can get this in 1.2 ?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-18 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz OK got it. Maybe we could reconsider if separation of stream ID is 
not really needed.
@revans2 Could you take a look at this again?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-14 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
> Looks like we compose metric name and lookup from REGISTRY every time, 
even without executor ID and stream ID. I can see more calculation should be 
done after addressing, but not sure how much it affects performance. If we 
could also memorize metric name per combination of parameters it might help, 
but I'm also not sure how much it will help.

Prior to adding stream ID, we could store the metric as a variable and 
reuse it without having to do a lookup on each use. Adding stream ID required 
(unless I'm missing something) doing the lookup on every use. There might be 
additional optimizations, but because metrics names are composed of several 
fields, some level of string concatenation is unavoidable. For example, we 
could try to optimize lookups by caching metrics with metric name as the key, 
but we would still have to do concatenation to create the lookup key.

I suppose we could create a `MetricName` class to serve as the cache key, 
but we'd be trading string concatenation for object creation (unless we name 
`MetricName` mutable with `setX` methods).

> Regarding issuing warning on name, +1 on your approach. Looks nice!

Thanks for the feedback. Implemented in latest commit.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-14 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz 
Looks like we compose metric name and lookup from REGISTRY every time, even 
without executor ID and stream ID. I can see more calculation should be done 
after addressing, but not sure how much it affects performance. If we could 
also memorize metric name per combination of parameters it might help, but I'm 
also not sure how much it will help.

Regarding issuing warning on name, +1 on your approach. Looks nice!


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-13 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 @HeartSaVioR 

I added stream id and executor id to the metrics names, and implemented 
replacing "." with "_".

One consequence of adding stream id was having to get-or-create metrics on 
the fly, as opposed to creating them up-front. That means there will be a 
lookup, string concat, and string replacement on each metrics update, which 
could affect performance. (Unless I'm missing something obvious, we have to do 
it that way because we don't know the stream IDs ahead of time and have to 
instantiate metrics as we see them).

As far as issuing warnings when names contain ".", one option would be to 
handle the warnings in an `ITopologyValidator` instance. We could have 
`DefaultTopologyValidator` log warnings when certain names contain a ".". We 
could also provide something along the lines of a `StrictTopologyValidator` 
that throws a `InvalidTopologyException` as opposed to just warning. That might 
be an easy way to transition from warning to error that also gives users to 
turn strict checking on or off.

What do you think about that approach?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
I'm +1 to restrict names to avoid replacing. One thing to consider is that 
when's what we call `later`, and personally I think major version is the safest 
version to do the change like this, so 2.0.0 looks like good milestone to 
address.

Could we poll how many users are using underscore in one of (component 
name, stream name, topology name - also hostname?) to user groups, dev groups 
and Twitter official account, and apply smooth migration to only 1.x version if 
affected groups are small enough?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-08 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz I think that is a good start.  Not sure if users would be upset 
that their bolt B.1 now looks like B_1.  We might have issues though if someone 
has both a B.1 bolt and a B_1 bolt.  It might serve them right for doing 
something so dumb, but could end up being a really painful situation to try and 
clean up.  I would rather have us look into restricting names for bolts, 
spouts, and streams.  We could just start out with a warning saying metrics 
might be messed up if you violate this, and later it becomes an error.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-08 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Thanks for clarifying. I have a better understanding of what 
you're saying now.

How do you feel about just replacing "." with "_" on all metrics path 
components (host name, componentId, etc.)? That would ensure a path could be 
reliably split on the "." character.

I'll add stream id and executor id to the path in places where they are 
available.



---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-08 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz for me this is all about the metadata for the metric.

The code we have been working on for storing metrics in a DB that we can 
then use for scheduling is getting close to being done.  We have a data 
structure like the following. (NOTE still a work in progress)

```
struct WorkerMetric {
  1: required string metricName;
  2: required i64 timestamp;
  3: required double metricValue;
  4: required string componentId;
  5: required string executorId;
  6: required string streamId;
}

struct WorkerMetricList {
  1: list metrics;
}

struct WorkerMetrics {
  1: required string topologyId;
  2: required i32 port;
  3: required string hostname;
  4: required WorkerMetricList metricList;
}
```

Having metadata separated out for topology, host, port, etc allow us to 
easily ask questions like how many messages were emitted by all instances of 
bolt "B" on streamId "default" for the topology "T".  I assume people using 
ganglia or nagios want to do the same thing without having to setup a separate 
metric for each bolt/spout/hostname/port combination possible.

Almost every metrics system I have seen supports these kinds of metadata or 
tags, but almost none of the metrics APIs I have seen support this (dropwizard 
included).  As such we need a way to parse these values out of the metric name 
in a reliable way.  Making the name more configurable does not help me in this. 
 In fact it makes it much more difficult.

The current proposed format is:
```
String.format("storm.worker.%s.%s.%s.%s-%s", stormId, hostName, 
componentId, workerPort, name);
```

It has most of the information we need.  It is missing the stream id and 
executor id, which for some internal metrics we really need if we are to 
replace the metrics on the UI and reduce the load on zookeeper.

But the issue for me is how do I get the metadata back out in a reliable 
way?  topologyid is simple to parse out because there is no '.' character 
allowed.  hostName is going to have '.' in it, so I don't know what to do 
there.  ComponentId and streamId I think are totally open so again I am in 
trouble no matter what I do, and then the name itself we could do some 
restrictions on to make it simpler.

All I want is some API that can take a metric name and reliably return to 
me the fields that we put into creating the name.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-08 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 (cc @HeartSaVioR ) Regarding metadata, parseable metrics 
names/paths, etc. what do you think of the following approach?

In a nutshell, make everything configurable (with sane defaults), something 
along the lines:

METRICS_V2_PREFIX (String prepended to all metrics paths, replacing 
hard-coded "storm.worker", etc. in current code)
METRICS_V2_PATH_DELIMITER (String/character used to separate metrics path, 
replaces hard-coded "." in current code.
METRICS_V2_INVALID_NAME_REGEX (Regex that checks user-supplied metrics 
names for disallowed characters. Would be used to prevent users from 
inadvertently breaking up a path, for example by putting a "." in a metric name 
when that's used as the delimiter.

Does that seem reasonable?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
In case of missing, we have unaddressed comments in here:

https://github.com/apache/storm/blob/00a382b017c1e29863ac4d9a4449086ef79384e4/storm-core/src/jvm/org/apache/storm/metrics2/StormMetricRegistry.java#L133-L135

and @revans2 had a voice regarding metadata of metrics, which IMHO looks 
like non-blocker for the patch, but if it makes sense we need to file a 
follow-up issue then.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@arunmahadevan 
I think there're something not addressed in this PR, and I don't think we 
can merge before adding document and having patch for master. 

@ptgoetz 
Could you revisit this and address remaining comments or file issues if it 
can be done with follow-up patches? If you don't think they should be 
addressed, could you leave comments so that we can discuss and make decisions?

@revans2 
Could you revisit this and sort out remaining things from your side?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-12-07 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/storm/pull/2203
  
+1, @ptgoetz @HeartSaVioR  , are there any more changes planned in this 
patch? If no can we look at merging asap so that it can be included in 1.2.0?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz @revans2 Addressed the bug (adding sample rate) which is not 
expected to affect the performance.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-29 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR Agreed/understood. We can squash on merge and cleanup commit 
messages.

This is a feature branch. You can commit directly if you want.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 
I'm OK to deprecate old metrics API in 2.0 and remove it from 3.0. More 
important aspect from me is the following up patches from @abellina.

As I stated earlier, the patch only migrate small parts of existing metrics 
to new metrics. In master branch we should migrate all the metrics internally 
used in Storm codebase to new metrics (do we want to use deprecated API by 
ourselves?) and then built-in metrics will not be compatible with current 
metrics transfer mechanism (adding to heartbeat). 

I expect the patches from @abellina will address this point. Right?


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz These changes were only for measurement so that's why I made 
commit titles as WIP. You can just squash two commits into one and free to 
change author (so that we could squash all the commits into one: I don't like 
having `address review comments` in master's commit list).

Btw as someone may notice, I didn't handle multiplying the value 
accordingly (based on sample rate) so there's bug on the code. I'll address and 
provide a single commit which addresses all my works.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-29 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR Thanks for the update. I pulled your changes into the 
metrics_v2 branch.

@revans2 I'll start working on naming conventions and disallowing certain 
delimiter characters. If you have any sample paths/names that illustrate your 
thinking that would be helpful. 


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
The new overhead for metrics_v2_replace_meters_to_counters_with_sampling 
looks like it is in an acceptable range for me.  Right about a 1% difference 
for latency.  (0.97% to 1.29%) and right about a 1.8% to 1.9% difference for 
CPU usage.  It is also good to see that even if we turn off the sub sampling 
there is not too much of an increase in the overhead.

I would be +1 for merging this in (from a performance perspective still 
need to look at the code in more depth), assuming that we can get similar 
results for 2.x and with a plan to have the old API deprecated in 2.x and 
completely removed in a 3.x release.

We also need to make sure that we have the naming convention figured out or 
some kind of tag support so that we can properly do rollups for different types 
of metrics. 


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-27 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
The difference on CPU usage and latency is reduced again, which looks like 
closer.
(We need to be aware that test results from same branch fluctuate a bit.)


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-27 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
I created another branch for sampling:

https://github.com/HeartSaVioR/storm/tree/metrics_v2_replace_meters_to_counters_with_sampling

> metrics_v2_replace_meters_to_counters_with_sampling

```
uptime:  211 acked: 1,578,860 acked/sec:  52,628.67 failed:0 99%:   
   32,849,919 99.9%:  48,136,191 min:   6,545,408 max:  81,461,247 
mean:   13,719,507.76 stddev:4,401,893.83 user:174,310 sys: 34,720 
gc:  2,159 mem: 777.26
uptime:  241 acked: 1,541,220 acked/sec:  51,374.00 failed:0 99%:   
   34,078,719 99.9%:  79,495,167 min:   5,988,352 max: 142,737,407 
mean:   13,692,679.51 stddev:5,447,765.36 user:175,140 sys: 33,960 
gc:  2,189 mem: 864.25
uptime:  271 acked: 1,503,360 acked/sec:  50,112.00 failed:0 99%:   
   31,670,271 99.9%:  50,724,863 min:   6,541,312 max:  86,638,591 
mean:   13,532,495.54 stddev:4,225,220.91 user:175,360 sys: 34,390 
gc:  2,060 mem: 751.39
```

```
uptime:  211 acked: 1,503,880 acked/sec:  50,129.33 failed:0 99%:   
   36,241,407 99.9%:  46,759,935 min:   6,434,816 max:  72,155,135 
mean:   14,540,674.05 stddev:5,200,081.08 user:174,870 sys: 35,740 
gc:  2,523 mem: 846.15
uptime:  241 acked: 1,503,400 acked/sec:  50,113.33 failed:0 99%:   
   41,746,431 99.9%:  68,288,511 min:   6,590,464 max: 137,232,383 
mean:   14,784,902.94 stddev:6,220,659.93 user:173,800 sys: 36,400 
gc:  2,593 mem: 896.85
uptime:  271 acked: 1,504,120 acked/sec:  50,137.33 failed:0 99%:   
   35,880,959 99.9%:  49,971,199 min:   6,553,600 max:  77,070,335 
mean:   14,223,920.20 stddev:4,932,482.39 user:175,290 sys: 35,930 
gc:  2,380 mem: 886.29
```

```
uptime:  210 acked: 1,616,860 acked/sec:  53,895.33 failed:0 99%:   
   32,767,999 99.9%:  48,955,391 min:   6,643,712 max:  73,727,999 
mean:   13,798,011.67 stddev:4,442,751.14 user:176,310 sys: 34,800 
gc:  2,450 mem: 932.71
uptime:  240 acked: 1,503,640 acked/sec:  50,121.33 failed:0 99%:   
   34,897,919 99.9%:  53,379,071 min:   6,475,776 max:  75,366,399 
mean:   14,001,232.82 stddev:4,850,826.78 user:176,380 sys: 34,880 
gc:  2,568 mem: 921.85
uptime:  270 acked: 1,503,340 acked/sec:  50,111.33 failed:0 99%:   
   33,406,975 99.9%:  45,711,359 min:   6,610,944 max:  65,765,375 
mean:   14,058,020.45 stddev:4,589,290.14 user:177,140 sys: 35,920 
gc:  2,586 mem: 989.75
```

> 1.2.0-SNAPSHOT

```
uptime:  210 acked: 1,503,120 acked/sec:  50,104.00 failed:0 99%:   
   36,012,031 99.9%:  54,329,343 min:   6,586,368 max:  86,179,839 
mean:   13,866,565.45 stddev:4,940,470.24 user:175,790 sys: 32,220 
gc:  2,463 mem: 848.33
uptime:  240 acked: 1,503,520 acked/sec:  50,117.33 failed:0 99%:   
   32,571,391 99.9%:  48,988,159 min:   6,623,232 max:  74,448,895 
mean:   13,639,588.48 stddev:4,414,394.82 user:173,890 sys: 32,870 
gc:  2,435 mem: 929.12
uptime:  270 acked: 1,503,320 acked/sec:  50,110.67 failed:0 99%:   
   30,834,687 99.9%:  43,778,047 min:   6,488,064 max:  64,520,191 
mean:   13,545,641.23 stddev:4,120,231.64 user:176,190 sys: 32,110 
gc:  2,409 mem: 904.40
```

```
uptime:  210 acked: 1,503,540 acked/sec:  50,118.00 failed:0 99%:   
   37,421,055 99.9%:  61,341,695 min:   6,582,272 max:  85,393,407 
mean:   14,000,161.69 stddev:5,317,874.33 user:172,530 sys: 33,640 
gc:  2,455 mem:   1,001.10
uptime:  241 acked: 1,541,460 acked/sec:  49,724.52 failed:0 99%:   
   39,157,759 99.9%:  61,603,839 min:   6,426,624 max: 101,056,511 
mean:   14,153,442.96 stddev:5,630,083.75 user:173,010 sys: 33,280 
gc:  2,617 mem:   1,092.48
uptime:  271 acked: 1,578,560 acked/sec:  52,618.67 failed:0 99%:   
   39,780,351 99.9%:  70,909,951 min:   5,787,648 max: 119,734,271 
mean:   14,350,305.70 stddev:5,916,972.37 user:174,360 sys: 32,960 
gc:  2,569 mem:   1,057.77
```

```
uptime:  210 acked: 1,541,480 acked/sec:  51,382.67 failed:0 99%:   
   32,309,247 99.9%:  46,628,863 min:   6,500,352 max:  79,888,383 
mean:   13,589,491.43 stddev:4,288,111.92 user:171,970 sys: 33,360 
gc:  2,253 mem: 716.99
uptime:  241 acked: 1,540,900 acked/sec:  49,706.45 failed:0 99%:   
   34,701,311 99.9%:  52,396,031 min:   6,639,616 max:  92,602,367 
mean:   13,937,783.46 stddev:4,782,

[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-27 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
I did a quick analysis of the latency vs CPU usage for the branch from 
@HeartSaVioR and this branch based off of the test results posted by 
@HeartSaVioR (just to get more of an apples to apples comparison, but my test 
results look similar).

It looks like metrics_v2_replace_meters_to_counters had an increased 
latency over 1.2.0-SNAPSHOT by about 2.1% to 4.0% (depending on if I do a mean 
vs a median of the reported mean latency values)  Similarly for CPU usage it 
was between 1.7% to 3.8% more CPU usage.

When using the normal metrics v2 it was a 7.2% to 8.0% increase in latency 
and 7.0% to 10.1% increase in CPU.  So this is a huge step in the right 
direction. 

I really would love to see an overhead of < 1%, but I could probably live 
with these numbers.

Would it be possible to try an experiment where we also do random 
sub-sampling with these metrics?  I know it is a giant confusion for people to 
not have an exact number for metrics, but I would be curious to see what 
difference if any it would make so we know what the issue is.




---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-27 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
So the difference on CPU usage and latency is reduced, whereas there's 
still gap between twos. That's what we can imagine, so please do some 
performance tests and see the gaps, and discuss whether the gap is tolerable.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-27 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
> metrics_v2_replace_meters_to_counters

```
uptime:  210 acked: 1,504,280 acked/sec:  50,142.67 failed:0 99%:   
   35,192,831 99.9%:  52,658,175 min:   6,561,792 max:  97,648,639 
mean:   14,233,952.18 stddev:4,975,946.97 user:192,210 sys: 36,820 
gc:  2,604 mem: 811.39
uptime:  240 acked: 1,503,860 acked/sec:  50,128.67 failed:0 99%:   
   53,510,143 99.9%:  88,276,991 min:   6,643,712 max: 128,581,631 
mean:   15,472,482.77 stddev:8,210,141.23 user:178,240 sys: 34,680 
gc:  2,605 mem: 889.04
uptime:  271 acked: 1,578,380 acked/sec:  50,915.48 failed:0 99%:   
   38,436,863 99.9%:  57,049,087 min:   6,717,440 max:  92,274,687 
mean:   14,373,815.12 stddev:5,487,140.26 user:196,260 sys: 37,490 
gc:  2,780 mem: 916.19
```

```
uptime:  210 acked: 1,541,100 acked/sec:  51,370.00 failed:0 99%:   
   37,486,591 99.9%:  57,409,535 min:   6,664,192 max:  75,169,791 
mean:   14,192,844.52 stddev:5,369,622.19 user:179,110 sys: 33,210 
gc:  2,571 mem: 677.76
uptime:  240 acked: 1,541,240 acked/sec:  51,374.67 failed:0 99%:   
   38,404,095 99.9%:  56,819,711 min:   6,651,904 max:  85,590,015 
mean:   14,508,982.42 stddev:5,551,974.72 user:178,550 sys: 34,130 
gc:  2,614 mem: 740.89
uptime:  270 acked: 1,503,300 acked/sec:  50,110.00 failed:0 99%:   
   37,093,375 99.9%:  59,506,687 min:   6,537,216 max:  92,078,079 
mean:   14,124,510.04 stddev:5,286,365.69 user:177,730 sys: 33,530 
gc:  2,698 mem: 755.77
```

```
uptime:  211 acked: 1,503,700 acked/sec:  50,123.33 failed:0 99%:   
   32,571,391 99.9%:  49,938,431 min:   6,483,968 max:  81,592,319 
mean:   13,690,758.91 stddev:4,494,703.26 user:176,600 sys: 34,130 
gc:  2,503 mem: 893.43
uptime:  241 acked: 1,578,620 acked/sec:  52,620.67 failed:0 99%:   
   39,649,279 99.9%:  63,143,935 min:   6,541,312 max:  98,631,679 
mean:   14,144,096.06 stddev:5,696,236.12 user:177,100 sys: 33,880 
gc:  2,671 mem: 889.88
uptime:  271 acked: 1,540,000 acked/sec:  51,333.33 failed:0 99%:   
   40,632,319 99.9%:  66,912,255 min:   6,782,976 max: 112,984,063 
mean:   14,480,762.35 stddev:6,064,099.29 user:180,220 sys: 33,230 
gc:  2,743 mem: 832.76
```

> 1.2.0-SNAPSHOT

```
uptime:  211 acked: 1,503,480 acked/sec:  50,116.00 failed:0 99%:   
   35,422,207 99.9%:  52,330,495 min:   6,602,752 max:  88,211,455 
mean:   14,071,994.11 stddev:4,935,845.81 user:174,870 sys: 34,440 
gc:  2,531 mem: 888.99
uptime:  241 acked: 1,503,540 acked/sec:  50,118.00 failed:0 99%:   
   36,208,639 99.9%:  52,527,103 min:   6,619,136 max:  95,617,023 
mean:   14,146,908.46 stddev:5,030,666.13 user:176,370 sys: 34,290 
gc:  2,500 mem: 961.91
uptime:  271 acked: 1,616,180 acked/sec:  53,872.67 failed:0 99%:   
   34,111,487 99.9%:  52,133,887 min:   6,529,024 max:  83,099,647 
mean:   14,007,535.10 stddev:4,706,000.17 user:175,580 sys: 33,900 
gc:  2,458 mem:   1,039.27
```

```
uptime:  210 acked: 1,578,340 acked/sec:  52,611.33 failed:0 99%:   
   34,668,543 99.9%:  51,249,151 min:   6,664,192 max:  82,575,359 
mean:   13,965,232.22 stddev:4,855,563.94 user:175,000 sys: 33,820 
gc:  2,415 mem: 902.57
uptime:  240 acked: 1,541,200 acked/sec:  51,373.33 failed:0 99%:   
   34,144,255 99.9%:  51,478,527 min:   5,931,008 max:  80,150,527 
mean:   13,730,336.55 stddev:4,719,024.27 user:175,660 sys: 33,430 
gc:  2,403 mem: 825.49
uptime:  270 acked: 1,504,080 acked/sec:  50,136.00 failed:0 99%:   
   34,340,863 99.9%:  51,019,775 min:   6,594,560 max:  79,167,487 
mean:   13,756,017.32 stddev:4,619,909.40 user:175,510 sys: 34,050 
gc:  2,366 mem: 819.57
```

```
uptime:  211 acked: 1,541,080 acked/sec:  51,369.33 failed:0 99%:   
   29,130,751 99.9%:  48,037,887 min:   6,631,424 max:  87,162,879 
mean:   13,285,878.61 stddev:3,908,718.02 user:172,650 sys: 33,080 
gc:  2,203 mem: 949.73
uptime:  241 acked: 1,503,620 acked/sec:  50,120.67 failed:0 99%:   
   38,830,079 99.9%:  58,392,575 min:   6,602,752 max:  92,340,223 
mean:   13,945,274.39 stddev:5,359,104.05 user:173,580 sys: 32,890 
gc:  2,295 mem:   1,078.11
uptime:  271 acked: 1,503,400 acked/sec:  50,113.33 failed:0 99%:   
   31,29

[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-27 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
I've tried out replacing meters to counters: here is the branch


https://github.com/HeartSaVioR/storm/tree/metrics_v2_replace_meters_to_counters


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-21 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
Crude test, but illustrates the cost of meters:
(code marks a meter | increments a counter from 0 to `Integer.MAX`)
```
*** METER ***
Time: 126.39
ops/sec: 16,990,930
*** COUNTER ***
Time: 18.221
ops/sec: 117,857,617
```

The obvious path would be to switch critical path metrics to use counters. 
But I'd ultimately err on the side of user choice (e.g. let users decide which 
to use). That could be made configurable. I can imagine use cases where users 
would be willing to take a minor performance hit for more performance metrics 
(e.g. "sleepy" topologies). The performance hit could be tolerable in certain 
situations.

For the time being, I'll switch some of the metrics to counters, and re-run.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-21 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Thanks for the update.

My first instinct is that it’s usage of Meter on the critical path. 
Adding anything that “does something” there is going to add some level of 
overhead. I’ll do some (micro)benchmarking and experiments to find out. I 
think there will be a number of mitigation options.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
Also looking at the numbers from @HeartSaVioR.  They also show the CPU 
being higher, but not nearly as much as mine do 219 seconds for V2 vs 206 
seconds for 1.2.0 or about a 6% difference.

Again I really just want to understand where the extra load is coming from 
so we can understand how to mitigate it and/or better reason about what will 
happen when there are more metrics being used.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-21 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
Here are my numbers, and no I didn't do it headless but I was not doing 
anything on the laptop at the time it was running except looking at Activity 
Monitor to be sure nothing else was using too much CPU.

The setup is

MacBook Pro (Retina, 15-inch, Mid 2015)
2.8 GHz Intel Core i7
16 GB 1600 MHz DDR3
macOS 10.12.6
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)


The versions tested are 1.2.0-SNAPSHOT 
(b24f5d87cbcd4faeed651e45a8e673db52469a46)
Metrics V2 (78d3de44cf0705efd199fb64b7ef092e23c3285b) which is 
1.2.0-SNAPSHOT + 99bcf68 merged in.

The Numbers:

1.2.0-SNAPSHOT
```
./bin/storm jar 
../../../examples/storm-starter/target/storm-starter-1.2.0-SNAPSHOT.jar 
org.apache.storm.starter.ThroughputVsLatency 5 1 3 -c 
topology.max.spout.pending=500
uptime:   30 acked:   602,660 acked/sec:  20,088.67 failed:0 99%:   
  741,343,231 99.9%: 757,596,159 min:   3,639,296 max: 768,081,919 
mean:   95,373,435.91 stddev:  191,784,964.85 user: 71,070 sys:  4,565 
gc:247 mem: 109.46
uptime:   60 acked: 1,504,480 acked/sec:  50,149.33 failed:0 99%:   
   13,320,191 99.9%:  15,794,175 min:   3,467,264 max:  30,490,623 
mean:9,679,685.54 stddev:1,394,068.49 user: 72,565 sys:  6,299 
gc:728 mem:  43.30
uptime:   90 acked: 1,510,500 acked/sec:  50,350.00 failed:0 99%:   
   13,279,231 99.9%:  14,704,639 min:   3,352,576 max:  17,612,799 
mean:9,650,125.87 stddev:1,358,879.30 user: 72,788 sys:  6,360 
gc:732 mem:  73.51
uptime:  120 acked: 1,503,720 acked/sec:  50,124.00 failed:0 99%:   
   13,352,959 99.9%:  14,745,599 min:   3,356,672 max:  16,973,823 
mean:9,683,344.13 stddev:1,361,582.42 user: 72,855 sys:  6,239 
gc:740 mem: 101.27
uptime:  150 acked: 1,654,820 acked/sec:  55,160.67 failed:0 99%:   
   13,369,343 99.9%:  14,811,135 min:   3,473,408 max:  16,736,255 
mean:9,695,874.65 stddev:1,363,167.65 user: 72,671 sys:  6,327 
gc:741 mem: 127.25
uptime:  180 acked: 1,503,740 acked/sec:  50,124.67 failed:0 99%:   
   13,377,535 99.9%:  14,811,135 min:   3,448,832 max:  17,711,103 
mean:9,682,345.18 stddev:1,370,041.64 user: 72,848 sys:  6,280 
gc:756 mem:  41.83

./bin/storm jar 
../../../examples/storm-starter/target/storm-starter-1.2.0-SNAPSHOT.jar 
org.apache.storm.starter.ThroughputVsLatency 75000 1 3 -c 
topology.max.spout.pending=500
uptime:   30 acked:   882,420 acked/sec:  29,414.00 failed:0 99%:   
  746,586,111 99.9%: 763,363,327 min:   3,678,208 max: 768,081,919 
mean:  202,478,108.44 stddev:  266,947,569.24 user: 50,763 sys:  2,752 
gc:  0 mem:  38.93
uptime:   60 acked: 2,497,720 acked/sec:  83,257.33 failed:0 99%:   
  213,123,071 99.9%: 256,245,759 min:   3,672,064 max: 271,581,183 
mean:   46,314,967.38 stddev:   51,559,780.69 user:118,361 sys:  7,777 
gc:  1,322 mem:  85.28
uptime:   90 acked: 2,260,200 acked/sec:  75,340.00 failed:0 99%:   
  129,826,815 99.9%: 152,698,879 min:   3,659,776 max: 160,432,127 
mean:   29,179,878.90 stddev:   29,586,943.44 user:110,350 sys:  7,583 
gc:  1,323 mem:  30.47
uptime:  120 acked: 2,255,680 acked/sec:  75,189.33 failed:0 99%:   
  148,504,575 99.9%: 179,306,495 min:   3,700,736 max: 187,301,887 
mean:   16,040,979.89 stddev:   23,589,666.52 user:103,839 sys:  7,031 
gc:  1,174 mem:  51.50
uptime:  150 acked: 2,255,180 acked/sec:  75,172.67 failed:0 99%:   
   81,002,495 99.9%: 117,309,439 min:   3,571,712 max: 123,994,111 
mean:   10,956,757.98 stddev:   11,465,433.32 user:100,497 sys:  6,960 
gc:  1,136 mem:  90.77
uptime:  180 acked: 2,256,380 acked/sec:  75,212.67 failed:0 99%:   
   35,192,831 99.9%:  49,446,911 min:   3,608,576 max:  55,672,831 
mean:9,647,021.99 stddev:4,081,135.34 user:100,516 sys:  6,771 
gc:  1,141 mem: 110.73

./bin/storm jar 
../../../examples/storm-starter/target/storm-starter-1.2.0-SNAPSHOT.jar 
org.apache.storm.starter.ThroughputVsLatency 10 1 3 -c 
topology.max.spout.pending=500
uptime:   30 acked: 1,023,620 acked/sec:  34,120.67 failed:0 99%:   
1,427,111,935 99.9%:   1,442,840,575 min: 141,033,472 max:   1,448,083,455 
mean: 1,205,785,740.36 stddev:  167,013,014.66 user: 56,266 sys:  2,615 
gc:  0 mem:  53.83
uptime:   60 acked: 2,820,000 acked/sec:

[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
OK. While I understand what you're saying, that's not easy for individual 
to get it, cause we individual don't want to afford completely idle machine and 
use for only performance test. I have been spining up AWS VM with "dedicated" 
option to set up clear env. and install mandatory services, and go ahead. 
Unfortunately I used all of my AWS coupons and have to rely on my PC which is 
closely idle. ;)

One approach is running performance tests several times, and drop outliers 
and take average or just see trends. If the performance of A is always or most 
of the times greater/lower than B with enough trials, I'd rather believe the 
result. (would 5 times be sufficient?)


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR I know. ;)

I'm thinking more in terms of hardware profile and what other processes are 
running.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
You can scroll the result to see rightmost side which shows CPU, Memory, GC 
as well.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
Here's my machine spec used for performance test:

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

OS: Ubuntu 17.10
CPU: AMD Ryzen 5 1600 3.2Ghz 6 core (with hyper-thread = 12 logical cores)
RAM: Samsung DDR4 32G 19200
SSD: Samsung 850 Evo

I have another dev. machine and I only use the env. for testing at that 
time. It only runs some background processes, but CPU usage was around 1 % 
before running the tests all the time.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
We really need hardware and environment  information. I'd also argue that 
tests should be run headless. I've seen some benchmarks vary greatly on a MBP 
depending on what you're doing at the time. Something as mundane as checking 
email, etc. can affect benchmark results.

sigar might be an easy target (since it's included) to get some of that 
info.







---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz @revans2 

Here’s my result on performance test:

CLI option:
org.apache.storm.starter.ThroughputVsLatency 5 -c 
topology.max.spout.pending=5000

so that utilizes 4 workers by default.

> 1.2.0-SNAPSHOT

```
uptime:  210 acked: 1,503,840 acked/sec:  50,128.00 failed:0 99%:   
   34,734,079 99.9%:  48,103,423 min:   6,541,312 max:  78,381,055 
mean:   13,820,460.15 stddev:4,842,731.32 user:173,260 sys: 31,980 
gc:  2,583 mem: 669.23
uptime:  240 acked: 1,503,740 acked/sec:  50,124.67 failed:0 99%:   
   33,456,127 99.9%:  53,346,303 min:   6,479,872 max:  80,412,671 
mean:   13,642,800.22 stddev:4,618,847.37 user:172,510 sys: 31,690 
gc:  2,513 mem: 614.37
uptime:  270 acked: 1,503,280 acked/sec:  50,109.33 failed:0 99%:   
   33,013,759 99.9%:  48,201,727 min:   6,574,080 max:  78,315,519 
mean:   13,656,544.79 stddev:4,527,513.03 user:173,000 sys: 31,370 
gc:  2,513 mem: 740.95
```

```
uptime:  211 acked: 1,503,700 acked/sec:  48,506.45 failed:0 99%:   
   35,061,759 99.9%:  53,444,607 min:   6,516,736 max:  84,410,367 
mean:   14,023,839.49 stddev:4,968,132.86 user:173,490 sys: 33,240 
gc:  2,453 mem:   1,042.78
uptime:  241 acked: 1,503,920 acked/sec:  50,130.67 failed:0 99%:   
   32,882,687 99.9%:  53,968,895 min:   6,574,080 max:  81,133,567 
mean:   13,700,645.27 stddev:4,592,749.72 user:173,260 sys: 33,190 
gc:  2,465 mem:   1,007.66
uptime:  271 acked: 1,503,260 acked/sec:  50,108.67 failed:0 99%:   
   33,275,903 99.9%:  56,262,655 min:   6,582,272 max:  81,199,103 
mean:   13,710,314.71 stddev:4,676,515.80 user:173,910 sys: 32,430 
gc:  2,440 mem:   1,065.66
```

> Metrics V2

```
uptime:  211 acked: 1,503,580 acked/sec:  50,119.33 failed:0 99%:   
   40,861,695 99.9%:  62,783,487 min:   6,496,256 max: 106,692,607 
mean:   14,696,942.76 stddev:6,041,492.37 user:187,800 sys: 32,170 
gc:  2,646 mem: 779.90
uptime:  241 acked: 1,541,060 acked/sec:  51,368.67 failed:0 99%:   
   41,779,199 99.9%:  70,778,879 min:   6,639,616 max: 113,115,135 
mean:   14,872,133.61 stddev:6,435,291.03 user:219,910 sys: 36,630 
gc:  3,115 mem: 875.09
uptime:  271 acked: 1,503,780 acked/sec:  50,126.00 failed:0 99%:   
   41,189,375 99.9%:  63,733,759 min:   6,529,024 max: 104,267,775 
mean:   14,738,586.49 stddev:6,153,017.22 user:188,950 sys: 31,950 
gc:  2,815 mem: 891.73
```

```
uptime:  210 acked: 1,503,520 acked/sec:  50,117.33 failed:0 99%:   
   47,120,383 99.9%:  89,391,103 min:   6,520,832 max: 142,999,551 
mean:   15,073,527.31 stddev:7,453,484.52 user:186,330 sys: 33,150 
gc:  2,808 mem:   1,096.18
uptime:  241 acked: 1,540,960 acked/sec:  49,708.39 failed:0 99%:   
   36,012,031 99.9%:  51,281,919 min:   6,688,768 max:  74,055,679 
mean:   14,267,705.37 stddev:4,971,492.31 user:186,600 sys: 33,710 
gc:  2,462 mem:   1,192.20
uptime:  271 acked: 1,541,440 acked/sec:  51,381.33 failed:0 99%:   
   40,075,263 99.9%:  55,181,311 min:   6,725,632 max:  76,021,759 
mean:   14,850,073.94 stddev:5,832,959.05 user:187,380 sys: 32,640 
gc:  2,577 mem:   1,036.03
```

The value of 99% and 99.9% latency seems indicating the performance hit. 
Below is the values from 99.9% latency:

* 1.2.0-SNAPSHOT
avg(48, 53, 48) => 49.6
avg(53, 53, 56) => 54

* metrics_v2
avg(62, 70, 63) => 65
avg(89, 51, 55) => 65

We may could do another perspective of test: putting higher rate and see 
which one reaches the limit first.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz I think I just ran it with default options on my MBP.  But I will 
try and reproduce it again.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-11-20 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 You may have to jog your memory ;), but do you recall the 
parameters you used for ThroughputVsLatency?

I just ran ThroughputVsLatency (with defaults) on an isolated machine (not 
a VM) and saw no significant differences in performance. I tested with the 
latest 1.x-branch and the latest metrics_v2 (up merged agains 1.x-branch).


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-10-10 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 I need to recheck which thing (meter or counter) they are using, 
but JStorm also doesn’t sample metrics other than histogram. Unless they did 
custom optimization, IMHO it should be fast enough for us too.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-10-10 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR I agree that if we cannot find a better way we may not want to 
support a smooth transition for the metrics, but that also means that we cannot 
put it into 1.x. 

But I think the underlying problem is actually in our choice of metric.  We 
are using a meter everywhere, and looking at the 
[code](https://github.com/dropwizard/metrics/blob/v3.2.5/metrics-core/src/main/java/com/codahale/metrics/Meter.java)
 a meter does not appear that cheap.  If all we want/need is a running count, 
and not various rates we might want to look at using a 
[Counter](https://github.com/dropwizard/metrics/blob/v3.2.5/metrics-core/src/main/java/com/codahale/metrics/Counter.java)
 instead.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-10-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
This patch adds new types of metrics on top of current metrics, so 
performance hit is expected. (20% drop looks like not small indeed.) We may 
want to consider not supporting smooth transition because of this.

Even we don't support smooth transition, we may want to comment all the old 
metrics and re-run benchmark to measure real performance hit.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
I also want to make sure that we take the performance impact of this patch 
into consideration.  With the old metrics system we are sub-sampling the 
metrics for performance reasons.  With this patch we are not.  When I run 
ThroughputVsLatency, not perfect I know.  With this patch the maximum 
throughput I am able to get is about 48,000 sentences per second.  Without this 
patch I can do about 60,000 reliably.  This is a 20% drop in maximum 
throughput.  Note that all I did was adjust the target throughput and set max 
spout pending to 500.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz I think we can fix most of the issues simply by restricting what 
name a metric can have (i.e. no `:` allowed and then we use `:` as a 
deliminator).  That will give us a guarantee on being able to parse it.  Once 
we can parse it hopefully we can look at updating the other plugins to make 
them be able to parse the name and turn it into a smaller name with tags.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 Added rudimentary sanity check validation for metrics reporters 
configs. Because reporter implementations may want to have their own custom 
config keys, we can't really cover everything. I think we'll have to settle on 
basic sanity checks + reporters documenting the custom configuration fields 
they expose.

I'll think more about metrics naming and metadata. Like I said, feedback, 
opinions and pull requests against this branch are more than welcome.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
So to parse these pieces of information back out I would need to do 
something like remove "storm.topology." from the beginning.

Then everything up to the next "." would be the topology-ID because "." is 
not allowed in the name.  I could go even further and parse out the topology 
name too.

Now I am kind of in trouble, because I have two different fields that can 
be anything on both ends of the name.  So my best bet is to look for something 
that looks like `\.(\d+)-` and hope that neither the component name nor the 
metric name happens to have used something that matches that pattern.

My concern here is that most of what we are truly encoding in the name is 
metadata.  All of the examples that graphite and ganglia show use the name as a 
hierarchy to make finding the metric I want simpler, i.e. 
'system.memory.memory_allocated', or perhaps subdividing the metrics by 
application, 'app1.memory.memory_allocated'.  Metadata like the host name, 
port, component, etc, are really supported through tags.  The collectors are 
not attempting to pull out the metadata and then turn it back into tags.  So 
for example lets start with a simple use case. I want to know how many tuples 
component B has emitted.  Currently that would require me to combine N separate 
metrics together, one per port.

storm.topology.MYTOPO-1-1000.B.6700-emitted
storm.topology.MYTOPO-1-1000.B.6701-emitted
...
storm.topology.MYTOPO-1-1000.B.N-emitted

and then rewrite if I launch my topology again I now have to rewrite all of 
my queries to match.  I personally think everyone would be a lot happier if we 
could find a good way to translate these into tags.

Additionally the built in metrics that mostly @abellina, and me a tiny bit, 
have been working on wants these to be metadata.  Simply because we cannot 
reproduce what is happening on the UI in any other clean way.  I am fine with 
us using the metric name to store some of this metadata (sadly drop-wizard 
gives us no other option), but we really need to parse it out before sending it 
on.  I am not sure how simple that would be, seeing how they already come with 
built in reporters which was a lot of what we wanted them for.  If it is not 
going to be simple to do, then perhaps dropwizard was the wrong choice for a 
metrics library.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
So then perhaps I can summarize what we have right now for the naming.

```"storm.topology.%s.%s.%s-%s", getStormId(), getThisComponentId(), 
getThisWorkerPort(), name```

Where `getStormId()` is the topology id which should match 
`${name}-${counter}-${time-stamp}` where name matches `^[^/.:\]+$`

`getThisComponentId()` which as far as I can tell has no limits on what the 
name can be.

`getThisWorkerPort()` which is just an integer for the port this worker is 
listening on (no host name that it is a part of)

and `name` which has no limitations on it either.

Dropwizard (the underlying metrics system) requires that names be unique 
within a metrics registry, but places no other limitations except that.   


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2 I agree with you regarding getting the metrics naming correct. In 
fact I think it's one of the most important aspects. I'd like to get as much 
input/collaboration from others as possible to make sure we get it right from 
the start.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@danny0405 @abellina actually has a patch to do what you were asking about. 
 To push metrics to nimbus by way of the supervisor.  It is obviously based off 
of this patch, so a lot of this is waiting for this one to be done before 
putting that one up.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-29 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@revans2, @HeartSaVioR, @arunmahadevan Addressed review comments.

Also note that this is a feature branch, so it's open to pull requests from 
anyone. If there's anything you'd like to see that I haven't addressed (e.g. 
adding fields to metrics names) feel free to suggest.

Re: Documentation and port to master: Once this is merged I plan to add 
documentation and port to master. I can file a JIRA for those if desired.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-26 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR 

> We need to document how to use new metrics and its reporter, and also 
need to have patch for master branch (maybe with removal of metrics V1 public 
API).

As I mentioned elsewhere, I want to hold off on porting this to the master 
branch until all comments are addressed and this patch is in a mergeable state. 
That way we can avoid porting small changes between the two branches.

As far as deprecating/removing the metrics v1 API, I'd hold off until there 
is a consensus that this approach is a suitable replacement for most use cases.


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-26 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@arunmahadevan 
> Currently this patch addresses registering custom metrics and sending the 
metrics out via reporters and does not send any metrics to nimbus right? Why is 
disruptor metrics handled as a part of this?

No, this does not send any metrics to nimbus. Disruptor metrics are 
included because many users would like to track those metrics (and they aren't 
available via Storm UI/REST).

> There will be one thread per reporter per worker which sends out the 
metrics to the destination? Just thinking if it would overwhelm the destination 
and would need some local aggregation and have the supervisor report the 
metrics.

The idea here was to purposely avoid attempting aggregation, since in some 
cases it's impossible (e.g. there's no way to aggregate histograms). Instead, 
any aggregation is assumed to take place in the destination system (Grafana, 
etc.).


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-09-25 Thread danny0405
Github user danny0405 commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR @revans2  I have read most of the tickets and issues of 
STORM-2153, there are many discussions about the metrics V2 and the now 
built-in metrics. But i didn't see any about the heartbeats part.

Actually the heartbeats now write too much things[ window stats of tasks 
together] to zookeeper[ or pacemaker ], and we report it for worker granularity 
which actually can be pre-aggregated by machine/supervisor.

With such heavy implementation, we have serious efficiency reduction when 
cluster grows to be large, like what i have described in JIRA task: 
https://issues.apache.org/jira/browse/STORM-2693.

So i wanna promote the heartbeats part, which can be done individually 
apart the metrics V2 thing, this is what i will design the heartbeats to be:


![image](https://user-images.githubusercontent.com/7644508/30796521-fd6335fc-a204-11e7-95e5-1c3e08175b57.png)

After the promotion, we do not need to read the heartbeats from zookeeper 
any more when scheduling which will cost much time. But opening Ui will still 
be slow[ not a big deal anyway ].


---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
In this PR, I see that metrics V2 drops most of built-in metrics. 

This PR maintains metrics V1 for built-in and V2 for user-defined and some 
of built-in, but in metrics V1 all of built-in metrics are provided to the 
metrics consumers, so users will lose some metrics after replacing metrics 
consumer to V2. Moreover I guess we would want to replace the old metrics (in 
storm internal) eventually.

For worker there's no more task-level metrics, only (worker, 
component)-level is available, and it only tracks emit, ack and fail. All 
metrics regarding latency are dropped. Netty metrics are dropped too. Some 
built-in metrics might not be able to be injected by users, so also need to 
consider about this when dropping metrics. Maybe would want to turn on and off 
recording part of metrics as Bobby suggested if they impact performance.

All of disruptor queue metrics are taken, but there are many disruptor 
queues in worker but metrics are worker-level, and it's gauge (not aggregation) 
so one of queue metrics would always overwrite.

So would like to see some explanations that which metrics are selected, how 
they're aggregated in worker, how decision has been made, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
FYI on another news of RocksDB:


https://issues.apache.org/jira/browse/MARMOTTA-669?focusedCommentId=16094738&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16094738

```
Hi Sebastian - if you are using ROCKSDB in Marmotta prior to the latest two 
versions - where it has been changed to ALv2, then you need to remove it from 
your source tree unless it clearly is an "optional dependency" as defined 
https://www.apache.org/legal/resolved.html#optional
```

I read this as opposite way: the latest two versions of RocksDB are OK to 
be included to the source tree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-16 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
Just FYI: Facebook just removed PATENT file from RocksDB and make RocksDB 
be dual licenses (GPLv2 and Apache).


https://github.com/facebook/rocksdb/commit/3c327ac2d0fd50bbd82fe1f1af5de909dad769e6

Maybe I guess we will be good to go with RocksDB soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-12 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
 > @ptgoetz Suppose we deprecate the built-in metrics, are they deprecated 
for public API or also Storm codebase? If we would want latter, we should have 
alternative way to collect metrics (maybe internal metrics consumer?) and 
represent to the UI.

@HeartSaVioR 
I'd definitely deprecate the public API. One approach we could take for 
potentially shipping metrics to nimbus would be to implement an additional 
reporter that pushes them to some (non-RocksDB ;) ) data store for consumption 
by Storm UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-12 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@abellina 
Ah no problem. I can see experimental branch from your fork and also pull 
request for HBase storage.
Btw, from your status, nimbus is pulling (or some components are pushing, 
whatever) the metrics and UI can represent the metrics. Is my understanding 
correct?
Does the code (with RocksDB) work in production or in stage? Does it 
replace built-in metrics with metrics V2? Or does it also keep built-in metrics 
as it as? If it's former, I'd rather see alternatives for RocksDB which can 
make default metrics system "out of the box", and adopt that.
In-memory and snapshot (dump to file) approach (no H/A) may also be 
acceptable given that requirement for availability of metrics is not too strict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-12 Thread abellina
Github user abellina commented on the issue:

https://github.com/apache/storm/pull/2203
  
This certainly is a bummer. I've been focusing on other projects so I have 
been out for a while. Sorry about this.

For the status, we have an interface that gets the metrics from codahale to 
nimbus and we have a rocksdb implementation. For the default out of the box 
metric, we could rocksdb out and populate the maps that nimbus uses for the UI, 
at least. Ideally, however, we'd replace rocksdb for another suitable store 
(suggestions?)

Along those lines, one of the interns at Oath has code working to use Hbase 
as a store. Perhaps that's not the "out of the box" solution, but that is using 
the same interface via Nimbus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2203
  
@ptgoetz Suppose we deprecate the built-in metrics, are they deprecated for 
public API or also Storm codebase? If we would want latter, we should have 
alternative way to collect metrics (maybe internal metrics consumer?) and 
represent to the UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2203: STORM-2153: New Metrics Reporting API

2017-07-11 Thread ptgoetz
Github user ptgoetz commented on the issue:

https://github.com/apache/storm/pull/2203
  
@HeartSaVioR 

> Do we plan to close all of sub-issues after merging this patch?

No. I'm proposing a alternative. The RocksDb license issue is a blocker, so 
I'm proposing we move forward with just t his for now. Ultimately I'd like to 
see the built in metrics deprecated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >