[jira] [Commented] (STORM-1742) More accurate 'complete latency'

ASF GitHub Bot (JIRA) Fri, 29 Apr 2016 02:15:56 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263786#comment-15263786
 ]


ASF GitHub Bot commented on STORM-1742:
---------------------------------------

Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/1379#issuecomment-215665798
  
    I've done with functional tests.
    
    To experiment this behavior, I made a odd topology which modifies 
ExclamationTopology to let spout emits 1000 tuples and sleeps 5 secs in 
nextTuple().
    https://gist.github.com/HeartSaVioR/9ebac9f7f8503e878b0009a37c2b686b
    
    I also turn off backpressure to not affect the test and also adjust receive 
buffer size and send buffer size to 16384.
    
    > before patch
    
    * complete latency
    
    ```
     10:word            __complete-latency      {default=5211.189090909091}
     11:word            __complete-latency      {default=5193.9646017699115}
     12:word            __complete-latency      {default=5230.410909090909}
     13:word            __complete-latency      {default=5188.668965517241}
     14:word            __complete-latency      {default=5214.798181818182}
     15:word            __complete-latency      {default=5204.954545454545}
     16:word            __complete-latency      {default=5205.9745454545455}
     17:word            __complete-latency      {default=5210.390909090909}
     18:word            __complete-latency      {default=5223.48}
     19:word            __complete-latency      {default=5182.4290909090905}
    ```
    (Avg. of complete latency is over 5 secs as I intended.)
    
    * process latency for bolts
    
    ```
      1:__acker         __process-latency       
{exclaim2:__ack_ack=0.13774403470715835, 
exclaim1:__ack_ack=0.004287245444801715, word:__ack_init=0.004836109618484685}
      2:__acker         __process-latency       
{exclaim2:__ack_ack=0.16721132897603486, 
exclaim1:__ack_ack=0.0038503850385038503, word:__ack_init=0.0049261083743842365}
      3:__acker         __process-latency       
{exclaim2:__ack_ack=0.14852037967615858, 
exclaim1:__ack_ack=0.006379585326953748, word:__ack_init=0.002738225629791895}
      5:exclaim1        __process-latency       
{word:default=0.31357254290171604}
      6:exclaim1        __process-latency       
{word:default=0.29818181818181816}
      7:exclaim1        __process-latency       
{word:default=0.27874774232390126}
      8:exclaim2        __process-latency       
{exclaim1:default=0.24564586357039186}
      9:exclaim2        __process-latency       
{exclaim1:default=0.23582848837209303}
    ```
    
    > after patch
    
    * complete latency
    
    ```
     10:word            __complete-latency      {default=25.16}
     11:word            __complete-latency      {default=34.832727272727276}
     12:word            __complete-latency      {default=30.92}
     13:word            __complete-latency      {default=24.403636363636362}
     14:word            __complete-latency      {default=47.261818181818185}
     15:word            __complete-latency      {default=34.67272727272727}
     16:word            __complete-latency      {default=36.11090909090909}
     17:word            __complete-latency      {default=22.966010733452595}
     18:word            __complete-latency      {default=46.26989619377163}
     19:word            __complete-latency      {default=24.95090909090909}
    ```
    
    * process latency for bolts
    
    ```
      1:__acker         __process-latency       
{exclaim2:__ack_ack=0.15648854961832062, 
exclaim1:__ack_ack=0.003131524008350731, word:__ack_init=0.0065252854812398045}
      2:__acker         __process-latency       
{exclaim2:__ack_ack=0.19327731092436976, 
exclaim1:__ack_ack=0.005393743257820928, word:__ack_init=0.005910800644814616}
      3:__acker         __process-latency       
{exclaim2:__ack_ack=0.17433155080213902, 
exclaim1:__ack_ack=0.004909983633387889, word:__ack_init=0.002696871628910464}
      5:exclaim1        __process-latency       {word:default=0.3125}
      6:exclaim1        __process-latency       {word:default=0.363871635610766}
      7:exclaim1        __process-latency       
{word:default=0.31440526001195457}
      8:exclaim2        __process-latency       
{exclaim1:default=0.27243125904486254}
      9:exclaim2        __process-latency       
{exclaim1:default=0.2510775862068966}
    ```
    
    While sum of process latency is not close to complete latency (it makes 
sense since they exclude waiting times in queues, serde, transfer, etc.) 
applying patch makes complete latency much closer.


> More accurate 'complete latency'
> --------------------------------
>
>                 Key: STORM-1742
>                 URL: https://issues.apache.org/jira/browse/STORM-1742
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>
> I already initiated talking thread on dev@ list. Below is copy of the content 
> in my mail.
> http://mail-archives.apache.org/mod_mbox/storm-dev/201604.mbox/%3CCAF5108gn=rskundfs7-sgy_pd-_prgj2hf2t5e5zppp-knd...@mail.gmail.com%3E
> While thinking about metrics improvements, I doubt how many users know that
> what 'exactly' is complete latency. In fact, it's somewhat complicated
> because additional waiting time could be added to complete latency because
> of single-thread model event loop of spout.
> Long running nextTuple() / ack() / fail() can affect complete latency but
> it's behind the scene. No latency information provided, and someone even
> didn't know about this characteristic. Moreover, calling nextTuple() could
> be skipped due to max spout waiting, which will make us harder to guess
> when avg. latency of nextTuple() will be provided.
> I think separation of threads (tuple handler to separate thread, as JStorm
> provides) would resolve the gap, but it requires our spout logic to be
> thread-safe, so I'd like to find workaround first.
> My sketched idea is let Acker decides end time for root tuple.
> There're two subsequent ways to decide start time for root tuple,
> 1. when Spout about to emit ACK_INIT to Acker (in other words, keep it as
> it is)
>   - Acker sends ack / fail message to Spout with timestamp, and Spout
> calculates time delta
>   - pros. : It's most accurate way since it respects the definition of
> 'complete latency'.
>   - cons. : The sync of machine time between machines are very important.
> Milliseconds of precision would be required.
> 2. when Acker receives ACK_INIT from Spout
>   - Acker calculates time delta itself, and sends ack / fail message to
> Spout with time delta
>   - pros. : No requirement to sync the time between servers so strictly.
>   - cons. : It doesn't contain the latency to send / receive ACK_INIT
> between Spout and Acker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1742) More accurate 'complete latency'

Reply via email to