[ 
https://issues.apache.org/jira/browse/IGNITE-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-10418:
----------------------------------
    Description: 
There is a lack of capabilities to identify bottlenecks without extensive 
profiling on server and client side (JFR recording, sampling profilers, regular 
thread dumps, etc), which is not always possible. Even having profiling data 
not always helpful for determining several types of bottlenecks, for example, 
if there is a contention on single key/partition.

Lightweight message profiling will allow to track each message execution, to 
collect a statistics of execution in executors for each grid node and for all 
nodes, collect histograms distributed by waiting/execution time for each type 
of message.

We need to implement:
 # histogram metrics for message execution time, queue waiting time, queue size 
at the moments of queue add and execution start, with distribution by message 
type;

 # Dumping of messages if it’s execution/waiting time exceeds some threshold 
timeout, i.e.
{code:java}
Slow message: *enqueueTs*=2018-11-27 15:10:22.241, *waitTime*=0.048, 
*procTime*=305.186, *messageId*=3a3064a9, *queueSzBefore*=0, 
*headMessageId*=null, *queueSzAfter*=0, *message*=GridNearTxFinishRequest 
[miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest 
[topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
futId=199a3155761-f379f312-ad4b-4181-acc5-0aacb3391f07, threadId=296, 
commitVer=null, invalidate=false, commit=true, baseVer=null, txSize=0, 
sys=false, plc=2, subjId=dda703a0-69ee-47cf-9b9a-bf3dc9309feb, taskNameHash=0, 
flags=32, syncMode=FULL_SYNC, txState=IgniteTxStateImpl 
[activeCacheIds=[644280847], recovery=false, mvccEnabled=false, txMap=HashSet 
[IgniteTxEntry [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], 
cacheId=644280847, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=8, val=8, 
hasValBytes=true], cacheId=644280847], val=[op=READ, val=null], 
prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], 
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, 
explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], 
filtersPassed=false, filtersSet=false, entry=GridCacheMapEntry 
[key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], val=null, 
ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=8, 
extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion 
[topVer=2147483647, order=0, nodeOrder=0]], flags=2]GridDistributedCacheEntry 
[super=]GridDhtCacheEntry [rdrs=ReaderId[] [], part=8, super=], prepared=0, 
locked=false, nodeId=null, locMapped=false, expiryPlc=null, 
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
xidVer=GridCacheVersion{code}

 # JMX tools and command line interface to get this metrics and print 
statistics view.

> Implement lightweight profiling of messages processing
> ------------------------------------------------------
>
>                 Key: IGNITE-10418
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10418
>             Project: Ignite
>          Issue Type: New Feature
>            Reporter: Alexei Scherbakov
>            Assignee: Denis Chudov
>            Priority: Major
>             Fix For: 2.8
>
>
> There is a lack of capabilities to identify bottlenecks without extensive 
> profiling on server and client side (JFR recording, sampling profilers, 
> regular thread dumps, etc), which is not always possible. Even having 
> profiling data not always helpful for determining several types of 
> bottlenecks, for example, if there is a contention on single key/partition.
> Lightweight message profiling will allow to track each message execution, to 
> collect a statistics of execution in executors for each grid node and for all 
> nodes, collect histograms distributed by waiting/execution time for each type 
> of message.
> We need to implement:
>  # histogram metrics for message execution time, queue waiting time, queue 
> size at the moments of queue add and execution start, with distribution by 
> message type;
>  # Dumping of messages if it’s execution/waiting time exceeds some threshold 
> timeout, i.e.
> {code:java}
> Slow message: *enqueueTs*=2018-11-27 15:10:22.241, *waitTime*=0.048, 
> *procTime*=305.186, *messageId*=3a3064a9, *queueSzBefore*=0, 
> *headMessageId*=null, *queueSzAfter*=0, *message*=GridNearTxFinishRequest 
> [miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest 
> [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> futId=199a3155761-f379f312-ad4b-4181-acc5-0aacb3391f07, threadId=296, 
> commitVer=null, invalidate=false, commit=true, baseVer=null, txSize=0, 
> sys=false, plc=2, subjId=dda703a0-69ee-47cf-9b9a-bf3dc9309feb, 
> taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=IgniteTxStateImpl 
> [activeCacheIds=[644280847], recovery=false, mvccEnabled=false, txMap=HashSet 
> [IgniteTxEntry [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], 
> cacheId=644280847, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=8, val=8, 
> hasValBytes=true], cacheId=644280847], val=[op=READ, val=null], 
> prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], 
> entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, 
> explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], 
> filtersPassed=false, filtersSet=false, entry=GridCacheMapEntry 
> [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], val=null, 
> ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=8, 
> extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion 
> [topVer=2147483647, order=0, nodeOrder=0]], flags=2]GridDistributedCacheEntry 
> [super=]GridDhtCacheEntry [rdrs=ReaderId[] [], part=8, super=], prepared=0, 
> locked=false, nodeId=null, locMapped=false, expiryPlc=null, 
> transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
> xidVer=GridCacheVersion{code}
>  # JMX tools and command line interface to get this metrics and print 
> statistics view.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to