[ https://issues.apache.org/jira/browse/IGNITE-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Chudov reassigned IGNITE-10418: ------------------------------------- Assignee: (was: Denis Chudov) > Implement lightweight profiling of messages processing > ------------------------------------------------------ > > Key: IGNITE-10418 > URL: https://issues.apache.org/jira/browse/IGNITE-10418 > Project: Ignite > Issue Type: New Feature > Reporter: Alexey Scherbakov > Priority: Major > Labels: IEP-35 > > There is a lack of capabilities to identify bottlenecks without extensive > profiling on server and client side (JFR recording, sampling profilers, > regular thread dumps, etc), which is not always possible. Even having > profiling data not always helpful for determining several types of > bottlenecks, for example, if there is a contention on single key/partition. > Lightweight message profiling will allow to track each message execution, to > collect a statistics of execution in executors for each grid node and for all > nodes, collect histograms distributed by waiting/execution time for each type > of message. > We need to implement: > # histogram metrics for message execution time, queue waiting time, queue > size at the moments of queue add and execution start, with distribution by > message type; > # Dumping of messages if it’s execution/waiting time exceeds some threshold > timeout, i.e. > {code:java} > Slow message: *enqueueTs*=2018-11-27 15:10:22.241, *waitTime*=0.048, > *procTime*=305.186, *messageId*=3a3064a9, *queueSzBefore*=0, > *headMessageId*=null, *queueSzAfter*=0, *message*=GridNearTxFinishRequest > [miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest > [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], > futId=199a3155761-f379f312-ad4b-4181-acc5-0aacb3391f07, threadId=296, > commitVer=null, invalidate=false, commit=true, baseVer=null, txSize=0, > sys=false, plc=2, subjId=dda703a0-69ee-47cf-9b9a-bf3dc9309feb, > taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=IgniteTxStateImpl > [activeCacheIds=[644280847], recovery=false, mvccEnabled=false, txMap=HashSet > [IgniteTxEntry [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], > cacheId=644280847, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=8, val=8, > hasValBytes=true], cacheId=644280847], val=[op=READ, val=null], > prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], > entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, > explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], > filtersPassed=false, filtersSet=false, entry=GridCacheMapEntry > [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], val=null, > ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=8, > extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion > [topVer=2147483647, order=0, nodeOrder=0]], flags=2]GridDistributedCacheEntry > [super=]GridDhtCacheEntry [rdrs=ReaderId[] [], part=8, super=], prepared=0, > locked=false, nodeId=null, locMapped=false, expiryPlc=null, > transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, > xidVer=GridCacheVersion{code} > # JMX tools and command line interface to get this metrics and print > statistics view. -- This message was sent by Atlassian Jira (v8.20.10#820010)