Denis Chudov created IGNITE-12070:
-------------------------------------

             Summary: Document the new ability to track system/user time of 
transactions
                 Key: IGNITE-12070
                 URL: https://issues.apache.org/jira/browse/IGNITE-12070
             Project: Ignite
          Issue Type: Task
          Components: documentation
            Reporter: Denis Chudov


Now there is ability to track system/user time of transactions. System time is 
the time that is spent for system activities - i.e. time while aquiring locks, 
preparing, commiting, etc.User time is the time that is spent for user 
activities when client node runs some code while holding transaction.

We have ability to log info about transactions that exceed some threshold 
execution timeout, or some percentage of all transactions. Log record in case 
of long-running transactions looks like following:
{code:java}
[2019-08-09 13:39:49,130][WARN ][sys-stripe-1-#101%client%][root] Long 
transaction time dump [startTime=13:39:47.970, totalTime=1160, systemTime=157, 
userTime=1003, cacheOperationsTime=141, prepareTime=15, commitTime=0, 
tx=GridNearTxLocal [...]]
{code}
In case of sampling of all transactions:


{code:java}
[2019-08-09 13:39:54,079][INFO ][sys-stripe-2-#102%client%][root] Transaction 
time dump [startTime=13:39:54.063, totalTime=15, systemTime=6, userTime=9, 
cacheOperationsTime=2, prepareTime=3, commitTime=0, tx=GridNearTxLocal [...]]
{code}
Also some of transactions can be skipped to not overflow the log, information 
about this log throttling looks like this:
{code:java}
[2019-08-09 13:39:55,109][INFO ][sys-stripe-0-#100%client%][root] Transaction 
time dumps skipped because of log throttling: 2
{code}
There are JMX parameters and JVM options to control this behavior:
1)
JVM option: IGNITE_LONG_TRANSACTION_TIME_DUMP_THRESHOLD
JMX parameter: TransactionsMXBean.longTransactionTimeDumpThreshold
Threshold timeout in milliseconds for long transactions, if transaction exceeds 
it, it will be dumped in log with information about how much time did it spent 
in system time and user time. Default value is 0. No info about system/user 
time of long transactions is dumped in log if this parameter is not set.
2) 
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesCoefficient
The coefficient for samples of completed transactions that will be dumped in 
log. Must be float value between 0.0 and 1.0 inclusive. Default value is 0.0.
3) 
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_PER_SECOND_LIMIT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesPerSecondLimit
The limit of samples of completed transactions that will be dumped in log per 
second, if IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT is above 0.0. Must 
be integer value greater than 0. Default value is 5.

For the existing long running transaction warning was added information about 
current system and user time of transaction:
{code:java}
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] First 
10 long running transactions [total=1]
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] >>> 
Transaction [startTime=14:10:31.170, curTime=14:10:31.750, systemTime=32, 
userTime=548, tx=GridNearTxLocal [...]]
{code}

Also added following metrics to monitor system and user time for single node:
diagnostic.transactions.totalNodeSystemTime - Total transactions system time on 
node.
diagnostic.transactions.totalNodeUserTime - Total transactions user time on 
node.
diagnostic.transactions.nodeSystemTimeHistogram - Transactions system times on 
node represented as histogram.
diagnostic.transactions.nodeUserTimeHistogram - Transactions user times on node 
represented as histogram.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to