Alexander Belyak created IGNITE-19366:
-----------------------------------------

             Summary: Monitoring in AI3
                 Key: IGNITE-19366
                 URL: https://issues.apache.org/jira/browse/IGNITE-19366
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 3.0
            Reporter: Alexander Belyak


AI3 needs some monitoring tools ready prior to the first production 
installation.

In my opinion, firstly we need to make some documentation with:

1) the first set of monitoring tools (enlist each aspect of what should be done)

2) high level describe each element and try to mark its difficulty

3) split the implementation into phases: must have, should have, nice to have

>From my point of view, the most crucial thing is database locks. AI3 should be 
>able to show what (who and for how long) prevents transaction processing. 

To show it AI3 may provide:
 * a system table/view with all transactions with at least one active lock/lock 
attempt, its id and id(s) of the tx it's waiting for.
 * ability to log some debug info into the log when a transaction is killed by 
a deadlock prevention mechanism (not sure if it should be a part of this 
document)

The second majority problem is long-running queries.

To show it AI3 may provide:
 * a system table/view with all running queries/txs with their origin 
(client/node/username), start time, text, and id.
 * ability to log such queries into the log file (queries that took longer than 
N ms)

The others can contain:
 * index usage monitoring
 * memory usage (by tables, indexes, caches, metadata)
 * data integrity (can the user turn off a particular cluster node or not? Was 
rebalance finished?)
 * per query resource consumption (actual read pages (from dist/mem, 
globally/locally?), CPU, memory for the caching)
 * node/cluster configuration
 * background processes status (index rebuild, autovacuum, schema changes 
background processing)

Mandatory requirement - each option has to have its user documentation (and 
example of usage?)

What it should not cover/be:
 * data statistics
 * query plans
 * performance tuning instructions/manuals
 * tuning options to prevent excessive locking/database overloading like time 
to live, deadlock detection/prevention mechanisms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to