Hello, Stackers.
I’d like to start thread related to Monitoring of provisioned resources. Taking a look at production-ready solutions for monitoring shows us that almost all this systems are requiring plugin/agent deployment to provide online server monitoring, up-to-date statistics, etc. Monitoring question was raised at previous design summit, but there no discussions, proposals were made. I’d like to start discussion related to in-VM Database monitoring. For now trove-guestagent sends its status once it’s being modified. For monitoring perspective this type of reporting covers only availability/accessibility of deployed database. But what about other metrics? After certain research i’ve collected next abstract set of metrics and its units for databases: CPUUtilization The percentage of CPU utilization. Units: Percent DatabaseConnections The number of database connections in use. Units: Count DiskQueueDepth The number of outstanding IOs (read/write requests) waiting to access the disk. Units: Count FreeableMemory The amount of available random access memory. Units: Bytes FreeStorageSpace The amount of available storage space. Units: Bytes SwapUsage The amount of swap space used on the DB Instance. Units: Bytes ReadIOPS The average number of disk I/O operations per second. Units: Count/Second WriteIOPS The average number of disk I/O operations per second. Units: Count/Second ReadLatency The average amount of time taken per disk I/O operation. Units: Seconds WriteLatency The average amount of time taken per disk I/O operation. Units: Seconds ReadThroughput The average number of bytes read from disk per second. Units: Bytes/Second WriteThroughput The average number of bytes written to disk per second. Units: Bytes/Second NetworkReceiveThroughput The incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. Units: Bytes NetworkTransmitThroughput The outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. Units: Bytes And the list of specific metrics specific for datastores: 1. Cassandra (see [1]) 2. MongoDB (see [2]) 3. Redis (see [3]) 4. Couchbase (see [4]) 5. MySQL (see [5]) BinLogDiskUsage The amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas. Units: Bytes ReplicaLag The amount of time a Read Replica DB Instance lags behind the source DB Instance. Applies to MySQL read replicas. The ReplicaLag metric reports the value of the Seconds_Behind_Master field of the MySQL SHOW SLAVE STATUS command. For more information, see [6] <http://dev.mysql.com/doc/refman/5.6/en/show-slave-status.html> Units: Seconds To receive all metrics we might need to adopt guestagent to send them as part of notification process (by using periodic task mechanism), as the part of ceilometer integration. So, the major goal of this thread is to collect all use cases and requirements and build out suitable monitoring feature design (step by step, of course). Thoughts? Links: [1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_monitoring_c.html [2] http://blog.mongodb.org/post/62152249344/the-top-5-metrics-to-watch-in-mongodb [3] http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CacheMetrics.Redis.html [4] http://blog.couchbase.com/monitoring-couchbase-cluster [5] http://www.hyperic.com/products/mysql-monitoring [6] http://dev.mysql.com/doc/refman/5.6/en/show-slave-status.html Best regards, Denis Makogon
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev