In the interest of making CouchDB 3.0 "the best CouchDB Classic possible", I'd like to discuss whether to accept a donation from Cloudant of the "Weather Report" diagnostic tool. This tool (and dependencies) are OTP applications, and it is typically run from an escript which connects to a running cluster, gathers numerous diagnostics, and emits various warning and errors when it finds something to complain about. It was originally ported from a fork of Riaknostic (the Automated diagnostic tools for Riak) [1] by Mike Wallace.
The checks it makes are represented by the following modules: weatherreport_check_custodian.erl weatherreport_check_disk.erl weatherreport_check_internal_replication.erl weatherreport_check_ioq.erl weatherreport_check_mem3_sync.erl weatherreport_check_membership.erl weatherreport_check_memory_use.erl weatherreport_check_message_queues.erl weatherreport_check_node_stats.erl weatherreport_check_nodes_connected.erl weatherreport_check_process_calls.erl weatherreport_check_process_memory.erl weatherreport_check_safe_to_rebuild.erl weatherreport_check_search.erl weatherreport_check_tcp_queues.erl While some of these checks are self-contained, check_node_stats, check_process_calls, check_process_memory, and check_message_queues all use recon [2] under the hood. Similarly, check_custodian and check_safe_to_rebuild use another Cloudant OTP application called Custodian, which periodically scans the "dbs" database to track the location of every shard of every database and can integrate with sensu [3] to ensure that operators are aware of any shard that is under-replicated. I have created a POC branch [4] that adds Weather Report, Custodian, and Recon to CouchDB, and when I ran it in my dev environment (without search running), got the following diagnostic output: $ ./weatherreport --etc ~/proj/couchdb/dev/lib/node1/etc/ -a ['node1@127.0.0.1'] [error] Local search node at 'clouseau@127.0.0.1' not responding: pang ['node2@127.0.0.1'] [error] Local search node at 'clouseau@127.0.0.1' not responding: pang ['node3@127.0.0.1'] [error] Local search node at 'clouseau@127.0.0.1' not responding: pang ['node1@127.0.0.1'] [notice] Data directory /Users/jay/proj/couchdb/dev/lib/node1/data is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance. ['node2@127.0.0.1'] [notice] Data directory /Users/jay/proj/couchdb/dev/lib/node2/data is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance. ['node3@127.0.0.1'] [notice] Data directory /Users/jay/proj/couchdb/dev/lib/node3/data is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance. returned 1 There is still a little cleanup to be done before these tools would be ready to donate, but it seems that overall they already integrate tolerably well with CouchDB. As far as licenses go, Riaknostic is Apache 2.0. Recon is not [5], but it seems like it should be ok to include in CouchDB based on my possibly naive reading. Currently Custodian has no license (just Copyright 2013 Cloudant), but I assume it would get an Apache license, just like all other donated code. Would this be a welcome addition to CouchDB? Please let me know what you think. Thanks, Jay [1] https://github.com/basho/riaknostic [2] http://ferd.github.io/recon/ [3] https://sensu.io [4] https://github.com/apache/couchdb/compare/master...cloudant:weatherreport?expand=1 [5] https://github.com/ferd/recon/blob/master/LICENSE