Hi all, I have this idea in my head for a couple of months already but I didn't manage to progress much by myself, so I'm throwing it out to see what people think about it.
Audit log support in Ganeti =========================== As Ganeti matures, one of missing functionality is the ability to query it for things like: “What happened to this instance three days ago?”, “When was this instance powercycled?”, or “How many times was this OpCode executed in the past week?”. The lack of such background information hampers debugging of both the code and of the state of the fleet. Background ---------- The ability to properly report changes in state of virtual machines is equally important to being able to perform the respective changes. After a certain machine fleet size, the ability to be able to understand the changes in the cluster state in time becomes very important. Similarly, internal information is not available for many of the daemons. How many RPC calls where made to the node daemon on a given node? How many RAPI calls to the master daemon? There exist already methods to find this data: - from archived jobs, if they haven't been cleaned up - from (ganeti) logs, but this presents continuity problems - or from syslog, if logging to syslog is enabled But all of these have downsides that a native audit log implementation can address. Overview -------- Currently, there are two main methods to investigate the history of changes in cluster state: archived jobs and log files (Ganeti's or syslog). The job method has two downsides: - jobs contain opcodes, whose structure can change over releases, so loading an older job for investigation can be problematic - due to size issues, older jobs are automatically purged from the cluster The log-based method also has downsides: - unless using centralised logging via syslog, continuity of logs across master failover events is lost - logs are somewhat verbose, and parsing the logs requires an understanding of the actual opcodes used for a given operations Furthermore, both these methods have the downside of requiring low-level (SSH) administrator access to either the Ganeti node or integration with an external, centralised logging system. And they only represent that an operation was made, and not why. Proposed design --------------- A new, long-term audit log will be implemented in Ganeti, that can be queried and manipulated via Ganeti-level commands (as opposed to shell-based). The audit log will have a very simply format (mostly textual), so that consistency can be kept across version upgrades. The log will consist of entries having this structure: - timestamp - entity, one of: - cluster - group/$name - node/$name - instance/$name - uuid, the UUID of the entity - operation description (string) - context (string) - source Fields ~~~~~~ Timestamp +++++++++ This is a normal timestamp, down to millisecond precision. Entity/UUID +++++++++++ In order to track “logical” renames, e.g. the case where 'instance1' is renamed to 'instance1-old' and a new 'instance1' is created, we will store both the entity name and the entity UUID. Searches can then be made both via UUID and via names, providing a better picture. Operation string ++++++++++++++++ This field should hold a textual representation of the opcode/operation in cause. For example, an ``OP_INSTANCE_MIGRATE`` (which doesn't hold the source/target nodes) might be transformed into “Instance migrated from node $A to $B”. It is important that this description logs all the parameters that are significant for the operation, e.g. “Changed memory from 1GiB to 6GiB”. Context +++++++ To logically group related operations together, the context field should be used. For example, the command ``gnt-node evacuate node2`` creates multiple jobs that can perform this evacuation in parallel, all having one opcode ``OP_INSTANCE_REPLACE_DISKS``. In order to be able to view this as a single logical operation, the audit entries for the instances should have the same context (“evacuate node node2”). Source ++++++ An item that is completely missing from current operation logs is who initiated the operation. Due to the unauthenticated nature of LUXI, such information cannot be ultimately authenticated, but certain paths can be (as long as root is trusted). The proposed format of this field is textual, with slash-separated components. Examples are easier to explain this: - ``gnt-cluster/root`` - ``ganeti-watcher`` - ``RAPI/anonymous`` (this should not happen in reality) - ``RAPI/admin`` - ``RAPI/admin/enduser`` (for a RAPI operation under the RAPI user admin but where the RAPI client supplied an extra context) - ``hbal`` - etc. The path should be trusted only up to the last trusted entity; for example, if the 'admin' RAPI user is known to be used only by a known application, then we can trust the contexts provided by this user; but if the user is not trusted, then all contexts that it provides should be ignored. Storage details ~~~~~~~~~~~~~~~ TO BE DONE Needs to be replicated. Needs to be time-limited or space-limited. Since the size of the audit log can be significant (e.g. 1000 instances * 2 operations per day * 365 = 730K entries. 1M entries with valid fields make a Python interpreter eat ~400MB RSS, and when converted to JSON it takes 200MB), we must allow download/expire on old entries. This means that long-term storage of audit logs is off-cluster. User interface ~~~~~~~~~~~~~~ The basic functionality is, of course, examining audit logs and being able to search them. Multiple ways of searching will be provided: - search by entity name - search by UUID - timestamp searches Alternatives considered ----------------------- Maybe it makes more sense to just offload this to an external service, and not implemented on-the-cluster itself. Thoughts? .. vim: set textwidth=72 : .. Local Variables: .. mode: rst .. fill-column: 72 .. End:
