Hi all,

I have this idea in my head for a couple of months already but I didn't
manage to progress much by myself, so I'm throwing it out to see what
people think about it.

Audit log support in Ganeti
===========================


As Ganeti matures, one of missing functionality is the ability to query
it for things like: “What happened to this instance three days ago?”,
“When was this instance powercycled?”, or “How many times was this
OpCode executed in the past week?”. The lack of such background
information hampers debugging of both the code and of the state of the
fleet.

Background
----------

The ability to properly report changes in state of virtual machines is
equally important to being able to perform the respective changes. After
a certain machine fleet size, the ability to be able to understand the
changes in the cluster state in time becomes very important.

Similarly, internal information is not available for many of the
daemons. How many RPC calls where made to the node daemon on a given
node? How many RAPI calls to the master daemon?

There exist already methods to find this data:

- from archived jobs, if they haven't been cleaned up
- from (ganeti) logs, but this presents continuity problems
- or from syslog, if logging to syslog is enabled

But all of these have downsides that a native audit log implementation
can address.

Overview
--------

Currently, there are two main methods to investigate the history of
changes in cluster state: archived jobs and log files (Ganeti's or
syslog).

The job method has two downsides:

- jobs contain opcodes, whose structure can change over releases, so
  loading an older job for investigation can be problematic
- due to size issues, older jobs are automatically purged from the
  cluster

The log-based method also has downsides:

- unless using centralised logging via syslog, continuity of logs across
  master failover events is lost
- logs are somewhat verbose, and parsing the logs requires an
  understanding of the actual opcodes used for a given operations

Furthermore, both these methods have the downside of requiring low-level
(SSH) administrator access to either the Ganeti node or integration with
an external, centralised logging system. And they only represent that an
operation was made, and not why.


Proposed design
---------------

A new, long-term audit log will be implemented in Ganeti, that can be
queried and manipulated via Ganeti-level commands (as opposed to
shell-based).

The audit log will have a very simply format (mostly textual), so that
consistency can be kept across version upgrades. The log will consist of
entries having this structure:

- timestamp
- entity, one of:

  - cluster
  - group/$name
  - node/$name
  - instance/$name

- uuid, the UUID of the entity
- operation description (string)
- context (string)
- source

Fields
~~~~~~

Timestamp
+++++++++

This is a normal timestamp, down to millisecond precision.

Entity/UUID
+++++++++++

In order to track “logical” renames, e.g. the case where 'instance1' is
renamed to 'instance1-old' and a new 'instance1' is created, we will
store both the entity name and the entity UUID. Searches can then be
made both via UUID and via names, providing a better picture.

Operation string
++++++++++++++++

This field should hold a textual representation of the opcode/operation
in cause. For example, an ``OP_INSTANCE_MIGRATE`` (which doesn't hold
the source/target nodes) might be transformed into “Instance migrated
from node $A to $B”. It is important that this description logs all the
parameters that are significant for the operation, e.g. “Changed memory
from 1GiB to 6GiB”.

Context
+++++++

To logically group related operations together, the context field should
be used. For example, the command ``gnt-node evacuate node2`` creates
multiple jobs that can perform this evacuation in parallel, all having
one opcode ``OP_INSTANCE_REPLACE_DISKS``. In order to be able to view
this as a single logical operation, the audit entries for the instances
should have the same context (“evacuate node node2”).

Source
++++++

An item that is completely missing from current operation logs is who
initiated the operation. Due to the unauthenticated nature of LUXI, such
information cannot be ultimately authenticated, but certain paths can be
(as long as root is trusted).

The proposed format of this field is textual, with slash-separated
components. Examples are easier to explain this:

- ``gnt-cluster/root``
- ``ganeti-watcher``
- ``RAPI/anonymous`` (this should not happen in reality)
- ``RAPI/admin``
- ``RAPI/admin/enduser`` (for a RAPI operation under the RAPI user admin
  but where the RAPI client supplied an extra context)
- ``hbal``
- etc.

The path should be trusted only up to the last trusted entity; for
example, if the 'admin' RAPI user is known to be used only by a known
application, then we can trust the contexts provided by this user; but
if the user is not trusted, then all contexts that it provides should be
ignored.

Storage details
~~~~~~~~~~~~~~~

TO BE DONE

Needs to be replicated. Needs to be time-limited or space-limited.

Since the size of the audit log can be significant (e.g. 1000
instances * 2 operations per day * 365 = 730K entries. 1M entries with
valid fields make a Python interpreter eat ~400MB RSS, and when
converted to JSON it takes 200MB), we must allow download/expire on old
entries. This means that long-term storage of audit logs is off-cluster.


User interface
~~~~~~~~~~~~~~

The basic functionality is, of course, examining audit logs and being
able to search them. Multiple ways of searching will be provided:

- search by entity name
- search by UUID
- timestamp searches

Alternatives considered
-----------------------

Maybe it makes more sense to just offload this to an external service,
and not implemented on-the-cluster itself. Thoughts?

.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:

Reply via email to