Hi all,
I'd like to open a discussion on a topic that's been bugging me for a number
of reasons - soft deletes (by that I mean marking rows with deleted=1 in the
db) and related - actions audit. Some research and speculations first...
To be honest I could not find any reason why the feature is there in the
first place. Here's the commit that introduced the 'deleted' columns:
https://github.com/openstack/nova/commit/ae6905b9f1ef97206ee3c8722cec3b26fc0
64f38 - unfortunately the description says only "Refactored orm to support
atomic actions". So the guessing part starts here. These are the possible
uses for soft-deletion of the database records that I could come up with:

1. safety net (recover data what was deleted by accident)
2. audit / log (preserve the information about past data)
3. some kind of micro-optimisation where update is more useful than deletion
- be it speed or ease of handling foreign constraints (or not handling them
straight away more likely)
4. ... no... that's all

But I think there's a number of issues with that approach. First - what are
the issues with the possible uses above. Then - issues that I can see
otherwise. Point by point:

1. Soft-deletion probably makes some restoration possible, but I doubt
there's much that could be done without full analysis of the situation.
Mainly because the database is only about metainformation - the actual data
users care about either goes away (ephemeral disks, memory, ...) or not
(volumes, networks, ...) and is not recoverable. Since resources like ips
and volumes can be just reused in other instances, not all recovery is
possible anyway. Most hardcore fixes could be done by reinserting the
original/reconstructed data just as easily as verifying what's safe to
undelete. Both actions require looking at existing data and locking out
information so it doesn't get reused while we're messing with the previous
state.

2. Soft-deleted records are not great as a source of old information. This
is connected to the previous point - some resources are just reused /
rewritten instead of created and deleted. For example there's no record of
what happens with old floating ips - the information gets overwritten when
the IP is reassigned to the new instance, so the useful bits are gone.

3. This is the only thing I could come up with related to the commit message
itself and the "support atomic actions" part. Maybe it was sometimes easier
to mark something as deleted rather than managing and properly ordering
deletes of a number of related entries.

So with that out of the way, here's a number of issues related to
soft-deletes that I run into myself:

4. Indexing all this data on a busy system is getting a bit silly. Unless
you do your own cleanup of old entries, you will end up in a situation where
looking up instances on a host actually looks through thousands of "deleted"
rows even if only around 20 or so can be live and interesting. I know it's
not a huge deal, but still an unnecessary cpu cycle burning.

5. Some things are just not possible to do in a safe and portable way at the
moment. For example adding a new network and fixed IPs (there's a bug for
that https://bugs.launchpad.net/nova/+bug/755138). I tried to fix this
situation, but actually discovered that this is not possible to do using
only sessions and with the 'deleted' column in place. There are ways to do
it in a specific database (you can lock the whole table in mysql for
example), but it's not portable then. The best you can do easily is limit
the issue and hope that two inserts in different sessions won't happen at
the same time. This could be easily done with an unique constraint if the
'deleted' column wasn't there.
I haven't checked, but guess that anything that can be named (and should
have a unique name) has the same problem - security groups, keys, instances,
...

6. The amount of data grows pretty quickly in a busy environment. It has to
be cleaned up, but due to some constraints, it can't be done easily in one
go. Cleanup triggers help here, but that's some additional work that needs
maintenance during schema changes. Schema changes themselves get interesting
when you're actually spending time converting mostly rows you really don't
care about. There were also instances where migration over many steps failed
for some reason on very old rows (virtual interface related, can't recall
which step was it at the moment). 

7. Not directly related, but I'll get back to that in the summary: owners of
bigger deployments will either want to or are required to hold some record
of various events and customer information. For example to handle security
abuse reports, it would be great to know who owned a specific floating IP at
a specific moment.


So what's my point?
Any use case I can find right now is not really improved by the current
schema. It doesn't look like there are many benefits, but there are
definitely some downsides.

Does anyone know why soft-delete is still in place?
Are there any reasons it can't / shouldn't be removed at this time?
If it's possible to remove it, would you miss it?


If the answers are all "no", my proposal for a future release is to:
- Check if the `deleted` column can be removed. Make sure other rows are
either disconnected (update ref to null) or just deleted as needed and all
data is really being removed.
- Add a system similar to notifications, but for auditable events - who did
what with which resource at what time - in some semi-structured way that
allows reviewing and summaries (basic information as separate columns + a
description as a text message). I saw there was some blueprint for a
cloudaudit api
(https://blueprints.launchpad.net/openstack-common/+spec/cloud-audit-api),
but not much happened with it for a year, so I'm assuming it's dead now.

This would allow both proper cleanup of the data and retention of what's
really necessary. It would also make it possible to use unique constraints
where they're really needed (mainly IP descriptions) to prevent silly
mistakes.

Any additional external processing of deleted records would be easier to do
because the database trigger could be just set on the delete action.
Thoughts, comments and critique welcome :) Let me know what you think about
these issues.

Regards,
Stanisław Pitucha
Cloud Services 
Hewlett Packard


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to