On 12/12/2014 7:54 PM, melanie witt wrote:
Hi everybody,

At some point, our db archiving functionality got broken because there was a 
change to stop ever deleting instance system metadata [1]. For those 
unfamiliar, the 'nova-manage db archive_deleted_rows' is the thing that moves 
all soft-deleted (deleted=nonzero) rows to the shadow tables. This is a 
periodic cleaning that operators can do to maintain performance (as things can 
get sluggish when deleted=nonzero rows accumulate).

The change was made because instance_type data still needed to be read even after 
instances had been deleted, because we allow admin to view deleted instances. I saw a bug 
[2] and two patches [3][4] which aimed to fix this by changing back to soft-deleting 
instance sysmeta when instances are deleted, and instead allow 
read_deleted="yes" for the things that need to read instance_type for deleted 
instances present in the db.

My question is, is this approach okay? If so, I'd like to see these patches 
revive so we can have our db archiving working again. :) I think there's likely 
something I'm missing about the approach, so I'm hoping people who know more 
about instance sysmeta than I do, can chime in on how/if we can fix this for db 
archiving. Thanks.

[1] https://bugs.launchpad.net/nova/+bug/1185190
[2] https://bugs.launchpad.net/nova/+bug/1226049
[3] https://review.openstack.org/#/c/110875/
[4] https://review.openstack.org/#/c/109201/

melanie (melwitt)






_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I'd like to bring this back up since even though [3] and [4] are merged, nova-manage db archive_deleted_rows still fails to delete rows from some tables because of foreign key constraint issues, detailed here:

https://bugs.launchpad.net/nova/+bug/1183523/comments/12

I'm wondering why we don't reverse sort the tables using the sqlalchemy metadata object before processing the tables for delete? That's the same thing I did in the 267 migration since we needed to process the tree starting with the leafs and then eventually get back to the instances table (since most roads lead to the instances table).

Another thing that's really weird is how max_rows is used in this code. There is cumulative tracking of the max_rows value so if the value you pass in is too small, you might not actually be removing anything.

I figured max_rows meant up to max_rows from each table, not max_rows *total* across all tables. By my count, there are 52 tables in the nova db model. The way I read the code, if I pass in max_rows=10 and say it processes table A and archives 7 rows, then when it processes table B it will pass max_rows=(max_rows - rows_archived), which would be 3 for table B. If we archive 3 rows from table B, rows_archived >= max_rows and we quit. So to really make this work, you have to pass in something big for max_rows, like 1000, which seems completely random.

Does this seem odd to anyone else? Given the relationships between tables, I'd think you'd want to try and delete max_rows for all tables, so archive 10 instances, 10 block_device_mapping, 10 pci_devices, etc.

I'm also bringing this up now because there is a thread in the operators list which pointed me to a set of scripts that operators at GoDaddy are using for archiving deleted rows:

http://lists.openstack.org/pipermail/openstack-operators/2015-October/008392.html

Presumably because the command in nova doesn't work. We should either make this thing work or just punt and delete it because no one cares.

--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to