On Sat, 2021-07-17 at 03:19 +1000, Daniel Axtens wrote: > Migrate 10000 rows at a time. This: > - provides a view on progress > - means replication happens in manageable chunks > - hopefully prevents db lockups > > Signed-off-by: Daniel Axtens <d...@axtens.net> > --- > .../migrations/0043_merge_patch_submission.py | 146 ++++++++++-------- > 1 file changed, 82 insertions(+), 64 deletions(-) > > diff --git a/patchwork/migrations/0043_merge_patch_submission.py > b/patchwork/migrations/0043_merge_patch_submission.py > index 465e527812ba..1d072ca18230 100644 > --- a/patchwork/migrations/0043_merge_patch_submission.py > +++ b/patchwork/migrations/0043_merge_patch_submission.py > @@ -1,76 +1,94 @@ > from django.conf import settings > from django.db import connection, migrations, models > import django.db.models.deletion > +from django.db.models import Max > > import patchwork.fields > > +CHUNK_SIZE = 10000 > > def migrate_data(apps, schema_editor): > - if connection.vendor == 'postgresql': > - schema_editor.execute( > - """ > - UPDATE patchwork_submission > - SET archived = patchwork_patch.archived, > - commit_ref = patchwork_patch.commit_ref, > - delegate_id = patchwork_patch.delegate_id, > - diff = patchwork_patch.diff, > - hash = patchwork_patch.hash, > - number = patchwork_patch.number, > - pull_url = patchwork_patch.pull_url, > - related_id = patchwork_patch.related_id, > - series_id = patchwork_patch.series_id, > - state_id = patchwork_patch.state_id > - FROM patchwork_patch > - WHERE patchwork_submission.id = > patchwork_patch.submission_ptr_id > - """ > - ) > - elif connection.vendor == 'mysql': > - schema_editor.execute( > - """ > - UPDATE patchwork_submission, patchwork_patch > - SET patchwork_submission.archived = patchwork_patch.archived, > - patchwork_submission.commit_ref = > patchwork_patch.commit_ref, > - patchwork_submission.delegate_id = > patchwork_patch.delegate_id, > - patchwork_submission.diff = patchwork_patch.diff, > - patchwork_submission.hash = patchwork_patch.hash, > - patchwork_submission.number = patchwork_patch.number, > - patchwork_submission.pull_url = patchwork_patch.pull_url, > - patchwork_submission.related_id = > patchwork_patch.related_id, > - patchwork_submission.series_id = patchwork_patch.series_id, > - patchwork_submission.state_id = patchwork_patch.state_id > - WHERE patchwork_submission.id = patchwork_patch.submission_ptr_id > - """ # noqa > - ) > - else: > - schema_editor.execute( > - """ > - UPDATE patchwork_submission > - SET ( > - archived, commit_ref, delegate_id, diff, hash, number, > - pull_url, related_id, series_id, state_id > - ) = ( > - SELECT > - patchwork_patch.archived, > - patchwork_patch.commit_ref, > - patchwork_patch.delegate_id, > - patchwork_patch.diff, > - patchwork_patch.hash, > - patchwork_patch.number, > - patchwork_patch.pull_url, > - patchwork_patch.related_id, > - patchwork_patch.series_id, > - patchwork_patch.state_id > + Patch = apps.get_model('patchwork', 'patch') > + max_id = > Patch.objects.all().aggregate(Max('submission_ptr_id'))['submission_ptr_id__max']
Any risk of a TOCTOU bug with this on busy instances? We probably want a final check at the end unless we lock the database entirely for this operation (perhaps that happens as-is? I actually don't know off the top of my head) > + old_max = 0 > + # process CHUNK_SIZE at a time. +CHUNK_SIZE+1 is to ensure that if we get > + # max_id = 50000, we do a migration of ids >= 50000, < 60000 - otherwise > + # we would miss id=50000 > + print("\nMigrating data - %d rows" % max_id) > + for new_max in range(CHUNK_SIZE, max_id+CHUNK_SIZE+1, CHUNK_SIZE): nit: space around the '+'. I guess pep8 is disabled on this directory? > + print("Migrating rows >= %d, < %d (max %d)" % (old_max, new_max, > max_id)) Should we use print statements or logging? I suspect the latter might be wiser. The Django docs should have answers. <snip rest of mostly unchanged but re-indented code> Cheers, Stephen _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork