So, as we discussed during meeting, I have offloaded the data (for stg) older than half a year to another database. This is how I did it (probably could have been done more efficiently, but hey, this worked, and I'm not postgres expert...):
$ pg_dump -Fc resultsdb_stg > resultsdb_stg.dump # dump the resultsdb_stg to file $ createdb -T template0 resultsdb_stg_archive # create new empty database callend resultsdb_stg_archive $ pg_restore -d resultsdb_stg_archive resultsdb_stg.dump # load data from the dump to the resultsbd_stg_archive db $ psql resultsdb_stg_archive =# -- Get the newest result we want to keep in archives =# select id, job_id from result where submit_time<'2016-06-01' order by submit_time desc limit 1; id | job_id ---------+-------- 7857664 | 308901 =# -- Since jobs can contain multiple results, let's select the first result with the 'next' job_id (could be done as 'select id, job_id from result where job_id = 308902 order by id limit 1;' too, but this would automagically catch a hole in the job sequence) =# select id, job_id from result where job_id > 308901 order by id limit 1; id | job_id ---------+-------- 7857665 | 308902 =# -- delete all the result_data, results, and jobs, starting from what we got in the previous query =# delete from result_data where result_id >= 7857665; =# delete from result where id >= 7857665; =# delete from job where id >= 308902; $ psql resultsdb_stg =# -- since the db's were 'cloned' at the beginning, perform deletion of the inverse set of data than we did in archive =# delete from result_data where result_id < 7857665; =# delete from result where id < 7857665; =# delete from job where id < 308902; On Wed, Dec 7, 2016 at 2:19 PM, Josef Skladanka <jskla...@redhat.com> wrote: > > > On Mon, Dec 5, 2016 at 4:25 PM, Tim Flink <tfl...@redhat.com> wrote: > >> Is there a way we could export the results as a json file or something >> similar? If there is (or if it could be added without too much >> trouble), we would have multiple options: >> > > Sure, adding some kind of export should be doable > > >> >> 1. Dump the contents of the current db, do a partial offline migration >> and finish it during the upgrade outage by export/importing the >> newest data, deleting the production db and importing the offline >> upgraded db. If that still takes too long, create a second postgres >> db containing the offline upgrade, switchover during the outage and >> import the new results since the db was copied. >> >> > I slept just two hours, so this is a bit entangled for me. So - my initial > idea was, that we > - dump the database > - delete most of the results > - do migration on the small data set > > In paralel (or later on), we would > - create a second database (let's call it 'archive') > - import the un-migrated dump > - remove data that is in the production db > - run the lenghty migration > > This way, we have minimal downtime, and the data are available in the > 'archive' db, > > With the archive db, we could either > 1) dump the data and then import it to the prod db (again no down-time) > 2) just spawn another resultsdb (archives.resultsdb?) instance, that would > operate on top of the archives > > I'd rather do the second, since it also has the benefit of being able to > offload old data > to the 'archive' database (which would/could be 'slow by definition'), > while keeping the 'active' dataset > small enough, that it could all be in memory for fast queries,. > > What do you think? I guess we wanted to do something pretty similar, I > just got lost a bit in what you wrote :) > > > >> 2. If the import/export process is fast enough, might be able to do >> instead of the inplace migration >> > > My gut feeling is that it would be pretty slow, but I have no relevant > experience. > > Joza > >
_______________________________________________ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org