Imagine there's a 25GB SQLite DB file in version v1 of an application.

Because the way this DB is structured, with large to very large blobs, this
leads to problems for the app. That DB has 35 tables, but 5 of those
represent 95% of the DB size (because of those blobs).

So for v2 of the app, there's a new design where the 30 "lightweight"
tables move into a new DB file (with additional new tables), and the 5
"blob heavy" tables remain in the existing DB file (but the 30 "moved"
tables should be dropped from it).

Initial design was to copy the DB file (app is "shutdown", so no connection
to that DB file).
Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
But the 25GB copy was deemed too long in that case (several minutes).
(after all, copying 95% of 25GB to discard all those GBs is inefficient)

So alternate design was to create the schema empty in the new DB file,
attach the old one, and insert the data from the 30 tables into the new
empty (with identical schema/structure) tables. But that's also very slow
apparently.

Even though it's not my app, I know there are indexes and triggers on those
tables (but the triggers are mostly on deletes, so don't really apply
here), and I suggested adding those after the selects, but I doubt it's
going to make a dramatic difference time-wise.

Conceptually, the ideal situation would be to just copy the DB header,
sqlite_master page(s) (dropping the 5 big tables definitions and related
indexes/triggers from sqlite_master), and only the 5% of pages related to
the other tables (and related objects).  (chaining of pages and root pages
in sqlite_master have to be updated of course). It's almost like a form of
VACCUM, except it's not in-place and works on a subset of the tables.
IO-wise, that has the potential to be 20x faster I imagine.

But of course there's no such "out-of-place" "partial" VACCUM...

Then I though maybe .backup, which also works at the page level (I
believe), perhaps can subset what tables to backup. But no luck there
either. backup works for the whole DB, not a subset.

Am I missing other ways to achieve this "split" efficiently?
Any chance the backup API could group table-specific (and related objects)
alternate form?

Any advice would be appreciated. Thanks, --DD
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to