What are the business logic reasons you want to split the database? Is your application going to be able to handle dealing with data that'd exist in multiple places?
When I know I'm going to be putting large blobs within a database, those blob tables have two fields, one ID, and one for the actual content. Perhaps even a field to indicate what kind of blob it is, but, I'm also the type that'd just make a new table for the new type of blob information. The ID is a 1:1 PK:FK relationship between a field in the "master" table and this blob data, so the master table would have a direct reference to the blob record. This way, whenever I need to do queries, I pull from the master table, and IF I need to get content about the blob (Say image, MP3, or whatever the case) I'll open a new query to the blob table, get my data, then shut it down. I'd also set a unique constraint on the blob ID field as well, which automatically creates an index against it for fast lookups. On Wed, Sep 7, 2016 at 12:08 PM, Dominique Devienne <ddevie...@gmail.com> wrote: > Imagine there's a 25GB SQLite DB file in version v1 of an application. > > Because the way this DB is structured, with large to very large blobs, this > leads to problems for the app. That DB has 35 tables, but 5 of those > represent 95% of the DB size (because of those blobs). > > So for v2 of the app, there's a new design where the 30 "lightweight" > tables move into a new DB file (with additional new tables), and the 5 > "blob heavy" tables remain in the existing DB file (but the 30 "moved" > tables should be dropped from it). > > Initial design was to copy the DB file (app is "shutdown", so no connection > to that DB file). > Drop the 5 big tables. Vaccum it (the new DB). Add the new tables. > But the 25GB copy was deemed too long in that case (several minutes). > (after all, copying 95% of 25GB to discard all those GBs is inefficient) > > So alternate design was to create the schema empty in the new DB file, > attach the old one, and insert the data from the 30 tables into the new > empty (with identical schema/structure) tables. But that's also very slow > apparently. > > Even though it's not my app, I know there are indexes and triggers on those > tables (but the triggers are mostly on deletes, so don't really apply > here), and I suggested adding those after the selects, but I doubt it's > going to make a dramatic difference time-wise. > > Conceptually, the ideal situation would be to just copy the DB header, > sqlite_master page(s) (dropping the 5 big tables definitions and related > indexes/triggers from sqlite_master), and only the 5% of pages related to > the other tables (and related objects). (chaining of pages and root pages > in sqlite_master have to be updated of course). It's almost like a form of > VACCUM, except it's not in-place and works on a subset of the tables. > IO-wise, that has the potential to be 20x faster I imagine. > > But of course there's no such "out-of-place" "partial" VACCUM... > > Then I though maybe .backup, which also works at the page level (I > believe), perhaps can subset what tables to backup. But no luck there > either. backup works for the whole DB, not a subset. > > Am I missing other ways to achieve this "split" efficiently? > Any chance the backup API could group table-specific (and related objects) > alternate form? > > Any advice would be appreciated. Thanks, --DD > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users