What are the business logic reasons you want to split the database?  Is
your application going to be able to handle dealing with data that'd exist
in multiple places?

When I know I'm going to be putting large blobs within a database, those
blob tables have two fields, one ID, and one for the actual content.
Perhaps even a field to indicate what kind of blob it is, but, I'm also the
type that'd just make a new table for the new type of blob information.

The ID is a 1:1 PK:FK relationship between a field in the "master" table
and this blob data, so the master table would have a direct reference to
the blob record.  This way, whenever I need to do queries, I pull from the
master table, and IF I need to get content about the blob (Say image, MP3,
or whatever the case) I'll open a new query to the blob table, get my data,
then shut it down. I'd also set a unique constraint on the blob ID field as
well, which automatically creates an index against it for fast lookups.


On Wed, Sep 7, 2016 at 12:08 PM, Dominique Devienne <ddevie...@gmail.com>
wrote:

> Imagine there's a 25GB SQLite DB file in version v1 of an application.
>
> Because the way this DB is structured, with large to very large blobs, this
> leads to problems for the app. That DB has 35 tables, but 5 of those
> represent 95% of the DB size (because of those blobs).
>
> So for v2 of the app, there's a new design where the 30 "lightweight"
> tables move into a new DB file (with additional new tables), and the 5
> "blob heavy" tables remain in the existing DB file (but the 30 "moved"
> tables should be dropped from it).
>
> Initial design was to copy the DB file (app is "shutdown", so no connection
> to that DB file).
> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
> But the 25GB copy was deemed too long in that case (several minutes).
> (after all, copying 95% of 25GB to discard all those GBs is inefficient)
>
> So alternate design was to create the schema empty in the new DB file,
> attach the old one, and insert the data from the 30 tables into the new
> empty (with identical schema/structure) tables. But that's also very slow
> apparently.
>
> Even though it's not my app, I know there are indexes and triggers on those
> tables (but the triggers are mostly on deletes, so don't really apply
> here), and I suggested adding those after the selects, but I doubt it's
> going to make a dramatic difference time-wise.
>
> Conceptually, the ideal situation would be to just copy the DB header,
> sqlite_master page(s) (dropping the 5 big tables definitions and related
> indexes/triggers from sqlite_master), and only the 5% of pages related to
> the other tables (and related objects).  (chaining of pages and root pages
> in sqlite_master have to be updated of course). It's almost like a form of
> VACCUM, except it's not in-place and works on a subset of the tables.
> IO-wise, that has the potential to be 20x faster I imagine.
>
> But of course there's no such "out-of-place" "partial" VACCUM...
>
> Then I though maybe .backup, which also works at the page level (I
> believe), perhaps can subset what tables to backup. But no luck there
> either. backup works for the whole DB, not a subset.
>
> Am I missing other ways to achieve this "split" efficiently?
> Any chance the backup API could group table-specific (and related objects)
> alternate form?
>
> Any advice would be appreciated. Thanks, --DD
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to