On Wed, 7 Sep 2016 18:08:50 +0200
Dominique Devienne <ddevie...@gmail.com> wrote:

> Imagine there's a 25GB SQLite DB file in version v1 of an application.
> 
> Because the way this DB is structured, with large to very large
> blobs, this leads to problems for the app. That DB has 35 tables, but
> 5 of those represent 95% of the DB size (because of those blobs).
> 
> So for v2 of the app, there's a new design where the 30 "lightweight"
> tables move into a new DB file (with additional new tables), and the 5
> "blob heavy" tables remain in the existing DB file (but the 30 "moved"
> tables should be dropped from it).
> 
> Initial design was to copy the DB file (app is "shutdown", so no
> connection to that DB file).
> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
> But the 25GB copy was deemed too long in that case (several minutes).
> (after all, copying 95% of 25GB to discard all those GBs is
> inefficient)
> 
> So alternate design was to create the schema empty in the new DB file,
> attach the old one, and insert the data from the 30 tables into the
> new empty (with identical schema/structure) tables. But that's also
> very slow apparently.
> 
> Even though it's not my app, I know there are indexes and triggers on
> those tables (but the triggers are mostly on deletes, so don't really
> apply here), and I suggested adding those after the selects, but I
> doubt it's going to make a dramatic difference time-wise.
> 
> Conceptually, the ideal situation would be to just copy the DB header,
> sqlite_master page(s) (dropping the 5 big tables definitions and
> related indexes/triggers from sqlite_master), and only the 5% of
> pages related to the other tables (and related objects).  (chaining
> of pages and root pages in sqlite_master have to be updated of
> course). It's almost like a form of VACCUM, except it's not in-place
> and works on a subset of the tables. IO-wise, that has the potential
> to be 20x faster I imagine.
> 
> But of course there's no such "out-of-place" "partial" VACCUM...
> 
> Then I though maybe .backup, which also works at the page level (I
> believe), perhaps can subset what tables to backup. But no luck there
> either. backup works for the whole DB, not a subset.
> 
> Am I missing other ways to achieve this "split" efficiently?
> Any chance the backup API could group table-specific (and related
> objects) alternate form?

Not answering your question, but some comments/tricks about how boost sqlite3 
performance on your scenario.

First, compile last sqlite3 version with these options on

SQLITE_DIRECT_OVERFLOW_READ
SQLITE_DEFAULT_AUTOVACUUM=2

They are described at https://www.sqlite.org/compile.html#direct_overflow_read 
and https://www.sqlite.org/compile.html#omit_autovacuum, don't omit
autovacuum, but set it to 2 or incremental. 

For an explanation about why setting autovacuum incremental works see
http://marc.info/?l=sqlite-users&m=136265346522617&w=4

If your blob size is high, bigger than 500KB, set page size to 64KB.

Create the blob column the last one.

Compile with dbstat (SQLITE_ENABLE_DBSTAT_VTAB) option if you want statistics 
about your blob db. You can get information about fragmentation, etc, see
https://www.sqlite.org/dbstat.html#section_3

If your use case is write heavy, don't use wal, modifications (CUD) are not 
atomic when use attached dbs and wal mode. If your use case is read heavy, use 
wal.

I use (and develop) a diy filesystem over sqlite and has a similar scenario 
with hundreds of GB of blobs on multiple databases. Those compile options, 
tricks 
and split metadata from data made a huge improvement.

> Any advice would be appreciated. Thanks, --DD

HTH

---   ---
Eduardo Morras <emorr...@yahoo.es>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to