Re: [sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Александр Гурьянов
Richard, Paul thanks! You help me so much. Omitting WITHOUT ROWID do the
trick.

2014-12-02 3:04 GMT+07:00 Paul Sanderson :

> Further to Richards comment the formula to determine when a record
> overflows into a page file is given in section 1.5 of the file format
> doc
>
> http://www.sqlite.org/fileformat.html
>
> The relevant section states:
>
> The amount of payload that spills onto overflow pages also depends on
> the page type. For the following computations, let U be the usable
> size of a database page, the total page size less the reserved space
> at the end of each page. And let P be the payload size.
>
> Table B-Tree Leaf Cell:
>
> If the payload size P is less than or equal to U-35 then the entire
> payload is stored on the b-tree leaf page. Let M be
> ((U-12)*32/255)-23. If P is greater than U-35 then the number of byte
> stored on the b-tree leaf page is the smaller of M+((P-M)%(U-4)) and
> U-35. Note that number of bytes stored on the leaf page is never less
> than M.
>
> So in your case with a max blob size of 49870 bytes (which is > U-35)
> so as far as I see it all the blobs can fit onto a single page and
> acccording to that formula they should be.
>
> Certainly in my testing I used your schema (with rowids) and wrote a
> single blob of size 6 bytes to a DB with a pagesize of 65536 bytes
> and the complete blob was stored in one page as expected (and as shown
> below).
>
> http://sandersonforensics.com/pics/largeblobhdr.jpg
>
> http://sandersonforensics.com/pics/largeblob.jpg
>
> You have truncated some of the data for DBs with page size > 512 - I
> assume you have determined that this is not relevant but without the
> info I can't draw any conclusions. I also assume that you started a
> new instance of sqlite between tests?
>
> Perhaps I could suggest that you do a similar test to mine just adding
> one (or a small handful) of blobs to the table and see if this sheds
> any light on what is going on.
>
>
>
>
>
>
>
>
> Paul
> www.sandersonforensics.com
> skype: r3scue193
> twitter: @sandersonforens
> Tel +44 (0)1326 572786
> http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
> -Forensic Toolkit for SQLite
> http://sandersonforensics.com/forum/content.php?168-Reconnoitre - VSC
> processing made easy
>
>
>
> On 1 December 2014 at 13:13, Richard Hipp  wrote:
> > On Sun, Nov 30, 2014 at 11:06 PM, Александр Гурьянов <
> caiiiy...@gmail.com>
> > wrote:
> >
> >> Hi. Hi. I am writing an app that stores some blobs in sqlite3 database.
> My
> >> question is about disk space usage.
> >> For analyst i use sqlite_analyzer.
> >>
> >> My database is created with this script:
> >>
> >> PRAGMA page_size = 512;
> >> PRAGMA synchronous = OFF;
> >> PRAGMA journal_mode = MEMORY;
> >> begin;
> >> create table grid_blobs(grid_id integer, blob blob, PRIMARY
> KEY(grid_id))
> >> WITHOUT ROWID;
> >>
> >
> > You'll have much better success here if you omit the WITHOUT ROWID.  See
> > section 4.0 "When to use WITHOUT ROWID" at
> > https://www.sqlite.org/withoutrowid.html for additional information.
> >
> > See also:   http://www.sqlite.org/affcase1.html#smaller
> >
> >
> >
> >
> >> insert into grid_blobs values(0, randomblob(70));
> >> insert into grid_blobs values(1, randomblob(66));
> >> insert into grid_blobs values(2, randomblob(82));
> >> insert into grid_blobs values(3, randomblob(79));
> >> insert into grid_blobs values(4, randomblob(67));
> >> insert into grid_blobs values(5, randomblob(124));
> >> insert into grid_blobs values(6, randomblob(119));
> >> insert into grid_blobs values(7, randomblob(70));
> >> insert into grid_blobs values(8, randomblob(82));
> >> insert into grid_blobs values(9, randomblob(75));
> >> --... (other entries)
> >> insert into grid_blobs values(2703, randomblob(698));
> >> commit;
> >>
> >> There are 2703 entries with blob length from 31 bytes up to 49870 bytes
> >> (avg: 2297, median: 332).
> >> The output of sqlite_analyzer for this table is:
> >>
> >> sqlite3 test.db "PRAGMA page_size=512; vacuum;" &&
> >> ~/Downloads/sqlite3_analyzer test.db > analyze
> >> Percentage of total database..  99.984%
> >> Number of entries. 2437
> >> Bytes of storage consumed. 3229184
> >> Bytes of payload.. 2608484 80.8%
> >> Average payload per entry. 1070.37
> >> Average unused bytes per entry 235.34
> >> Average fanout 8.00
> >> Non-sequential pages.. 00.0%
> >> Maximum payload per entry. 49877
> >> Entries that use overflow. 232195.2%
> >> Index pages used.. 35
> >> Primary pages used 268
> >> Overflow pages used... 6004
> >> Total pages 

Re: [sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Paul Sanderson
Further to Richards comment the formula to determine when a record
overflows into a page file is given in section 1.5 of the file format
doc

http://www.sqlite.org/fileformat.html

The relevant section states:

The amount of payload that spills onto overflow pages also depends on
the page type. For the following computations, let U be the usable
size of a database page, the total page size less the reserved space
at the end of each page. And let P be the payload size.

Table B-Tree Leaf Cell:

If the payload size P is less than or equal to U-35 then the entire
payload is stored on the b-tree leaf page. Let M be
((U-12)*32/255)-23. If P is greater than U-35 then the number of byte
stored on the b-tree leaf page is the smaller of M+((P-M)%(U-4)) and
U-35. Note that number of bytes stored on the leaf page is never less
than M.

So in your case with a max blob size of 49870 bytes (which is > U-35)
so as far as I see it all the blobs can fit onto a single page and
acccording to that formula they should be.

Certainly in my testing I used your schema (with rowids) and wrote a
single blob of size 6 bytes to a DB with a pagesize of 65536 bytes
and the complete blob was stored in one page as expected (and as shown
below).

http://sandersonforensics.com/pics/largeblobhdr.jpg

http://sandersonforensics.com/pics/largeblob.jpg

You have truncated some of the data for DBs with page size > 512 - I
assume you have determined that this is not relevant but without the
info I can't draw any conclusions. I also assume that you started a
new instance of sqlite between tests?

Perhaps I could suggest that you do a similar test to mine just adding
one (or a small handful) of blobs to the table and see if this sheds
any light on what is going on.








Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
http://sandersonforensics.com/forum/content.php?168-Reconnoitre - VSC
processing made easy



On 1 December 2014 at 13:13, Richard Hipp  wrote:
> On Sun, Nov 30, 2014 at 11:06 PM, Александр Гурьянов 
> wrote:
>
>> Hi. Hi. I am writing an app that stores some blobs in sqlite3 database. My
>> question is about disk space usage.
>> For analyst i use sqlite_analyzer.
>>
>> My database is created with this script:
>>
>> PRAGMA page_size = 512;
>> PRAGMA synchronous = OFF;
>> PRAGMA journal_mode = MEMORY;
>> begin;
>> create table grid_blobs(grid_id integer, blob blob, PRIMARY KEY(grid_id))
>> WITHOUT ROWID;
>>
>
> You'll have much better success here if you omit the WITHOUT ROWID.  See
> section 4.0 "When to use WITHOUT ROWID" at
> https://www.sqlite.org/withoutrowid.html for additional information.
>
> See also:   http://www.sqlite.org/affcase1.html#smaller
>
>
>
>
>> insert into grid_blobs values(0, randomblob(70));
>> insert into grid_blobs values(1, randomblob(66));
>> insert into grid_blobs values(2, randomblob(82));
>> insert into grid_blobs values(3, randomblob(79));
>> insert into grid_blobs values(4, randomblob(67));
>> insert into grid_blobs values(5, randomblob(124));
>> insert into grid_blobs values(6, randomblob(119));
>> insert into grid_blobs values(7, randomblob(70));
>> insert into grid_blobs values(8, randomblob(82));
>> insert into grid_blobs values(9, randomblob(75));
>> --... (other entries)
>> insert into grid_blobs values(2703, randomblob(698));
>> commit;
>>
>> There are 2703 entries with blob length from 31 bytes up to 49870 bytes
>> (avg: 2297, median: 332).
>> The output of sqlite_analyzer for this table is:
>>
>> sqlite3 test.db "PRAGMA page_size=512; vacuum;" &&
>> ~/Downloads/sqlite3_analyzer test.db > analyze
>> Percentage of total database..  99.984%
>> Number of entries. 2437
>> Bytes of storage consumed. 3229184
>> Bytes of payload.. 2608484 80.8%
>> Average payload per entry. 1070.37
>> Average unused bytes per entry 235.34
>> Average fanout 8.00
>> Non-sequential pages.. 00.0%
>> Maximum payload per entry. 49877
>> Entries that use overflow. 232195.2%
>> Index pages used.. 35
>> Primary pages used 268
>> Overflow pages used... 6004
>> Total pages used.. 6307
>> Unused bytes on index pages... 1774 9.9%
>> Unused bytes on primary pages. 8836 6.4%
>> Unused bytes on overflow pages 562916  18.3%
>> Unused bytes on all pages. 573526  17.8%
>>
>> sqlite3 test.db "PRAGMA page_size=1024; vacuum;" &&
>> 

Re: [sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Richard Hipp
On Sun, Nov 30, 2014 at 11:06 PM, Александр Гурьянов 
wrote:

> Hi. Hi. I am writing an app that stores some blobs in sqlite3 database. My
> question is about disk space usage.
> For analyst i use sqlite_analyzer.
>
> My database is created with this script:
>
> PRAGMA page_size = 512;
> PRAGMA synchronous = OFF;
> PRAGMA journal_mode = MEMORY;
> begin;
> create table grid_blobs(grid_id integer, blob blob, PRIMARY KEY(grid_id))
> WITHOUT ROWID;
>

You'll have much better success here if you omit the WITHOUT ROWID.  See
section 4.0 "When to use WITHOUT ROWID" at
https://www.sqlite.org/withoutrowid.html for additional information.

See also:   http://www.sqlite.org/affcase1.html#smaller




> insert into grid_blobs values(0, randomblob(70));
> insert into grid_blobs values(1, randomblob(66));
> insert into grid_blobs values(2, randomblob(82));
> insert into grid_blobs values(3, randomblob(79));
> insert into grid_blobs values(4, randomblob(67));
> insert into grid_blobs values(5, randomblob(124));
> insert into grid_blobs values(6, randomblob(119));
> insert into grid_blobs values(7, randomblob(70));
> insert into grid_blobs values(8, randomblob(82));
> insert into grid_blobs values(9, randomblob(75));
> --... (other entries)
> insert into grid_blobs values(2703, randomblob(698));
> commit;
>
> There are 2703 entries with blob length from 31 bytes up to 49870 bytes
> (avg: 2297, median: 332).
> The output of sqlite_analyzer for this table is:
>
> sqlite3 test.db "PRAGMA page_size=512; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Percentage of total database..  99.984%
> Number of entries. 2437
> Bytes of storage consumed. 3229184
> Bytes of payload.. 2608484 80.8%
> Average payload per entry. 1070.37
> Average unused bytes per entry 235.34
> Average fanout 8.00
> Non-sequential pages.. 00.0%
> Maximum payload per entry. 49877
> Entries that use overflow. 232195.2%
> Index pages used.. 35
> Primary pages used 268
> Overflow pages used... 6004
> Total pages used.. 6307
> Unused bytes on index pages... 1774 9.9%
> Unused bytes on primary pages. 8836 6.4%
> Unused bytes on overflow pages 562916  18.3%
> Unused bytes on all pages. 573526  17.8%
>
> sqlite3 test.db "PRAGMA page_size=1024; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Unused bytes on index pages... 467910.6%
> Unused bytes on primary pages. 172335.7%
> Unused bytes on overflow pages 933114  28.7%
> Unused bytes on all pages. 955026  26.5%
>
> sqlite3 test.db "PRAGMA page_size=2048; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Unused bytes on index pages... 13738   14.6%
> Unused bytes on primary pages. 378826.6%
> Unused bytes on overflow pages 1184110 37.0%
> Unused bytes on all pages. 1235730 31.9%
>
> sqlite3 test.db "PRAGMA page_size=4096; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Unused bytes on index pages... 16017   12.6%
> Unused bytes on primary pages. 576456.3%
> Unused bytes on overflow pages 126 43.2%
> Unused bytes on all pages. 1338106 33.7%
>
> sqlite3 test.db "PRAGMA page_size=8192; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Unused bytes on index pages... 31248   18.2%
> Unused bytes on primary pages. 700635.5%
> Unused bytes on overflow pages 1535111 54.6%
> Unused bytes on all pages. 1636422 38.4%
>
> sqlite3 test.db "PRAGMA page_size=16384; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Unused bytes on index pages... 52549   24.7%
> Unused bytes on primary pages. 889745.1%
> Unused bytes on overflow pages 1202291 59.7%
> Unused bytes on all pages. 1343814 33.9%
>
> sqlite3 test.db "PRAGMA page_size=32768; vacuum;" &&
> ~/Downloads/sqlite3_analyzer test.db > analyze
> Unused bytes on index pages... 36957   18.8%
> Unused bytes on primary pages. 104610   5.0%
> Unused bytes on overflow pages 1104895