[sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Александр Гурьянов
Hi. Hi. I am writing an app that stores some blobs in sqlite3 database. My
question is about disk space usage.
For analyst i use sqlite_analyzer.

My database is created with this script:

PRAGMA page_size = 512;
PRAGMA synchronous = OFF;
PRAGMA journal_mode = MEMORY;
begin;
create table grid_blobs(grid_id integer, blob blob, PRIMARY KEY(grid_id))
WITHOUT ROWID;
insert into grid_blobs values(0, randomblob(70));
insert into grid_blobs values(1, randomblob(66));
insert into grid_blobs values(2, randomblob(82));
insert into grid_blobs values(3, randomblob(79));
insert into grid_blobs values(4, randomblob(67));
insert into grid_blobs values(5, randomblob(124));
insert into grid_blobs values(6, randomblob(119));
insert into grid_blobs values(7, randomblob(70));
insert into grid_blobs values(8, randomblob(82));
insert into grid_blobs values(9, randomblob(75));
--... (other entries)
insert into grid_blobs values(2703, randomblob(698));
commit;

There are 2703 entries with blob length from 31 bytes up to 49870 bytes
(avg: 2297, median: 332).
The output of sqlite_analyzer for this table is:

sqlite3 test.db PRAGMA page_size=512; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Percentage of total database..  99.984%
Number of entries. 2437
Bytes of storage consumed. 3229184
Bytes of payload.. 2608484 80.8%
Average payload per entry. 1070.37
Average unused bytes per entry 235.34
Average fanout 8.00
Non-sequential pages.. 00.0%
Maximum payload per entry. 49877
Entries that use overflow. 232195.2%
Index pages used.. 35
Primary pages used 268
Overflow pages used... 6004
Total pages used.. 6307
Unused bytes on index pages... 1774 9.9%
Unused bytes on primary pages. 8836 6.4%
Unused bytes on overflow pages 562916  18.3%
Unused bytes on all pages. 573526  17.8%

sqlite3 test.db PRAGMA page_size=1024; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 467910.6%
Unused bytes on primary pages. 172335.7%
Unused bytes on overflow pages 933114  28.7%
Unused bytes on all pages. 955026  26.5%

sqlite3 test.db PRAGMA page_size=2048; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 13738   14.6%
Unused bytes on primary pages. 378826.6%
Unused bytes on overflow pages 1184110 37.0%
Unused bytes on all pages. 1235730 31.9%

sqlite3 test.db PRAGMA page_size=4096; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 16017   12.6%
Unused bytes on primary pages. 576456.3%
Unused bytes on overflow pages 126 43.2%
Unused bytes on all pages. 1338106 33.7%

sqlite3 test.db PRAGMA page_size=8192; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 31248   18.2%
Unused bytes on primary pages. 700635.5%
Unused bytes on overflow pages 1535111 54.6%
Unused bytes on all pages. 1636422 38.4%

sqlite3 test.db PRAGMA page_size=16384; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 52549   24.7%
Unused bytes on primary pages. 889745.1%
Unused bytes on overflow pages 1202291 59.7%
Unused bytes on all pages. 1343814 33.9%

sqlite3 test.db PRAGMA page_size=32768; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 36957   18.8%
Unused bytes on primary pages. 104610   5.0%
Unused bytes on overflow pages 1104895 70.2%
Unused bytes on all pages. 1246462 32.2%

sqlite3 test.db PRAGMA page_size=65536; vacuum; 
~/Downloads/sqlite3_analyzer test.db  analyze
Unused bytes on index pages... 115861  44.2%
Unused bytes on primary pages. 179123   7.2%
Unused bytes on overflow pages 427834  72.5%
Unused bytes on all pages. 722818  21.6%

As you can see there are initially 18% of unused space in overflow pages.
As far as I 

Re: [sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Richard Hipp
On Sun, Nov 30, 2014 at 11:06 PM, Александр Гурьянов caiiiy...@gmail.com
wrote:

 Hi. Hi. I am writing an app that stores some blobs in sqlite3 database. My
 question is about disk space usage.
 For analyst i use sqlite_analyzer.

 My database is created with this script:

 PRAGMA page_size = 512;
 PRAGMA synchronous = OFF;
 PRAGMA journal_mode = MEMORY;
 begin;
 create table grid_blobs(grid_id integer, blob blob, PRIMARY KEY(grid_id))
 WITHOUT ROWID;


You'll have much better success here if you omit the WITHOUT ROWID.  See
section 4.0 When to use WITHOUT ROWID at
https://www.sqlite.org/withoutrowid.html for additional information.

See also:   http://www.sqlite.org/affcase1.html#smaller




 insert into grid_blobs values(0, randomblob(70));
 insert into grid_blobs values(1, randomblob(66));
 insert into grid_blobs values(2, randomblob(82));
 insert into grid_blobs values(3, randomblob(79));
 insert into grid_blobs values(4, randomblob(67));
 insert into grid_blobs values(5, randomblob(124));
 insert into grid_blobs values(6, randomblob(119));
 insert into grid_blobs values(7, randomblob(70));
 insert into grid_blobs values(8, randomblob(82));
 insert into grid_blobs values(9, randomblob(75));
 --... (other entries)
 insert into grid_blobs values(2703, randomblob(698));
 commit;

 There are 2703 entries with blob length from 31 bytes up to 49870 bytes
 (avg: 2297, median: 332).
 The output of sqlite_analyzer for this table is:

 sqlite3 test.db PRAGMA page_size=512; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Percentage of total database..  99.984%
 Number of entries. 2437
 Bytes of storage consumed. 3229184
 Bytes of payload.. 2608484 80.8%
 Average payload per entry. 1070.37
 Average unused bytes per entry 235.34
 Average fanout 8.00
 Non-sequential pages.. 00.0%
 Maximum payload per entry. 49877
 Entries that use overflow. 232195.2%
 Index pages used.. 35
 Primary pages used 268
 Overflow pages used... 6004
 Total pages used.. 6307
 Unused bytes on index pages... 1774 9.9%
 Unused bytes on primary pages. 8836 6.4%
 Unused bytes on overflow pages 562916  18.3%
 Unused bytes on all pages. 573526  17.8%

 sqlite3 test.db PRAGMA page_size=1024; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 467910.6%
 Unused bytes on primary pages. 172335.7%
 Unused bytes on overflow pages 933114  28.7%
 Unused bytes on all pages. 955026  26.5%

 sqlite3 test.db PRAGMA page_size=2048; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 13738   14.6%
 Unused bytes on primary pages. 378826.6%
 Unused bytes on overflow pages 1184110 37.0%
 Unused bytes on all pages. 1235730 31.9%

 sqlite3 test.db PRAGMA page_size=4096; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 16017   12.6%
 Unused bytes on primary pages. 576456.3%
 Unused bytes on overflow pages 126 43.2%
 Unused bytes on all pages. 1338106 33.7%

 sqlite3 test.db PRAGMA page_size=8192; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 31248   18.2%
 Unused bytes on primary pages. 700635.5%
 Unused bytes on overflow pages 1535111 54.6%
 Unused bytes on all pages. 1636422 38.4%

 sqlite3 test.db PRAGMA page_size=16384; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 52549   24.7%
 Unused bytes on primary pages. 889745.1%
 Unused bytes on overflow pages 1202291 59.7%
 Unused bytes on all pages. 1343814 33.9%

 sqlite3 test.db PRAGMA page_size=32768; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 36957   18.8%
 Unused bytes on primary pages. 104610   5.0%
 Unused bytes on overflow pages 1104895 70.2%
 Unused bytes on all pages. 1246462 32.2%

 sqlite3 test.db PRAGMA page_size=65536; vacuum; 
 

Re: [sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Paul Sanderson
Further to Richards comment the formula to determine when a record
overflows into a page file is given in section 1.5 of the file format
doc

http://www.sqlite.org/fileformat.html

The relevant section states:

The amount of payload that spills onto overflow pages also depends on
the page type. For the following computations, let U be the usable
size of a database page, the total page size less the reserved space
at the end of each page. And let P be the payload size.

Table B-Tree Leaf Cell:

If the payload size P is less than or equal to U-35 then the entire
payload is stored on the b-tree leaf page. Let M be
((U-12)*32/255)-23. If P is greater than U-35 then the number of byte
stored on the b-tree leaf page is the smaller of M+((P-M)%(U-4)) and
U-35. Note that number of bytes stored on the leaf page is never less
than M.

So in your case with a max blob size of 49870 bytes (which is  U-35)
so as far as I see it all the blobs can fit onto a single page and
acccording to that formula they should be.

Certainly in my testing I used your schema (with rowids) and wrote a
single blob of size 6 bytes to a DB with a pagesize of 65536 bytes
and the complete blob was stored in one page as expected (and as shown
below).

http://sandersonforensics.com/pics/largeblobhdr.jpg

http://sandersonforensics.com/pics/largeblob.jpg

You have truncated some of the data for DBs with page size  512 - I
assume you have determined that this is not relevant but without the
info I can't draw any conclusions. I also assume that you started a
new instance of sqlite between tests?

Perhaps I could suggest that you do a similar test to mine just adding
one (or a small handful) of blobs to the table and see if this sheds
any light on what is going on.








Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
http://sandersonforensics.com/forum/content.php?168-Reconnoitre - VSC
processing made easy



On 1 December 2014 at 13:13, Richard Hipp d...@sqlite.org wrote:
 On Sun, Nov 30, 2014 at 11:06 PM, Александр Гурьянов caiiiy...@gmail.com
 wrote:

 Hi. Hi. I am writing an app that stores some blobs in sqlite3 database. My
 question is about disk space usage.
 For analyst i use sqlite_analyzer.

 My database is created with this script:

 PRAGMA page_size = 512;
 PRAGMA synchronous = OFF;
 PRAGMA journal_mode = MEMORY;
 begin;
 create table grid_blobs(grid_id integer, blob blob, PRIMARY KEY(grid_id))
 WITHOUT ROWID;


 You'll have much better success here if you omit the WITHOUT ROWID.  See
 section 4.0 When to use WITHOUT ROWID at
 https://www.sqlite.org/withoutrowid.html for additional information.

 See also:   http://www.sqlite.org/affcase1.html#smaller




 insert into grid_blobs values(0, randomblob(70));
 insert into grid_blobs values(1, randomblob(66));
 insert into grid_blobs values(2, randomblob(82));
 insert into grid_blobs values(3, randomblob(79));
 insert into grid_blobs values(4, randomblob(67));
 insert into grid_blobs values(5, randomblob(124));
 insert into grid_blobs values(6, randomblob(119));
 insert into grid_blobs values(7, randomblob(70));
 insert into grid_blobs values(8, randomblob(82));
 insert into grid_blobs values(9, randomblob(75));
 --... (other entries)
 insert into grid_blobs values(2703, randomblob(698));
 commit;

 There are 2703 entries with blob length from 31 bytes up to 49870 bytes
 (avg: 2297, median: 332).
 The output of sqlite_analyzer for this table is:

 sqlite3 test.db PRAGMA page_size=512; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Percentage of total database..  99.984%
 Number of entries. 2437
 Bytes of storage consumed. 3229184
 Bytes of payload.. 2608484 80.8%
 Average payload per entry. 1070.37
 Average unused bytes per entry 235.34
 Average fanout 8.00
 Non-sequential pages.. 00.0%
 Maximum payload per entry. 49877
 Entries that use overflow. 232195.2%
 Index pages used.. 35
 Primary pages used 268
 Overflow pages used... 6004
 Total pages used.. 6307
 Unused bytes on index pages... 1774 9.9%
 Unused bytes on primary pages. 8836 6.4%
 Unused bytes on overflow pages 562916  18.3%
 Unused bytes on all pages. 573526  17.8%

 sqlite3 test.db PRAGMA page_size=1024; vacuum; 
 ~/Downloads/sqlite3_analyzer test.db  analyze
 Unused bytes on index pages... 467910.6%
 Unused bytes on primary 

Re: [sqlite] Much unused space in sqlite3 database when using blobs

2014-12-01 Thread Александр Гурьянов
Richard, Paul thanks! You help me so much. Omitting WITHOUT ROWID do the
trick.

2014-12-02 3:04 GMT+07:00 Paul Sanderson sandersonforens...@gmail.com:

 Further to Richards comment the formula to determine when a record
 overflows into a page file is given in section 1.5 of the file format
 doc

 http://www.sqlite.org/fileformat.html

 The relevant section states:

 The amount of payload that spills onto overflow pages also depends on
 the page type. For the following computations, let U be the usable
 size of a database page, the total page size less the reserved space
 at the end of each page. And let P be the payload size.

 Table B-Tree Leaf Cell:

 If the payload size P is less than or equal to U-35 then the entire
 payload is stored on the b-tree leaf page. Let M be
 ((U-12)*32/255)-23. If P is greater than U-35 then the number of byte
 stored on the b-tree leaf page is the smaller of M+((P-M)%(U-4)) and
 U-35. Note that number of bytes stored on the leaf page is never less
 than M.

 So in your case with a max blob size of 49870 bytes (which is  U-35)
 so as far as I see it all the blobs can fit onto a single page and
 acccording to that formula they should be.

 Certainly in my testing I used your schema (with rowids) and wrote a
 single blob of size 6 bytes to a DB with a pagesize of 65536 bytes
 and the complete blob was stored in one page as expected (and as shown
 below).

 http://sandersonforensics.com/pics/largeblobhdr.jpg

 http://sandersonforensics.com/pics/largeblob.jpg

 You have truncated some of the data for DBs with page size  512 - I
 assume you have determined that this is not relevant but without the
 info I can't draw any conclusions. I also assume that you started a
 new instance of sqlite between tests?

 Perhaps I could suggest that you do a similar test to mine just adding
 one (or a small handful) of blobs to the table and see if this sheds
 any light on what is going on.








 Paul
 www.sandersonforensics.com
 skype: r3scue193
 twitter: @sandersonforens
 Tel +44 (0)1326 572786
 http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
 -Forensic Toolkit for SQLite
 http://sandersonforensics.com/forum/content.php?168-Reconnoitre - VSC
 processing made easy



 On 1 December 2014 at 13:13, Richard Hipp d...@sqlite.org wrote:
  On Sun, Nov 30, 2014 at 11:06 PM, Александр Гурьянов 
 caiiiy...@gmail.com
  wrote:
 
  Hi. Hi. I am writing an app that stores some blobs in sqlite3 database.
 My
  question is about disk space usage.
  For analyst i use sqlite_analyzer.
 
  My database is created with this script:
 
  PRAGMA page_size = 512;
  PRAGMA synchronous = OFF;
  PRAGMA journal_mode = MEMORY;
  begin;
  create table grid_blobs(grid_id integer, blob blob, PRIMARY
 KEY(grid_id))
  WITHOUT ROWID;
 
 
  You'll have much better success here if you omit the WITHOUT ROWID.  See
  section 4.0 When to use WITHOUT ROWID at
  https://www.sqlite.org/withoutrowid.html for additional information.
 
  See also:   http://www.sqlite.org/affcase1.html#smaller
 
 
 
 
  insert into grid_blobs values(0, randomblob(70));
  insert into grid_blobs values(1, randomblob(66));
  insert into grid_blobs values(2, randomblob(82));
  insert into grid_blobs values(3, randomblob(79));
  insert into grid_blobs values(4, randomblob(67));
  insert into grid_blobs values(5, randomblob(124));
  insert into grid_blobs values(6, randomblob(119));
  insert into grid_blobs values(7, randomblob(70));
  insert into grid_blobs values(8, randomblob(82));
  insert into grid_blobs values(9, randomblob(75));
  --... (other entries)
  insert into grid_blobs values(2703, randomblob(698));
  commit;
 
  There are 2703 entries with blob length from 31 bytes up to 49870 bytes
  (avg: 2297, median: 332).
  The output of sqlite_analyzer for this table is:
 
  sqlite3 test.db PRAGMA page_size=512; vacuum; 
  ~/Downloads/sqlite3_analyzer test.db  analyze
  Percentage of total database..  99.984%
  Number of entries. 2437
  Bytes of storage consumed. 3229184
  Bytes of payload.. 2608484 80.8%
  Average payload per entry. 1070.37
  Average unused bytes per entry 235.34
  Average fanout 8.00
  Non-sequential pages.. 00.0%
  Maximum payload per entry. 49877
  Entries that use overflow. 232195.2%
  Index pages used.. 35
  Primary pages used 268
  Overflow pages used... 6004
  Total pages used.. 6307
  Unused bytes on index pages... 1774 9.9%
  Unused bytes on primary pages. 8836 6.4%
  Unused bytes on overflow pages 562916  18.3%