Re: [sqlite] Coping with database growth/fragmentation

2010-07-28 Thread Max Vlasov
On Fri, Jul 23, 2010 at 5:11 AM, Taras Glek  wrote:

>  Hello,
> Recently I spent some time investigating sqlite IO patterns in Mozilla.
> Two issues came up: keeping sqlite files from getting fragmented and
> fixing fragmented sqlite files.
>
>
Funny, that's why I like reading someone's questions in this list. It helps
sometimes solving old tasks :). As many noticed Windows system cache is hard
to control. So for example it almost impossible to clear reading cache for
testing purposes, once you read the file, it's in the cache so the following
timing numbers are irrelevant. There's an option for CreateFile,
FILE_FLAG_NO_BUFFERING, it can be used to disable the cache for a file when
one wants to work with it. I thought maybe to change the sources and prepare
a special version of sqlite allowing to open without cache. But a better
solution at least on XP came, if I "touch" a file with
CreateFile(..FILE_FLAG_NO_BUFFERING) and close it, it won't use the cache
for the next file opening, so for testing purposes I just made a checkbox in
an admin that "touches" the file before passing it to sqlite. And it seems
it really works.

So, Taras, thank for your post )

Also with this approach I tried to test places.sqlite, particularly
moz_places table, the query was

SELECT * FROM moz_places WHERE url Like "%double%"

I suppose that mozilla team probably uses different queries, but since there
are no fts table recognizable, there should be some kind of full-scan.

So, my tests on two hard drives showed that windows fragmentation had small
effect on the performance of the query, while VACUUM; results had
significant. Before Vacuum, my long time places.sqlite 13M in size, having
moz_places with 16893 records, return results after 8-10 seconds, depending
on the place it lived, but after VACUUM, the results were between 150ms and
300ms.

I think that this can be due to the nature of windows read ahead cache, so
when sqlite tables are placed in consequent blocks of file, Windows loads
the same pages sqlite expects. So before any file system defragmentation,
internal sqlite defragmentation (VACUUM) have to be applied.

Max Vlasov,
maxerist.net
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Coping with database growth/fragmentation

2010-07-27 Thread Taras Glek
On 07/23/2010 04:38 AM, Martin Engelschalk wrote:
>Hello Taras, List,
>
> I have been fighting the same problems described here for a long time,
> and have no real elegant solution. So, the proposed solution of the OP
> below would be ideal for me too.
> The proposed pragma could also define a number of pages to be allocated
> at once instead of a number of bytes.
>
> In my case, the database grows continously and the file is often
> extremely fragmented when the growth phase is finished (this concerns
> the file on the disk, not internal fragmentation)
>
> Currently, i monitor the size of the database using pragma
> freelist_count. When I see the value of free pages approach zero, i
> create a dummy table with a blob field and fill it with a very large
> empty blob. Then i drop the table. The empty pages remain behind and
> page_count does not rise any more for a time.
> This has been proposed to me on this list a while ago.
>
> However, testing the database in this way and creating and dropping the
> table carries a performance penalty, and finding the strategic places in
> my application to do this has been difficult.

Yeah sounds like the same problem. Interesting workaround.

Here is my "fix". https://bugzilla.mozilla.org/show_bug.cgi?id=581606

This dramatically reduces fragmentation for append-only workloads.

Taras
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Coping with database growth/fragmentation

2010-07-23 Thread Simon Slavin

On 23 Jul 2010, at 2:11am, Taras Glek wrote:

> Recently I spent some time investigating sqlite IO patterns in Mozilla. 
> Two issues came up: keeping sqlite files from getting fragmented and 
> fixing fragmented sqlite files.

If I understand correctly, there are two levels of fragmentation involved: disk 
fragmentation (where the sectors of the disk file are spread about) and 
database fragmentation (where the pages of database information are spread 
about the disk file).

> First on fixing fragmentation:
> Currently we write pretty heavily to our databases. This causes the 
> databases to grow, queries to slow down.

Can I check that you are not just seeing fragmentation, but are actually seeing 
performance vary with fragmentation ?  Because having that happen to an extent 
that's noticeable is something traditionally associated only with Windows, and 
your blog entry says you're using ext4 on Linux.  Other platforms and file 
systems /have/ fragmentation, of course, but it doesn't normally slow them down 
as much as fragmentation slows down Windows.

Some platforms handle this in unexpected ways.  For instance, OS X will 
automatically defragment files smaller than 20MB each time they're opened.  It 
won't defragment the database pages because, of course, it doesn't understand 
SQLite format.

The easiest way to make a defragmented copy of a SQLite file would be to use 
the command-line tool to .dump a copy of a database to a text file, then again 
to .read that textfile into a database.  Under Unix you can do it in one 
command:

sqlite3 old_database.sqlite .dump | sqlite3 new_database.sqlite

The resulting SQLite database file will not only be defragmented but will have 
some other optimal characteristics.  I would be interested to know if you 
really do see performance improvements by doing this then replacing 
old_database.sqlite with new_database.sqlite .

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Coping with database growth/fragmentation

2010-07-23 Thread Martin Engelschalk
  Hello Taras, List,

I have been fighting the same problems described here for a long time, 
and have no real elegant solution. So, the proposed solution of the OP 
below would be ideal for me too.
The proposed pragma could also define a number of pages to be allocated 
at once instead of a number of bytes.

In my case, the database grows continously and the file is often 
extremely fragmented when the growth phase is finished (this concerns 
the file on the disk, not internal fragmentation)

Currently, i monitor the size of the database using pragma 
freelist_count. When I see the value of free pages approach zero, i 
create a dummy table with a blob field and fill it with a very large 
empty blob. Then i drop the table. The empty pages remain behind and 
page_count does not rise any more for a time.
This has been proposed to me on this list a while ago.

However, testing the database in this way and creating and dropping the 
table carries a performance penalty, and finding the strategic places in 
my application to do this has been difficult.

Martin


Am 23.07.2010 03:11, schrieb Taras Glek:


>   Seems like the
> easiest fix here is to add a pragma fs_allocation_size. It would
> preallocate a continuous chunk of diskspace. Sqlite would behave exactly
> as it does now, except it would avoid truncating the file beyond a
> multiple of the fs_allocation_size.
> For example, pragma fs_allocation_size=50M would grow the db file to
> 50megabytes. Once the db grows to beyond 50mb the underlying file would
> get resized to 100mb.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users