Re: [sqlite] Coping with database growth/fragmentation
On Fri, Jul 23, 2010 at 5:11 AM, Taras Glekwrote: > Hello, > Recently I spent some time investigating sqlite IO patterns in Mozilla. > Two issues came up: keeping sqlite files from getting fragmented and > fixing fragmented sqlite files. > > Funny, that's why I like reading someone's questions in this list. It helps sometimes solving old tasks :). As many noticed Windows system cache is hard to control. So for example it almost impossible to clear reading cache for testing purposes, once you read the file, it's in the cache so the following timing numbers are irrelevant. There's an option for CreateFile, FILE_FLAG_NO_BUFFERING, it can be used to disable the cache for a file when one wants to work with it. I thought maybe to change the sources and prepare a special version of sqlite allowing to open without cache. But a better solution at least on XP came, if I "touch" a file with CreateFile(..FILE_FLAG_NO_BUFFERING) and close it, it won't use the cache for the next file opening, so for testing purposes I just made a checkbox in an admin that "touches" the file before passing it to sqlite. And it seems it really works. So, Taras, thank for your post ) Also with this approach I tried to test places.sqlite, particularly moz_places table, the query was SELECT * FROM moz_places WHERE url Like "%double%" I suppose that mozilla team probably uses different queries, but since there are no fts table recognizable, there should be some kind of full-scan. So, my tests on two hard drives showed that windows fragmentation had small effect on the performance of the query, while VACUUM; results had significant. Before Vacuum, my long time places.sqlite 13M in size, having moz_places with 16893 records, return results after 8-10 seconds, depending on the place it lived, but after VACUUM, the results were between 150ms and 300ms. I think that this can be due to the nature of windows read ahead cache, so when sqlite tables are placed in consequent blocks of file, Windows loads the same pages sqlite expects. So before any file system defragmentation, internal sqlite defragmentation (VACUUM) have to be applied. Max Vlasov, maxerist.net ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Coping with database growth/fragmentation
On 07/23/2010 04:38 AM, Martin Engelschalk wrote: >Hello Taras, List, > > I have been fighting the same problems described here for a long time, > and have no real elegant solution. So, the proposed solution of the OP > below would be ideal for me too. > The proposed pragma could also define a number of pages to be allocated > at once instead of a number of bytes. > > In my case, the database grows continously and the file is often > extremely fragmented when the growth phase is finished (this concerns > the file on the disk, not internal fragmentation) > > Currently, i monitor the size of the database using pragma > freelist_count. When I see the value of free pages approach zero, i > create a dummy table with a blob field and fill it with a very large > empty blob. Then i drop the table. The empty pages remain behind and > page_count does not rise any more for a time. > This has been proposed to me on this list a while ago. > > However, testing the database in this way and creating and dropping the > table carries a performance penalty, and finding the strategic places in > my application to do this has been difficult. Yeah sounds like the same problem. Interesting workaround. Here is my "fix". https://bugzilla.mozilla.org/show_bug.cgi?id=581606 This dramatically reduces fragmentation for append-only workloads. Taras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Coping with database growth/fragmentation
On 23 Jul 2010, at 2:11am, Taras Glek wrote: > Recently I spent some time investigating sqlite IO patterns in Mozilla. > Two issues came up: keeping sqlite files from getting fragmented and > fixing fragmented sqlite files. If I understand correctly, there are two levels of fragmentation involved: disk fragmentation (where the sectors of the disk file are spread about) and database fragmentation (where the pages of database information are spread about the disk file). > First on fixing fragmentation: > Currently we write pretty heavily to our databases. This causes the > databases to grow, queries to slow down. Can I check that you are not just seeing fragmentation, but are actually seeing performance vary with fragmentation ? Because having that happen to an extent that's noticeable is something traditionally associated only with Windows, and your blog entry says you're using ext4 on Linux. Other platforms and file systems /have/ fragmentation, of course, but it doesn't normally slow them down as much as fragmentation slows down Windows. Some platforms handle this in unexpected ways. For instance, OS X will automatically defragment files smaller than 20MB each time they're opened. It won't defragment the database pages because, of course, it doesn't understand SQLite format. The easiest way to make a defragmented copy of a SQLite file would be to use the command-line tool to .dump a copy of a database to a text file, then again to .read that textfile into a database. Under Unix you can do it in one command: sqlite3 old_database.sqlite .dump | sqlite3 new_database.sqlite The resulting SQLite database file will not only be defragmented but will have some other optimal characteristics. I would be interested to know if you really do see performance improvements by doing this then replacing old_database.sqlite with new_database.sqlite . Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Coping with database growth/fragmentation
Hello Taras, List, I have been fighting the same problems described here for a long time, and have no real elegant solution. So, the proposed solution of the OP below would be ideal for me too. The proposed pragma could also define a number of pages to be allocated at once instead of a number of bytes. In my case, the database grows continously and the file is often extremely fragmented when the growth phase is finished (this concerns the file on the disk, not internal fragmentation) Currently, i monitor the size of the database using pragma freelist_count. When I see the value of free pages approach zero, i create a dummy table with a blob field and fill it with a very large empty blob. Then i drop the table. The empty pages remain behind and page_count does not rise any more for a time. This has been proposed to me on this list a while ago. However, testing the database in this way and creating and dropping the table carries a performance penalty, and finding the strategic places in my application to do this has been difficult. Martin Am 23.07.2010 03:11, schrieb Taras Glek: > Seems like the > easiest fix here is to add a pragma fs_allocation_size. It would > preallocate a continuous chunk of diskspace. Sqlite would behave exactly > as it does now, except it would avoid truncating the file beyond a > multiple of the fs_allocation_size. > For example, pragma fs_allocation_size=50M would grow the db file to > 50megabytes. Once the db grows to beyond 50mb the underlying file would > get resized to 100mb. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users