[sqlite] WAL file growth concern
On Sat Oct 23 20:57:56 GMT 2010, H. Phil Duby wrote: > Give the described conditions, I do not think you need to worry about the > WAL file growing without bound. I think that each increase should be > smaller than the previous one, and will stop growing all together [for your > stress test] when the WAL file has increased to the point that all of the > [WAL] writes [during your forced lock conditions time frame] fit in the > expanded file After thinking about that and also after some testing, I concur. I ran my read/write stress test again inserting 2 million rows over 9 hours while doing my reads. The WAL file occasionally kept growing even in the 8th hour, but the growth really slowed. It grew up through 12MB during the first 2 hours or so and then only grew another 700K during the next 2.5 hours and only grew another 400K over the next 4.5 hours to a total of 13.1MB through the 9 hour test. Every once in a while a read is going to take an extra long time from the norm or the writes are going to go faster than the norm (this a somewhat controlled test, but on a a general purpose machine - other processes do exist who get time slices...) and the file grows a little bit every once in a while after it initially settles down. I do believe on a longer run it would pretty much stop growing all together. Very good. The reason I looked into this in the first place and put together my more controlled test is my colleague was doing some testing with my full system and reported that his WAL file had grown to 306MB and his system had become sluggish. After my initial results, I thought the culprit causing the huge WAL file was the simultaneous reads and writes However, it turns out this was not the culprit. We also periodically delete the oldest X rows from the database in a purge. He had deleted 750K rows in one shot. That is definitely what grew the WAL file so large. I have not yet determined why his system got sluggish afterwards. It may not have anything to do with sqlite at all. So, I won't dwell on it here. I will ask one question, however. I know that reads are expected to slow down as the WAL file gets larger. Is that just in terms of real data/pages in the WAL file or does the actual size of the file have an impact? For example, I have a 300MB WAL file due to a previous operation. However, everything has been checkpointed, and we are back to really only having ~1MB of actual data being written into the WAL file and then checkpointed. The rest (vast majority) of the file is not being used. Would we expect performance degradation here as compared to a WAL file with a real physical size of ~1MB at this point? Considering things like bulk deletes (and updates) potentially really growing a WAL file to be quite large and having a system that is constantly running and inserting data into the database over a long period of time with reads also coming in, I wonder about the value of adding an optional feature to sqlite. What if there was an API to to specify a max desired limit to the size of the WAL file? Whenever a checkpoint was 100% successful and it was determined that the entire WAL has been transferred into the database and synced and no readers are making use of the WAL, then in addition to the writer rewinding the WAL back the beginning, the WAL file was truncated IF this option was configured and the size of the WAL file was greater than the specified value? This seems like it would be simple to implement without costing anything by default to those who don't configure it... If I was to use such a feature and was to do the default 1000 page checkpoint (which seems to correspond to a little over a 1MB WAL file size), I would make the limit something like 50MB. Under normal conditions, the limit would never be reached anyways. But, in the case where a large WAL file did get created at some point, this would be used to get it truncated. Thoughts? Best Regards, Bob ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WAL file growth concern
On Fri, Oct 22, 2010 at 09:56:22PM -0400, Richard Hipp wrote: > On many (most?) filesystems, it is faster to overwrite an existing area of a > file than it is to extend the file by writing past the end. That's why > SQLite doesn't truncate the WAL file on each checkpoint - so that subsequent > writes will be overwriting an existing file region and thus go faster. I think that assumption is getting harder to make. For one thing, filesystems nowadays aggregate transactions into large writes, which means that old blocks aren't overwritten, but replaced -- COW goes hand-in-hand with such aggregation. For ZFS your assumption is wrong because of ZFS' variable block size support[*]. I don't know if there are filesystems other than ZFS where file data block size varies. But for filesystems that aggregate writes I think you'd find that overwriting performs about as well as appending (assuming there's no O_APPEND synchronization going on). Does the WAL store modified pages? Nico [*] In ZFS files have a single data block until the file size exceeds the host filesystem's "recordsize", from which point the file will consist of two or more data blocks, all of that size. Block sizes are all the powers of two between nine and seventeen (512 bytes to 128KB). Thus overwriting a 1KB SQLite3 page in the middle of the file with 128KB recordsize will result in a read-modify-write of the modified block. Though likely the application will have already caused that block to be in memory, in which case there's no RMW, but the disparity between application "page size" and ZFS recordsize obviously has a significant cost. SQLite3 users should set the SQLite3 page size and host ZFS dataset recordsize so they match. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WAL file growth concern
On Sat, Oct 23, 2010 at 9:26 AM, Bob Smithwrote: > On Sat, Oct 23, 2010 at 5:56 AM, Richard Hipp wrote: > > > > Is that what you are seeing? > > > > Which filesystem are you using? > > > > > > Thanks Richard, this does seem to be what I am seeing. I am definitely > seeing more writes during the subsequent cycles and an explanation > that this is likely due to the writes being able to go faster to the > existing parts of the WAL definitely makes sense. I think that is it. > > I have run and seen this scenario on two machines. One has a linux > ext4 filesystem and the other has a linux ext3 filesystem. > > The WAL mode is a great addition to sqlite. Being able to do > simultaneous reads while writing is fantastic. I have also seen nice > improvements with writing speed. WAL mode has lots of value and the > sqlite developers have done a fantastic job. > > In my scenario of having to support a constant stream of steady writes > (granted, I would hope not with the frequency of my stress test) along > with an unknown frequency of reads that will come in, I do have > concerns about the WAL file consistently growing over a long period of > time (days and weeks). I would love to be able to continue using the > WAL mode, so I need to see what I can do to limit the growth or to > periodically make sure all connections to the database are closed at > the same time to allow the WAL file to periodically truncate. Any > other ideas? > > Best Regards, > > Bob > Give the described conditions, I do not think you need to worry about the WAL file growing without bound. I think that each increase should be smaller than the previous one, and will stop growing all together [for your stress test] when the WAL file has increased to the point that all of the [WAL] writes [during your forced lock conditions time frame] fit in the expanded file. How big, and how long to get there, will depend on the relative speeds of [over] writing the existing file, versus expanding and writing. Of course there are always edge cases. If a read / lock lasts long enough for *another* read to get a lock, then a progressive serious can keep a lock indefinitely, and the WAL file will be forced to continue to expand. The only way to *prevent* that expansion, is to stop writing. Other solutions get outside of SQLITE, like having application logic that starts rejecting new read requests at some point, until all existing read connection have closed, and SQLITE has finished the [long delayed] checkpoint. -- Phil ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] WAL file growth concern
On Sat, Oct 23, 2010 at 5:56 AM, Richard Hipp wrote: > > Is that what you are seeing? > > Which filesystem are you using? > > Thanks Richard, this does seem to be what I am seeing. I am definitely seeingĀ more writes during the subsequent cycles and an explanation that this is likely due to the writes being able to go faster to the existing parts of the WAL definitely makes sense. I think that is it. I have run and seen this scenario on two machines. One has a linux ext4 filesystem and the other has a linux ext3 filesystem. The WAL mode is a great addition to sqlite. Being able to do simultaneous reads while writing is fantastic. I have also seen nice improvements with writing speed. WAL mode has lots of value and the sqlite developers have done a fantastic job. In my scenario of having to support a constant stream of steady writes (granted, I would hope not with the frequency of my stress test) along with an unknown frequency of reads that will come in, I do have concerns about the WAL file consistently growing over a long period of time (days and weeks). I would love to be able to continue using the WAL mode, so I need to see what I can do to limit the growth orĀ to periodically make sure all connections to the database are closed at the same time to allow the WAL file to periodically truncate. Any other ideas? Best Regards, Bob ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WAL file growth concern
On Sat, Oct 23, 2010 at 5:56 AM, Richard Hippwrote: > > Is that what you are seeing? > > Which filesystem are you using? > > > > I think the misfeature even easier to reproduce than with the scenario Bob wrote. If your admin has a "repeated query" feature, you even don't need writing a single line of code. sqlite 3.7.3 dll, windows 7 64bit CREATE TABLE [TestTable] ([Id] INTEGER PRIMARY KEY AUTOINCREMENT) 1. One client repeated query, non-transacted loop, so takes several minutes for 10,000 repeats INSERT INTO TestTable DEFAULT VALUES as sqlite itself does checkpointing from time to time and threshold is about 1000 pages, we have -wal growing to about 1,024K and stay at this size. 2. Another client, minimize cache if you want to repeat the test afterwards: PRAGMA cache_size=10; do cross join after some inserts in the first client SELECT Count(Null) FROM TestTable, TestTable if this query is long enough (5 seconds and more) we will see -wal file growing steadily until client 2 finishes. In my tests I con confirm growing to 8M, 23M Max Vlasov maxerist.net ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WAL file growth concern
On Fri, Oct 22, 2010 at 2:54 PM, Bob Smithwrote: > What I have determined (via using sqlite3_wal_hook, catching some > statistics > in the callback as well as calling sqlite3_wal_checkpoint from the callback > myself) is that each time I am writing during a read operation (so a > checkpoint can't complete), I am able to write more records than the last > time on average during this period. So, more records written during the > time > a checkpoint can't happen is going to relate to the WAL file needing to > grow. > > I am not sure why more writes are progressively able to occur each time > these reads are happening, but this is the cause of my continued WAL file > growth in this scenario. > On many (most?) filesystems, it is faster to overwrite an existing area of a file than it is to extend the file by writing past the end. That's why SQLite doesn't truncate the WAL file on each checkpoint - so that subsequent writes will be overwriting an existing file region and thus go faster. So apparently, as long as you are overwriting existing parts of the WAL, the writes go faster and you quickly get to the end of the WAL where writes start to slow down. But the WAL got a little bigger so more writes go fast on the next cycle, and the WAL grows a little more. And so forth. Is that what you are seeing? Which filesystem are you using? -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WAL file growth concern
From: Bob SmithDate: 2010-10-21 22:55:05 GMT > I might possibly have found an issue/concern with the way sqlite handles > doing new writes to the WAL file during a time that checkpoints are unable > to checkpoint data from the WAL file back into the database due to a reader > being in the process of reading data from the WAL file. The result is the > WAL file growing unbounded. Having same issue with the SQLite (all vesions since July) under Windows (XP/Vista/7). My server process opens SQLite database on startup, one thread sometimes UPDATEs some records, other threads sometimes SELECTs (using one shared handle), there no any INSERTs. WAL growing up to 400Mb! (expecting <4Mb = 4Kb page *1000). sqlite3_wal_checkpoint() or "pragma wal_checkpoint" helps only if I close DB connection explicit after this operation. -- Andrey Cherezov ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] WAL file growth concern
I have been able to do some more investigation here. What I was seeing originally made me suspicious that a writer was skipping over parts of the WAL file during the time that a checkpoint is unable to proceed due to a reader. After further investigation, however, I have determined that this is NOT the case in my scenario. What I am seeing may be of interest, however, so I will explain what I have determined is going on in my scenario. This is a stress test scenario. I am inserting rows as quick as my C code allows. I am writing an average of around 80 rows a second. This writing is going on constantly. A checkpoint occurs / wants to occur every few seconds on average with this amount of activity and the size of the rows. Then every 30 seconds from another process, I do a read query of the last 10,000 rows and I have it such that this query takes roughly 15 seconds (I put some delay in the processing to mimic a longer operation such as from ruby code from a web server). This read process opens a db connection, does the query, closes the connection, and exits the process. The WAL file grows just about every time the read process kicks in. It does not grow at all in between invocations of the read process. What I have determined (via using sqlite3_wal_hook, catching some statistics in the callback as well as calling sqlite3_wal_checkpoint from the callback myself) is that each time I am writing during a read operation (so a checkpoint can't complete), I am able to write more records than the last time on average during this period. So, more records written during the time a checkpoint can't happen is going to relate to the WAL file needing to grow. I am periodically doing the same type of read query each time I do one and it is taking about the same amount of time as measured in my calling process. I really don't see an increase in the time it takes to do the reads (for example in my last run reads took 18sec, 16sec, 16sec, 16sec, 16sec, 15sec, 16sec, 17sec, 16sec, 16sec, 14sec). However, I am measuring more writes being able to complete from my writer during these times progressively. The current read query is reading the last 10,000 rows based on the internal rowid primary key. Since I am getting more writes every time during this period, that is why the WAL file grows every time during this period. I am not sure why more writes are progressively able to occur each time these reads are happening, but this is the cause of my continued WAL file growth in this scenario. In a production system, I would not expect to be constantly pounding so many events into the system non stop all the time. Still, I would like to be able to handle the stress case so that I know things are solid. I can also adjust the threshold to checkpoint in larger chunks than the default. This actually won't solve this (and will make the standard size WAL file larger), but I do think it can reduce the impacts of the phenomenon in my scenario. Is this type of scenario just not right for WAL mode? Best Regards, Bob ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] WAL file growth concern
> Are you *sure* you aren't accidentally holding a read transaction open > somewhere? > Do you have any other clues on how we can isolate the problem? A test case > that will we can run here, perhaps? A am pretty certain that I am not accidentally holding a read transaction open. I have my writing process that stays running. I start up independent read processes which open a connection, do a short sequence of reads, close the connection, and then exit the process altogether. For this test, I wait a bit in between kicking off read processes to ensure the writer will kick off a checkpoint (with my writing frequency, the checkpoints kick off pretty quickly). I will triple check this and then look into sending you a stand alone test case that you can run. I will target this for tomorrow. Where should I send it? I am running on Ubuntu 9.10, Linux 2.6.31-22-generic, i686 In order to make use of order by and limit support for delete which I use for efficient purging of my oldest X records (which is not even coming into play in my simple read/write test discussed here), I had to build sqlite 3.7.3 from the full source using -DSQLITE_ENABLE_UPDATE_DELETE_LIMIT=1 Best Regards, Bob ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WAL file growth concern
On Thu, Oct 21, 2010 at 6:55 PM, Bob Smithwrote: > I have been using WAL mode for a few months and have been quite happy with > the write performance increases. > > I might possibly have found an issue/concern with the way sqlite handles > doing new writes to the WAL file during a time that checkpoints are unable > to checkpoint data from the WAL file back into the database due to a reader > being in the process of reading data from the WAL file. The result is the > WAL file growing unbounded. > I followed the procedure you outlined with the exception that I did 20K inserts rather than 1K inserts (so that fewer transactions were required). I ran this for 100,000 transactions. the resulting database file reached 1.536GB but the WAL never went above 1.542MB. This is on SuSE 10.1, Linux 2.6.16.13-4.smp on i686 athlon. It is also hard to imagine how the WAL could skip over a block of frames in the middle and still continue to work. If the writer really were skipping over parts of the WAL file, the next checkpoint would go haywire when it reached the part of the WAL that is skipped, and the later transactions would never make it into the database. Are you *sure* you aren't accidentally holding a read transaction open somewhere? Do you have any other clues on how we can isolate the problem? A test case that will we can run here, perhaps? -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] WAL file growth concern
I have been using WAL mode for a few months and have been quite happy with the write performance increases. I might possibly have found an issue/concern with the way sqlite handles doing new writes to the WAL file during a time that checkpoints are unable to checkpoint data from the WAL file back into the database due to a reader being in the process of reading data from the WAL file. The result is the WAL file growing unbounded. I have an application running on Linux which is an event logger. I am storing the events in a table in a sqlite database. I have been using version 2.7.2 until this week when I moved to 2.7.3 This application stays up and running for long periods of time. The database is opened at startup and not closed until the application is shutdown. This application is the only writer to the database. The transaction sizes when writing to the database are quite small. A single transaction writes only a single row to the database. The size of a given row is pretty small (less than 1 KB). Other processes/applications read from the database while writes are ongoing. In the testing I discuss below, I do reads by starting a process that opens a database connection, reads a set amount of data, gracefully closes the database connection making sure all statements are finalized, and then exits. I am currently using the default checkpointing and it works well when I am only writing. My WAL file never grows above 1MB when I am only writing. I startup the app and start writing to the DB. The WAL file grows up to around 1MB. During one of the subsequent writes, a checkpoint is done. All data is checkpointed / written to the main DB. The next write will start writing to the beginning of the WAL file again. (well, I believe this is what is happening based on my understanding of http://www.sqlite.org/wal.html and based on what I am seeing while testing) This process occurs over and over again during the course of many writers. The WAL file remains at right around 1MB. Then a read operation comes in. If it is still going on at the time a checkpoint needs to occur, the checkpoint won't be able to do it's job. We expect the WAL file to now grow larger here based on writes that are done while the read is still in progress. Let's say it grows to 1.5MB while the read transaction is ongoing and the writing process is concurrently writing. The read operation now completes. A subsequent write will kick off a checkpoint and all data will make it to the main db. Subsequent writes now start writing to the WAL file at the beginning of the file. Writes can go on and on, kicking off checkpoints over time. The WAL file will stay at 1.5 MB over many more writes and checkpoints over time. All of the above matches the understanding I got from reading http://www.sqlite.org/wal.html and matches what I have seen in testing. So far so good. Now consider another read operation that comes in while we are still doing writes. The read is still going on when a checkpoint is kicked off (we are up to 1MB of live data in the WAL file in my case). The checkpoint once again can't do its thing. OK, so the WAL file is going to get written to beyond the 1MB "mark". At this point what I would have expected is that the WAL file gets first written to between the 1MB and 1.5MB "mark" while new writes are being written while the read transaction is in progress. If less than .5 MB is written, then the WAL file would not grow. What I have seen in my testing, however, leads me to believe that at this point, new writes actually start at after the 1.5 MB "mark". The WAL file now grows again. Let's say the write frequency and read transaction duration are exactly as they were the last time. Instead of the WAL file remaining at 1.5MB, the WAL file now grows to 2MB. This seems to be a concern/issue with how these writes to the WAL file are being done under this condition. If I repeat this over and over again over time while we are constantly writing and periodically doing reads (I spread the reads out far enough so that I know a checkpoint will successfully complete in between two sets of reads), the WAL file will continue to grow indefinitely. During this test, my write frequency remains constant (or even slows over time). I do periodic reads, spread well out, all reading the same amount of data and taking about the same amount of time. The reads all complete in a timely fashion and clean up / close their connection afterwards. As such, I would not expect my WAL file to keep growing indefinitely, but it does. My only solution has been to make sure that my writing application periodically closes and re-opens the database while making certain all readers close the database after their read queries. Then, If I happen to close the database from my writer at a point in time when no readers have the database open, then the WAL file gets truncated and is back to size 0 when my writer reopens the DB. This solution is not perfect