Re: [sqlite] Terrible performance for one of our tables
Jens Miltner wrote: > > Any ideas why there would be such a _huge_ performance hit after > deleting and re-inserting records for a while? Without deletes/inserts the reads are sequential, and the OS and/or the drive/controller are reading ahead for you, hiding much of the disk read and seek times. After fragmentation reads are (to some extent) random, and fancy read-aheads do not help. Instead of deleting, could you just somehow mark obsoleted rows as "deleted" (possibly with a new column, but probably some existing column could just have a unique value for that)? The file would grow a lot larger, and you would be doing a lot of unneeded reads, but the fact that they are sequential might more than make up for that. Just a thought. Gerry ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
Hello! В сообщении от Wednesday 19 November 2008 15:05:01 D. Richard Hipp написал(а): > That's because "SELECT count(*) FROM t" has to read and count every > row in the table - all 1.2 million rows. And this involves reading > all 1.2 million rows from disk. If each row is 100 bytes in size, > that involves reading 120MB. Is it possible using only PRIMARY KEY index for select count(*) queries? For other indexes it's possible as explain query plan select count(*) from street where code >= '770' and code < '770'+1; 0|0|TABLE street WITH INDEX street_code_idx Best regards, Alexey. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
Jens Miltner wrote: > Am 19.11.2008 um 13:05 schrieb D. Richard Hipp: > >> On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote: >> >>> Hi, >>> >>> we're seeing terrible performance problems when fetching data from >>> one >>> of our tables: >>> The table contains roughly 1.2 Million rows and a plain "SELECT >>> COUNT(*) FROM t" query takes 8 minutes to finish. >> That's because "SELECT count(*) FROM t" has to read and count every >> row in the table - all 1.2 million rows. And this involves reading >> all 1.2 million rows from disk. If each row is 100 bytes in size, >> that involves reading 120MB. >> >> An SQLite database does *not* maintain the number of rows in a table >> as separate metadata. Many client/server database engines do maintain >> such metadata. When I was designing the SQLite file format, I >> deliberately choose to omit such metadata because including it would >> slow down inserts and deletes. > > Yes, I know about this and it's usually not a problem. > It only turned into a problem for this specific table. > > As I mentioned in my original e-mail, after vacuuming the database, > the time to run the COUNT(*) query went down to about 5 seconds, so it > looks like somehow the database became fragmented enough to seriously > hit the performance of the serial row access... > > BTW: we don't actually run the COUNT(*) query, but we see major > performance hits after a while with this table, so I figured I'd run > the most simple query first and found that the time needed for > COUNT(*) goes through the ceiling (from something like 5 seconds to > almost 9 minutes - roughly a 100x time increase) after having deleted > and re-inserted rows in that table for a while. > > Any ideas why there would be such a _huge_ performance hit after > deleting and re-inserting records for a while? > Anything we can do to avoid this kind of performance-penalty- > fragmentation (other than regularly vacuuming)? > > Thanks, > > Fragmentation is inherent when there are deletions and insertions, and involves orders of magnitude increases in access times. Just look at the example of before and after when a badly fragmented disk is defragged to see the effect. Many years ago we learned from practical experience to avoid such situations when designing applications. High availability applications cannot tolerate an increase in chaos during operation. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
Am 19.11.2008 um 13:05 schrieb D. Richard Hipp: > > On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote: > >> Hi, >> >> we're seeing terrible performance problems when fetching data from >> one >> of our tables: >> The table contains roughly 1.2 Million rows and a plain "SELECT >> COUNT(*) FROM t" query takes 8 minutes to finish. > > That's because "SELECT count(*) FROM t" has to read and count every > row in the table - all 1.2 million rows. And this involves reading > all 1.2 million rows from disk. If each row is 100 bytes in size, > that involves reading 120MB. > > An SQLite database does *not* maintain the number of rows in a table > as separate metadata. Many client/server database engines do maintain > such metadata. When I was designing the SQLite file format, I > deliberately choose to omit such metadata because including it would > slow down inserts and deletes. Yes, I know about this and it's usually not a problem. It only turned into a problem for this specific table. As I mentioned in my original e-mail, after vacuuming the database, the time to run the COUNT(*) query went down to about 5 seconds, so it looks like somehow the database became fragmented enough to seriously hit the performance of the serial row access... BTW: we don't actually run the COUNT(*) query, but we see major performance hits after a while with this table, so I figured I'd run the most simple query first and found that the time needed for COUNT(*) goes through the ceiling (from something like 5 seconds to almost 9 minutes - roughly a 100x time increase) after having deleted and re-inserted rows in that table for a while. Any ideas why there would be such a _huge_ performance hit after deleting and re-inserting records for a while? Anything we can do to avoid this kind of performance-penalty- fragmentation (other than regularly vacuuming)? Thanks, ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
>> Out of interest why must it completely READ each entire row? Is it >> because '*' has been used? > > The database reads a page at a time. A page is, by default, 1024 > bytes. A single page might contain multiple rows, or a single large > row might be spread across multiple pages. > When rows are only about 100 bytes in size, they will all fit on a > single page, so the entire row ends up being read, though only the > header is decoded and interpreted. So what is the explanation for the 1.2 million row table taking 8 minutes, and the 5 million row table taking 40 seconds when the row in the larger, faster table is larger than the smaller, slower one? The OP claims that the "average" row size is about 100 bytes for the smaller table, and a "few hundred" for the larger table, which suggests that either his estimates of row size are incorrect, or something else is going on. Perhaps the variance from "average" accounts for the difference? I'm interested because I might have this problem shortly, and I'd like to understand the causes in order to attempt to avoid this type of problem. Brad ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
On Nov 19, 2008, at 7:57 AM, Hardy, Andrew wrote: > Out of interest why must it completely READ each entire row? Is it > because '*' has been used? The database reads a page at a time. A page is, by default, 1024 bytes. A single page might contain multiple rows, or a single large row might be spread across multiple pages. To do a count(*), only the header of each row actually needs to be seen. For smaller rows, the header and the data are all on the same page, so the entire row ends up being read. However, if a row contains large CLOBs or BLOBs and is spread across multiple pages, only the first page (the page containing the header) is read. When rows are only about 100 bytes in size, they will all fit on a single page, so the entire row ends up being read, though only the header is decoded and interpreted. > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > ] On Behalf Of D. Richard Hipp > Sent: 19 November 2008 12:05 > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Terrible performance for one of our tables > > > On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote: > >> Hi, >> >> we're seeing terrible performance problems when fetching data from >> one >> of our tables: >> The table contains roughly 1.2 Million rows and a plain "SELECT >> COUNT(*) FROM t" query takes 8 minutes to finish. > > That's because "SELECT count(*) FROM t" has to read and count every > row in the table - all 1.2 million rows. And this involves reading > all 1.2 million rows from disk. If each row is 100 bytes in size, > that involves reading 120MB. > > An SQLite database does *not* maintain the number of rows in a table > as separate metadata. Many client/server database engines do > maintain such metadata. When I was designing the SQLite file > format, I deliberately choose to omit such metadata because > including it would slow down inserts and deletes. > > If you frequently need to know how many rows are in a certain table, > use insert and delete triggers to maintain the count yourself in a > separate table. Then just read out the count from the separate > table when you need it, rather than recomputing it by reading all > 1.2 million rows of the original table. > > > D. Richard Hipp > [EMAIL PROTECTED] > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users D. Richard Hipp [EMAIL PROTECTED] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
Out of interest why must it completely READ each entire row? Is it because '*' has been used? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of D. Richard Hipp Sent: 19 November 2008 12:05 To: General Discussion of SQLite Database Subject: Re: [sqlite] Terrible performance for one of our tables On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote: > Hi, > > we're seeing terrible performance problems when fetching data from one > of our tables: > The table contains roughly 1.2 Million rows and a plain "SELECT > COUNT(*) FROM t" query takes 8 minutes to finish. That's because "SELECT count(*) FROM t" has to read and count every row in the table - all 1.2 million rows. And this involves reading all 1.2 million rows from disk. If each row is 100 bytes in size, that involves reading 120MB. An SQLite database does *not* maintain the number of rows in a table as separate metadata. Many client/server database engines do maintain such metadata. When I was designing the SQLite file format, I deliberately choose to omit such metadata because including it would slow down inserts and deletes. If you frequently need to know how many rows are in a certain table, use insert and delete triggers to maintain the count yourself in a separate table. Then just read out the count from the separate table when you need it, rather than recomputing it by reading all 1.2 million rows of the original table. D. Richard Hipp [EMAIL PROTECTED] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote: > Hi, > > we're seeing terrible performance problems when fetching data from one > of our tables: > The table contains roughly 1.2 Million rows and a plain "SELECT > COUNT(*) FROM t" query takes 8 minutes to finish. That's because "SELECT count(*) FROM t" has to read and count every row in the table - all 1.2 million rows. And this involves reading all 1.2 million rows from disk. If each row is 100 bytes in size, that involves reading 120MB. An SQLite database does *not* maintain the number of rows in a table as separate metadata. Many client/server database engines do maintain such metadata. When I was designing the SQLite file format, I deliberately choose to omit such metadata because including it would slow down inserts and deletes. If you frequently need to know how many rows are in a certain table, use insert and delete triggers to maintain the count yourself in a separate table. Then just read out the count from the separate table when you need it, rather than recomputing it by reading all 1.2 million rows of the original table. D. Richard Hipp [EMAIL PROTECTED] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
What page size do you use in your database? Maybe you should increase it? Assuming a table fully fragmented with a page size of 16KB then the I/O rate should be (if my calculations are correct) in the 2MB/s range (assuming a hard disk having 50MB/s bandwidth, 7msec seek time). For 32KB page size you should get 4MB/s. Because the I/O rate is heavily affected by the seeks (the rate doubles for 32KB because the hard disk does roughly half the seeks), if SQLite does additional seeks (like trying to lock something), then everything halves so try to avoid any locking if you can. lefteris Jens Miltner wrote: > Hi, > > we're seeing terrible performance problems when fetching data from one > of our tables: > The table contains roughly 1.2 Million rows and a plain "SELECT > COUNT(*) FROM t" query takes 8 minutes to finish. > The table contains 10 fields, but the records average to about 100 > Bytes of data total. The largest records are ~1k of data, but there > are only a few hundred records with more than 256 Bytes of data. > > Another table in the same database contains over 5 Million records > (and probably an average of a few hundred Bytes per record). Counting > rows in this table only takes about 40 seconds. > > (All times taken from first run and they're reproducible on several > machines). > > > When looking at a sample taken while sqlite is executing the COUNT(*) > query, almost all the time is spent inside read() calls. > The average I/O rate is about 500kB - 800kB / seconds read > performance. When reading from other databases / other tables, I can > easily get 20-30 MB/s overall read performance. > So, I can only conclude that for some reason, there seems to be some > terrible I/O behavior for the data in this specific table. > > Running fs_usage shows that there are almost no consecutive pages > being read (i.e. the file offsets are not consecutive), so I suspect > that for some reason, this table is fragmented all across the 3 GB > database file, which may explain the slow overall read performance. > > Now, there are two things that happened to this table, which didn't > happen to the other big table: > > (1) we've updated the schema by appending a couple of columns > (2) when data is updated, it's updated by first deleting a bunch of > rows and then re-inserting the new data. This happens once or twice a > day for almost all entries, thus the > > Would either (1) or (2) lead to heavy fragmentation of this kind? > > Vacuuming the database does resolve the performance issue, but > vacuuming that specific database took about 75 minutes (MacPro @ > 3GHz), so it's not something we can do frequently (especially, since > the database is used in a server product that has 24/7 uptime) :( > > Is there anything we could do to avoid the table fragmentation in the > first place (write queries in a different way, avoid deletions/re- > insertions, etc.)? > (Note that I'm not talking about reclaiming free space here - that's > not really an issue for us. We need to avoid the performance hit we've > seen which IMHO is caused by the pages for a single table being > scattered all over the database file, thus requiring the paging code > to jump a lot in the file). > > > Thanks, > -jens > > > BTW: this is SQLite 3.6.3, but at least running the COUNT queries is > not any faster with 3.5.x. Tests run on Mac OS X. > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Terrible performance for one of our tables
Hi, do you have a autoincrement primary key in that table ? if not, try to add one giving sqlite the chance to query an internal index rather than the table itselve. I don't see why sqlite should read all the data from that table. I've read somewhere that count(*) may scan the hole table, if you allready have an primary key (autoincrement) you may try with "SELECT COUNT(Primarykey) FROM t". Or you may try by SELECT ID FROM t and step throu the results and count manually. I'm interested in the results. Hope this helps. Marcus Jens Miltner wrote: > Hi, > > we're seeing terrible performance problems when fetching data from one > of our tables: > The table contains roughly 1.2 Million rows and a plain "SELECT > COUNT(*) FROM t" query takes 8 minutes to finish. > The table contains 10 fields, but the records average to about 100 > Bytes of data total. The largest records are ~1k of data, but there > are only a few hundred records with more than 256 Bytes of data. > > Another table in the same database contains over 5 Million records > (and probably an average of a few hundred Bytes per record). Counting > rows in this table only takes about 40 seconds. > > (All times taken from first run and they're reproducible on several > machines). > > > When looking at a sample taken while sqlite is executing the COUNT(*) > query, almost all the time is spent inside read() calls. > The average I/O rate is about 500kB - 800kB / seconds read > performance. When reading from other databases / other tables, I can > easily get 20-30 MB/s overall read performance. > So, I can only conclude that for some reason, there seems to be some > terrible I/O behavior for the data in this specific table. > > Running fs_usage shows that there are almost no consecutive pages > being read (i.e. the file offsets are not consecutive), so I suspect > that for some reason, this table is fragmented all across the 3 GB > database file, which may explain the slow overall read performance. > > Now, there are two things that happened to this table, which didn't > happen to the other big table: > > (1) we've updated the schema by appending a couple of columns > (2) when data is updated, it's updated by first deleting a bunch of > rows and then re-inserting the new data. This happens once or twice a > day for almost all entries, thus the > > Would either (1) or (2) lead to heavy fragmentation of this kind? > > Vacuuming the database does resolve the performance issue, but > vacuuming that specific database took about 75 minutes (MacPro @ > 3GHz), so it's not something we can do frequently (especially, since > the database is used in a server product that has 24/7 uptime) :( > > Is there anything we could do to avoid the table fragmentation in the > first place (write queries in a different way, avoid deletions/re- > insertions, etc.)? > (Note that I'm not talking about reclaiming free space here - that's > not really an issue for us. We need to avoid the performance hit we've > seen which IMHO is caused by the pages for a single table being > scattered all over the database file, thus requiring the paging code > to jump a lot in the file). > > > Thanks, > -jens > > > BTW: this is SQLite 3.6.3, but at least running the COUNT queries is > not any faster with 3.5.x. Tests run on Mac OS X. > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users