Re: [sqlite] What is so slow on dropping tables?
On Jun 1, 2011 1:46 PM, "Jan Hudec"wrote: > On Tue, May 31, 2011 at 16:44:13 -0500, Nico Williams wrote: > > On Tue, May 31, 2011 at 4:22 PM, Simon Slavin wrote: > > > Split the DROP into two stages: > > > > > > DELETE FROM myTable; > > > DROP TABLE myTable; > > > > > > Which one takes all the time ? If it's the second one, then perhaps just delete all the records. Filling the table back up again with new rows obviously already takes considerable time so adding extra time may not be as inconvenient. > > The first one alone takes more time than the drop alone did before (already > tried some time ago). Delete might require visiting all the leaf nodes as well (even when doing a delete with no WHERE clause). > > That's gonna have the same problem: in order to find out what pages to > > add to the free list the engine will have to traverse the interior > > nodes of the table's b-tree. > > > > You could rename the table and then delete from ... order by rowid asc > > limit 1000, to delete 1000 rows at a time. Add in incremental > > autovacuum and that might do the trick. > > The autovacuum would just add more work reshuffling the data around. And > it's not that the pages are going to be unused long. But you've not tried it, right? You don't need the autovacuum unless you're concerned about storage consumption anyways. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On Tue, May 31, 2011 at 16:44:13 -0500, Nico Williams wrote: > On Tue, May 31, 2011 at 4:22 PM, Simon Slavinwrote: > > Split the DROP into two stages: > > > > DELETE FROM myTable; > > DROP TABLE myTable; > > > > Which one takes all the time ? If it's the second one, then perhaps just > > delete all the records. Filling the table back up again with new rows > > obviously already takes considerable time so adding extra time may not be > > as inconvenient. The first one alone takes more time than the drop alone did before (already tried some time ago). > That's gonna have the same problem: in order to find out what pages to > add to the free list the engine will have to traverse the interior > nodes of the table's b-tree. > > You could rename the table and then delete from ... order by rowid asc > limit 1000, to delete 1000 rows at a time. Add in incremental > autovacuum and that might do the trick. The autovacuum would just add more work reshuffling the data around. And it's not that the pages are going to be unused long. > You could also re-create the DB in a new file then use the filesystem > to delete the old file -- the filesystem will have to do the same work > to free filesystem blocks -more or less-, but it will do it in the > asynchronously, in the background. I am deleting one table, but there are many, many others in the file that I need. Though I already thought about having each table in it's own file and attach the ones I need (the number of attached databases is limited, but I don't think single statement ever uses more than 6 or 7 tables, so it could be doable. Except it would be a lot of work to do it. > Personally I recommend Roger's VIEW approach to schema changes to > minimize data re-write overhead. Well, besides there being many tables that are written from code (though the data still come from the database, some complex transformations are involved) the main reason is, that when table X changes, I need to rebuild that table and the tables that depend on it, but not the tables it depends on. If it depended on views instead, I would have to reread them. And each join involved does make the query a bit slower (especially when joining huge tables like each query would if I used views) it would quickly become unbearable. -- Jan 'Bulb' Hudec ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On 06/01/2011 02:18 AM, Jan Hudec wrote: > Hellow folks, > > At $work we have an application that processes *huge* (tens of millions of > rows in some of the larger tables, sometimes over 30GiB file size). This > application changes and when it does, it drops some tables and calculates > them again. What is somewhat surprising is that dropping the tables itself > takes quite long (order of minutes) time. > > - What is the reason it might take that long? I didn't expect removing the > table entry in sqlite_master and adding it's pages to the free list to > take that long. > - Is there any way to speed it up? The application works in big tasks, each > of which opens a transaction and creates one or few tables, dropping any > old versions of those tables first. So could perhaps moving the drops out > of the transaction help? It would be correct, once the table is found > obsolete, it would be found obsolete after rollback and retry again, but > it would take quite a bit of refactoring, so I'd only do it if it's likely > to help significantly. If you have foreign-keys enabled (and one or more FK's that involve the table being dropped), that can slow things down. If this is the case, try using the pragma to disable FKs before running the DROP TABLE. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
>Quoth Roger Binns, on 2011-06-01 00:21:44 -0700: > > On 05/31/2011 12:18 PM, Jan Hudec wrote: > > > - Is there any way to speed it up? > > > > Another way (somewhat hinted at by Nico) is that you can create > these tables > > in separate databases and use attach to bring them in. To drop a > table you > > can just detach and then delete the database file (at a later point if > > necessary). If the new database is going to be the approximately > the same > > size as the old one then you can overwrite the database header to > cause the > > database to be empty but already the allocated size so the > operating system > > won't have to go through a free then allocate cycle for the file > blocks. > >This is a reasonable suggestion, but watch out for the limit on the >number of attached databases. You cannot reliably have more than 30 >of them on a custom compiled SQLite3 (for 32-bit integers; the doc is >silent re int64 type) and the default is a compile-time limit of 10. >Whether this is a problem depends on your data and application >architecture. Also no foreign key to/from attached DBs, which may be a stronger limitation. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
Quoth Roger Binns, on 2011-06-01 00:21:44 -0700: > On 05/31/2011 12:18 PM, Jan Hudec wrote: > > - Is there any way to speed it up? > > Another way (somewhat hinted at by Nico) is that you can create these tables > in separate databases and use attach to bring them in. To drop a table you > can just detach and then delete the database file (at a later point if > necessary). If the new database is going to be the approximately the same > size as the old one then you can overwrite the database header to cause the > database to be empty but already the allocated size so the operating system > won't have to go through a free then allocate cycle for the file blocks. This is a reasonable suggestion, but watch out for the limit on the number of attached databases. You cannot reliably have more than 30 of them on a custom compiled SQLite3 (for 32-bit integers; the doc is silent re int64 type) and the default is a compile-time limit of 10. Whether this is a problem depends on your data and application architecture. ---> Drake Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/31/2011 12:18 PM, Jan Hudec wrote: > - Is there any way to speed it up? Another way (somewhat hinted at by Nico) is that you can create these tables in separate databases and use attach to bring them in. To drop a table you can just detach and then delete the database file (at a later point if necessary). If the new database is going to be the approximately the same size as the old one then you can overwrite the database header to cause the database to be empty but already the allocated size so the operating system won't have to go through a free then allocate cycle for the file blocks. This will also allow you to do hot swapping if that is important to you. You can build up the new tables/databases separately and then when ready to use them do a detach/attach which should be very quick. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk3l6IQACgkQmOOfHg372QRfTwCgqT0qnW9osxVGff9M1w5rDyKh ANIAoKr+cCCpkJRXnfIGmdglpGdWEEtZ =DDJh -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On 31 May 2011, at 10:44pm, Nico Williams wrote: > You could rename the table and then delete from ... order by rowid asc > limit 1000, to delete 1000 rows at a time. Add in incremental > autovacuum and that might do the trick. Would DELETE FROM myTable ORDER BY rowid DESC LIMIT 1 be any faster ? Remove the last records rather than the first ? Also, would dropping indexes before deleting the data be any faster ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On Tue, May 31, 2011 at 4:22 PM, Simon Slavinwrote: > Split the DROP into two stages: > > DELETE FROM myTable; > DROP TABLE myTable; > > Which one takes all the time ? If it's the second one, then perhaps just > delete all the records. Filling the table back up again with new rows > obviously already takes considerable time so adding extra time may not be as > inconvenient. That's gonna have the same problem: in order to find out what pages to add to the free list the engine will have to traverse the interior nodes of the table's b-tree. You could rename the table and then delete from ... order by rowid asc limit 1000, to delete 1000 rows at a time. Add in incremental autovacuum and that might do the trick. You could also re-create the DB in a new file then use the filesystem to delete the old file -- the filesystem will have to do the same work to free filesystem blocks -more or less-, but it will do it in the asynchronously, in the background. Personally I recommend Roger's VIEW approach to schema changes to minimize data re-write overhead. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On 31 May 2011, at 8:18pm, Jan Hudec wrote: > At $work we have an application that processes *huge* (tens of millions of > rows in some of the larger tables, sometimes over 30GiB file size). This > application changes and when it does, it drops some tables and calculates > them again. What is somewhat surprising is that dropping the tables itself > takes quite long (order of minutes) time. > > - What is the reason it might take that long? I didn't expect removing the > table entry in sqlite_master and adding it's pages to the free list to > take that long. Split the DROP into two stages: DELETE FROM myTable; DROP TABLE myTable; Which one takes all the time ? If it's the second one, then perhaps just delete all the records. Filling the table back up again with new rows obviously already takes considerable time so adding extra time may not be as inconvenient. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On Tue, May 31, 2011 at 12:30:59 -0700, Roger Binns wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 05/31/2011 12:18 PM, Jan Hudec wrote: > > and calculates them again. > > Have you considered using virtual tables so that the calculations are done > on access as needed rather than up front? That would not help. The whole point is that the result of each step is cached, so it does not have to be calculated again next time unless it's definition has changed (which it however does, because a lot of new features are being added). Some steps involve quite a lot of work on the C++ side and while others are implemented completely in SQL, they involve very complex joins that would trash the caches (already set to 0.5GiB and I still suspect it's not enough) if they were to be expanded into the query using it. Regards, Jan -- Jan 'Bulb' Hudec___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
On Tue, May 31, 2011 at 14:29:11 -0500, Nico Williams wrote: > Just a guess: finding all the pages to free requires traversing the > internal nodes of the table's b-tree, which requires reading a fair > subset of the table's b-tree, which might be a lot of I/O. At 150MB/s > it would take almost two minutes to read 15GB of b-tree pages from a > single disk, and that's assuming the I/Os are sequential (which they > will almost certainly not be). So you can see why the drops might be > slow. Might well be. Individual tables are not as big as 15GiB, but 0.5GiB/table can be (the whole file regularly grows to maybe 35GiB, but there are many tables and a couple of indices on them and I didn't try to dig out how much each takes). > One workaround would be to rename the tables to drop and dropping them > later, when you can spare the time. There is no such time. Besides that would mean the pages would not be available for the next table, making the file even larger and even more fragmented. > Longer term it'd be nice if SQLite3 could free a dropped table's pages > incrementally rather than all at once, assuming my guess above is > correct anyways. Regards, Jan -- Jan 'Bulb' Hudec___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/31/2011 12:18 PM, Jan Hudec wrote: > and calculates them again. Have you considered using virtual tables so that the calculations are done on access as needed rather than up front? Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk3lQfMACgkQmOOfHg372QQeogCfRoRj45G8xTQ1p2fBE5s7H+0Y c6EAn2wWbH65xM5bXRw2nrHTHBCIRRrX =WJNx -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is so slow on dropping tables?
Just a guess: finding all the pages to free requires traversing the internal nodes of the table's b-tree, which requires reading a fair subset of the table's b-tree, which might be a lot of I/O. At 150MB/s it would take almost two minutes to read 15GB of b-tree pages from a single disk, and that's assuming the I/Os are sequential (which they will almost certainly not be). So you can see why the drops might be slow. One workaround would be to rename the tables to drop and dropping them later, when you can spare the time. Longer term it'd be nice if SQLite3 could free a dropped table's pages incrementally rather than all at once, assuming my guess above is correct anyways. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] What is so slow on dropping tables?
Hellow folks, At $work we have an application that processes *huge* (tens of millions of rows in some of the larger tables, sometimes over 30GiB file size). This application changes and when it does, it drops some tables and calculates them again. What is somewhat surprising is that dropping the tables itself takes quite long (order of minutes) time. - What is the reason it might take that long? I didn't expect removing the table entry in sqlite_master and adding it's pages to the free list to take that long. - Is there any way to speed it up? The application works in big tasks, each of which opens a transaction and creates one or few tables, dropping any old versions of those tables first. So could perhaps moving the drops out of the transaction help? It would be correct, once the table is found obsolete, it would be found obsolete after rollback and retry again, but it would take quite a bit of refactoring, so I'd only do it if it's likely to help significantly. Thanks, Jan -- Jan 'Bulb' Hudec___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users