Re: [sqlite] What is so slow on dropping tables?

2011-06-01 Thread Nico Williams
On Jun 1, 2011 1:46 PM, "Jan Hudec"  wrote:
> On Tue, May 31, 2011 at 16:44:13 -0500, Nico Williams wrote:
> > On Tue, May 31, 2011 at 4:22 PM, Simon Slavin 
wrote:
> > > Split the DROP into two stages:
> > >
> > > DELETE FROM myTable;
> > > DROP TABLE myTable;
> > >
> > > Which one takes all the time ?  If it's the second one, then perhaps
just delete all the records.  Filling the table back up again with new rows
obviously already takes considerable time so adding extra time may not be as
inconvenient.
>
> The first one alone takes more time than the drop alone did before
(already
> tried some time ago).

Delete might require visiting all the leaf nodes as well (even when doing a
delete with no WHERE clause).

> > That's gonna have the same problem: in order to find out what pages to
> > add to the free list the engine will have to traverse the interior
> > nodes of the table's b-tree.
> >
> > You could rename the table and then delete from ... order by rowid asc
> > limit 1000, to delete 1000 rows at a time.  Add in incremental
> > autovacuum and that might do the trick.
>
> The autovacuum would just add more work reshuffling the data around. And
> it's not that the pages are going to be unused long.

But you've not tried it, right?  You don't need the autovacuum unless you're
concerned about storage consumption anyways.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-06-01 Thread Jan Hudec
On Tue, May 31, 2011 at 16:44:13 -0500, Nico Williams wrote:
> On Tue, May 31, 2011 at 4:22 PM, Simon Slavin  wrote:
> > Split the DROP into two stages:
> >
> > DELETE FROM myTable;
> > DROP TABLE myTable;
> >
> > Which one takes all the time ?  If it's the second one, then perhaps just 
> > delete all the records.  Filling the table back up again with new rows 
> > obviously already takes considerable time so adding extra time may not be 
> > as inconvenient.

The first one alone takes more time than the drop alone did before (already
tried some time ago).

> That's gonna have the same problem: in order to find out what pages to
> add to the free list the engine will have to traverse the interior
> nodes of the table's b-tree.
> 
> You could rename the table and then delete from ... order by rowid asc
> limit 1000, to delete 1000 rows at a time.  Add in incremental
> autovacuum and that might do the trick.

The autovacuum would just add more work reshuffling the data around. And
it's not that the pages are going to be unused long.

> You could also re-create the DB in a new file then use the filesystem
> to delete the old file -- the filesystem will have to do the same work
> to free filesystem blocks -more or less-, but it will do it in the
> asynchronously, in the background.

I am deleting one table, but there are many, many others in the file that
I need. Though I already thought about having each table in it's own file and
attach the ones I need (the number of attached databases is limited, but
I don't think single statement ever uses more than 6 or 7 tables, so it could
be doable. Except it would be a lot of work to do it.

> Personally I recommend Roger's VIEW approach to schema changes to
> minimize data re-write overhead.

Well, besides there being many tables that are written from code (though the
data still come from the database, some complex transformations are involved)
the main reason is, that when table X changes, I need to rebuild that table
and the tables that depend on it, but not the tables it depends on. If it
depended on views instead, I would have to reread them. And each join
involved does make the query a bit slower (especially when joining huge
tables like each query would if I used views) it would quickly become
unbearable.

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-06-01 Thread Dan Kennedy
On 06/01/2011 02:18 AM, Jan Hudec wrote:
> Hellow folks,
>
> At $work we have an application that processes *huge* (tens of millions of
> rows in some of the larger tables, sometimes over 30GiB file size). This
> application changes and when it does, it drops some tables and calculates
> them again. What is somewhat surprising is that dropping the tables itself
> takes quite long (order of minutes) time.
>
>   - What is the reason it might take that long? I didn't expect removing the
> table entry in sqlite_master and adding it's pages to the free list to
> take that long.
>   - Is there any way to speed it up? The application works in big tasks, each
> of which opens a transaction and creates one or few tables, dropping any
> old versions of those tables first. So could perhaps moving the drops out
> of the transaction help? It would be correct, once the table is found
> obsolete, it would be found obsolete after rollback and retry again, but
> it would take quite a bit of refactoring, so I'd only do it if it's likely
> to help significantly.

If you have foreign-keys enabled (and one or more FK's that involve
the table being dropped), that can slow things down. If this is
the case, try using the pragma to disable FKs before running the
DROP TABLE.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-06-01 Thread Jean-Christophe Deschamps

>Quoth Roger Binns , on 2011-06-01 00:21:44 -0700:
> > On 05/31/2011 12:18 PM, Jan Hudec wrote:
> > >  - Is there any way to speed it up?
> >
> > Another way (somewhat hinted at by Nico) is that you can create 
> these tables
> > in separate databases and use attach to bring them in.  To drop a 
> table you
> > can just detach and then delete the database file (at a later point if
> > necessary).  If the new database is going to be the approximately 
> the same
> > size as the old one then you can overwrite the database header to 
> cause the
> > database to be empty but already the allocated size so the 
> operating system
> > won't have to go through a free then allocate cycle for the file 
> blocks.
>
>This is a reasonable suggestion, but watch out for the limit on the
>number of attached databases.  You cannot reliably have more than 30
>of them on a custom compiled SQLite3 (for 32-bit integers; the doc is
>silent re int64 type) and the default is a compile-time limit of 10.
>Whether this is a problem depends on your data and application
>architecture.

Also no foreign key to/from attached DBs, which may be a stronger 
limitation.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-06-01 Thread Drake Wilson
Quoth Roger Binns , on 2011-06-01 00:21:44 -0700:
> On 05/31/2011 12:18 PM, Jan Hudec wrote:
> >  - Is there any way to speed it up? 
> 
> Another way (somewhat hinted at by Nico) is that you can create these tables
> in separate databases and use attach to bring them in.  To drop a table you
> can just detach and then delete the database file (at a later point if
> necessary).  If the new database is going to be the approximately the same
> size as the old one then you can overwrite the database header to cause the
> database to be empty but already the allocated size so the operating system
> won't have to go through a free then allocate cycle for the file blocks.

This is a reasonable suggestion, but watch out for the limit on the
number of attached databases.  You cannot reliably have more than 30
of them on a custom compiled SQLite3 (for 32-bit integers; the doc is
silent re int64 type) and the default is a compile-time limit of 10.
Whether this is a problem depends on your data and application
architecture.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-06-01 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/31/2011 12:18 PM, Jan Hudec wrote:
>  - Is there any way to speed it up? 

Another way (somewhat hinted at by Nico) is that you can create these tables
in separate databases and use attach to bring them in.  To drop a table you
can just detach and then delete the database file (at a later point if
necessary).  If the new database is going to be the approximately the same
size as the old one then you can overwrite the database header to cause the
database to be empty but already the allocated size so the operating system
won't have to go through a free then allocate cycle for the file blocks.

This will also allow you to do hot swapping if that is important to you.
You can build up the new tables/databases separately and then when ready to
use them do a detach/attach which should be very quick.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3l6IQACgkQmOOfHg372QRfTwCgqT0qnW9osxVGff9M1w5rDyKh
ANIAoKr+cCCpkJRXnfIGmdglpGdWEEtZ
=DDJh
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Simon Slavin

On 31 May 2011, at 10:44pm, Nico Williams wrote:

> You could rename the table and then delete from ... order by rowid asc
> limit 1000, to delete 1000 rows at a time.  Add in incremental
> autovacuum and that might do the trick.

Would

DELETE FROM myTable ORDER BY rowid DESC LIMIT 1

be any faster ?  Remove the last records rather than the first ?

Also, would dropping indexes before deleting the data be any faster ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Nico Williams
On Tue, May 31, 2011 at 4:22 PM, Simon Slavin  wrote:
> Split the DROP into two stages:
>
> DELETE FROM myTable;
> DROP TABLE myTable;
>
> Which one takes all the time ?  If it's the second one, then perhaps just 
> delete all the records.  Filling the table back up again with new rows 
> obviously already takes considerable time so adding extra time may not be as 
> inconvenient.

That's gonna have the same problem: in order to find out what pages to
add to the free list the engine will have to traverse the interior
nodes of the table's b-tree.

You could rename the table and then delete from ... order by rowid asc
limit 1000, to delete 1000 rows at a time.  Add in incremental
autovacuum and that might do the trick.

You could also re-create the DB in a new file then use the filesystem
to delete the old file -- the filesystem will have to do the same work
to free filesystem blocks -more or less-, but it will do it in the
asynchronously, in the background.

Personally I recommend Roger's VIEW approach to schema changes to
minimize data re-write overhead.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Simon Slavin

On 31 May 2011, at 8:18pm, Jan Hudec wrote:

> At $work we have an application that processes *huge* (tens of millions of
> rows in some of the larger tables, sometimes over 30GiB file size). This
> application changes and when it does, it drops some tables and calculates
> them again. What is somewhat surprising is that dropping the tables itself
> takes quite long (order of minutes) time.
> 
> - What is the reason it might take that long? I didn't expect removing the
>   table entry in sqlite_master and adding it's pages to the free list to
>   take that long.

Split the DROP into two stages:

DELETE FROM myTable;
DROP TABLE myTable;

Which one takes all the time ?  If it's the second one, then perhaps just 
delete all the records.  Filling the table back up again with new rows 
obviously already takes considerable time so adding extra time may not be as 
inconvenient.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Jan Hudec
On Tue, May 31, 2011 at 12:30:59 -0700, Roger Binns wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 05/31/2011 12:18 PM, Jan Hudec wrote:
> > and calculates them again.
> 
> Have you considered using virtual tables so that the calculations are done
> on access as needed rather than up front?

That would not help. The whole point is that the result of each step is
cached, so it does not have to be calculated again next time unless it's
definition has changed (which it however does, because a lot of new features
are being added). Some steps involve quite a lot of work on the C++ side and
while others are implemented completely in SQL, they involve very complex
joins that would trash the caches (already set to 0.5GiB and I still suspect
it's not enough) if they were to be expanded into the query using it.

Regards,
Jan

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Jan Hudec
On Tue, May 31, 2011 at 14:29:11 -0500, Nico Williams wrote:
> Just a guess: finding all the pages to free requires traversing the
> internal nodes of the table's b-tree, which requires reading a fair
> subset of the table's b-tree, which might be a lot of I/O.  At 150MB/s
> it would take almost two minutes to read 15GB of b-tree pages from a
> single disk, and that's assuming the I/Os are sequential (which they
> will almost certainly not be).  So you can see why the drops might be
> slow.

Might well be. Individual tables are not as big as 15GiB, but 0.5GiB/table
can be (the whole file regularly grows to maybe 35GiB, but there are many
tables and a couple of indices on them and I didn't try to dig out how much
each takes).

> One workaround would be to rename the tables to drop and dropping them
> later, when you can spare the time.

There is no such time. Besides that would mean the pages would not be
available for the next table, making the file even larger and even more
fragmented.

> Longer term it'd be nice if SQLite3 could free a dropped table's pages
> incrementally rather than all at once, assuming my guess above is
> correct anyways.

Regards,
Jan

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/31/2011 12:18 PM, Jan Hudec wrote:
> and calculates them again.

Have you considered using virtual tables so that the calculations are done
on access as needed rather than up front?

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3lQfMACgkQmOOfHg372QQeogCfRoRj45G8xTQ1p2fBE5s7H+0Y
c6EAn2wWbH65xM5bXRw2nrHTHBCIRRrX
=WJNx
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is so slow on dropping tables?

2011-05-31 Thread Nico Williams
Just a guess: finding all the pages to free requires traversing the
internal nodes of the table's b-tree, which requires reading a fair
subset of the table's b-tree, which might be a lot of I/O.  At 150MB/s
it would take almost two minutes to read 15GB of b-tree pages from a
single disk, and that's assuming the I/Os are sequential (which they
will almost certainly not be).  So you can see why the drops might be
slow.

One workaround would be to rename the tables to drop and dropping them
later, when you can spare the time.

Longer term it'd be nice if SQLite3 could free a dropped table's pages
incrementally rather than all at once, assuming my guess above is
correct anyways.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] What is so slow on dropping tables?

2011-05-31 Thread Jan Hudec
Hellow folks,

At $work we have an application that processes *huge* (tens of millions of
rows in some of the larger tables, sometimes over 30GiB file size). This
application changes and when it does, it drops some tables and calculates
them again. What is somewhat surprising is that dropping the tables itself
takes quite long (order of minutes) time.

 - What is the reason it might take that long? I didn't expect removing the
   table entry in sqlite_master and adding it's pages to the free list to
   take that long.
 - Is there any way to speed it up? The application works in big tasks, each
   of which opens a transaction and creates one or few tables, dropping any
   old versions of those tables first. So could perhaps moving the drops out
   of the transaction help? It would be correct, once the table is found
   obsolete, it would be found obsolete after rollback and retry again, but
   it would take quite a bit of refactoring, so I'd only do it if it's likely
   to help significantly.

Thanks,
Jan

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users