Re: [sqlite] quickly insert multiple rows into multiple table

2010-07-12 Thread Eric Smith
W.-H.  Gu wrote: 

> For (1), after I wrapped inserts into a transaction, I face an issue: 
> the cpu usage is too high.  I think the reason is that I use prepare 
> statement to insert a row at a time, it than does sqlite3_step for 
> thousands times every second.  

If I'm reading this right, you're saying you're re-preparing the 
statement on every row.  Don't do that.  Prepare it once, and only 
re-bind the variables for each insert.  Then finalize the statement 
after your loop.  

I gather that statement preparation is a non-trivial percentage of 
the compute time taken to insert a single row.

> Every second I insert 9184 rows, 

Your optimal insertion rates will vary strongly depending on your 
hardware, your operating system, the efficiency of data generation 
between calls to the sqlite api, amount of data you're inserting per 
row, amount of RAM sqlite is allowed to use, any indices, triggers, 
check constraints, unique constraints, primary keys, and foreign keys 
you have defined, compile time flags, and dozens of other variables.

I'm no SQL or SQLite expert myself -- the gurus might add many items 
to that list off the tops of their heads.

We have no way of knowing what number you should be shooting for.  

> which leads to cpu usage ~30%.  
> 
> Is there any good way to resolve this issue?  

Obviously there are many variables here.  Why do you think you want low 
CPU usage?  

Generally, I think you *want* to be CPU-bound for the duration of your 
insert batch.  Anything else implies you're doing something besides 
useful computation (like waiting for disk i/o (due to a cache spill or
one of many other possible reasons), or scheduled behind some other 
process, or something).

At commit time, sqlite will write the new rows back to the disk, by 
default blocking til the OS tells SQLite that the write is complete 
(though this is configurable).  During this time you expect cpu usage 
to go down while disk usage goes up.  

Another point is that you need to make sure the OS is really telling you
what you think it's telling you.  E.g. a common mistake for a Linux user
is to think that the load average is the average CPU usage over some 
period of time.  It isn't.

Eric 

-- 
Eric A. Smith

One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] quickly insert multiple rows into multiple table

2010-07-12 Thread W.-H. Gu
Hi,

   For (1), after I wrapped inserts into a transaction, I face an issue:
the cpu usage is too high. I think the reason is that I use prepare
statement to insert a row at a time, it than does sqlite3_step for thousands
times every second.

   Every second I insert 9184 rows, which leads to cpu usage ~30%.

   Is there any good way to resolve this issue?

  Thanks,

   WeiHsin

On Mon, Jul 12, 2010 at 10:48 AM, Kees Nuyt  wrote:

> 1On Mon, 12 Jul 2010 09:29:35 -0700, "W.-H. Gu"
>  wrote:
>
> >Hi,
> >
> >  I have 5 tables. Every second I want to insert 1 row into the main
> table,
> >and multiple rows into every other table. What I do right now is to
> prepare
> >5 statements for those 5 tables, where the statements for multiple row
> >insertion is like
> >
> >  INSERT INTO MyTable (FirstCol, SecondCol)
> >  SELECT 'First' ,1
> >  UNION ALL
> >  SELECT 'Second' ,2
> >  UNION ALL
> >  SELECT 'Third' ,3
> >  UNION ALL
> >  SELECT 'Fourth' ,4
> >  UNION ALL
> >  SELECT 'Fifth' ,5
> >
> >  My questions are:
> >
> >  (1) Is the above statement for multiple row insertion efficient? What is
> >the best approach?
>
> That is a very complicated way to combine multiple insert
> statements.
>
> Just wrap the INSERTs (several thousands of them is not a
> problem) into a transaction:
>
> BEGIN EXCLUSIVE TRANSACTION;
> INSERT INTO MyTable (FirstCol, SecondCol)
>   VALUES ('First' ,1);
> INSERT INTO MyTable (FirstCol, SecondCol)
>   VALUES ('Second' ,2);
> INSERT INTO MyTable (FirstCol, SecondCol)
>   VALUES ('Third' ,3);
> INSERT INTO MyTable (FirstCol, SecondCol)
>   VALUES ('Fourth' ,4);
> INSERT INTO MyTable (FirstCol, SecondCol)
>  VALUES ('Fifth' ,5);
> COMMIT;
>
> >  (2) Should I need to bind parameters every time when inserting? Or is
> >there a way to bind pointers so that I just need to do sqlite3_step every
> >time when I insert rows?
>
> I think you have to bind. You make it sound like a lot of
> work, but you only have to write the code once
>
> >  (3) Is there a way to speed up the whole process? For example, disable
> >index management first and enable it again after bulk insert, or one
> prepare
> >statement for multiple row multiple table insertion, etc..
>
> a) Normalize redundancy out of the schema.
> b) Wrap many inserts in a transaction
> c) Before bulk load, you can indeed DROP indexes and CREATE
> them again later to get some speed advantage
> d) sort the data in primary key order before inserting
> e) Give the database as much cache as you can (but not more
> than the estimated database size)
> f) fast hardware (especially disks)
>
> I think there are a few more suggestions in the wiki on
> http://www.sqlite.org
> --
>  (  Kees Nuyt
>  )
> c[_]
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] quickly insert multiple rows into multiple table

2010-07-12 Thread Kees Nuyt
1On Mon, 12 Jul 2010 09:29:35 -0700, "W.-H. Gu"
 wrote:

>Hi,
>
>  I have 5 tables. Every second I want to insert 1 row into the main table,
>and multiple rows into every other table. What I do right now is to prepare
>5 statements for those 5 tables, where the statements for multiple row
>insertion is like
>
>  INSERT INTO MyTable (FirstCol, SecondCol)
>  SELECT 'First' ,1
>  UNION ALL
>  SELECT 'Second' ,2
>  UNION ALL
>  SELECT 'Third' ,3
>  UNION ALL
>  SELECT 'Fourth' ,4
>  UNION ALL
>  SELECT 'Fifth' ,5
>
>  My questions are:
>
>  (1) Is the above statement for multiple row insertion efficient? What is
>the best approach?

That is a very complicated way to combine multiple insert
statements.

Just wrap the INSERTs (several thousands of them is not a
problem) into a transaction:

BEGIN EXCLUSIVE TRANSACTION;
INSERT INTO MyTable (FirstCol, SecondCol)
  VALUES ('First' ,1);
INSERT INTO MyTable (FirstCol, SecondCol)
  VALUES ('Second' ,2);
INSERT INTO MyTable (FirstCol, SecondCol)
  VALUES ('Third' ,3);
INSERT INTO MyTable (FirstCol, SecondCol)
  VALUES ('Fourth' ,4);
INSERT INTO MyTable (FirstCol, SecondCol)
 VALUES ('Fifth' ,5);
COMMIT;

>  (2) Should I need to bind parameters every time when inserting? Or is
>there a way to bind pointers so that I just need to do sqlite3_step every
>time when I insert rows?

I think you have to bind. You make it sound like a lot of
work, but you only have to write the code once

>  (3) Is there a way to speed up the whole process? For example, disable
>index management first and enable it again after bulk insert, or one prepare
>statement for multiple row multiple table insertion, etc..

a) Normalize redundancy out of the schema.
b) Wrap many inserts in a transaction
c) Before bulk load, you can indeed DROP indexes and CREATE
them again later to get some speed advantage
d) sort the data in primary key order before inserting
e) Give the database as much cache as you can (but not more
than the estimated database size)
f) fast hardware (especially disks)

I think there are a few more suggestions in the wiki on
http://www.sqlite.org
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users