Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-06-04 Thread Yuriy M. Kaminskiy
Clemens Ladisch  writes:

> Sarge Borsch wrote:
>> time xzdec something.sql.xz | sqlite3 something.db
>
> This measures only xzdec; it does not catch anything that sqlite3 does
> after xzdec has finished and closed the pipe.

Nitpick:
In bash or zsh, `time` is handled by shell, and measures both sides of
pipe (in dash it is not, but dash is not intended for interactive use).

>> IMO sqlite needs some optimisation for this case when there’s a huge
>> INSERT statement, because the speed difference is enormous.
>
> The huge statement must be completely parsed, and compiled into a huge
> VDBE program.  I don't see how SQLite could do anything different.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-31 Thread Hick Gunter
Looking at the generated program for mulit-tuple INSERT INTO ... VALUE yields 
the following structure:

Only 1 tuple (standard case)






2 tuples










Each further tuple adds a  segment (1 opcode per field + data) 
and a yield opcode to the generated VDBE program, and an equivalent amount to 
the parse tree, all of which will tend to decrease locality and increase the 
required working set of pages in memory.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Keith Medcalf
Gesendet: Mittwoch, 31. Mai 2017 04:45
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Importing from single-insert-statement SQL dump is 61 
times slower than importing from SQL dump with one statement per row

On Tuesday, 30 May, 2017 10:33, R Smith <rsm...@rsweb.co.za> said:

> Keith, I think the OP meant he inserted the values using one single
> statement, not one single transaction, as in he did one ginormous
> INSERT INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),,(1,
> 297829872); - 180MB or so worth... Probably lots of data in few rows,
> because he is not hitting any SQLite statement limits - unless those
> were disabled.
>
> Meaning that it is probably just one VDBE program. That's unless I am
> reading wrong or assuming wrong from the original post - which is
> always possible.

You are correct.  I modified the dump so that it would execute one insert per 
table with all the data in the single insert statement.  It is far slower than 
even inserting each row in its own transaction (autocommit).  Of course, that 
means that some of the insert statements are approaching several hundred 
megabytes long.  It works, but it is very slow.  Most of the time is spent in 
the parser as one would expect.

> On 2017/05/30 6:19 PM, Keith Medcalf wrote:
> > I find quite the opposite.  Using a DUMP file to create a database
> > where
> the first test uses the standard dump (which does the load in a single
> transaction) is enormously faster than the second one, where the BEGIN
> TRANSACTION and COMMIT have been commented out, and thus each insert
> is performed in its own transaction.  The structure does have the
> indexes created while loading ...
> >
> >> wc -l xmltv.sql
> >   5425040xmltv.sql
> >
> > 2017-05-30  09:43   446,700,424 xmltv.sql
> >
> > With the commands all being loaded in a single transaction:
> >
> >> timethis sqlite test1.db < xmltv.sql
> > TimeThis :  Command Line :  sqlite test1.db
> > TimeThis :Start Time :  Tue May 30 09:41:12 2017
> > TimeThis :  End Time :  Tue May 30 09:42:14 2017
> > TimeThis :  Elapsed Time :  00:01:02.005
> >
> >
> > With the commands being in their own individual autocommit transactions:
> >   --- still running after 10 minutes
> >   --- still running after 20 minutes
> >   --- gave up after 30 minutes (and it was only 1/100th through
> inserting all the data)
> >
> >
> > The fact of the matter is, that you are much better running larger
> transactions than smaller ones.  The hardware limits the transaction
> rate (max 60 transactions/second on spinning rust -- unless you have
> lying hardware that does not flush properly).  Apparently somewhat
> limited on a machine with a 5GB/s SSD as well...since the write rate
> was hugely less than the channel limit, but the CPU was fully consumed.
> >
> > And just to be clear, putting multiple statements in a transaction
> > does
> not mean they are executed as a single VDBE program.  They are still
> executed one at a time.  The only difference is the number of
> trnsactions (and hence the commit/flush to disk count).
> >
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread Keith Medcalf
On Tuesday, 30 May, 2017 10:33, R Smith  said:

> Keith, I think the OP meant he inserted the values using one single
> statement, not one single transaction, as in he did one ginormous INSERT
> INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),,(1,
> 297829872); - 180MB or so worth... Probably lots of data in few rows,
> because he is not hitting any SQLite statement limits - unless those
> were disabled.
> 
> Meaning that it is probably just one VDBE program. That's unless I am
> reading wrong or assuming wrong from the original post - which is always
> possible.

You are correct.  I modified the dump so that it would execute one insert per 
table with all the data in the single insert statement.  It is far slower than 
even inserting each row in its own transaction (autocommit).  Of course, that 
means that some of the insert statements are approaching several hundred 
megabytes long.  It works, but it is very slow.  Most of the time is spent in 
the parser as one would expect.

> On 2017/05/30 6:19 PM, Keith Medcalf wrote:
> > I find quite the opposite.  Using a DUMP file to create a database where
> the first test uses the standard dump (which does the load in a single
> transaction) is enormously faster than the second one, where the BEGIN
> TRANSACTION and COMMIT have been commented out, and thus each insert is
> performed in its own transaction.  The structure does have the indexes
> created while loading ...
> >
> >> wc -l xmltv.sql
> >   5425040xmltv.sql
> >
> > 2017-05-30  09:43   446,700,424 xmltv.sql
> >
> > With the commands all being loaded in a single transaction:
> >
> >> timethis sqlite test1.db < xmltv.sql
> > TimeThis :  Command Line :  sqlite test1.db
> > TimeThis :Start Time :  Tue May 30 09:41:12 2017
> > TimeThis :  End Time :  Tue May 30 09:42:14 2017
> > TimeThis :  Elapsed Time :  00:01:02.005
> >
> >
> > With the commands being in their own individual autocommit transactions:
> >   --- still running after 10 minutes
> >   --- still running after 20 minutes
> >   --- gave up after 30 minutes (and it was only 1/100th through
> inserting all the data)
> >
> >
> > The fact of the matter is, that you are much better running larger
> transactions than smaller ones.  The hardware limits the transaction rate
> (max 60 transactions/second on spinning rust -- unless you have lying
> hardware that does not flush properly).  Apparently somewhat limited on a
> machine with a 5GB/s SSD as well...since the write rate was hugely less
> than the channel limit, but the CPU was fully consumed.
> >
> > And just to be clear, putting multiple statements in a transaction does
> not mean they are executed as a single VDBE program.  They are still
> executed one at a time.  The only difference is the number of trnsactions
> (and hence the commit/flush to disk count).
> >
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread R Smith
Keith, I think the OP meant he inserted the values using one single 
statement, not one single transaction, as in he did one ginormous INSERT 
INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),,(1, 
297829872); - 180MB or so worth... Probably lots of data in few rows, 
because he is not hitting any SQLite statement limits - unless those 
were disabled.


Meaning that it is probably just one VDBE program. That's unless I am 
reading wrong or assuming wrong from the original post - which is always 
possible.



On 2017/05/30 6:19 PM, Keith Medcalf wrote:

I find quite the opposite.  Using a DUMP file to create a database where the 
first test uses the standard dump (which does the load in a single transaction) 
is enormously faster than the second one, where the BEGIN TRANSACTION and 
COMMIT have been commented out, and thus each insert is performed in its own 
transaction.  The structure does have the indexes created while loading ...


wc -l xmltv.sql

  5425040xmltv.sql

2017-05-30  09:43   446,700,424 xmltv.sql

With the commands all being loaded in a single transaction:


timethis sqlite test1.db < xmltv.sql

TimeThis :  Command Line :  sqlite test1.db
TimeThis :Start Time :  Tue May 30 09:41:12 2017
TimeThis :  End Time :  Tue May 30 09:42:14 2017
TimeThis :  Elapsed Time :  00:01:02.005


With the commands being in their own individual autocommit transactions:
  --- still running after 10 minutes
  --- still running after 20 minutes
  --- gave up after 30 minutes (and it was only 1/100th through inserting all 
the data)


The fact of the matter is, that you are much better running larger transactions 
than smaller ones.  The hardware limits the transaction rate (max 60 
transactions/second on spinning rust -- unless you have lying hardware that 
does not flush properly).  Apparently somewhat limited on a machine with a 
5GB/s SSD as well...since the write rate was hugely less than the channel 
limit, but the CPU was fully consumed.

And just to be clear, putting multiple statements in a transaction does not 
mean they are executed as a single VDBE program.  They are still executed one 
at a time.  The only difference is the number of trnsactions (and hence the 
commit/flush to disk count).



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread R Smith

On 2017/05/30 2:01 PM, Hick Gunter wrote:

If you stuff all 18MB of your data into a single INSERT, then SQlite will need 
to generate a single program that contains all 18MB of your data (plus code to 
build rows aout of that). This will put a heavy strain on memory requirements 
and offset any speed you hope to gain.

The SOP is to put many (1000 magnitude) INSERT statements into one transaction 
to save disk IO on commit.


Correct, and let me just add, the /compressed/ size is 18MB of fairly 
compressible statements, so the real data may well be 180MB or more. 
This can take quite some time to build a query on.


Out of interest Sarge, did you try this on MySQL or Postgres too? What 
was the result?



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread Keith Medcalf

I find quite the opposite.  Using a DUMP file to create a database where the 
first test uses the standard dump (which does the load in a single transaction) 
is enormously faster than the second one, where the BEGIN TRANSACTION and 
COMMIT have been commented out, and thus each insert is performed in its own 
transaction.  The structure does have the indexes created while loading ...

>wc -l xmltv.sql
 5425040xmltv.sql

2017-05-30  09:43   446,700,424 xmltv.sql

With the commands all being loaded in a single transaction:

>timethis sqlite test1.db < xmltv.sql

TimeThis :  Command Line :  sqlite test1.db
TimeThis :Start Time :  Tue May 30 09:41:12 2017
TimeThis :  End Time :  Tue May 30 09:42:14 2017
TimeThis :  Elapsed Time :  00:01:02.005


With the commands being in their own individual autocommit transactions:
 --- still running after 10 minutes
 --- still running after 20 minutes
 --- gave up after 30 minutes (and it was only 1/100th through inserting all 
the data)


The fact of the matter is, that you are much better running larger transactions 
than smaller ones.  The hardware limits the transaction rate (max 60 
transactions/second on spinning rust -- unless you have lying hardware that 
does not flush properly).  Apparently somewhat limited on a machine with a 
5GB/s SSD as well...since the write rate was hugely less than the channel 
limit, but the CPU was fully consumed.

And just to be clear, putting multiple statements in a transaction does not 
mean they are executed as a single VDBE program.  They are still executed one 
at a time.  The only difference is the number of trnsactions (and hence the 
commit/flush to disk count).

-- 
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Sarge Borsch
> Sent: Sunday, 28 May, 2017 04:58
> To: sqlite-users@mailinglists.sqlite.org
> Subject: [sqlite] Importing from single-insert-statement SQL dump is 61
> times slower than importing from SQL dump with one statement per row
> 
> I compared speed of importing (into an empty SQLite DB) from 2 kinds of
> SQL dumps. Data is exactly the same in both cases, and xz-compressed size
> of SQL dump is near 18MB in both cases.
> First SQL dump has single big INSERT statement in single transaction.
> Second SQL dump has one INSERT statement for each row.
> 
> Schema is 1 table with these columns: INTEGER PRIMARY KEY, TEXT x2 (2
> columns), INTEGER x4, TEXT x10
> There is nothing else besides the data, no indexes, etc. in both cases.
> 
> In both cases I am importing from compressed file, using command like
> this:
> time xzdec something.sql.xz | sqlite3 something.db
> 
> Time of import from single-insert SQL dump:
> real  2m13.884s
> user  2m13.791s
> sys   0m1.052s
> 
> Time of import from multiple-insert SQL dump:
> real  0m2.192s
> user  0m3.266s
> sys   0m0.347s
> 
> IMO sqlite needs some optimisation for this case when there’s a huge
> INSERT statement, because the speed difference is enormous.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread Hick Gunter
If you stuff all 18MB of your data into a single INSERT, then SQlite will need 
to generate a single program that contains all 18MB of your data (plus code to 
build rows aout of that). This will put a heavy strain on memory requirements 
and offset any speed you hope to gain.

The SOP is to put many (1000 magnitude) INSERT statements into one transaction 
to save disk IO on commit.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Sarge Borsch
Gesendet: Sonntag, 28. Mai 2017 12:58
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] Importing from single-insert-statement SQL dump is 61 times 
slower than importing from SQL dump with one statement per row

I compared speed of importing (into an empty SQLite DB) from 2 kinds of SQL 
dumps. Data is exactly the same in both cases, and xz-compressed size of SQL 
dump is near 18MB in both cases.
First SQL dump has single big INSERT statement in single transaction.
Second SQL dump has one INSERT statement for each row.

Schema is 1 table with these columns: INTEGER PRIMARY KEY, TEXT x2 (2 columns), 
INTEGER x4, TEXT x10 There is nothing else besides the data, no indexes, etc. 
in both cases.

In both cases I am importing from compressed file, using command like this:
time xzdec something.sql.xz | sqlite3 something.db

Time of import from single-insert SQL dump:
real2m13.884s
user2m13.791s
sys 0m1.052s

Time of import from multiple-insert SQL dump:
real0m2.192s
user0m3.266s
sys 0m0.347s

IMO sqlite needs some optimisation for this case when there’s a huge INSERT 
statement, because the speed difference is enormous.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread Clemens Ladisch
Sarge Borsch wrote:
> time xzdec something.sql.xz | sqlite3 something.db

This measures only xzdec; it does not catch anything that sqlite3 does
after xzdec has finished and closed the pipe.

> IMO sqlite needs some optimisation for this case when there’s a huge
> INSERT statement, because the speed difference is enormous.

The huge statement must be completely parsed, and compiled into a huge
VDBE program.  I don't see how SQLite could do anything different.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

2017-05-30 Thread Sarge Borsch
I compared speed of importing (into an empty SQLite DB) from 2 kinds of SQL 
dumps. Data is exactly the same in both cases, and xz-compressed size of SQL 
dump is near 18MB in both cases.
First SQL dump has single big INSERT statement in single transaction.
Second SQL dump has one INSERT statement for each row.

Schema is 1 table with these columns: INTEGER PRIMARY KEY, TEXT x2 (2 columns), 
INTEGER x4, TEXT x10
There is nothing else besides the data, no indexes, etc. in both cases.

In both cases I am importing from compressed file, using command like this:
time xzdec something.sql.xz | sqlite3 something.db

Time of import from single-insert SQL dump:
real2m13.884s
user2m13.791s
sys 0m1.052s

Time of import from multiple-insert SQL dump:
real0m2.192s
user0m3.266s
sys 0m0.347s

IMO sqlite needs some optimisation for this case when there’s a huge INSERT 
statement, because the speed difference is enormous.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users