Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
Clemens Ladischwrites: > Sarge Borsch wrote: >> time xzdec something.sql.xz | sqlite3 something.db > > This measures only xzdec; it does not catch anything that sqlite3 does > after xzdec has finished and closed the pipe. Nitpick: In bash or zsh, `time` is handled by shell, and measures both sides of pipe (in dash it is not, but dash is not intended for interactive use). >> IMO sqlite needs some optimisation for this case when there’s a huge >> INSERT statement, because the speed difference is enormous. > > The huge statement must be completely parsed, and compiled into a huge > VDBE program. I don't see how SQLite could do anything different. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
Looking at the generated program for mulit-tuple INSERT INTO ... VALUE yields the following structure: Only 1 tuple (standard case) 2 tuples Each further tuple adds a segment (1 opcode per field + data) and a yield opcode to the generated VDBE program, and an equivalent amount to the parse tree, all of which will tend to decrease locality and increase the required working set of pages in memory. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Keith Medcalf Gesendet: Mittwoch, 31. Mai 2017 04:45 An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> Betreff: Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row On Tuesday, 30 May, 2017 10:33, R Smith <rsm...@rsweb.co.za> said: > Keith, I think the OP meant he inserted the values using one single > statement, not one single transaction, as in he did one ginormous > INSERT INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),,(1, > 297829872); - 180MB or so worth... Probably lots of data in few rows, > because he is not hitting any SQLite statement limits - unless those > were disabled. > > Meaning that it is probably just one VDBE program. That's unless I am > reading wrong or assuming wrong from the original post - which is > always possible. You are correct. I modified the dump so that it would execute one insert per table with all the data in the single insert statement. It is far slower than even inserting each row in its own transaction (autocommit). Of course, that means that some of the insert statements are approaching several hundred megabytes long. It works, but it is very slow. Most of the time is spent in the parser as one would expect. > On 2017/05/30 6:19 PM, Keith Medcalf wrote: > > I find quite the opposite. Using a DUMP file to create a database > > where > the first test uses the standard dump (which does the load in a single > transaction) is enormously faster than the second one, where the BEGIN > TRANSACTION and COMMIT have been commented out, and thus each insert > is performed in its own transaction. The structure does have the > indexes created while loading ... > > > >> wc -l xmltv.sql > > 5425040xmltv.sql > > > > 2017-05-30 09:43 446,700,424 xmltv.sql > > > > With the commands all being loaded in a single transaction: > > > >> timethis sqlite test1.db < xmltv.sql > > TimeThis : Command Line : sqlite test1.db > > TimeThis :Start Time : Tue May 30 09:41:12 2017 > > TimeThis : End Time : Tue May 30 09:42:14 2017 > > TimeThis : Elapsed Time : 00:01:02.005 > > > > > > With the commands being in their own individual autocommit transactions: > > --- still running after 10 minutes > > --- still running after 20 minutes > > --- gave up after 30 minutes (and it was only 1/100th through > inserting all the data) > > > > > > The fact of the matter is, that you are much better running larger > transactions than smaller ones. The hardware limits the transaction > rate (max 60 transactions/second on spinning rust -- unless you have > lying hardware that does not flush properly). Apparently somewhat > limited on a machine with a 5GB/s SSD as well...since the write rate > was hugely less than the channel limit, but the CPU was fully consumed. > > > > And just to be clear, putting multiple statements in a transaction > > does > not mean they are executed as a single VDBE program. They are still > executed one at a time. The only difference is the number of > trnsactions (and hence the commit/flush to disk count). > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
On Tuesday, 30 May, 2017 10:33, R Smithsaid: > Keith, I think the OP meant he inserted the values using one single > statement, not one single transaction, as in he did one ginormous INSERT > INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),,(1, > 297829872); - 180MB or so worth... Probably lots of data in few rows, > because he is not hitting any SQLite statement limits - unless those > were disabled. > > Meaning that it is probably just one VDBE program. That's unless I am > reading wrong or assuming wrong from the original post - which is always > possible. You are correct. I modified the dump so that it would execute one insert per table with all the data in the single insert statement. It is far slower than even inserting each row in its own transaction (autocommit). Of course, that means that some of the insert statements are approaching several hundred megabytes long. It works, but it is very slow. Most of the time is spent in the parser as one would expect. > On 2017/05/30 6:19 PM, Keith Medcalf wrote: > > I find quite the opposite. Using a DUMP file to create a database where > the first test uses the standard dump (which does the load in a single > transaction) is enormously faster than the second one, where the BEGIN > TRANSACTION and COMMIT have been commented out, and thus each insert is > performed in its own transaction. The structure does have the indexes > created while loading ... > > > >> wc -l xmltv.sql > > 5425040xmltv.sql > > > > 2017-05-30 09:43 446,700,424 xmltv.sql > > > > With the commands all being loaded in a single transaction: > > > >> timethis sqlite test1.db < xmltv.sql > > TimeThis : Command Line : sqlite test1.db > > TimeThis :Start Time : Tue May 30 09:41:12 2017 > > TimeThis : End Time : Tue May 30 09:42:14 2017 > > TimeThis : Elapsed Time : 00:01:02.005 > > > > > > With the commands being in their own individual autocommit transactions: > > --- still running after 10 minutes > > --- still running after 20 minutes > > --- gave up after 30 minutes (and it was only 1/100th through > inserting all the data) > > > > > > The fact of the matter is, that you are much better running larger > transactions than smaller ones. The hardware limits the transaction rate > (max 60 transactions/second on spinning rust -- unless you have lying > hardware that does not flush properly). Apparently somewhat limited on a > machine with a 5GB/s SSD as well...since the write rate was hugely less > than the channel limit, but the CPU was fully consumed. > > > > And just to be clear, putting multiple statements in a transaction does > not mean they are executed as a single VDBE program. They are still > executed one at a time. The only difference is the number of trnsactions > (and hence the commit/flush to disk count). > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
Keith, I think the OP meant he inserted the values using one single statement, not one single transaction, as in he did one ginormous INSERT INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),,(1, 297829872); - 180MB or so worth... Probably lots of data in few rows, because he is not hitting any SQLite statement limits - unless those were disabled. Meaning that it is probably just one VDBE program. That's unless I am reading wrong or assuming wrong from the original post - which is always possible. On 2017/05/30 6:19 PM, Keith Medcalf wrote: I find quite the opposite. Using a DUMP file to create a database where the first test uses the standard dump (which does the load in a single transaction) is enormously faster than the second one, where the BEGIN TRANSACTION and COMMIT have been commented out, and thus each insert is performed in its own transaction. The structure does have the indexes created while loading ... wc -l xmltv.sql 5425040xmltv.sql 2017-05-30 09:43 446,700,424 xmltv.sql With the commands all being loaded in a single transaction: timethis sqlite test1.db < xmltv.sql TimeThis : Command Line : sqlite test1.db TimeThis :Start Time : Tue May 30 09:41:12 2017 TimeThis : End Time : Tue May 30 09:42:14 2017 TimeThis : Elapsed Time : 00:01:02.005 With the commands being in their own individual autocommit transactions: --- still running after 10 minutes --- still running after 20 minutes --- gave up after 30 minutes (and it was only 1/100th through inserting all the data) The fact of the matter is, that you are much better running larger transactions than smaller ones. The hardware limits the transaction rate (max 60 transactions/second on spinning rust -- unless you have lying hardware that does not flush properly). Apparently somewhat limited on a machine with a 5GB/s SSD as well...since the write rate was hugely less than the channel limit, but the CPU was fully consumed. And just to be clear, putting multiple statements in a transaction does not mean they are executed as a single VDBE program. They are still executed one at a time. The only difference is the number of trnsactions (and hence the commit/flush to disk count). ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
On 2017/05/30 2:01 PM, Hick Gunter wrote: If you stuff all 18MB of your data into a single INSERT, then SQlite will need to generate a single program that contains all 18MB of your data (plus code to build rows aout of that). This will put a heavy strain on memory requirements and offset any speed you hope to gain. The SOP is to put many (1000 magnitude) INSERT statements into one transaction to save disk IO on commit. Correct, and let me just add, the /compressed/ size is 18MB of fairly compressible statements, so the real data may well be 180MB or more. This can take quite some time to build a query on. Out of interest Sarge, did you try this on MySQL or Postgres too? What was the result? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
I find quite the opposite. Using a DUMP file to create a database where the first test uses the standard dump (which does the load in a single transaction) is enormously faster than the second one, where the BEGIN TRANSACTION and COMMIT have been commented out, and thus each insert is performed in its own transaction. The structure does have the indexes created while loading ... >wc -l xmltv.sql 5425040xmltv.sql 2017-05-30 09:43 446,700,424 xmltv.sql With the commands all being loaded in a single transaction: >timethis sqlite test1.db < xmltv.sql TimeThis : Command Line : sqlite test1.db TimeThis :Start Time : Tue May 30 09:41:12 2017 TimeThis : End Time : Tue May 30 09:42:14 2017 TimeThis : Elapsed Time : 00:01:02.005 With the commands being in their own individual autocommit transactions: --- still running after 10 minutes --- still running after 20 minutes --- gave up after 30 minutes (and it was only 1/100th through inserting all the data) The fact of the matter is, that you are much better running larger transactions than smaller ones. The hardware limits the transaction rate (max 60 transactions/second on spinning rust -- unless you have lying hardware that does not flush properly). Apparently somewhat limited on a machine with a 5GB/s SSD as well...since the write rate was hugely less than the channel limit, but the CPU was fully consumed. And just to be clear, putting multiple statements in a transaction does not mean they are executed as a single VDBE program. They are still executed one at a time. The only difference is the number of trnsactions (and hence the commit/flush to disk count). -- ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı > -Original Message- > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] > On Behalf Of Sarge Borsch > Sent: Sunday, 28 May, 2017 04:58 > To: sqlite-users@mailinglists.sqlite.org > Subject: [sqlite] Importing from single-insert-statement SQL dump is 61 > times slower than importing from SQL dump with one statement per row > > I compared speed of importing (into an empty SQLite DB) from 2 kinds of > SQL dumps. Data is exactly the same in both cases, and xz-compressed size > of SQL dump is near 18MB in both cases. > First SQL dump has single big INSERT statement in single transaction. > Second SQL dump has one INSERT statement for each row. > > Schema is 1 table with these columns: INTEGER PRIMARY KEY, TEXT x2 (2 > columns), INTEGER x4, TEXT x10 > There is nothing else besides the data, no indexes, etc. in both cases. > > In both cases I am importing from compressed file, using command like > this: > time xzdec something.sql.xz | sqlite3 something.db > > Time of import from single-insert SQL dump: > real 2m13.884s > user 2m13.791s > sys 0m1.052s > > Time of import from multiple-insert SQL dump: > real 0m2.192s > user 0m3.266s > sys 0m0.347s > > IMO sqlite needs some optimisation for this case when there’s a huge > INSERT statement, because the speed difference is enormous. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
If you stuff all 18MB of your data into a single INSERT, then SQlite will need to generate a single program that contains all 18MB of your data (plus code to build rows aout of that). This will put a heavy strain on memory requirements and offset any speed you hope to gain. The SOP is to put many (1000 magnitude) INSERT statements into one transaction to save disk IO on commit. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Sarge Borsch Gesendet: Sonntag, 28. Mai 2017 12:58 An: sqlite-users@mailinglists.sqlite.org Betreff: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row I compared speed of importing (into an empty SQLite DB) from 2 kinds of SQL dumps. Data is exactly the same in both cases, and xz-compressed size of SQL dump is near 18MB in both cases. First SQL dump has single big INSERT statement in single transaction. Second SQL dump has one INSERT statement for each row. Schema is 1 table with these columns: INTEGER PRIMARY KEY, TEXT x2 (2 columns), INTEGER x4, TEXT x10 There is nothing else besides the data, no indexes, etc. in both cases. In both cases I am importing from compressed file, using command like this: time xzdec something.sql.xz | sqlite3 something.db Time of import from single-insert SQL dump: real2m13.884s user2m13.791s sys 0m1.052s Time of import from multiple-insert SQL dump: real0m2.192s user0m3.266s sys 0m0.347s IMO sqlite needs some optimisation for this case when there’s a huge INSERT statement, because the speed difference is enormous. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
Sarge Borsch wrote: > time xzdec something.sql.xz | sqlite3 something.db This measures only xzdec; it does not catch anything that sqlite3 does after xzdec has finished and closed the pipe. > IMO sqlite needs some optimisation for this case when there’s a huge > INSERT statement, because the speed difference is enormous. The huge statement must be completely parsed, and compiled into a huge VDBE program. I don't see how SQLite could do anything different. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row
I compared speed of importing (into an empty SQLite DB) from 2 kinds of SQL dumps. Data is exactly the same in both cases, and xz-compressed size of SQL dump is near 18MB in both cases. First SQL dump has single big INSERT statement in single transaction. Second SQL dump has one INSERT statement for each row. Schema is 1 table with these columns: INTEGER PRIMARY KEY, TEXT x2 (2 columns), INTEGER x4, TEXT x10 There is nothing else besides the data, no indexes, etc. in both cases. In both cases I am importing from compressed file, using command like this: time xzdec something.sql.xz | sqlite3 something.db Time of import from single-insert SQL dump: real2m13.884s user2m13.791s sys 0m1.052s Time of import from multiple-insert SQL dump: real0m2.192s user0m3.266s sys 0m0.347s IMO sqlite needs some optimisation for this case when there’s a huge INSERT statement, because the speed difference is enormous. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users