Re: [sqlite] performance regression when using "insert or replace"
On Thu, Jan 24, 2013 at 10:44 PM, Heiles, Katrinawrote: > I am curious why versions prior to 3.6.19 don't enter the if > (subjRequiresPage(pPg) ) in pager_write, but ever since 3.6.19 they do. > The change was needed to fix this bug: http://www.sqlite.org/src/info/4a03ed -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
That makes sense. Thanks for the explanation. Unfortunately when I run the more extensive unit test that is built into our system, I still see about a 10% regression (with temp_store=memory and any version >= 3.6.19). This is much better than the 25% regression I was getting in the extensive test when running with the default temp_store value. I am curious why versions prior to 3.6.19 don't enter the if (subjRequiresPage(pPg) ) in pager_write, but ever since 3.6.19 they do. (for my insert or replace unit tests). Are there any options to avoid this? I have tried replacing my "insert or replace" with "delete" then "insert", which is almost as fast as the older code, so that is one option for me. thanks againm, katrina -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy Sent: Thursday, January 24, 2013 4:44 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] performance regression when using "insert or replace" On 01/24/2013 02:48 AM, Heiles, Katrina wrote: > Thank you so much!! This is great news. > > One question: according to the tempfiles.html doc... Section 3.0 > states that "The rollback journal, master journal, and statement > journal files are always written to disk. But the other kinds of > temporary files might be stored in memory only and never written to > disk." > > Section 2.3 states: "the statement journal is also omitted if an > alternative conflict resolution algorithm is used." I think both of those statements are out of date. Rollback and master journal files (as well as *-wal files) are always created on disk, as these are required by database recovery in the event of an application crash or system (power) failure. But these days statement journals may be stored in memory if SQLite is configured with "PRAGMA temp_store=memory" or equivalent. > -- > > > /* If the statement journal is open and the page is not in it, > ** then write the current page to the statement journal. Note that > ** the statement journal format differs from the standard journal > format ** in that it omits the checksums and the header. */ if( > subjRequiresPage(pPg) ){ rc = subjournalPage(pPg); } > -- > > > Doesn't going into this "if" statement mean that a statement journal > is getting created? If so, isn't a statement journal always written to > disk (based on above snipet from section 3)? Yes and no, respectively. See function openSubJournal() in pager.c. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
On 01/24/2013 03:25 AM, Teg wrote: I use "insert or replace" heavily. In debug mode, I set it to use temp file on disk so, I can watch the disk IO, in release mode I set it to temp file in memory. The danger is that if you ever do anything that requires a bunch of temp file, you can easily run out of RAM. Adding and deleting indexes for example while temp store is set to memory will run you out of address space on a larger DB and a 32 bit system (windows). VACUUM can also create large temporary files (since it uses a temp file to construct the new database image before copying it over the top of the old). Other commands (other than CREATE INDEX and VACUUM) *usually* don't use too much temp space. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
On 01/24/2013 02:48 AM, Heiles, Katrina wrote: Thank you so much!! This is great news. One question: according to the tempfiles.html doc... Section 3.0 states that "The rollback journal, master journal, and statement journal files are always written to disk. But the other kinds of temporary files might be stored in memory only and never written to disk." Section 2.3 states: "the statement journal is also omitted if an alternative conflict resolution algorithm is used." I think both of those statements are out of date. Rollback and master journal files (as well as *-wal files) are always created on disk, as these are required by database recovery in the event of an application crash or system (power) failure. But these days statement journals may be stored in memory if SQLite is configured with "PRAGMA temp_store=memory" or equivalent. -- /* If the statement journal is open and the page is not in it, ** then write the current page to the statement journal. Note that ** the statement journal format differs from the standard journal format ** in that it omits the checksums and the header. */ if( subjRequiresPage(pPg) ){ rc = subjournalPage(pPg); } -- Doesn't going into this "if" statement mean that a statement journal is getting created? If so, isn't a statement journal always written to disk (based on above snipet from section 3)? Yes and no, respectively. See function openSubJournal() in pager.c. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
I use "insert or replace" heavily. In debug mode, I set it to use temp file on disk so, I can watch the disk IO, in release mode I set it to temp file in memory. The danger is that if you ever do anything that requires a bunch of temp file, you can easily run out of RAM. Adding and deleting indexes for example while temp store is set to memory will run you out of address space on a larger DB and a 32 bit system (windows). In my testing, "insert or replace" uses temp store most of the time. Wednesday, January 23, 2013, 9:16:42 AM, you wrote: DK> On 01/23/2013 04:20 AM, Heiles, Katrina wrote: >> I am in the process of updating from 3.6.4 to the latest and greatest >> version (finally :-) ). >> While running performance unit tests, i found a major regression (10K/sec vs >> 30k/sec) >> on a common use case for us. The regression occurs when using "insert or >> replace". >> >> I narrowed this down as follows: >> - the regression was introduced in version 3.6.19, and has been there ever >> since. >> - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is >> good again. >> >> QUESTION: I realize that commenting this line out is not the correct >> solution. >> Is there any way to improve performance of "insert or replace"? DK> Out of interest, is performance improved any with 3.7.15.2 if you DK> execute "PRAGMA temp_store = memory" before calling _do_batch_insert()? DK> Thanks, DK> Dan. >> >> >> I am including 2 other attachments: >> >> 1. test_batch_insert.c >>- this is the unit test i created to reproduce the issue. It output >> 2. notes.txt >>- this contains the performance output from running test_batch_insert.c >> on 3 different versions of sqlite >>- 3.6.18 (fast) >>- 3.6.19 (slow) >>- 3.6.19 with line 74643 commented out (fast) >> >> >> Below are detailed (but cryptic) notes on what causes the issue. Please let >> me know if you need more info. >> >> system: >> Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 >> x86_64 GNU/Linux >> >> NOTE: all line number references are based on 3_6_19 version of sqlite3.c, >> where the regression was first introduced. >> >> ROOT CAUSE: >> If I remove one line of code from sqlite3.c (line 74643) that was added in >> this release, performance >> regression is resolved. >> --- >>74640 default: { >>74641 Trigger *pTrigger = 0; >>74642 assert( onError==OE_Replace ); >>74643 sqlite3MultiWrite(pParse);<--- THIS IS THE GUILTY >> LINE >>74644 if( pParse->db->flags_RecTriggers ){ >>74645 pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, >> 0, 0); >>74646 } >>74647 sqlite3GenerateRowDelete( >>74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace >>74649 ); >>74650 seenReplace = 1; >>74651 break; >>74652 } >> --- >> >> >> DETAILS OF WHY THIS LINE CAUSES THE REGRESSION: >> The effect of including line 74643 when running a batch insert: >> >> HIGH LEVEL: >> in pager_write function, we end up going into this if statement, >> which creates a >> subjournal, causing creation of etilqs_xxx file. >> NOTE: using the attached test_batch_insert.c file with max_entries set to >> 4, this >>results in creation of 15 etilqs_xxx and 262122 MORE writes!! >> --- >>35536 /* If the statement journal is open and the page is not in >> it, >>35537 ** then write the current page to the statement journal. >> Note that >>35538 ** the statement journal format differs from the standard >> journal format >>35539 ** in that it omits the checksums and the header. >>35540 */ >>35541 if( subjRequiresPage(pPg) ){ >>35542 rc = subjournalPage(pPg);< we create a subjournal >>35543 } >>35544 } >> --- >> >> >> LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN >> THE CODE >> when running sample test_batch_insert.c file (included in this bug) >> The reason that subjRequiresPage() returns true is that when >> sqlite3GenerateConstraintChecks is called from sqlite3Insert, >> this calls the guilty line,(74643)sqlite3MultiWrite(pParse); >> >> - This sets pParse->isMultiWrite to 1 >> - then sqlite3FinishCoding calls sqlite3VdbeMakeReady with >> pParse->isMultiWrite =1 >>causing 3rd param of call to be TRUE. >> - This causes
Re: [sqlite] performance regression when using "insert or replace"
Thank you so much!! This is great news. One question: according to the tempfiles.html doc... Section 3.0 states that "The rollback journal, master journal, and statement journal files are always written to disk. But the other kinds of temporary files might be stored in memory only and never written to disk." Section 2.3 states: "the statement journal is also omitted if an alternative conflict resolution algorithm is used." When I debug my test, I enter the if statement below (at the bottom of the pager_write() function). With pragma temp_store=file (the default), it creates temporary files, which is much slower, and with pramga temp_store=memory, it is fast. -- /* If the statement journal is open and the page is not in it, ** then write the current page to the statement journal. Note that ** the statement journal format differs from the standard journal format ** in that it omits the checksums and the header. */ if( subjRequiresPage(pPg) ){ rc = subjournalPage(pPg); } -- My question is: Doesn't going into this "if" statement mean that a statement journal is getting created? If so, isn't a statement journal always written to disk (based on above snipet from section 3). Also, why is a statement journal created at all here (since based on section 2.3, it should be omitted because I'm using an alternative conflict resolution algorithm (insert or replace). thanks SO MUCH for your help! katrina -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy Sent: Wednesday, January 23, 2013 11:51 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] performance regression when using "insert or replace" On 01/23/2013 11:04 PM, Heiles, Katrina wrote: > Hi Dan, > Yes, this resolves the problem. performance comes back up to 31K/sec. > > What are the risks of using this as a workaround? Data integrity is > very important to us so I'm curious what effect this pragma would have. No effect on data integrity or durability. SQLite uses temporary files for various reasons - statement journals, temporary tables (those created with CREATE TEMP TABLE), to materialize views and sub-queries when required, that sort of thing. If you set "PRAGMA temp_store=memory", then it uses malloc'd memory instead of temporary files for these things. See also: http://www.sqlite.org/tempfiles.html Dan. > > thanks, katrina >> Out of interest, is performance improved any with 3.7.15.2 if you >> execute "PRAGMA temp_store = memory" before calling _do_batch_insert()? >> >> Thanks, >> Dan. >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
On 01/23/2013 11:04 PM, Heiles, Katrina wrote: Hi Dan, Yes, this resolves the problem. performance comes back up to 31K/sec. What are the risks of using this as a workaround? Data integrity is very important to us so I'm curious what effect this pragma would have. No effect on data integrity or durability. SQLite uses temporary files for various reasons - statement journals, temporary tables (those created with CREATE TEMP TABLE), to materialize views and sub-queries when required, that sort of thing. If you set "PRAGMA temp_store=memory", then it uses malloc'd memory instead of temporary files for these things. See also: http://www.sqlite.org/tempfiles.html Dan. thanks, katrina Out of interest, is performance improved any with 3.7.15.2 if you execute "PRAGMA temp_store = memory" before calling _do_batch_insert()? Thanks, Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
Hi Dan, Yes, this resolves the problem. performance comes back up to 31K/sec. What are the risks of using this as a workaround? Data integrity is very important to us so I'm curious what effect this pragma would have. thanks, katrina > Out of interest, is performance improved any with 3.7.15.2 if you > execute "PRAGMA temp_store = memory" before calling _do_batch_insert()? > > Thanks, > Dan. > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using "insert or replace"
On 01/23/2013 04:20 AM, Heiles, Katrina wrote: I am in the process of updating from 3.6.4 to the latest and greatest version (finally :-) ). While running performance unit tests, i found a major regression (10K/sec vs 30k/sec) on a common use case for us. The regression occurs when using "insert or replace". I narrowed this down as follows: - the regression was introduced in version 3.6.19, and has been there ever since. - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is good again. QUESTION: I realize that commenting this line out is not the correct solution. Is there any way to improve performance of "insert or replace"? Out of interest, is performance improved any with 3.7.15.2 if you execute "PRAGMA temp_store = memory" before calling _do_batch_insert()? Thanks, Dan. I am including 2 other attachments: 1. test_batch_insert.c - this is the unit test i created to reproduce the issue. It output 2. notes.txt - this contains the performance output from running test_batch_insert.c on 3 different versions of sqlite - 3.6.18 (fast) - 3.6.19 (slow) - 3.6.19 with line 74643 commented out (fast) Below are detailed (but cryptic) notes on what causes the issue. Please let me know if you need more info. system: Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux NOTE: all line number references are based on 3_6_19 version of sqlite3.c, where the regression was first introduced. ROOT CAUSE: If I remove one line of code from sqlite3.c (line 74643) that was added in this release, performance regression is resolved. --- 74640 default: { 74641 Trigger *pTrigger = 0; 74642 assert( onError==OE_Replace ); 74643 sqlite3MultiWrite(pParse);<--- THIS IS THE GUILTY LINE 74644 if( pParse->db->flags_RecTriggers ){ 74645 pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 0); 74646 } 74647 sqlite3GenerateRowDelete( 74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace 74649 ); 74650 seenReplace = 1; 74651 break; 74652 } --- DETAILS OF WHY THIS LINE CAUSES THE REGRESSION: The effect of including line 74643 when running a batch insert: HIGH LEVEL: in pager_write function, we end up going into this if statement, which creates a subjournal, causing creation of etilqs_xxx file. NOTE: using the attached test_batch_insert.c file with max_entries set to 4, this results in creation of 15 etilqs_xxx and 262122 MORE writes!! --- 35536 /* If the statement journal is open and the page is not in it, 35537 ** then write the current page to the statement journal. Note that 35538 ** the statement journal format differs from the standard journal format 35539 ** in that it omits the checksums and the header. 35540 */ 35541 if( subjRequiresPage(pPg) ){ 35542 rc = subjournalPage(pPg);< we create a subjournal 35543 } 35544 } --- LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE CODE when running sample test_batch_insert.c file (included in this bug) The reason that subjRequiresPage() returns true is that when sqlite3GenerateConstraintChecks is called from sqlite3Insert, this calls the guilty line,(74643)sqlite3MultiWrite(pParse); - This sets pParse->isMultiWrite to 1 - then sqlite3FinishCoding calls sqlite3VdbeMakeReady with pParse->isMultiWrite =1 causing 3rd param of call to be TRUE. - This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady - Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case OP_Transaction:) here, p->iStatement is set to 1 because p->usesStmtJournal is 1 54698 if( pOp->p2&& p->usesStmtJournal<--- we go INTO this if statement 54699&& (db->autoCommit==0 || db->activeVdbeCnt>1) 54700 ){ 54701 assert( sqlite3BtreeIsInTrans(u.as.pBt) ); 54702 if( p->iStatement==0 ){ 54703 assert( db->nStatement>=0&& db->nSavepoint>=0 ); 54704 db->nStatement++;< this sets to 1, causing next line to set iStatement to 1 54705 p->iStatement = db->nSavepoint + db->nStatement; 54706 } 54707 rc =
[sqlite] performance regression when using "insert or replace"
I am in the process of updating from 3.6.4 to the latest and greatest version (finally :-) ). While running performance unit tests, i found a major regression (10K/sec vs 30k/sec) on a common use case for us. The regression occurs when using "insert or replace". I narrowed this down as follows: - the regression was introduced in version 3.6.19, and has been there ever since. - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is good again. QUESTION: I realize that commenting this line out is not the correct solution. Is there any way to improve performance of "insert or replace"? I am including 2 other attachments: 1. test_batch_insert.c - this is the unit test i created to reproduce the issue. It output 2. notes.txt - this contains the performance output from running test_batch_insert.c on 3 different versions of sqlite - 3.6.18 (fast) - 3.6.19 (slow) - 3.6.19 with line 74643 commented out (fast) Below are detailed (but cryptic) notes on what causes the issue. Please let me know if you need more info. system: Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux NOTE: all line number references are based on 3_6_19 version of sqlite3.c, where the regression was first introduced. ROOT CAUSE: If I remove one line of code from sqlite3.c (line 74643) that was added in this release, performance regression is resolved. --- 74640 default: { 74641 Trigger *pTrigger = 0; 74642 assert( onError==OE_Replace ); 74643 sqlite3MultiWrite(pParse); <--- THIS IS THE GUILTY LINE 74644 if( pParse->db->flags_RecTriggers ){ 74645 pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 0); 74646 } 74647 sqlite3GenerateRowDelete( 74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace 74649 ); 74650 seenReplace = 1; 74651 break; 74652 } --- DETAILS OF WHY THIS LINE CAUSES THE REGRESSION: The effect of including line 74643 when running a batch insert: HIGH LEVEL: in pager_write function, we end up going into this if statement, which creates a subjournal, causing creation of etilqs_xxx file. NOTE: using the attached test_batch_insert.c file with max_entries set to 4, this results in creation of 15 etilqs_xxx and 262122 MORE writes!! --- 35536 /* If the statement journal is open and the page is not in it, 35537 ** then write the current page to the statement journal. Note that 35538 ** the statement journal format differs from the standard journal format 35539 ** in that it omits the checksums and the header. 35540 */ 35541 if( subjRequiresPage(pPg) ){ 35542 rc = subjournalPage(pPg); < we create a subjournal 35543 } 35544 } --- LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE CODE when running sample test_batch_insert.c file (included in this bug) The reason that subjRequiresPage() returns true is that when sqlite3GenerateConstraintChecks is called from sqlite3Insert, this calls the guilty line,(74643)sqlite3MultiWrite(pParse); - This sets pParse->isMultiWrite to 1 - then sqlite3FinishCoding calls sqlite3VdbeMakeReady with pParse->isMultiWrite =1 causing 3rd param of call to be TRUE. - This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady - Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case OP_Transaction:) here, p->iStatement is set to 1 because p->usesStmtJournal is 1 54698 if( pOp->p2 && p->usesStmtJournal <--- we go INTO this if statement 54699 && (db->autoCommit==0 || db->activeVdbeCnt>1) 54700 ){ 54701 assert( sqlite3BtreeIsInTrans(u.as.pBt) ); 54702 if( p->iStatement==0 ){ 54703 assert( db->nStatement>=0 && db->nSavepoint>=0 ); 54704 db->nStatement++;< this sets to 1, causing next line to set iStatement to 1 54705 p->iStatement = db->nSavepoint + db->nStatement; 54706 } 54707 rc = sqlite3BtreeBeginStmt(u.as.pBt, p->iStatement); - sqlite3BtreeBeginStmt calls sqlite3PagerOpenSavepoint using iStatement as 2nd parameter, therefore nSavepoint is set to 1 - eventually we call sqlite3BtreeInsert which calls insertCell which