Re: [sqlite] performance regression when using "insert or replace"

2013-01-24 Thread Richard Hipp
On Thu, Jan 24, 2013 at 10:44 PM, Heiles, Katrina wrote:

> I am curious why versions prior to 3.6.19 don't enter the if
> (subjRequiresPage(pPg) ) in pager_write, but ever since 3.6.19 they do.
>

The change was needed to fix this bug:
http://www.sqlite.org/src/info/4a03ed

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-24 Thread Heiles, Katrina
That makes sense.  Thanks for the explanation.

Unfortunately when I run the more extensive unit test that is built into our 
system, I still
see about a 10% regression (with temp_store=memory and any version >= 3.6.19).  
This is much 
better than the 25% regression I was getting in the extensive test when running 
with the default 
temp_store value.
I am curious why versions prior to 3.6.19 don't enter the if 
(subjRequiresPage(pPg) ) in pager_write, but ever since 3.6.19 they do.  (for 
my insert or replace unit tests).

Are there any options to avoid this? I have tried replacing my "insert or 
replace" with 
"delete" then "insert", which is almost as fast as the older code, so that is 
one option for me.


thanks againm, katrina



-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Dan Kennedy
Sent: Thursday, January 24, 2013 4:44 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] performance regression when using "insert or replace"

On 01/24/2013 02:48 AM, Heiles, Katrina wrote:
> Thank you so much!!  This is great news.
>
> One question: according to the tempfiles.html doc... Section 3.0 
> states that "The rollback journal, master journal, and statement 
> journal files are always written to disk. But the other kinds of 
> temporary files might be stored in memory only and never written to 
> disk."
>
> Section 2.3 states: "the statement journal is also omitted if an 
> alternative conflict resolution algorithm is used."

I think both of those statements are out of date. Rollback and master journal 
files (as well as *-wal files) are always created on disk, as these are 
required by database recovery in the event of an application crash or system 
(power) failure. But these days statement journals may be stored in memory if 
SQLite is configured with "PRAGMA temp_store=memory" or equivalent.

> --
> 
>
>
/* If the statement journal is open and the page is not in it,
> ** then write the current page to the statement journal.  Note that
> ** the statement journal format differs from the standard journal 
> format ** in that it omits the checksums and the header. */ if(
> subjRequiresPage(pPg) ){ rc = subjournalPage(pPg); }
> --
> 
>
> Doesn't going into this "if" statement mean that a statement journal 
> is getting created? If so, isn't a statement journal always written to 
> disk (based on above snipet from section 3)?

Yes and no, respectively. See function openSubJournal() in pager.c.

Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-24 Thread Dan Kennedy

On 01/24/2013 03:25 AM, Teg wrote:



I use "insert or replace" heavily. In debug mode, I set it to use temp
file on disk so, I can watch the disk IO, in release mode I set it to
temp file in memory.

The danger is that if you ever do anything that requires a bunch of
temp file, you can easily run out of RAM. Adding and deleting indexes
for example while temp store is set to memory will run you out of
address space on a larger DB and a 32 bit system (windows).


VACUUM can also create large temporary files (since it uses a temp file
to construct the new database image before copying it over the top of
the old).

Other commands (other than CREATE INDEX and VACUUM) *usually* don't use
too much temp space.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-24 Thread Dan Kennedy

On 01/24/2013 02:48 AM, Heiles, Katrina wrote:

Thank you so much!!  This is great news.

One question: according to the tempfiles.html doc... Section 3.0
states that "The rollback journal, master journal, and statement
journal files are always written to disk. But the other kinds of
temporary files might be stored in memory only and never written to
disk."

Section 2.3 states: "the statement journal is also omitted if an
alternative conflict resolution algorithm is used."


I think both of those statements are out of date. Rollback and
master journal files (as well as *-wal files) are always created
on disk, as these are required by database recovery in the event
of an application crash or system (power) failure. But these days
statement journals may be stored in memory if SQLite is configured
with "PRAGMA temp_store=memory" or equivalent.


--



/* If the statement journal is open and the page is not in it,

** then write the current page to the statement journal.  Note that
** the statement journal format differs from the standard journal
format ** in that it omits the checksums and the header. */ if(
subjRequiresPage(pPg) ){ rc = subjournalPage(pPg); }
--

Doesn't going into this "if" statement mean that a statement journal
is getting created? If so, isn't a statement journal always written
to disk (based on above snipet from section 3)?


Yes and no, respectively. See function openSubJournal() in pager.c.

Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-23 Thread Teg


I use "insert or replace" heavily. In debug mode, I set it to use temp
file on disk so, I can watch the disk IO, in release mode I set it to
temp file in memory.

The danger is that if you ever do anything that requires a bunch of
temp file, you can easily run out of RAM. Adding and deleting indexes
for example while temp store is set to memory will run you out of
address space on a larger DB and a 32 bit system (windows).

In  my  testing, "insert or replace" uses temp store most of the time.



Wednesday, January 23, 2013, 9:16:42 AM, you wrote:

DK> On 01/23/2013 04:20 AM, Heiles, Katrina wrote:
>> I am in the process of updating from 3.6.4 to the latest and greatest 
>> version (finally :-) ).
>> While running performance unit tests, i found a major regression (10K/sec vs 
>> 30k/sec)
>> on a common use case for us.  The regression occurs when using "insert or 
>> replace".
>>
>> I narrowed this down as follows:
>> - the regression was introduced in version 3.6.19, and has been there ever 
>> since.
>> - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is 
>> good again.
>>
>> QUESTION:  I realize that commenting this line out is not the correct 
>> solution.
>> Is there any way to improve performance of "insert or replace"?

DK> Out of interest, is performance improved any with 3.7.15.2 if you
DK> execute "PRAGMA temp_store = memory" before calling _do_batch_insert()?

DK> Thanks,
DK> Dan.




>>
>>
>> I am including 2 other attachments:
>> 
>> 1. test_batch_insert.c
>>- this is the unit test i created to reproduce the issue.  It output
>> 2. notes.txt
>>- this contains the performance output from running test_batch_insert.c
>>  on 3 different versions of sqlite
>>- 3.6.18  (fast)
>>- 3.6.19  (slow)
>>- 3.6.19 with line 74643 commented out  (fast)
>>
>>
>> Below are detailed (but cryptic) notes on what causes the issue.  Please let 
>> me know if you need more info.
>>
>> system:
>> Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 
>> x86_64 GNU/Linux
>>
>> NOTE: all line number references are based on 3_6_19 version of sqlite3.c,
>> where the regression was first introduced.
>>
>> ROOT CAUSE:
>> If I remove one line of code from sqlite3.c (line 74643) that was added in 
>> this release, performance
>> regression is resolved.
>> ---
>>74640   default: {
>>74641 Trigger *pTrigger = 0;
>>74642 assert( onError==OE_Replace );
>>74643 sqlite3MultiWrite(pParse);<--- THIS IS THE GUILTY 
>> LINE
>>74644 if( pParse->db->flags_RecTriggers ){
>>74645   pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 
>> 0, 0);
>>74646 }
>>74647 sqlite3GenerateRowDelete(
>>74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace
>>74649 );
>>74650 seenReplace = 1;
>>74651 break;
>>74652   }
>> ---
>>
>>
>> DETAILS OF WHY THIS LINE CAUSES THE REGRESSION:
>> The effect of including line 74643 when running a batch insert:
>>
>> HIGH LEVEL:
>> in pager_write function, we  end up going into this if statement, 
>> which creates a
>> subjournal, causing creation of etilqs_xxx file.
>> NOTE: using the attached test_batch_insert.c file with max_entries set to 
>> 4, this
>>results in creation of 15 etilqs_xxx  and 262122 MORE writes!!
>> ---
>>35536 /* If the statement journal is open and the page is not in 
>> it,
>>35537 ** then write the current page to the statement journal.  
>> Note that
>>35538 ** the statement journal format differs from the standard 
>> journal format
>>35539 ** in that it omits the checksums and the header.
>>35540 */
>>35541 if( subjRequiresPage(pPg) ){
>>35542   rc = subjournalPage(pPg);< we create a subjournal
>>35543 }
>>35544   }
>> ---
>>
>>
>> LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN 
>> THE CODE
>> when running sample test_batch_insert.c file (included in this bug)
>> The reason that subjRequiresPage() returns true is that when
>> sqlite3GenerateConstraintChecks is called from sqlite3Insert,
>> this calls the guilty line,(74643)sqlite3MultiWrite(pParse);
>>
>>  - This sets pParse->isMultiWrite to 1
>>  - then sqlite3FinishCoding calls  sqlite3VdbeMakeReady with 
>> pParse->isMultiWrite =1
>>causing 3rd param of call to be TRUE.
>>  - This causes 

Re: [sqlite] performance regression when using "insert or replace"

2013-01-23 Thread Heiles, Katrina
Thank you so much!!  This is great news.

One question: according to the tempfiles.html doc... 
Section 3.0 states that "The rollback journal, master journal, and statement 
journal files are always written to disk. But the other kinds of temporary 
files might be stored in memory only and never written to disk."

Section 2.3 states:
"the statement journal is also omitted if an alternative conflict resolution 
algorithm is used."

When I debug my test, I enter the if statement below (at the bottom of the 
pager_write() function).
With pragma temp_store=file (the default), it creates temporary files, which is 
much slower, and with
pramga temp_store=memory, it is fast.  
--
/* If the statement journal is open and the page is not in it,
** then write the current page to the statement journal.  Note that
** the statement journal format differs from the standard journal format
** in that it omits the checksums and the header.
*/
if( subjRequiresPage(pPg) ){
  rc = subjournalPage(pPg);
}
--
My question is: 
Doesn't going into this "if" statement mean that a statement journal is getting 
created?
If so, isn't a statement journal always written to disk (based on above snipet 
from section 3).
Also, why is a statement journal created at all here (since based on section 
2.3, it should be omitted
because I'm using an alternative conflict resolution algorithm (insert or 
replace).

thanks SO MUCH for your help!
katrina


-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Dan Kennedy
Sent: Wednesday, January 23, 2013 11:51 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] performance regression when using "insert or replace"

On 01/23/2013 11:04 PM, Heiles, Katrina wrote:
> Hi Dan,
> Yes, this resolves the problem.  performance comes back up to 31K/sec.
>
> What are the risks of using this as a workaround?  Data integrity is 
> very important to us so I'm curious what effect this pragma would have.

No effect on data integrity or durability.

SQLite uses temporary files for various reasons - statement journals, temporary 
tables (those created with CREATE TEMP TABLE), to materialize views and 
sub-queries when required, that sort of thing. If you set "PRAGMA 
temp_store=memory", then it uses malloc'd memory instead of temporary files for 
these things.

See also:

   http://www.sqlite.org/tempfiles.html

Dan.




>
> thanks, katrina
>> Out of interest, is performance improved any with 3.7.15.2 if you 
>> execute "PRAGMA temp_store = memory" before calling _do_batch_insert()?
>>
>> Thanks,
>> Dan.
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-23 Thread Dan Kennedy

On 01/23/2013 11:04 PM, Heiles, Katrina wrote:

Hi Dan,
Yes, this resolves the problem.  performance comes back up to 31K/sec.

What are the risks of using this as a workaround?  Data integrity is very 
important to us so I'm curious what effect
this pragma would have.


No effect on data integrity or durability.

SQLite uses temporary files for various reasons - statement journals,
temporary tables (those created with CREATE TEMP TABLE), to materialize
views and sub-queries when required, that sort of thing. If you set
"PRAGMA temp_store=memory", then it uses malloc'd memory instead of
temporary files for these things.

See also:

  http://www.sqlite.org/tempfiles.html

Dan.






thanks, katrina

Out of interest, is performance improved any with 3.7.15.2 if you
execute "PRAGMA temp_store = memory" before calling _do_batch_insert()?

Thanks,
Dan.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-23 Thread Heiles, Katrina
Hi Dan,
Yes, this resolves the problem.  performance comes back up to 31K/sec.

What are the risks of using this as a workaround?  Data integrity is very 
important to us so I'm curious what effect
this pragma would have.

thanks, katrina
> Out of interest, is performance improved any with 3.7.15.2 if you
> execute "PRAGMA temp_store = memory" before calling _do_batch_insert()?
>
> Thanks,
> Dan.
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance regression when using "insert or replace"

2013-01-23 Thread Dan Kennedy

On 01/23/2013 04:20 AM, Heiles, Katrina wrote:

I am in the process of updating from 3.6.4 to the latest and greatest version 
(finally :-) ).
While running performance unit tests, i found a major regression (10K/sec vs 
30k/sec)
on a common use case for us.  The regression occurs when using "insert or 
replace".

I narrowed this down as follows:
- the regression was introduced in version 3.6.19, and has been there ever 
since.
- If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is 
good again.

QUESTION:  I realize that commenting this line out is not the correct solution.
Is there any way to improve performance of "insert or replace"?


Out of interest, is performance improved any with 3.7.15.2 if you
execute "PRAGMA temp_store = memory" before calling _do_batch_insert()?

Thanks,
Dan.







I am including 2 other attachments:

1. test_batch_insert.c
   - this is the unit test i created to reproduce the issue.  It output
2. notes.txt
   - this contains the performance output from running test_batch_insert.c
 on 3 different versions of sqlite
   - 3.6.18  (fast)
   - 3.6.19  (slow)
   - 3.6.19 with line 74643 commented out  (fast)


Below are detailed (but cryptic) notes on what causes the issue.  Please let me 
know if you need more info.

system:
Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 
x86_64 GNU/Linux

NOTE: all line number references are based on 3_6_19 version of sqlite3.c,
where the regression was first introduced.

ROOT CAUSE:
If I remove one line of code from sqlite3.c (line 74643) that was added in this 
release, performance
regression is resolved.
---
   74640   default: {
   74641 Trigger *pTrigger = 0;
   74642 assert( onError==OE_Replace );
   74643 sqlite3MultiWrite(pParse);<--- THIS IS THE GUILTY 
LINE
   74644 if( pParse->db->flags_RecTriggers ){
   74645   pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 
0);
   74646 }
   74647 sqlite3GenerateRowDelete(
   74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace
   74649 );
   74650 seenReplace = 1;
   74651 break;
   74652   }
---


DETAILS OF WHY THIS LINE CAUSES THE REGRESSION:
The effect of including line 74643 when running a batch insert:

HIGH LEVEL:
in pager_write function, we  end up going into this if statement, 
which creates a
subjournal, causing creation of etilqs_xxx file.
NOTE: using the attached test_batch_insert.c file with max_entries set to 
4, this
   results in creation of 15 etilqs_xxx  and 262122 MORE writes!!
---
   35536 /* If the statement journal is open and the page is not in it,
   35537 ** then write the current page to the statement journal.  Note 
that
   35538 ** the statement journal format differs from the standard 
journal format
   35539 ** in that it omits the checksums and the header.
   35540 */
   35541 if( subjRequiresPage(pPg) ){
   35542   rc = subjournalPage(pPg);< we create a subjournal
   35543 }
   35544   }
---


LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE 
CODE
when running sample test_batch_insert.c file (included in this bug)
The reason that subjRequiresPage() returns true is that when
sqlite3GenerateConstraintChecks is called from sqlite3Insert,
this calls the guilty line,(74643)sqlite3MultiWrite(pParse);

 - This sets pParse->isMultiWrite to 1
 - then sqlite3FinishCoding calls  sqlite3VdbeMakeReady with 
pParse->isMultiWrite =1
   causing 3rd param of call to be TRUE.
 - This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady
 - Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case 
OP_Transaction:)
  here, p->iStatement is set to 1 because p->usesStmtJournal is 1
   54698 if( pOp->p2&&  p->usesStmtJournal<--- we go INTO this 
if statement
   54699&&  (db->autoCommit==0 || db->activeVdbeCnt>1)
   54700 ){
   54701   assert( sqlite3BtreeIsInTrans(u.as.pBt) );
   54702   if( p->iStatement==0 ){
   54703 assert( db->nStatement>=0&&  db->nSavepoint>=0 );
   54704 db->nStatement++;< this sets to 1, causing 
next line to set iStatement to 1
   54705 p->iStatement = db->nSavepoint + db->nStatement;
   54706   }
   54707   rc = 

[sqlite] performance regression when using "insert or replace"

2013-01-23 Thread Heiles, Katrina
I am in the process of updating from 3.6.4 to the latest and greatest version 
(finally :-) ).
While running performance unit tests, i found a major regression (10K/sec vs 
30k/sec)
on a common use case for us.  The regression occurs when using "insert or 
replace".

I narrowed this down as follows:
- the regression was introduced in version 3.6.19, and has been there ever 
since.
- If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is 
good again.

QUESTION:  I realize that commenting this line out is not the correct solution.
   Is there any way to improve performance of "insert or replace"?


I am including 2 other attachments:

1. test_batch_insert.c
  - this is the unit test i created to reproduce the issue.  It output
2. notes.txt
  - this contains the performance output from running test_batch_insert.c
on 3 different versions of sqlite
  - 3.6.18  (fast)
  - 3.6.19  (slow)
  - 3.6.19 with line 74643 commented out  (fast)


Below are detailed (but cryptic) notes on what causes the issue.  Please let me 
know if you need more info.

system:
Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 
x86_64 GNU/Linux

NOTE: all line number references are based on 3_6_19 version of sqlite3.c,
where the regression was first introduced.

ROOT CAUSE:
If I remove one line of code from sqlite3.c (line 74643) that was added in this 
release, performance
regression is resolved.
---
  74640   default: {
  74641 Trigger *pTrigger = 0;
  74642 assert( onError==OE_Replace );
  74643 sqlite3MultiWrite(pParse);  <--- THIS IS THE GUILTY 
LINE
  74644 if( pParse->db->flags_RecTriggers ){
  74645   pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 
0);
  74646 }
  74647 sqlite3GenerateRowDelete(
  74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace
  74649 );
  74650 seenReplace = 1;
  74651 break;
  74652   }
---


DETAILS OF WHY THIS LINE CAUSES THE REGRESSION:
The effect of including line 74643 when running a batch insert:

HIGH LEVEL:
in pager_write function, we  end up going into this if statement, 
which creates a
subjournal, causing creation of etilqs_xxx file.
NOTE: using the attached test_batch_insert.c file with max_entries set to 
4, this
  results in creation of 15 etilqs_xxx  and 262122 MORE writes!!
---
  35536 /* If the statement journal is open and the page is not in it,
  35537 ** then write the current page to the statement journal.  Note 
that
  35538 ** the statement journal format differs from the standard 
journal format
  35539 ** in that it omits the checksums and the header.
  35540 */
  35541 if( subjRequiresPage(pPg) ){
  35542   rc = subjournalPage(pPg);  < we create a subjournal
  35543 }
  35544   }
---


LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE 
CODE
when running sample test_batch_insert.c file (included in this bug)
The reason that subjRequiresPage() returns true is that when
sqlite3GenerateConstraintChecks is called from sqlite3Insert,
this calls the guilty line,(74643)sqlite3MultiWrite(pParse);

- This sets pParse->isMultiWrite to 1
- then sqlite3FinishCoding calls  sqlite3VdbeMakeReady with 
pParse->isMultiWrite =1
  causing 3rd param of call to be TRUE.
- This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady
- Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case 
OP_Transaction:)
 here, p->iStatement is set to 1 because p->usesStmtJournal is 1
  54698 if( pOp->p2 && p->usesStmtJournal   <--- we go INTO 
this if statement
  54699  && (db->autoCommit==0 || db->activeVdbeCnt>1)
  54700 ){
  54701   assert( sqlite3BtreeIsInTrans(u.as.pBt) );
  54702   if( p->iStatement==0 ){
  54703 assert( db->nStatement>=0 && db->nSavepoint>=0 );
  54704 db->nStatement++;< this sets to 1, causing 
next line to set iStatement to 1
  54705 p->iStatement = db->nSavepoint + db->nStatement;
  54706   }
  54707   rc = sqlite3BtreeBeginStmt(u.as.pBt, p->iStatement);

- sqlite3BtreeBeginStmt calls sqlite3PagerOpenSavepoint using iStatement as 
2nd parameter,
   therefore nSavepoint is set to 1

- eventually we call sqlite3BtreeInsert which calls insertCell which