[sqlite] Sqlite aggregate cold/hot boot performance
Hello, I can observe a behaviour that I do not understand. [context] I have an application that links whith sqlite3.dll. I can open a document, and perform a query on it. That query is using an aggregate function, that I have created myself using the callbacks like sqlite_aggr_indexset_step... It works very well. [what I do not understand] I perform a computer cold boot, launch my app, opens a doc, perform the query. The sqlite3_step() takes some time (a few seconds).It's ok, the query is rather complex. Now, I close my doc, reopens it. The same query performs very fast. I close my app, restart it, open the doc, perform the query, and one again it goes very fast. I stop the computer, restart it, redo the above, and that time, the query is slow. Is there some global cache retained by sqlite3.dll that makes my query faster ? Could it be rather related to some hard disk cache ? (please note that my database files where the query is done are about 150MB) Any hint ? Regards, Pierre Chatelier ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] integrity_check out of memory
On Tue, Jan 22, 2013 at 12:33 PM, Dominique Pellé dominique.pe...@gmail.com wrote: Max Vlasov wrote: Hi, I found with the web search that this error (out of memory for PRAGMA integrity_check) appeared in the wild and there were some fixes related to it.It looks perhaps like the bug that was fixed in this checkin: http://www.sqlite.org/src/info/120c82d56e Can you try and confirm? The trunk version works ok, no limits noticed. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Sqlite aggregate cold/hot boot performance
Pierre Chatelier wrote: Is there some global cache retained by sqlite3.dll that makes my query faster ? Could it be rather related to some hard disk cache ? SQLite uses normal file accesses, so the operating system will try to cache the data. (please note that my database files where the query is done are about 150MB) And how many GB is the system using for file caching? :) Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] performance regression when using insert or replace
I am in the process of updating from 3.6.4 to the latest and greatest version (finally :-) ). While running performance unit tests, i found a major regression (10K/sec vs 30k/sec) on a common use case for us. The regression occurs when using insert or replace. I narrowed this down as follows: - the regression was introduced in version 3.6.19, and has been there ever since. - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is good again. QUESTION: I realize that commenting this line out is not the correct solution. Is there any way to improve performance of insert or replace? I am including 2 other attachments: 1. test_batch_insert.c - this is the unit test i created to reproduce the issue. It output 2. notes.txt - this contains the performance output from running test_batch_insert.c on 3 different versions of sqlite - 3.6.18 (fast) - 3.6.19 (slow) - 3.6.19 with line 74643 commented out (fast) Below are detailed (but cryptic) notes on what causes the issue. Please let me know if you need more info. system: Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux NOTE: all line number references are based on 3_6_19 version of sqlite3.c, where the regression was first introduced. ROOT CAUSE: If I remove one line of code from sqlite3.c (line 74643) that was added in this release, performance regression is resolved. --- 74640 default: { 74641 Trigger *pTrigger = 0; 74642 assert( onError==OE_Replace ); 74643 sqlite3MultiWrite(pParse); --- THIS IS THE GUILTY LINE 74644 if( pParse-db-flagsSQLITE_RecTriggers ){ 74645 pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 0); 74646 } 74647 sqlite3GenerateRowDelete( 74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace 74649 ); 74650 seenReplace = 1; 74651 break; 74652 } --- DETAILS OF WHY THIS LINE CAUSES THE REGRESSION: The effect of including line 74643 when running a batch insert: HIGH LEVEL: in pager_write function, we sometimes end up going into this if statement, which creates a subjournal, causing creation of etilqs_xxx file. NOTE: using the attached test_batch_insert.c file with max_entries set to 4, this results in creation of 15 etilqs_xxx and 262122 MORE writes!! --- 35536 /* If the statement journal is open and the page is not in it, 35537 ** then write the current page to the statement journal. Note that 35538 ** the statement journal format differs from the standard journal format 35539 ** in that it omits the checksums and the header. 35540 */ 35541 if( subjRequiresPage(pPg) ){ 35542 rc = subjournalPage(pPg); we create a subjournal 35543 } 35544 } --- LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE CODE when running sample test_batch_insert.c file (included in this bug) The reason that subjRequiresPage() returns true is that when sqlite3GenerateConstraintChecks is called from sqlite3Insert, this calls the guilty line,(74643)sqlite3MultiWrite(pParse); - This sets pParse-isMultiWrite to 1 - then sqlite3FinishCoding calls sqlite3VdbeMakeReady with pParse-isMultiWrite =1 causing 3rd param of call to be TRUE. - This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady - Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case OP_Transaction:) here, p-iStatement is set to 1 because p-usesStmtJournal is 1 54698 if( pOp-p2 p-usesStmtJournal --- we go INTO this if statement 54699 (db-autoCommit==0 || db-activeVdbeCnt1) 54700 ){ 54701 assert( sqlite3BtreeIsInTrans(u.as.pBt) ); 54702 if( p-iStatement==0 ){ 54703 assert( db-nStatement=0 db-nSavepoint=0 ); 54704 db-nStatement++; this sets to 1, causing next line to set iStatement to 1 54705 p-iStatement = db-nSavepoint + db-nStatement; 54706 } 54707 rc = sqlite3BtreeBeginStmt(u.as.pBt, p-iStatement); - sqlite3BtreeBeginStmt calls sqlite3PagerOpenSavepoint using iStatement as 2nd parameter, therefore nSavepoint is set to 1 - eventually we call sqlite3BtreeInsert which calls insertCell which calls
[sqlite] minor bug: EXPLAIN statement should fail when it is not compiled
As written on http://www.sqlite.org/compile.html#omit_explain when explain is disabled executing it should fail. But I do get back the result of the query instead. Thanks for the great software by the way. Greetings, Simon H. Examples of unexpected result = sqlite EXPLAIN SELECT 1.234; 1.234 sqlite EXPLAIN QUERY PLAN SELECT 98765432; 98765432 Compile options and version === sqlite SELECT sqlite_compileoption_get(0); CURDIR sqlite SELECT sqlite_compileoption_get(1); ENABLE_COLUMN_METADATA sqlite SELECT sqlite_compileoption_get(2); OMIT_EXPLAIN sqlite SELECT sqlite_compileoption_get(3); OMIT_SHARED_CACHE sqlite SELECT sqlite_compileoption_get(4); SOUNDEX sqlite SELECT sqlite_compileoption_get(5); TEMP_STORE=1 sqlite SELECT sqlite_compileoption_get(6); THREADSAFE=1 sqlite SELECT sqlite_compileoption_get(7); sqlite SELECT sqlite_source_id(); 2012-10-04 19:37:12 091570e46d04e84b67228e0bdbcd6e1fb60c6bdb sqlite SELECT sqlite_version(); 3.7.14.1 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Are Foreign Keys indexed?
Hi, I'm reading http://www.sqlite.org/foreignkeys.html but can't find answer. Are SQLite FK indexed? For example: CREATE TABLE artists ( id_artist INTEGER PRIMARY KEY // - PK are indexed, that I know ) CREATE TABLE songs ( id_song INTEGER PRIMARY KEY, id_artist INTEGER, // - Is this FK indexed? FOREIGN KEY(id_artist) REFERENCES artists(id_artist) ) To be more specific: select * from songs where id_artist = 123 Is above query searching sequentially or using index? I'm asking because if I remember, FireBird FK where indexed by default, but Oracle needed create index on that FK manually. How this work in SQLite? Regards ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Are Foreign Keys indexed?
Krzysztof wrote: Are SQLite FK indexed? http://www.sqlite.org/foreignkeys.html#fk_indexes says: | Indices are not required for child key columns but they are almost | always beneficial. They are if you create one manually. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Are Foreign Keys indexed?
On Wed, Jan 23, 2013 at 02:12:39PM +0100, Krzysztof scratched on the wall: Hi, I'm reading http://www.sqlite.org/foreignkeys.html but can't find answer. Are SQLite FK indexed? For example: Not automatically. From that page: 3. Required and Suggested Database Indexes [...] Indices are not required for child key columns but they are almost always beneficial. -j -- Jay A. Kreibich J A Y @ K R E I B I.C H Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable. -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Insert record in c++
Hi Everyone, I am extremely new to sqlite and using sql syntax. I have been searching the web for examples but I am not finding any that actually helps me understand what I am doing. Can someone show me in c++ how to insert a record into the database utilizing variables? The examples I see all have hard coded values, even when I try to put a variable instead of the values my code never compiles. Example, I found this code String Data = INSERT INTO friend(name, address, age) VALUES('John','Anywhere here','25'); How would you actually pull the VALUES from variables and use them. A complete example will be very helpful. I am currently trying to write a small demo that will show how to CreateDB, OpenDB, Create Multiple Tables at once or just one table, Insert Record, Delete Record, Next Record, Prior Record etc etc. Just the basics so I will at least know how to use code. Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Are Foreign Keys indexed?
Thanks for answer! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Insert record in c++
On Wed, Jan 23, 2013 at 2:36 PM, roystonja...@comcast.net wrote: String Data = INSERT INTO friend(name, address, age) VALUES('John','Anywhere here','25'); How would you actually pull the VALUES from variables and use them. A complete example will be very helpful. Google for sqlite3_bind example and you'll find many examples. You'll also need to read up on sqlite3_prepare() and sqlite3_prepare_v2(), both described in detail at: http://www.sqlite.org/c3ref/prepare.html In the _abstract_ it looks like: statement = prepare(INSERT INTO t(a,b,c) VALUES(?,?,?)); bind( statement, 1, foo ); bind( statement, 2, bar ); bind( statement, 3, 32 ); And the above pages reveal the exact syntax/APIs to use. Note that the String class shown above is platform-specific and won't be understood by the C API - it uses (char const *) and (unsigned char const *). There are a number of C++ wrappers out there. AFAIK none of them are official, but here's the one i currently use/maintain: http://fossil.wanderinghorse.net/wikis/cpdo/?page=cpdopp -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Sqlite aggregate cold/hot boot performance
Hi, Is there some global cache retained by sqlite3.dll that makes my query faster ? Could it be rather related to some hard disk cache ? SQLite uses normal file accesses, so the operating system will try to cache the data. OK And how many GB is the system using for file caching? :) I don't know ! There are so many levels of caching that I do not know how much can be expected at that level. Regards, Pierre Chatelier ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Sqlite aggregate cold/hot boot performance
On 23 Jan 2013, at 9:01am, Pierre Chatelier k...@club-internet.fr wrote: [what I do not understand] I perform a computer cold boot, launch my app, opens a doc, perform the query. The sqlite3_step() takes some time (a few seconds).It's ok, the query is rather complex. Now, I close my doc, reopens it. The same query performs very fast. I close my app, restart it, open the doc, perform the query, and one again it goes very fast. I stop the computer, restart it, redo the above, and that time, the query is slow. What you describe is typical of a computer with a slow hard disk and lots of file cache space. The first time a file is needed it has to be read from hard disk which is slow. After that the data is already in (cache) memory and access to that is fast. Operating systems these days do not expect to see the computer rebooted even once a week, so slowing the computer up only after a reboot isn't a problem. There is one complicating factor under Windows which is that Windows makes special efforts to cache files with certain extensions on the filename. And these actually make sqlite slower. So pick a file extension for your databases which is something obviously unusual (.sqlite, .s, etc..) rather than one which Windows may think it understands (.db). However I don't think that this is anything to do with the problem you're reporting. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using insert or replace
On 01/23/2013 04:20 AM, Heiles, Katrina wrote: I am in the process of updating from 3.6.4 to the latest and greatest version (finally :-) ). While running performance unit tests, i found a major regression (10K/sec vs 30k/sec) on a common use case for us. The regression occurs when using insert or replace. I narrowed this down as follows: - the regression was introduced in version 3.6.19, and has been there ever since. - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is good again. QUESTION: I realize that commenting this line out is not the correct solution. Is there any way to improve performance of insert or replace? Out of interest, is performance improved any with 3.7.15.2 if you execute PRAGMA temp_store = memory before calling _do_batch_insert()? Thanks, Dan. I am including 2 other attachments: 1. test_batch_insert.c - this is the unit test i created to reproduce the issue. It output 2. notes.txt - this contains the performance output from running test_batch_insert.c on 3 different versions of sqlite - 3.6.18 (fast) - 3.6.19 (slow) - 3.6.19 with line 74643 commented out (fast) Below are detailed (but cryptic) notes on what causes the issue. Please let me know if you need more info. system: Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux NOTE: all line number references are based on 3_6_19 version of sqlite3.c, where the regression was first introduced. ROOT CAUSE: If I remove one line of code from sqlite3.c (line 74643) that was added in this release, performance regression is resolved. --- 74640 default: { 74641 Trigger *pTrigger = 0; 74642 assert( onError==OE_Replace ); 74643 sqlite3MultiWrite(pParse);--- THIS IS THE GUILTY LINE 74644 if( pParse-db-flagsSQLITE_RecTriggers ){ 74645 pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 0); 74646 } 74647 sqlite3GenerateRowDelete( 74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace 74649 ); 74650 seenReplace = 1; 74651 break; 74652 } --- DETAILS OF WHY THIS LINE CAUSES THE REGRESSION: The effect of including line 74643 when running a batch insert: HIGH LEVEL: in pager_write function, wesometimes end up going into this if statement, which creates a subjournal, causing creation of etilqs_xxx file. NOTE: using the attached test_batch_insert.c file with max_entries set to 4, this results in creation of 15 etilqs_xxx and 262122 MORE writes!! --- 35536 /* If the statement journal is open and the page is not in it, 35537 ** then write the current page to the statement journal. Note that 35538 ** the statement journal format differs from the standard journal format 35539 ** in that it omits the checksums and the header. 35540 */ 35541 if( subjRequiresPage(pPg) ){ 35542 rc = subjournalPage(pPg); we create a subjournal 35543 } 35544 } --- LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE CODE when running sample test_batch_insert.c file (included in this bug) The reason that subjRequiresPage() returns true is that when sqlite3GenerateConstraintChecks is called from sqlite3Insert, this calls the guilty line,(74643)sqlite3MultiWrite(pParse); - This sets pParse-isMultiWrite to 1 - then sqlite3FinishCoding calls sqlite3VdbeMakeReady with pParse-isMultiWrite =1 causing 3rd param of call to be TRUE. - This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady - Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case OP_Transaction:) here, p-iStatement is set to 1 because p-usesStmtJournal is 1 54698 if( pOp-p2 p-usesStmtJournal--- we go INTO this if statement 54699 (db-autoCommit==0 || db-activeVdbeCnt1) 54700 ){ 54701 assert( sqlite3BtreeIsInTrans(u.as.pBt) ); 54702 if( p-iStatement==0 ){ 54703 assert( db-nStatement=0 db-nSavepoint=0 ); 54704 db-nStatement++; this sets to 1, causing next line to set iStatement to 1 54705 p-iStatement = db-nSavepoint + db-nStatement; 54706 } 54707 rc = sqlite3BtreeBeginStmt(u.as.pBt,
Re: [sqlite] Connection philosophy
On Jan 22, 2013, at 8:18 PM, Keith Medcalf kmedc...@dessus.com wrote: I presume that you are using some kind of input-driven or event driven application which may get a request to process a query in the middle of your update transaction. That is correct. One of the advantages of WAL and using a separate reader and writer connection is that the reader connection will not see uncommitted data from the writer. This is a good point and one I had not considered because I haven't been bit. In our application, the schema itself has kept us out of trouble, since the readers generally (always?) deal with a part of the schema uninvolved with the data collection. That is, they are looking up data to support the collection, but are not interested in the specific data being collected. Kind of. (It is a complicated program and I would not be surprised if there are some corner-cases to this statement.) It may be worth pursuing this approach anyway to make the most robust wrapper possible (and remove these concerns from the working set of things I have to worry about!). One way you might do this is to modify your wrappers ... keep a usage count Turns out the wrappers already have smart, reference counted objects for statement handles and the DB objects themselves, so hooking in the logic you suggest would be straightforward. it may be better to think about such possibilities before they notice it and claim your application is broken ... or even worse is acting mysteriously in their eyes. So, you've worked here too, eh? :-) Anyway, I think this is a powerful model and, given the wrappers, I think I can experiment with it in a low-risk way. I look forward to doing so. In other news, last night I coded the the secret singleton handle -- actually a cache keyed by DB filename -- works well, and it may be just me, but I feel like it sped up certain operations quite a bit. We'll see what the QA folks tell me later on today! Thanks again for all this. Very productive. -- Ward ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using insert or replace
Hi Dan, Yes, this resolves the problem. performance comes back up to 31K/sec. What are the risks of using this as a workaround? Data integrity is very important to us so I'm curious what effect this pragma would have. thanks, katrina Out of interest, is performance improved any with 3.7.15.2 if you execute PRAGMA temp_store = memory before calling _do_batch_insert()? Thanks, Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using insert or replace
On 01/23/2013 11:04 PM, Heiles, Katrina wrote: Hi Dan, Yes, this resolves the problem. performance comes back up to 31K/sec. What are the risks of using this as a workaround? Data integrity is very important to us so I'm curious what effect this pragma would have. No effect on data integrity or durability. SQLite uses temporary files for various reasons - statement journals, temporary tables (those created with CREATE TEMP TABLE), to materialize views and sub-queries when required, that sort of thing. If you set PRAGMA temp_store=memory, then it uses malloc'd memory instead of temporary files for these things. See also: http://www.sqlite.org/tempfiles.html Dan. thanks, katrina Out of interest, is performance improved any with 3.7.15.2 if you execute PRAGMA temp_store = memory before calling _do_batch_insert()? Thanks, Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Change in behavior between 1.0.79.0 and1.0.83.0 in System.Data.SQLite
Ok. So how does this get fixed? -- Michael Russell MTI Film, LLC michael.russ...@mtifilm.com http://www.mtifilm.com/ Providence, RI 02906 USA +1 (401) 831-1315 On Thu, Jan 10, 2013 at 8:49 PM, Joe Mistachkin sql...@mistachkin.comwrote: Michael Russell wrote: 1) The break seems to have happened between 1.0.81.0 and 1.0.82.0. There was a change in System.Data.SQLite disposal behavior in that timeframe, namely using the new sqlite3_close_v2() API; however, everything [now] conforms to the IDisposable interface semantics and established best practices for native resource cleanup. The root issue here is that all disposable managed objects (in this case CriticalHandle derived classes) must be properly disposed by whatever uses them. Failing to do so with System.Data.SQLite used to be a source of potential access violations, etc; now, it just keeps the database connection handle active until everything has been properly disposed. 2) Do Entity Framework team members read this list? I don't know. -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQL query
If you simply want a list of all files that are present and are not also present in set 0 (I'm not sure how 'duplicated' means anything different...) SELECT f.name, f.set, f.hash FROM files f LEFT OUTER JOIN files f2 ON f2.name = f.name and f2.set = 0 WHERE f.set != 0 and f2.name is null -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Paul Sanderson Sent: Wednesday, January 23, 2013 12:18 PM To: General Discussion of SQLite Database Subject: [sqlite] SQL query I have a database with many million rows with in it each representing a file. There are many duplicate files in the database and all files are hashed. The files are sub categorised into a number of sets, numbered 0 to 10 for example. Files do not need to be in every set. I need to select all files that are in any set other than 0 that are not duplicated/present in set 0 So a sample database might contain columns name set hash with sample data file10ABCD file11ABCD file13EF01 file20BCE2 file22BCE2 file35EE34 file40EE22 My query would return file13EF01 file35EE34 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQL query
On Wed, Jan 23, 2013 at 12:17 PM, Paul Sanderson sandersonforens...@gmail.com wrote: I have a database with many million rows with in it each representing a file. There are many duplicate files in the database and all files are hashed. The files are sub categorised into a number of sets, numbered 0 to 10 for example. Files do not need to be in every set. I need to select all files that are in any set other than 0 that are not duplicated/present in set 0 So a sample database might contain columns name set hash with sample data file10ABCD file11ABCD file13EF01 file20BCE2 file22BCE2 file35EE34 file40EE22 My query would return file13EF01 file35EE34 SELECT * FROM files WHERE hash NOT IN (SELECT hash FROM files WHERE set=0) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using insert or replace
Thank you so much!! This is great news. One question: according to the tempfiles.html doc... Section 3.0 states that The rollback journal, master journal, and statement journal files are always written to disk. But the other kinds of temporary files might be stored in memory only and never written to disk. Section 2.3 states: the statement journal is also omitted if an alternative conflict resolution algorithm is used. When I debug my test, I enter the if statement below (at the bottom of the pager_write() function). With pragma temp_store=file (the default), it creates temporary files, which is much slower, and with pramga temp_store=memory, it is fast. -- /* If the statement journal is open and the page is not in it, ** then write the current page to the statement journal. Note that ** the statement journal format differs from the standard journal format ** in that it omits the checksums and the header. */ if( subjRequiresPage(pPg) ){ rc = subjournalPage(pPg); } -- My question is: Doesn't going into this if statement mean that a statement journal is getting created? If so, isn't a statement journal always written to disk (based on above snipet from section 3). Also, why is a statement journal created at all here (since based on section 2.3, it should be omitted because I'm using an alternative conflict resolution algorithm (insert or replace). thanks SO MUCH for your help! katrina -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy Sent: Wednesday, January 23, 2013 11:51 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] performance regression when using insert or replace On 01/23/2013 11:04 PM, Heiles, Katrina wrote: Hi Dan, Yes, this resolves the problem. performance comes back up to 31K/sec. What are the risks of using this as a workaround? Data integrity is very important to us so I'm curious what effect this pragma would have. No effect on data integrity or durability. SQLite uses temporary files for various reasons - statement journals, temporary tables (those created with CREATE TEMP TABLE), to materialize views and sub-queries when required, that sort of thing. If you set PRAGMA temp_store=memory, then it uses malloc'd memory instead of temporary files for these things. See also: http://www.sqlite.org/tempfiles.html Dan. thanks, katrina Out of interest, is performance improved any with 3.7.15.2 if you execute PRAGMA temp_store = memory before calling _do_batch_insert()? Thanks, Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] performance regression when using insert or replace
I use insert or replace heavily. In debug mode, I set it to use temp file on disk so, I can watch the disk IO, in release mode I set it to temp file in memory. The danger is that if you ever do anything that requires a bunch of temp file, you can easily run out of RAM. Adding and deleting indexes for example while temp store is set to memory will run you out of address space on a larger DB and a 32 bit system (windows). In my testing, insert or replace uses temp store most of the time. Wednesday, January 23, 2013, 9:16:42 AM, you wrote: DK On 01/23/2013 04:20 AM, Heiles, Katrina wrote: I am in the process of updating from 3.6.4 to the latest and greatest version (finally :-) ). While running performance unit tests, i found a major regression (10K/sec vs 30k/sec) on a common use case for us. The regression occurs when using insert or replace. I narrowed this down as follows: - the regression was introduced in version 3.6.19, and has been there ever since. - If i comment out line 74643 in sqlite3.c of version 3.6.19, performance is good again. QUESTION: I realize that commenting this line out is not the correct solution. Is there any way to improve performance of insert or replace? DK Out of interest, is performance improved any with 3.7.15.2 if you DK execute PRAGMA temp_store = memory before calling _do_batch_insert()? DK Thanks, DK Dan. I am including 2 other attachments: 1. test_batch_insert.c - this is the unit test i created to reproduce the issue. It output 2. notes.txt - this contains the performance output from running test_batch_insert.c on 3 different versions of sqlite - 3.6.18 (fast) - 3.6.19 (slow) - 3.6.19 with line 74643 commented out (fast) Below are detailed (but cryptic) notes on what causes the issue. Please let me know if you need more info. system: Linux 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:59:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux NOTE: all line number references are based on 3_6_19 version of sqlite3.c, where the regression was first introduced. ROOT CAUSE: If I remove one line of code from sqlite3.c (line 74643) that was added in this release, performance regression is resolved. --- 74640 default: { 74641 Trigger *pTrigger = 0; 74642 assert( onError==OE_Replace ); 74643 sqlite3MultiWrite(pParse);--- THIS IS THE GUILTY LINE 74644 if( pParse-db-flagsSQLITE_RecTriggers ){ 74645 pTrigger = sqlite3TriggersExist(pParse, pTab, TK_DELETE, 0, 0); 74646 } 74647 sqlite3GenerateRowDelete( 74648 pParse, pTab, baseCur, regR, 0, pTrigger, OE_Replace 74649 ); 74650 seenReplace = 1; 74651 break; 74652 } --- DETAILS OF WHY THIS LINE CAUSES THE REGRESSION: The effect of including line 74643 when running a batch insert: HIGH LEVEL: in pager_write function, wesometimes end up going into this if statement, which creates a subjournal, causing creation of etilqs_xxx file. NOTE: using the attached test_batch_insert.c file with max_entries set to 4, this results in creation of 15 etilqs_xxx and 262122 MORE writes!! --- 35536 /* If the statement journal is open and the page is not in it, 35537 ** then write the current page to the statement journal. Note that 35538 ** the statement journal format differs from the standard journal format 35539 ** in that it omits the checksums and the header. 35540 */ 35541 if( subjRequiresPage(pPg) ){ 35542 rc = subjournalPage(pPg); we create a subjournal 35543 } 35544 } --- LOWER LEVEL DETAILS OF WHY subJRequiresPage IS CALLED WHEN LINE 74643 IS IN THE CODE when running sample test_batch_insert.c file (included in this bug) The reason that subjRequiresPage() returns true is that when sqlite3GenerateConstraintChecks is called from sqlite3Insert, this calls the guilty line,(74643)sqlite3MultiWrite(pParse); - This sets pParse-isMultiWrite to 1 - then sqlite3FinishCoding calls sqlite3VdbeMakeReady with pParse-isMultiWrite =1 causing 3rd param of call to be TRUE. - This causes Vdbe.usesStmtJournal to be set to TRUE in sqliteVdbeMakeReady - Then sqlite3BtreeBeginStmt is called from sqlite3VdbeExec (case OP_Transaction:) here, p-iStatement is set to 1 because p-usesStmtJournal is 1
Re: [sqlite] SQL query
I have a database with many million rows with in it each representing a file. There are many duplicate files in the database and all files are hashed. The files are sub categorised into a number of sets, numbered 0 to 10 for example. Files do not need to be in every set. I need to select all files that are in any set other than 0 that are not duplicated/present in set 0 So a sample database might contain columns name set hash with sample data file10ABCD file11ABCD file13EF01 file20BCE2 file22BCE2 file35EE34 file40EE22 My query would return file13EF01 file35EE34 select * from files where set 0 and not exists (select * from files a where hash=files.hash and set=0); --- () ascii ribbon campaign against html e-mail /\ www.asciiribbon.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SELECT and UPDATE in single query
I have a table like this: CREATE TABLE queue ( key TEXT NOT NULL PRIMARY KEY UNIQUE, status INTEGER ); CREATE INDEX IF NOT EXISTS keys ON queue (key); Your index is redundant. There is already a unique index on key since it is a primary key. It should probably be: create unique index if not exists keys on queue(status, key); And then I process it like this, N keys at a time: SELECT key FROM queue WHERE status=0 LIMIT N; BEGIN TRANSACTION; for key in keys: UPDATE queue SET status=1 WHERE key=key; END TRANSACTION; How can this SELECT and UPDATE be combined more efficiently? update queue set status=1 where key in (select key from queue where status=0 limit N); --- () ascii ribbon campaign against html e-mail /\ www.asciiribbon.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SELECT and UPDATE in single query
On 1/23/2013 11:22 PM, Richard Baron Penman wrote: And then I process it like this, N keys at a time: SELECT key FROM queue WHERE status=0 LIMIT N; BEGIN TRANSACTION; for key in keys: UPDATE queue SET status=1 WHERE key=key; END TRANSACTION; How can this SELECT and UPDATE be combined more efficiently? Something like this perhaps: update queue set status = 1 where status = 0 and key (select key from queue where status=0 order by key limit 1 offset N); -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SELECT and UPDATE in single query
On Wed, 23 Jan 2013 21:32:20 -0700 Keith Medcalf kmedc...@dessus.com wrote: And then I process it like this, N keys at a time: SELECT key FROM queue WHERE status=0 LIMIT N; BEGIN TRANSACTION; for key in keys: UPDATE queue SET status=1 WHERE key=key; END TRANSACTION; How can this SELECT and UPDATE be combined more efficiently? update queue set status=1 where key in (select key from queue where status=0 limit N); Why process only N at a time, Richard? If you remove that requirement the query is much simpler, because key is unique: update queue set status=1 where status = 0; Besides being nonstandard, the results using limit N are nondeterministic. You don't know which rows are updated, only that (up to) N are. HTH. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SELECT and UPDATE in single query
Thanks for tip about the redundant index. How to find which keys have been updated from this query? On Thu, Jan 24, 2013 at 3:32 PM, Keith Medcalf kmedc...@dessus.com wrote: I have a table like this: CREATE TABLE queue ( key TEXT NOT NULL PRIMARY KEY UNIQUE, status INTEGER ); CREATE INDEX IF NOT EXISTS keys ON queue (key); Your index is redundant. There is already a unique index on key since it is a primary key. It should probably be: create unique index if not exists keys on queue(status, key); And then I process it like this, N keys at a time: SELECT key FROM queue WHERE status=0 LIMIT N; BEGIN TRANSACTION; for key in keys: UPDATE queue SET status=1 WHERE key=key; END TRANSACTION; How can this SELECT and UPDATE be combined more efficiently? update queue set status=1 where key in (select key from queue where status=0 limit N); --- () ascii ribbon campaign against html e-mail /\ www.asciiribbon.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SELECT and UPDATE in single query
Why process only N at a time, Richard? There are a number of workers who request unprocessed jobs from the queue. But the queue is too big to hold in memory all at once. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users