Re: [sqlite] range enclosing a number
Thank you everybody for the incredible help. Maybe is better if I explain more the problem: I need to do piecewise linear interpolations between two very large sets of numbers. The interpolation is the reason I was using <= and>=). One set represents clock ticks (46 bits integers) and the other real times (floating point numbers). The table has two columns which are indexed and the elements are unique. The interpolation would have to be done either way: from ticks to times or from times to ticks. To perform the interpolation I need that select returns both columns. Adding the second column as in:select max(ticks),time from time_pair where ticks <= ?; seems to work, although I don't understand the GROUP BY comments. I have thought of edge conditions (there I need to do extrapolation) and planned either to retrieve the max and min beforehand and use them to compare with the current argument, or to do always something like: select * from time_pair where ticks <= ? order by ticks desc limit 2; i.e. try to retrieve two rows before (and similarly two after). Since the max and min cannot be reused because the database is closed between interpolation operations and might be altered, it's not clear which is the better approach, so I will have to test. The idea to try on both 3.4 and 3.6 is very good since since this is a mixed environment and ultimately it may run against 3.4. Thank you again for your time and efforts,Bogdan > Date: Mon, 13 Jul 2009 23:35:22 -0500 > From: j...@kreibi.ch > To: paiva...@gmail.com > CC: sqlite-users@sqlite.org > Subject: Re: [sqlite] range enclosing a number > > On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall: >> Jay, you're pretty much mistaken: >> >>> I'm pretty sure you don't want to do it this way. What this does is >>> gather every row that meets the WHERE condition and then runs a max() >>> or min() aggregation function across all of those rows. That means >>> that even if the column "number" has an index on it, between these >>> two statements you're going to end up scanning the whole table. >> >> All database engines optimize queries which ask for min/max on indexed >> column with condition including only or = on this very column. >> And SQLite is among these too: > > > Not "all." This type of optimization is actually a fairly new > addition to SQLite (considering the product lifetime) and first > appeared in 3.5.5, which was released in early 2008. > > > And I'd still go with ORDER/LIMIT. Here's why: > > > For my tests I just used the default build under the current version > of Mac OS X, which is a bit old (3.4). Under that build, the > ORDER/LIMIT is clearly faster, as this is before the optimization > existed: > > (using the same test set you did) > > Full scan, 3.4: > - > real0m5.99s > user0m4.73s > sys 0m0.84s > > Using ORDER/LIMIT, 3.4: > - > real0m0.00s > user0m0.01s > sys 0m0.00s > > Using min/max, 3.4: > - > real0m5.97s > user0m2.94s > sys 0m0.38s > > In this case, it is clear that min/max are NOT integrated into the > optimizer, and requires half a table scan, just as I stated. > > I also have a build of the current 3.6.16 around, and in that case, > the numbers are better: > > Using ORDER/LIMIT, 3.6.16 > - > real0m0.12s > user0m0.01s > sys 0m0.03s > > Using min/max, 3.6.16 > - > real0m0.04s > user0m0.01s > sys 0m0.03s > > This clearly shows that the optimization does exist, and that for > this very basic case my assumptions were incorrect. > > With the current 3.6.16 build, using min/max seems a tad faster-- but > only in "real" time. In terms of user/sys times, the results shown > here (and you're own numbers, which were 0.043/0.001/0.005 and > 0.005/0.001/0.001) were pretty typical (i.e. very very close). > That might just be an I/O fluke. We're getting small enough that > to really say anything definite requires better profiling. So > there does appear to be a difference, but it is pretty small and > unclear where it is coming from. > > However, I'd point out that using ORDER/LIMIT under 3.4 is the > fastest of all. This isn't just a quirk of one run, either. > I ran these several times and the 3.4 ORDER/LIMIT was always fastest. > We're still playing with number to small to really trust, but it > seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4, > it would likely be the best choice of all. > > > So you've sold me that the current version of SQLite clearly does > have the min/max optimization and doesn't require a table scan. It > also appears to be slightly faster, but not by a big enough gap to > clearly consider it a better choice on that alone. > > > > Personally, I'd still go with ORDER/LIMIT. Wi
[sqlite] FW: range enclosing a number
Sorry if you receive twice this email: Thank you everybody for the incredible help. Maybe is better if I explain more the problem: I need to do piecewise linear interpolations between two very large sets of numbers. The interpolation is the reason I was using <= and>=. One set represents clock ticks (46 bits integers) and the other real times (floating point numbers). The table has two columns which are indexed and the elements are unique. The interpolation would have to be done either way: from ticks to times or from times to ticks. To perform the interpolation I need that select returns both columns. Adding the second column as in: select max(ticks),time from time_pair where ticks <= ?; seems to work, although I don't understand the GROUP BY comments. I have thought of edge conditions (there I need to do extrapolation) and planned either to retrieve the max and min beforehand and use them to compare with the current argument, or to do always something like: select * from time_pair where ticks <= ? order by ticks desc limit 2; i.e. try to retrieve two rows before (and similarly two after). Since the max and min cannot be reused because the database is closed between interpolation operations and might be altered, it's not clear which is the better approach, so I will have to test. The idea to try on both 3.4 and 3.6 is very good since since this is a mixed environment and ultimately it may run against 3.4. Thank you again for your time and efforts, Bogdan >> Date: Mon, 13 Jul 2009 23:35:22 -0500 >> From: j...@kreibi.ch >> To: paiva...@gmail.com >> CC: sqlite-users@sqlite.org >> Subject: Re: [sqlite] range enclosing a number >> >> On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall: >>> Jay, you're pretty much mistaken: >>> I'm pretty sure you don't want to do it this way. What this does is gather every row that meets the WHERE condition and then runs a max() or min() aggregation function across all of those rows. That means that even if the column "number" has an index on it, between these two statements you're going to end up scanning the whole table. >>> >>> All database engines optimize queries which ask for min/max on indexed >>> column with condition including only or = on this very column. >>> And SQLite is among these too: >> >> >> Not "all." This type of optimization is actually a fairly new >> addition to SQLite (considering the product lifetime) and first >> appeared in 3.5.5, which was released in early 2008. >> >> >> And I'd still go with ORDER/LIMIT. Here's why: >> >> >> For my tests I just used the default build under the current version >> of Mac OS X, which is a bit old (3.4). Under that build, the >> ORDER/LIMIT is clearly faster, as this is before the optimization >> existed: >> >> (using the same test set you did) >> >> Full scan, 3.4: >> - >> real 0m5.99s >> user 0m4.73s >> sys0m0.84s >> >> Using ORDER/LIMIT, 3.4: >> - >> real 0m0.00s >> user 0m0.01s >> sys0m0.00s >> >> Using min/max, 3.4: >> - >> real 0m5.97s >> user 0m2.94s >> sys0m0.38s >> >> In this case, it is clear that min/max are NOT integrated into the >> optimizer, and requires half a table scan, just as I stated. >> >> I also have a build of the current 3.6.16 around, and in that case, >> the numbers are better: >> >> Using ORDER/LIMIT, 3.6.16 >> - >> real 0m0.12s >> user 0m0.01s >> sys0m0.03s >> >> Using min/max, 3.6.16 >> - >> real 0m0.04s >> user 0m0.01s >> sys0m0.03s >> >> This clearly shows that the optimization does exist, and that for >> this very basic case my assumptions were incorrect. >> >> With the current 3.6.16 build, using min/max seems a tad faster-- but >> only in "real" time. In terms of user/sys times, the results shown >> here (and you're own numbers, which were 0.043/0.001/0.005 and >> 0.005/0.001/0.001) were pretty typical (i.e. very very close). >> That might just be an I/O fluke. We're getting small enough that >> to really say anything definite requires better profiling. So >> there does appear to be a difference, but it is pretty small and >> unclear where it is coming from. >> >> However, I'd point out that using ORDER/LIMIT under 3.4 is the >> fastest of all. This isn't just a quirk of one run, either. >> I ran these several times and the 3.4 ORDER/LIMIT was always fastest. >> We're still playing with number to small to really trust, but it >> seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4, >> it would likely be the best choice of all. >> >> >> So you've sold me that the current version of SQLite clearly does >> have the min/max optimization and doesn't require a table scan. It >> a
Re: [sqlite] range enclosing a number
Not to continue argument with Jay but just to express my opinion in comparison: > The ORDER/LIMIT approach is much more resilient to changes, however, > and should more or less behave the same no matter what you do to the > rest of the query. Seriously, I don't believe this. There's no way to change the query so that min/max will scan all table and order/limit will take just value from index. They both will take the same approach in all cases (except of course old SQLite which I fortunately didn't work with :) ). > The ORDER/LIMIT approach is, arguable, less graceful, but > it is also (IMHO) a lot easier to break down into logical blocks that > a newbie can follow and understand both the design and intent-- even > if the query is a bit strung out. This is where my opinion is exactly opposite: query with min/max is more readable by newbie just getting introduced to the code - it clearly states what requester is trying to do: "get minimum among records with this condition". Order/limit approach requires more thinking before you clearly understand what was the intention behind the query - I've struggled myself trying to understand which sign (< or >) should go with which order by (desc or asc). But... Here are opinions and they have already become an off-topic because the OP's case looks like more complicated and couldn't be solved by simple min/max. And in response to Bogdan's letter: > Adding the second column as in:select max(ticks),time from time_pair where > ticks <= ?; > seems to work, although I don't understand the GROUP BY comments. The essence of "GROUP BY comments" is that if you write query this way then what is returned in time column is implementation dependent (not all sql engines even support this syntax) and you better think that in this case value in time column is completely arbitrary and unrelated to actual data in the table. So if you want to return both columns you should use either order/limit approach or already mentioned "select * ... where ticks = (select max(ticks) ...)" approach. Pavel On Tue, Jul 14, 2009 at 12:35 AM, Jay A. Kreibich wrote: > On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall: >> Jay, you're pretty much mistaken: >> >> > I'm pretty sure you don't want to do it this way. What this does is >> > gather every row that meets the WHERE condition and then runs a max() >> > or min() aggregation function across all of those rows. That means >> > that even if the column "number" has an index on it, between these >> > two statements you're going to end up scanning the whole table. >> >> All database engines optimize queries which ask for min/max on indexed >> column with condition including only <, > or = on this very column. >> And SQLite is among these too: > > > Not "all." This type of optimization is actually a fairly new > addition to SQLite (considering the product lifetime) and first > appeared in 3.5.5, which was released in early 2008. > > > And I'd still go with ORDER/LIMIT. Here's why: > > > For my tests I just used the default build under the current version > of Mac OS X, which is a bit old (3.4). Under that build, the > ORDER/LIMIT is clearly faster, as this is before the optimization > existed: > > (using the same test set you did) > > Full scan, 3.4: > - > real 0m5.99s > user 0m4.73s > sys 0m0.84s > > Using ORDER/LIMIT, 3.4: > - > real 0m0.00s > user 0m0.01s > sys 0m0.00s > > Using min/max, 3.4: > - > real 0m5.97s > user 0m2.94s > sys 0m0.38s > > In this case, it is clear that min/max are NOT integrated into the > optimizer, and requires half a table scan, just as I stated. > > I also have a build of the current 3.6.16 around, and in that case, > the numbers are better: > > Using ORDER/LIMIT, 3.6.16 > - > real 0m0.12s > user 0m0.01s > sys 0m0.03s > > Using min/max, 3.6.16 > - > real 0m0.04s > user 0m0.01s > sys 0m0.03s > > This clearly shows that the optimization does exist, and that for > this very basic case my assumptions were incorrect. > > With the current 3.6.16 build, using min/max seems a tad faster-- but > only in "real" time. In terms of user/sys times, the results shown > here (and you're own numbers, which were 0.043/0.001/0.005 and > 0.005/0.001/0.001) were pretty typical (i.e. very very close). > That might just be an I/O fluke. We're getting small enough that > to really say anything definite requires better profiling. So > there does appear to be a difference, but it is pretty small and > unclear where it is coming from. > > However, I'd point out that using ORDER/LIMIT under 3.4 is the > fastest of all. This isn't just a quirk of one run, either. > I ran these several times and the 3.4 ORDER/LIMIT was always fastest. > We're still playing with number to small to really trust, but it > seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4, > i
[sqlite] how to access tables in multiple sqlite databases
Hi guys, I have multiple sqlite databases in hand, and I would like to perform the following several tasks: . two tables with the same schema are in two dbs, and I want to apply a sql query to the two tables efficiently (merge them into one table? Merge cost is also considered as the total cost. Is there a way to logically combine these two tables into one? Other mechanisms?) . two table with different schemas are in two dbs, and I want to apply a join query to the two tables efficiently (if I can logically have them in one db, that will be better.) Thanks, Zhenyu ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how to access tables in multiple sqlite databases
Hi, see http://www.sqlite.org/lang_attach.html you attach a second database, giving it a name, and prepend this name to the table name. Martin Zhenyu Guo schrieb: > Hi guys, > > I have multiple sqlite databases in hand, and I would like to perform the > following several tasks: > > . two tables with the same schema are in two dbs, and I want to apply a sql > query to the two tables efficiently (merge them into one table? Merge cost is > also considered as the total cost. Is there a way to logically combine these > two tables into one? Other mechanisms?) > > . two table with different schemas are in two dbs, and I want to apply a join > query to the two tables efficiently (if I can logically have them in one db, > that will be better.) > > Thanks, > Zhenyu > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reading year from YYYY-MM-DD column?
On Tue, 30 Jun 2009 07:54:13 -0500, P Kishor wrote: >sqlite> SELECT strftime('%Y', '1999-12-03') AS year; Thanks guys. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Announce of the new "Versioning" extension
Hello! This may be used for table versioning and replication. Source code is available here http://mobigroup.ru/files/sqlite-ext/ You can get from the debian repository the SQLite build with some extra extensions: deb http://mobigroup.ru/debian/ lenny main contrib non-free deb-src http://mobigroup.ru/debian/ lenny main contrib non-free =Test script== CREATE TABLE sessions ( key text not null, value text not null ); insert into sessions (key,value) values ('test key 0','test value 0'); select versioning_table('sessions'); .schema select * from _versioning_sessions; insert into sessions (key,value) values ('test key 1','test value 1'); insert into sessions (key,value) values ('test key 2','test value 2'); select * from _versioning_sessions; delete from sessions; select * from _versioning_sessions; select unversioning_table('sessions'); .schema ==Test script results= sqlite> CREATE TABLE sessions ( ...> key text not null, ...> value text not null ...> ); sqlite> insert into sessions (key,value) values ('test key 0','test value 0'); sqlite> sqlite> select versioning_table('sessions'); sqlite> .schema CREATE TABLE _undo(sql TEXT, status TEXT); CREATE TABLE _versioning_sessions(key text,value text, _date REAL, _action TEXT, _rowid INTEGER); CREATE TABLE sessions ( key text not null, value text not null ); CREATE TRIGGER _versioning_sessions_d AFTER DELETE ON sessions BEGIN INSERT INTO _versioning_sessions (_date, _action, _rowid) values (julianday('now'), 'D', old.rowid);END; CREATE TRIGGER _versioning_sessions_i AFTER INSERT ON sessions BEGIN INSERT INTO _versioning_sessions SELECT *, julianday('now') as _date, 'I' as _action, new.rowid as _rowid FROM sessions WHERE rowid=new.rowid;END; CREATE TRIGGER _versioning_sessions_u AFTER UPDATE ON sessions BEGIN INSERT INTO _versioning_sessions SELECT *, julianday('now') as _date, 'U' as _action, new.rowid as _rowid FROM sessions WHERE rowid=new.rowid;END; sqlite> select * from _versioning_sessions; test key 0|test value 0|2455027.00753221|I|1 sqlite> sqlite> insert into sessions (key,value) values ('test key 1','test value 1'); sqlite> insert into sessions (key,value) values ('test key 2','test value 2'); sqlite> select * from _versioning_sessions; test key 0|test value 0|2455027.00753221|I|1 test key 1|test value 1|2455027.00753347|I|2 test key 2|test value 2|2455027.00753368|I|3 sqlite> sqlite> delete from sessions; sqlite> select * from _versioning_sessions; test key 0|test value 0|2455027.00753221|I|1 test key 1|test value 1|2455027.00753347|I|2 test key 2|test value 2|2455027.00753368|I|3 ||2455027.00753382|D|1 ||2455027.00753382|D|2 ||2455027.00753382|D|3 sqlite> sqlite> select unversioning_table('sessions'); sqlite> .schema CREATE TABLE _undo(sql TEXT, status TEXT); CREATE TABLE _versioning_sessions(key text,value text, _date REAL, _action TEXT, _rowid INTEGER); CREATE TABLE sessions ( key text not null, value text not null ); sqlite> select * from _versioning_sessions; test key 0|test value 0|2455027.00753221|I|1 test key 1|test value 1|2455027.00753347|I|2 test key 2|test value 2|2455027.00753368|I|3 ||2455027.00753382|D|1 ||2455027.00753382|D|2 ||2455027.00753382|D|3 == Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] New Book Available
Hello! On Thursday 09 July 2009 17:50:14 cmar...@unc.edu wrote: > What is the title? I am not finding a new book by van der > Laans book on Google or Amazon. As Rick wrote to me: "The book is finished and available through the Publisher Lulu.com. It will be available through Amazon and so on with 6 to 8 weeks. Amazon is a little slow." Now the book is available here: http://www.lulu.com/preview/paperback-book/the-sql-guide-to-sqlite/7251432 Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [Columns] Keeping internal + beautified names?
On Sat, 11 Jul 2009 09:38:27 -0700, Jim Dodgen wrote: >I would just use: > >SELECT id AS Identification FROM foobar Thanks. This is what I already use for displaying the results, but then I'm stuck when I need to perform INSERT/UPDATES because I need to get the actual columns names :-/ It'd be cool if each column in a table could have an internal name and an external name. Is there a better way than either using the very first row in each table or keeping a table for this purpose? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [Columns] Keeping internal + beautified names?
For overall performance and efficiency, I recommend you keep the "pretty" in the GUI where such things are traditionally implemented. Pick two of three: "COOL, Fast, Tight." Fred -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org]on Behalf Of Gilles Ganault Sent: Tuesday, July 14, 2009 11:02 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] [Columns] Keeping internal + beautified names? On Sat, 11 Jul 2009 09:38:27 -0700, Jim Dodgen wrote: >I would just use: > >SELECT id AS Identification FROM foobar Thanks. This is what I already use for displaying the results, but then I'm stuck when I need to perform INSERT/UPDATES because I need to get the actual columns names :-/ It'd be cool if each column in a table could have an internal name and an external name. Is there a better way than either using the very first row in each table or keeping a table for this purpose? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Optimizing insert or replace speed
Good day, Could someone explain where I'm going wrong with this? I've identified the following query as a bottle neck in a utility I've written. insert or replace into main.masterlist select * from delta.masterlist d where d.write_out_ok=0 and d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > d.record_updatetime) The purpose is to import a data from a remotely created change file, with only new/newer records. (Due to the fact that the subject of the data is shipping / receiving product serial numbers and that data moves faster than product there is no way independent nodes can create a change to a record at the same time. Also, deleting is not allowed.) The change file is attached as 'delta' The structure of masterlist in the main database is: sqlite> .schema masterlist CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0 references Product_type_dictionary(TypeID) , ConstructionDate text, MFGID int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text, record_updatetime text default "2000.00.00.00", write_out_ok int default 0); CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); main.masterlist has 36,000 records deltas.masterlist has 9,000 records Notes about fields: write_out_ok is a flag indicating that the record has been imported. States are 1 or 0. MFGID is a manufacturer, about 4 different ints can be used. TypeID is a product Type, about 7 different types, The index is ordered by cardinality, and all int. record_updatetime is the modified date & time GMT (UTC), .mm.dd.hh.MM.ss Experimenting with indexes on the delta file with No indexes: 7 min 22s CREATE INDEX IDX_MasterList on MasterList ( SN); 14min 52s CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); 20 min, 07s --- Dropped indexes on both main and delta. ~20 min. - Is the real problem a poor choice of index in main? regards, Adam ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [Columns] Keeping internal + beautified names?
"Gilles Ganault" schrieb im Newsbeitrag news:0sap559atknanmiv8g83pdj6f83e8ve...@4ax.com... > On Sat, 11 Jul 2009 09:38:27 -0700, Jim Dodgen > wrote: > >I would just use: > > > >SELECT id AS Identification FROM foobar > > Thanks. This is what I already use for displaying the results, but > then I'm stuck when I need to perform INSERT/UPDATES because I need to > get the actual columns names :-/ > > It'd be cool if each column in a table could have an internal name and > an external name. > > Is there a better way than either using the very first row in each > table or keeping a table for this purpose? There are SQLite-APIs for that purpose, which hand out the original columnname to you: sqlite3_column_origin_name or the original tablename (useful in Joins): sqlite3_column_table_name or the original database-name: sqlite3_column_database_name All based on the current statement-handle of your Select. You need a library-compile, with enabled: SQLITE_ENABLE_COLUMN_METADATA If that switch is "on", you can also make use of: sqlite3_table_column_metadata in addition, to retrieve even more infos about the current column in question (if it is a "NotNull"-column, or a Primary-Key-Column, etc.) You can do all that (looping over the appropriate column- count of your select) before entering the sqlite-step-loop - and return these additional "header-infos" in appropriate structures (together with your result-data-set). The overhead, to retrieve all these additional infos is not all that large - barely noticeable - the Recordset-Objects of my wrapper always work in that mode (delivering all of the data, but also these additional Field-infos) - and it does not affect the overall-performance - much more time is spent in the step-loop, which retrieves the real recorddata-content. Olaf ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing insert or replace speed
I believe your choice of query is not good enough. Try this one: insert or replace into main.masterlist select d.* from delta.masterlist d left outer join main.masterlist M on d.sn = M.sn where d.write_out_ok=0 and d.record_updatetime >= ifnull(M.record_updatetime, '') Pavel On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote: > Good day, > > Could someone explain where I'm going wrong with this? > > I've identified the following query as a bottle neck in a utility I've > written. > > insert or replace into main.masterlist select * from delta.masterlist d > where d.write_out_ok=0 and > d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > > d.record_updatetime) > > The purpose is to import a data from a remotely created change file, with > only new/newer records. (Due to the fact that the subject of the data is > shipping / receiving product serial numbers and that data moves faster than > product there is no way independent nodes can create a change to a record at > the same time. Also, deleting is not allowed.) > > The change file is attached as 'delta' > > The structure of masterlist in the main database is: > sqlite> .schema masterlist > CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0 > references Product_type_dictionary(TypeID) , ConstructionDate text, MFGID > int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text, > record_updatetime text default "2000.00.00.00", write_out_ok int default 0); > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); > > main.masterlist has 36,000 records > deltas.masterlist has 9,000 records > > Notes about fields: > write_out_ok is a flag indicating that the record has been imported. States > are 1 or 0. > MFGID is a manufacturer, about 4 different ints can be used. > TypeID is a product Type, about 7 different types, > > The index is ordered by cardinality, and all int. > record_updatetime is the modified date & time GMT (UTC), > .mm.dd.hh.MM.ss > > > > Experimenting with indexes on the delta file with > No indexes: > 7 min 22s > > CREATE INDEX IDX_MasterList on MasterList ( SN); > 14min 52s > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); > 20 min, 07s > > --- > Dropped indexes on both main and delta. > ~20 min. > - > > Is the real problem a poor choice of index in main? > > > regards, > Adam > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] range enclosing a number
On Tue, Jul 14, 2009 at 07:32:25AM -0400, Pavel Ivanov scratched on the wall: > Not to continue argument with Jay but just to express my opinion in > comparison: > > > The ORDER/LIMIT approach is much more resilient to changes, however, > > and should more or less behave the same no matter what you do to the > > rest of the query. > > Seriously, I don't believe this. There's no way to change the query so > that min/max will scan all table and order/limit will take just value > from index. They both will take the same approach in all cases (except > of course old SQLite which I fortunately didn't work with :) ). You clearly have experience with more robust databases, where I would expect this statement to be true. SQLite has a long history of keeping the engine as simple and small as possible, however. Part of that was a strong use of standard user-defined functions for the built-ins, like min/max/count. For years stuff like min/max used the same API as user-defined functions, and as such couldn't be touched by the optimizer. Even now, with the integration, it isn't that strong. Perhaps in other products, but not here. In specific, here's the comment in the code of the optimizer that attempts to short-cut a min/max: /* ** Analyze the SELECT statement passed as an argument to see if it ** is a min() or max() query. Return WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX if ** it is, or 0 otherwise. At present, a query is considered to be ** a min()/max() query if: ** ** 1. There is a single object in the FROM clause. ** ** 2. There is a single expression in the result set, and it is ** either min(x) or max(x), where x is a column reference. */ So, a simple JOIN (perhaps used as a filtering semijoin) will kill the min/max approach, but still leave the use of the index available for something like the ORDER/LIMIT approach. I'm not actually sure what SQLite will do if you add a JOIN, so I'm not saying I know the ORDER/LIMIT is faster, but this seems to say that min/max will go to a scan. Also, as pointed out, if x becomes an expression or computed column (even one that only accesses the indexed column), rather than a direct table reference, all bets are off with the SQLite optimizer. Sure, it is possible to write an optimizer that can deal with these situations, but that's not what we're dealing with here. We'd be having a different conversation if we were talking about a large-scale data-center product. I'll also admit I'm not sure these situations are extremely likely, but they're still valid examples, and I like to program defensively. > > The ORDER/LIMIT approach is, arguable, less graceful, but > > it is also (IMHO) a lot easier to break down into logical blocks that > > a newbie can follow and understand both the design and intent-- even > > if the query is a bit strung out. > > This is where my opinion is exactly opposite: query with min/max is > more readable by newbie just getting introduced to the code - it > clearly states what requester is trying to do: "get minimum among > records with this condition". OK, I'll buy that for someone that is just trying to figure out what it does without thinking about it very hard, the min/max does "read" better. But for someone trying to figure out how it works, and maybe change it, I still disagree. Most people think of the SELECT clause as a column-wise operator... that's where you define the vertical "shape" of your result. If you want to pick out a specific row, they think WHERE. Now clearly GROUP BY mixes this up, and allows you to alter the row configuration using the SELECT heading, but my own experience teaching people SQL has shown me that the two most confusing concepts for an SQL newbee are JOINs and GROUP BYs (see OP's response), and this depends on a GROUP BY that isn't even there (although at least it is a simple one!). Further, from a code maintenance standpoint, I'd be seriously afraid someone look at this, think "Oh, its obvious this picks out the _row_ with the max value," and do something like add a second column to the query (like the OP is needing to do). Only problem is that's not how that query works. Even worse, as we've already talked about, if you do add a column the query still runs, still works, still returns a value-- it just happens to be the wrong value. Unless you've got a strong understanding of how GROUP BY works (see above note about learning SQL) someone is going to spend hours trying to debug that one. From a set-wise, higher-thinking approach I agree that min/max is cleaner and more concise, and has the advantage of also running fast IF you have a good optimizer. But I also consider it a "clever" piece of code. ORDER/LIMIT is bigger, and has more bits that you need to piece together in your mind to see the big picture, but the individual bits are, in my mind, more isolated and simpler
Re: [sqlite] Optimizing insert or replace speed
Awesome, brilliant, and decisive! New times: No index on Delta File: 3 seconds. Index on SN: 4 seconds. Index on MasterList (write_out_ok, MFGID, TypeID, SN); 4 seconds. The speedup of the one query is greater than this because the above time figures include 1) A query to see if there are any records in deltas with write_out_ok=0 (if so, don't execute other queries) 2) A query to update write_out_ok =1 in delta where the record in main exists and is newer. 3) A query to update write_out_ok =1 in main where the record came from delta; 1,2, & 3 were negligible compared to the un-optimized insert or replace into TargetD select * from sourceD sa where sa.write_out_ok=0 and sa.sn not in (select ta.sn from TargetD ta where ta.record_updatetime > sa.record_updatetime) ; Now, it appears that the time is comparable, so the actual time is in the order of 2 seconds faster than listed above. Dropping the sequence time from 7 min 22s down to 0 minutes 4 seconds is tremendous. thank you. Adam On Tue, Jul 14, 2009 at 2:04 PM, Pavel Ivanov wrote: > I believe your choice of query is not good enough. Try this one: > > insert or replace into main.masterlist > select d.* > from delta.masterlist d left outer join main.masterlist M on d.sn = > M.sn >where d.write_out_ok=0 > and d.record_updatetime >= ifnull(M.record_updatetime, '') > > > Pavel > > On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote: > > Good day, > > > > Could someone explain where I'm going wrong with this? > > > > I've identified the following query as a bottle neck in a utility I've > > written. > > > > insert or replace into main.masterlist select * from delta.masterlist d > > where d.write_out_ok=0 and > > d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > > > > d.record_updatetime) > > > > The purpose is to import a data from a remotely created change file, > with > > only new/newer records. (Due to the fact that the subject of the data is > > shipping / receiving product serial numbers and that data moves faster > than > > product there is no way independent nodes can create a change to a record > at > > the same time. Also, deleting is not allowed.) > > > > The change file is attached as 'delta' > > > > The structure of masterlist in the main database is: > > sqlite> .schema masterlist > > CREATE TABLE MasterList (SN int primary key not null, TypeID int default > 0 > > references Product_type_dictionary(TypeID) , ConstructionDate text, > MFGID > > int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date > text, > > record_updatetime text default "2000.00.00.00", write_out_ok int default > 0); > > > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, > SN); > > > > main.masterlist has 36,000 records > > deltas.masterlist has 9,000 records > > > > Notes about fields: > > write_out_ok is a flag indicating that the record has been imported. > States > > are 1 or 0. > > MFGID is a manufacturer, about 4 different ints can be used. > > TypeID is a product Type, about 7 different types, > > > > The index is ordered by cardinality, and all int. > > record_updatetime is the modified date & time GMT (UTC), > > .mm.dd.hh.MM.ss > > > > > > > > Experimenting with indexes on the delta file with > > No indexes: > > 7 min 22s > > > > CREATE INDEX IDX_MasterList on MasterList ( SN); > > 14min 52s > > > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, > SN); > > 20 min, 07s > > > > --- > > Dropped indexes on both main and delta. > > ~20 min. > > - > > > > Is the real problem a poor choice of index in main? > > > > > > regards, > > Adam > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Database corrupt.
I have many cases of database corruption. May depend on what? Someone using sqlite in multi thread environment? -- Denis Gottardello Sintesi S.r.l. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Announce of the new "Versioning" extension
Hi Alexey, Thank you for this extension which could be quite interesting to many SQLite users. Is there any documentation on this available, possibly in your new book? I couldn't find any with the source at http://mobigroup.ru/files/sqlite-ext/ Also you have used the GNU License which means we cannot use this in any commercial applications. It also goes against the Public Domain license used by SQLite itself - see http://www.sqlite.org/copyright.html It would be great if this could be changed. Tuesday, July 14, 2009, 10:17:18 PM, you wrote: AP> Hello! AP> This may be used for table versioning and replication. AP> Source code is available here AP> http://mobigroup.ru/files/sqlite-ext/ AP> You can get from the debian repository the SQLite build with some extra extensions: AP> deb http://mobigroup.ru/debian/ lenny main contrib non-free AP> deb-src http://mobigroup.ru/debian/ lenny main contrib non-free AP> =Test script== AP> CREATE TABLE sessions ( AP> key text not null, AP> value text not null AP> ); AP> insert into sessions (key,value) values ('test key 0','test value 0'); AP> select versioning_table('sessions'); AP> .schema AP> select * from _versioning_sessions; AP> insert into sessions (key,value) values ('test key 1','test value 1'); AP> insert into sessions (key,value) values ('test key 2','test value 2'); AP> select * from _versioning_sessions; AP> delete from sessions; AP> select * from _versioning_sessions; AP> select unversioning_table('sessions'); AP> .schema AP> ==Test script results= sqlite>> CREATE TABLE sessions ( AP>...> key text not null, AP>...> value text not null AP>...> ); sqlite>> insert into sessions (key,value) values ('test key 0','test value 0'); sqlite>> sqlite>> select versioning_table('sessions'); sqlite>> .schema AP> CREATE TABLE _undo(sql TEXT, status TEXT); AP> CREATE TABLE _versioning_sessions(key text,value text, _date AP> REAL, _action TEXT, _rowid INTEGER); AP> CREATE TABLE sessions ( AP> key text not null, AP> value text not null AP> ); AP> CREATE TRIGGER _versioning_sessions_d AFTER DELETE ON sessions AP> BEGIN INSERT INTO _versioning_sessions (_date, _action, _rowid) AP> values (julianday('now'), 'D', old.rowid);END; AP> CREATE TRIGGER _versioning_sessions_i AFTER INSERT ON sessions AP> BEGIN INSERT INTO _versioning_sessions SELECT *, julianday('now') AP> as _date, 'I' as _action, new.rowid as _rowid FROM sessions WHERE rowid=new.rowid;END; AP> CREATE TRIGGER _versioning_sessions_u AFTER UPDATE ON sessions AP> BEGIN INSERT INTO _versioning_sessions SELECT *, julianday('now') AP> as _date, 'U' as _action, new.rowid as _rowid FROM sessions WHERE rowid=new.rowid;END; sqlite>> select * from _versioning_sessions; AP> test key 0|test value 0|2455027.00753221|I|1 sqlite>> sqlite>> insert into sessions (key,value) values ('test key 1','test value 1'); sqlite>> insert into sessions (key,value) values ('test key 2','test value 2'); sqlite>> select * from _versioning_sessions; AP> test key 0|test value 0|2455027.00753221|I|1 AP> test key 1|test value 1|2455027.00753347|I|2 AP> test key 2|test value 2|2455027.00753368|I|3 sqlite>> sqlite>> delete from sessions; sqlite>> select * from _versioning_sessions; AP> test key 0|test value 0|2455027.00753221|I|1 AP> test key 1|test value 1|2455027.00753347|I|2 AP> test key 2|test value 2|2455027.00753368|I|3 AP> ||2455027.00753382|D|1 AP> ||2455027.00753382|D|2 AP> ||2455027.00753382|D|3 sqlite>> sqlite>> select unversioning_table('sessions'); sqlite>> .schema AP> CREATE TABLE _undo(sql TEXT, status TEXT); AP> CREATE TABLE _versioning_sessions(key text,value text, _date AP> REAL, _action TEXT, _rowid INTEGER); AP> CREATE TABLE sessions ( AP> key text not null, AP> value text not null AP> ); sqlite>> select * from _versioning_sessions; AP> test key 0|test value 0|2455027.00753221|I|1 AP> test key 1|test value 1|2455027.00753347|I|2 AP> test key 2|test value 2|2455027.00753368|I|3 AP> ||2455027.00753382|D|1 AP> ||2455027.00753382|D|2 AP> ||2455027.00753382|D|3 AP> == AP> Best regards, Alexey Pechnikov. AP> http://pechnikov.tel/ AP> ___ AP> sqlite-users mailing list AP> sqlite-users@sqlite.org AP> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Best regards, Neville Franks, http://www.surfulater.com http://blog.surfulater.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users