Re: [sqlite] Optimizing insert or replace speed
Also, very good. No index on Delta File: 3 seconds. Index on SN: 4 seconds. Index on MasterList (write_out_ok, MFGID, TypeID, SN); 4 seconds. Time is to the nearest second in my test program, so I can't distinguish between the two. In summary: /*FAST */ insert or replace into main.masterlist select d.* from delta.masterlist d left outer join main.masterlist M on d.sn = M.sn where d.write_out_ok=0 and d.record_updatetime >= ifnull(M.record_updatetime, '') /* just as FAST*/ insert or replace into main.masterlist select * from delta.masterlist d where d.write_out_ok=0 and not exists (select 1 from main.masterlist M where M.sn = d.sn and M.record_updatetime > d.record_updatetime); /* very SLOW*/ insert or replace into main.masterlist select * from delta.masterlist d where d.write_out_ok=0 and d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > d.record_updatetime) On Wed, Jul 15, 2009 at 7:33 AM, Igor Tandetnikwrote: > Adam DeVita wrote: > > I've identified the following query as a bottle neck in a utility I've > > written. > > > > insert or replace into main.masterlist select * from delta.masterlist > > d where d.write_out_ok=0 and > > d.sn not in(select M.sn from main.masterlist M where > > M.record_updatetime > d.record_updatetime) > > Try this: > > insert or replace into main.masterlist > select * from delta.masterlist d > where d.write_out_ok=0 and > not exists (select 1 from main.masterlist M > where M.sn = d.sn and M.record_updatetime > d.record_updatetime); > > It appears that your query doesn't use an index on M(sn), while mine > does. > > Igor Tandetnik > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing insert or replace speed
Adam DeVita wrote: > I've identified the following query as a bottle neck in a utility I've > written. > > insert or replace into main.masterlist select * from delta.masterlist > d where d.write_out_ok=0 and > d.sn not in(select M.sn from main.masterlist M where > M.record_updatetime > d.record_updatetime) Try this: insert or replace into main.masterlist select * from delta.masterlist d where d.write_out_ok=0 and not exists (select 1 from main.masterlist M where M.sn = d.sn and M.record_updatetime > d.record_updatetime); It appears that your query doesn't use an index on M(sn), while mine does. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing insert or replace speed
Awesome, brilliant, and decisive! New times: No index on Delta File: 3 seconds. Index on SN: 4 seconds. Index on MasterList (write_out_ok, MFGID, TypeID, SN); 4 seconds. The speedup of the one query is greater than this because the above time figures include 1) A query to see if there are any records in deltas with write_out_ok=0 (if so, don't execute other queries) 2) A query to update write_out_ok =1 in delta where the record in main exists and is newer. 3) A query to update write_out_ok =1 in main where the record came from delta; 1,2, & 3 were negligible compared to the un-optimized insert or replace into TargetD select * from sourceD sa where sa.write_out_ok=0 and sa.sn not in (select ta.sn from TargetD ta where ta.record_updatetime > sa.record_updatetime) ; Now, it appears that the time is comparable, so the actual time is in the order of 2 seconds faster than listed above. Dropping the sequence time from 7 min 22s down to 0 minutes 4 seconds is tremendous. thank you. Adam On Tue, Jul 14, 2009 at 2:04 PM, Pavel Ivanovwrote: > I believe your choice of query is not good enough. Try this one: > > insert or replace into main.masterlist > select d.* > from delta.masterlist d left outer join main.masterlist M on d.sn = > M.sn >where d.write_out_ok=0 > and d.record_updatetime >= ifnull(M.record_updatetime, '') > > > Pavel > > On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote: > > Good day, > > > > Could someone explain where I'm going wrong with this? > > > > I've identified the following query as a bottle neck in a utility I've > > written. > > > > insert or replace into main.masterlist select * from delta.masterlist d > > where d.write_out_ok=0 and > > d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > > > > d.record_updatetime) > > > > The purpose is to import a data from a remotely created change file, > with > > only new/newer records. (Due to the fact that the subject of the data is > > shipping / receiving product serial numbers and that data moves faster > than > > product there is no way independent nodes can create a change to a record > at > > the same time. Also, deleting is not allowed.) > > > > The change file is attached as 'delta' > > > > The structure of masterlist in the main database is: > > sqlite> .schema masterlist > > CREATE TABLE MasterList (SN int primary key not null, TypeID int default > 0 > > references Product_type_dictionary(TypeID) , ConstructionDate text, > MFGID > > int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date > text, > > record_updatetime text default "2000.00.00.00", write_out_ok int default > 0); > > > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, > SN); > > > > main.masterlist has 36,000 records > > deltas.masterlist has 9,000 records > > > > Notes about fields: > > write_out_ok is a flag indicating that the record has been imported. > States > > are 1 or 0. > > MFGID is a manufacturer, about 4 different ints can be used. > > TypeID is a product Type, about 7 different types, > > > > The index is ordered by cardinality, and all int. > > record_updatetime is the modified date & time GMT (UTC), > > .mm.dd.hh.MM.ss > > > > > > > > Experimenting with indexes on the delta file with > > No indexes: > > 7 min 22s > > > > CREATE INDEX IDX_MasterList on MasterList ( SN); > > 14min 52s > > > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, > SN); > > 20 min, 07s > > > > --- > > Dropped indexes on both main and delta. > > ~20 min. > > - > > > > Is the real problem a poor choice of index in main? > > > > > > regards, > > Adam > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing insert or replace speed
I believe your choice of query is not good enough. Try this one: insert or replace into main.masterlist select d.* from delta.masterlist d left outer join main.masterlist M on d.sn = M.sn where d.write_out_ok=0 and d.record_updatetime >= ifnull(M.record_updatetime, '') Pavel On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVitawrote: > Good day, > > Could someone explain where I'm going wrong with this? > > I've identified the following query as a bottle neck in a utility I've > written. > > insert or replace into main.masterlist select * from delta.masterlist d > where d.write_out_ok=0 and > d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > > d.record_updatetime) > > The purpose is to import a data from a remotely created change file, with > only new/newer records. (Due to the fact that the subject of the data is > shipping / receiving product serial numbers and that data moves faster than > product there is no way independent nodes can create a change to a record at > the same time. Also, deleting is not allowed.) > > The change file is attached as 'delta' > > The structure of masterlist in the main database is: > sqlite> .schema masterlist > CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0 > references Product_type_dictionary(TypeID) , ConstructionDate text, MFGID > int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text, > record_updatetime text default "2000.00.00.00", write_out_ok int default 0); > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); > > main.masterlist has 36,000 records > deltas.masterlist has 9,000 records > > Notes about fields: > write_out_ok is a flag indicating that the record has been imported. States > are 1 or 0. > MFGID is a manufacturer, about 4 different ints can be used. > TypeID is a product Type, about 7 different types, > > The index is ordered by cardinality, and all int. > record_updatetime is the modified date & time GMT (UTC), > .mm.dd.hh.MM.ss > > > > Experimenting with indexes on the delta file with > No indexes: > 7 min 22s > > CREATE INDEX IDX_MasterList on MasterList ( SN); > 14min 52s > > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); > 20 min, 07s > > --- > Dropped indexes on both main and delta. > ~20 min. > - > > Is the real problem a poor choice of index in main? > > > regards, > Adam > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Optimizing insert or replace speed
Good day, Could someone explain where I'm going wrong with this? I've identified the following query as a bottle neck in a utility I've written. insert or replace into main.masterlist select * from delta.masterlist d where d.write_out_ok=0 and d.sn not in(select M.sn from main.masterlist M where M.record_updatetime > d.record_updatetime) The purpose is to import a data from a remotely created change file, with only new/newer records. (Due to the fact that the subject of the data is shipping / receiving product serial numbers and that data moves faster than product there is no way independent nodes can create a change to a record at the same time. Also, deleting is not allowed.) The change file is attached as 'delta' The structure of masterlist in the main database is: sqlite> .schema masterlist CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0 references Product_type_dictionary(TypeID) , ConstructionDate text, MFGID int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text, record_updatetime text default "2000.00.00.00", write_out_ok int default 0); CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); main.masterlist has 36,000 records deltas.masterlist has 9,000 records Notes about fields: write_out_ok is a flag indicating that the record has been imported. States are 1 or 0. MFGID is a manufacturer, about 4 different ints can be used. TypeID is a product Type, about 7 different types, The index is ordered by cardinality, and all int. record_updatetime is the modified date & time GMT (UTC), .mm.dd.hh.MM.ss Experimenting with indexes on the delta file with No indexes: 7 min 22s CREATE INDEX IDX_MasterList on MasterList ( SN); 14min 52s CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN); 20 min, 07s --- Dropped indexes on both main and delta. ~20 min. - Is the real problem a poor choice of index in main? regards, Adam ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users