Re: [sqlite] Optimizing insert or replace speed

2009-07-15 Thread Adam DeVita
Also, very good.


No index on Delta File:
3 seconds.

Index on SN:
4 seconds.

Index on MasterList (write_out_ok, MFGID, TypeID, SN);
4 seconds.

Time is to the nearest second in my test program, so I can't distinguish
between the two.

In summary:

/*FAST */
insert or replace into main.masterlist select d.*  from delta.masterlist d
left outer join main.masterlist M on d.sn = M.sn
   where d.write_out_ok=0 and d.record_updatetime >=
ifnull(M.record_updatetime, '')

/* just as FAST*/
insert or replace into main.masterlist select * from delta.masterlist d
where d.write_out_ok=0 and
not exists (select 1 from main.masterlist M  where M.sn = d.sn and
M.record_updatetime > d.record_updatetime);

/* very SLOW*/
insert or replace into main.masterlist select * from delta.masterlist d
where d.write_out_ok=0 and
d.sn not in(select M.sn from main.masterlist M where M.record_updatetime >
d.record_updatetime)



On Wed, Jul 15, 2009 at 7:33 AM, Igor Tandetnik  wrote:

> Adam DeVita wrote:
> > I've identified the following query as a bottle neck in a utility I've
> > written.
> >
> > insert or replace into main.masterlist select * from delta.masterlist
> > d where d.write_out_ok=0 and
> > d.sn not in(select M.sn from main.masterlist M where
> > M.record_updatetime > d.record_updatetime)
>
> Try this:
>
> insert or replace into main.masterlist
> select * from delta.masterlist d
> where d.write_out_ok=0 and
> not exists (select 1 from main.masterlist M
>  where M.sn = d.sn and M.record_updatetime > d.record_updatetime);
>
> It appears that your query doesn't use an index on M(sn), while mine
> does.
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
VerifEye Technologies Inc.
905-948-0015x245
7100 Warden Ave, Unit 3
Markham ON, L3R 8B5
Canada
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing insert or replace speed

2009-07-15 Thread Igor Tandetnik
Adam DeVita wrote:
> I've identified the following query as a bottle neck in a utility I've
> written.
>
> insert or replace into main.masterlist select * from delta.masterlist
> d where d.write_out_ok=0 and
> d.sn not in(select M.sn from main.masterlist M where
> M.record_updatetime > d.record_updatetime)

Try this:

insert or replace into main.masterlist
select * from delta.masterlist d
where d.write_out_ok=0 and
not exists (select 1 from main.masterlist M
  where M.sn = d.sn and M.record_updatetime > d.record_updatetime);

It appears that your query doesn't use an index on M(sn), while mine
does.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing insert or replace speed

2009-07-14 Thread Adam DeVita
Awesome, brilliant, and decisive!

New times:

No index on Delta File:
3 seconds.

Index on SN:
4 seconds.

Index on MasterList (write_out_ok, MFGID, TypeID, SN);
4 seconds.

The speedup of the one query is greater than this because the above time
figures include
1) A query to see if there are any records in deltas with write_out_ok=0 (if
so, don't execute other queries)
2) A query to update write_out_ok =1 in delta where the record in main
exists and is newer.
3) A query to update write_out_ok =1 in main where the record came from
delta;

1,2, & 3 were negligible compared to the un-optimized
insert or replace into TargetD select * from sourceD sa where
sa.write_out_ok=0 and sa.sn not in (select ta.sn from TargetD ta where
ta.record_updatetime > sa.record_updatetime)   ;

Now, it appears that the time is comparable, so the actual time is in the
order of 2 seconds faster than listed above.  Dropping the sequence time
from 7 min 22s down to 0 minutes 4 seconds is tremendous.

thank you.

Adam


On Tue, Jul 14, 2009 at 2:04 PM, Pavel Ivanov  wrote:

> I believe your choice of query is not good enough. Try this one:
>
> insert or replace into main.masterlist
>   select d.*
> from delta.masterlist d left outer join main.masterlist M on d.sn =
> M.sn
>where d.write_out_ok=0
>   and d.record_updatetime >= ifnull(M.record_updatetime, '')
>
>
> Pavel
>
> On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote:
> > Good day,
> >
> > Could someone explain where I'm going wrong with this?
> >
> > I've identified the following query as a bottle neck in a utility I've
> > written.
> >
> > insert or replace into main.masterlist select * from delta.masterlist d
> > where d.write_out_ok=0 and
> > d.sn not in(select M.sn from main.masterlist M where M.record_updatetime
> >
> > d.record_updatetime)
> >
> > The purpose is to import a data from a  remotely created change file,
> with
> > only new/newer records.  (Due to the fact that the subject of the data is
> > shipping / receiving product serial numbers and that data moves faster
> than
> > product there is no way independent nodes can create a change to a record
> at
> > the same time.  Also, deleting is not allowed.)
> >
> > The change file is attached as 'delta'
> >
> > The structure of masterlist in the main database is:
> > sqlite> .schema masterlist
> > CREATE TABLE MasterList (SN int primary key not null, TypeID int default
> 0
> > references Product_type_dictionary(TypeID)  , ConstructionDate text,
> MFGID
> > int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date
> text,
> > record_updatetime text default "2000.00.00.00", write_out_ok int default
> 0);
> >
> > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID,
> SN);
> >
> > main.masterlist has 36,000 records
> > deltas.masterlist has 9,000 records
> >
> > Notes about fields:
> > write_out_ok is a flag indicating that the record has been imported.
> States
> > are 1 or 0.
> > MFGID is a manufacturer, about 4 different ints can be used.
> > TypeID is a product Type, about 7 different types,
> >
> > The index is ordered by cardinality, and all int.
> > record_updatetime  is the modified date & time GMT (UTC),
> > .mm.dd.hh.MM.ss
> >
> >
> > 
> > Experimenting with indexes on the delta file with
> > No indexes:
> > 7 min 22s
> >
> > CREATE INDEX IDX_MasterList on MasterList ( SN);
> > 14min 52s
> >
> > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID,
> SN);
> > 20 min, 07s
> >
> > ---
> > Dropped indexes on both main and delta.
> > ~20 min.
> > -
> >
> > Is the real problem a poor choice of index in main?
> >
> >
> > regards,
> > Adam
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
VerifEye Technologies Inc.
905-948-0015x245
7100 Warden Ave, Unit 3
Markham ON, L3R 8B5
Canada
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing insert or replace speed

2009-07-14 Thread Pavel Ivanov
I believe your choice of query is not good enough. Try this one:

insert or replace into main.masterlist
  select d.*
 from delta.masterlist d left outer join main.masterlist M on d.sn = M.sn
   where d.write_out_ok=0
  and d.record_updatetime >= ifnull(M.record_updatetime, '')


Pavel

On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote:
> Good day,
>
> Could someone explain where I'm going wrong with this?
>
> I've identified the following query as a bottle neck in a utility I've
> written.
>
> insert or replace into main.masterlist select * from delta.masterlist d
> where d.write_out_ok=0 and
> d.sn not in(select M.sn from main.masterlist M where M.record_updatetime >
> d.record_updatetime)
>
> The purpose is to import a data from a  remotely created change file, with
> only new/newer records.  (Due to the fact that the subject of the data is
> shipping / receiving product serial numbers and that data moves faster than
> product there is no way independent nodes can create a change to a record at
> the same time.  Also, deleting is not allowed.)
>
> The change file is attached as 'delta'
>
> The structure of masterlist in the main database is:
> sqlite> .schema masterlist
> CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0
> references Product_type_dictionary(TypeID)  , ConstructionDate text, MFGID
> int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text,
> record_updatetime text default "2000.00.00.00", write_out_ok int default 0);
>
> CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);
>
> main.masterlist has 36,000 records
> deltas.masterlist has 9,000 records
>
> Notes about fields:
> write_out_ok is a flag indicating that the record has been imported. States
> are 1 or 0.
> MFGID is a manufacturer, about 4 different ints can be used.
> TypeID is a product Type, about 7 different types,
>
> The index is ordered by cardinality, and all int.
> record_updatetime  is the modified date & time GMT (UTC),
> .mm.dd.hh.MM.ss
>
>
> 
> Experimenting with indexes on the delta file with
> No indexes:
> 7 min 22s
>
> CREATE INDEX IDX_MasterList on MasterList ( SN);
> 14min 52s
>
> CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);
> 20 min, 07s
>
> ---
> Dropped indexes on both main and delta.
> ~20 min.
> -
>
> Is the real problem a poor choice of index in main?
>
>
> regards,
> Adam
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Optimizing insert or replace speed

2009-07-14 Thread Adam DeVita
Good day,

Could someone explain where I'm going wrong with this?

I've identified the following query as a bottle neck in a utility I've
written.

insert or replace into main.masterlist select * from delta.masterlist d
where d.write_out_ok=0 and
d.sn not in(select M.sn from main.masterlist M where M.record_updatetime >
d.record_updatetime)

The purpose is to import a data from a  remotely created change file, with
only new/newer records.  (Due to the fact that the subject of the data is
shipping / receiving product serial numbers and that data moves faster than
product there is no way independent nodes can create a change to a record at
the same time.  Also, deleting is not allowed.)

The change file is attached as 'delta'

The structure of masterlist in the main database is:
sqlite> .schema masterlist
CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0
references Product_type_dictionary(TypeID)  , ConstructionDate text, MFGID
int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text,
record_updatetime text default "2000.00.00.00", write_out_ok int default 0);

CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);

main.masterlist has 36,000 records
deltas.masterlist has 9,000 records

Notes about fields:
write_out_ok is a flag indicating that the record has been imported. States
are 1 or 0.
MFGID is a manufacturer, about 4 different ints can be used.
TypeID is a product Type, about 7 different types,

The index is ordered by cardinality, and all int.
record_updatetime  is the modified date & time GMT (UTC),
.mm.dd.hh.MM.ss



Experimenting with indexes on the delta file with
No indexes:
7 min 22s

CREATE INDEX IDX_MasterList on MasterList ( SN);
14min 52s

CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);
20 min, 07s

---
Dropped indexes on both main and delta.
~20 min.
-

Is the real problem a poor choice of index in main?


regards,
Adam
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users