Re: [sqlite] range enclosing a number

2009-07-14 Thread Bogdan Nicula



Thank you everybody for the incredible help.
Maybe is better if I explain more the problem:
I need to do piecewise linear interpolations between two very large sets of 
numbers. The interpolation is the reason I was using <= and>=). One set 
represents clock ticks (46 bits integers) and the other real times (floating 
point numbers). The table has two columns which are indexed and the elements 
are unique. The interpolation would have to be done either way: from ticks to 
times or from times to ticks. To perform the interpolation I need that select 
returns both columns.
Adding the second column as in:select max(ticks),time from time_pair where 
ticks <= ?;
seems to work, although I don't understand the GROUP BY comments.
I have thought of edge conditions (there I need to do extrapolation) and 
planned either to retrieve the max and min beforehand and use them to compare 
with the current argument, or to do always something like:
select * from time_pair where ticks <= ? order by ticks desc limit 2;

i.e. try to retrieve two rows before (and similarly two after). Since the max 
and min cannot be reused because the database is closed between interpolation 
operations and might be altered, it's not clear which is the better approach, 
so I will have to test.
The idea to try on both 3.4 and 3.6 is very good since since this is a mixed 
environment and ultimately it may run against 3.4.
Thank you again for your time and efforts,Bogdan

> Date: Mon, 13 Jul 2009 23:35:22 -0500
> From: j...@kreibi.ch
> To: paiva...@gmail.com
> CC: sqlite-users@sqlite.org
> Subject: Re: [sqlite] range enclosing a number
> 
> On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall:
>> Jay, you're pretty much mistaken:
>> 
>>>  I'm pretty sure you don't want to do it this way.  What this does is
>>>  gather every row that meets the WHERE condition and then runs a max()
>>>  or min() aggregation function across all of those rows.  That means
>>>  that even if the column "number" has an index on it, between these
>>>  two statements you're going to end up scanning the whole table.
>> 
>> All database engines optimize queries which ask for min/max on indexed
>> column with condition including only  or = on this very column.
>> And SQLite is among these too:
> 
> 
>   Not "all."  This type of optimization is actually a fairly new
>   addition to SQLite (considering the product lifetime) and first
>   appeared in 3.5.5, which was released in early 2008.
> 
> 
>   And I'd still go with ORDER/LIMIT.  Here's why:
> 
> 
>   For my tests I just used the default build under the current version
>   of Mac OS X, which is a bit old (3.4).  Under that build, the
>   ORDER/LIMIT is clearly faster, as this is before the optimization
>   existed:
> 
>   (using the same test set you did)
>   
>   Full scan, 3.4: 
>   -
>   real0m5.99s
>   user0m4.73s
>   sys 0m0.84s
> 
>   Using ORDER/LIMIT, 3.4:
>   -
>   real0m0.00s
>   user0m0.01s
>   sys 0m0.00s
> 
>   Using min/max, 3.4:
>   -
>   real0m5.97s
>   user0m2.94s
>   sys 0m0.38s
> 
>   In this case, it is clear that min/max are NOT integrated into the
>   optimizer, and requires half a table scan, just as I stated.
> 
>   I also have a build of the current 3.6.16 around, and in that case,
>   the numbers are better:
> 
>   Using ORDER/LIMIT, 3.6.16
>   -
>   real0m0.12s
>   user0m0.01s
>   sys 0m0.03s
> 
>   Using min/max, 3.6.16
>   -
>   real0m0.04s
>   user0m0.01s
>   sys 0m0.03s
> 
>   This clearly shows that the optimization does exist, and that for
>   this very basic case my assumptions were incorrect.
> 
>   With the current 3.6.16 build, using min/max seems a tad faster-- but
>   only in "real" time.  In terms of user/sys times, the results shown
>   here (and you're own numbers, which were 0.043/0.001/0.005 and
>   0.005/0.001/0.001) were pretty typical (i.e. very very close).
>   That might just be an I/O fluke.  We're getting small enough that
>   to really say anything definite requires better profiling.  So
>   there does appear to be a difference, but it is pretty small and
>   unclear where it is coming from.
> 
>   However, I'd point out that using ORDER/LIMIT under 3.4 is the
>   fastest of all.  This isn't just a quirk of one run, either.
>   I ran these several times and the 3.4 ORDER/LIMIT was always fastest.
>   We're still playing with number to small to really trust, but it
>   seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4,
>   it would likely be the best choice of all.
> 
> 
>   So you've sold me that the current version of SQLite clearly does
>   have the min/max optimization and doesn't require a table scan.  It
>   also appears to be slightly faster, but not by a big enough gap to
>   clearly consider it a better choice on that alone.
> 
> 
> 
>   Personally, I'd still go with ORDER/LIMIT.  Wi

[sqlite] FW: range enclosing a number

2009-07-14 Thread Bogdan Nicula



Sorry if you receive twice this email:


Thank you everybody for the incredible help.


Maybe is better if I explain more the problem:


I need to do piecewise linear interpolations between two very large sets of 
numbers. The interpolation is the reason I was using <= and>=. One set 
represents clock ticks (46 bits integers) and the other real times (floating 
point numbers). The table has two columns which are indexed and the elements 
are unique. The interpolation would have to be done either way: from ticks to 
times or from times to ticks. To perform the interpolation I need that select 
returns both columns.


Adding the second column as in:

select max(ticks),time from time_pair where ticks <= ?;


seems to work, although I don't understand the GROUP BY comments.


I have thought of edge conditions (there I need to do extrapolation) and 
planned either to retrieve the max and min beforehand and use them to compare 
with the current argument, or to do always something like:


select * from time_pair where ticks <= ? order by ticks desc limit 2;

i.e. try to retrieve two rows before (and similarly two after). Since the max 
and min cannot be reused because the database is closed between interpolation 
operations and might be altered, it's not clear which is the better approach, 
so I will have to test.


The idea to try on both 3.4 and 3.6 is very good since since this is a mixed 
environment and ultimately it may run against 3.4.


Thank you again for your time and efforts,

Bogdan



>> Date: Mon, 13 Jul 2009 23:35:22 -0500
>> From: j...@kreibi.ch
>> To: paiva...@gmail.com
>> CC: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] range enclosing a number
>> 
>> On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall:
>>> Jay, you're pretty much mistaken:
>>> 
  I'm pretty sure you don't want to do it this way.  What this does is
  gather every row that meets the WHERE condition and then runs a max()
  or min() aggregation function across all of those rows.  That means
  that even if the column "number" has an index on it, between these
  two statements you're going to end up scanning the whole table.
>>> 
>>> All database engines optimize queries which ask for min/max on indexed
>>> column with condition including only  or = on this very column.
>>> And SQLite is among these too:
>> 
>> 
>>   Not "all."  This type of optimization is actually a fairly new
>>   addition to SQLite (considering the product lifetime) and first
>>   appeared in 3.5.5, which was released in early 2008.
>> 
>> 
>>   And I'd still go with ORDER/LIMIT.  Here's why:
>> 
>> 
>>   For my tests I just used the default build under the current version
>>   of Mac OS X, which is a bit old (3.4).  Under that build, the
>>   ORDER/LIMIT is clearly faster, as this is before the optimization
>>   existed:
>> 
>>   (using the same test set you did)
>>   
>>   Full scan, 3.4: 
>>   -
>>   real   0m5.99s
>>   user   0m4.73s
>>   sys0m0.84s
>> 
>>   Using ORDER/LIMIT, 3.4:
>>   -
>>   real   0m0.00s
>>   user   0m0.01s
>>   sys0m0.00s
>> 
>>   Using min/max, 3.4:
>>   -
>>   real   0m5.97s
>>   user   0m2.94s
>>   sys0m0.38s
>> 
>>   In this case, it is clear that min/max are NOT integrated into the
>>   optimizer, and requires half a table scan, just as I stated.
>> 
>>   I also have a build of the current 3.6.16 around, and in that case,
>>   the numbers are better:
>> 
>>   Using ORDER/LIMIT, 3.6.16
>>   -
>>   real   0m0.12s
>>   user   0m0.01s
>>   sys0m0.03s
>> 
>>   Using min/max, 3.6.16
>>   -
>>   real   0m0.04s
>>   user   0m0.01s
>>   sys0m0.03s
>> 
>>   This clearly shows that the optimization does exist, and that for
>>   this very basic case my assumptions were incorrect.
>> 
>>   With the current 3.6.16 build, using min/max seems a tad faster-- but
>>   only in "real" time.  In terms of user/sys times, the results shown
>>   here (and you're own numbers, which were 0.043/0.001/0.005 and
>>   0.005/0.001/0.001) were pretty typical (i.e. very very close).
>>   That might just be an I/O fluke.  We're getting small enough that
>>   to really say anything definite requires better profiling.  So
>>   there does appear to be a difference, but it is pretty small and
>>   unclear where it is coming from.
>> 
>>   However, I'd point out that using ORDER/LIMIT under 3.4 is the
>>   fastest of all.  This isn't just a quirk of one run, either.
>>   I ran these several times and the 3.4 ORDER/LIMIT was always fastest.
>>   We're still playing with number to small to really trust, but it
>>   seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4,
>>   it would likely be the best choice of all.
>> 
>> 
>>   So you've sold me that the current version of SQLite clearly does
>>   have the min/max optimization and doesn't require a table scan.  It
>>   a

Re: [sqlite] range enclosing a number

2009-07-14 Thread Pavel Ivanov
Not to continue argument with Jay but just to express my opinion in comparison:

>  The ORDER/LIMIT approach is much more resilient to changes, however,
>  and should more or less behave the same no matter what you do to the
>  rest of the query.

Seriously, I don't believe this. There's no way to change the query so
that min/max will scan all table and order/limit will take just value
from index. They both will take the same approach in all cases (except
of course old SQLite which I fortunately didn't work with :) ).

>  The ORDER/LIMIT approach is, arguable, less graceful, but
>  it is also (IMHO) a lot easier to break down into logical blocks that
>  a newbie can follow and understand both the design and intent-- even
>  if the query is a bit strung out.

This is where my opinion is exactly opposite: query with min/max is
more readable by newbie just getting introduced to the code - it
clearly states what requester is trying to do: "get minimum among
records with this condition". Order/limit approach requires more
thinking before you clearly understand what was the intention behind
the query - I've struggled myself trying to understand which sign (<
or >) should go with which order by (desc or asc).

But... Here are opinions and they have already become an off-topic
because the OP's case looks like more complicated and couldn't be
solved by simple min/max. And in response to Bogdan's letter:

> Adding the second column as in:select max(ticks),time from time_pair where 
> ticks <= ?;
> seems to work, although I don't understand the GROUP BY comments.

The essence of "GROUP BY comments" is that if you write query this way
then what is returned in time column is implementation dependent (not
all sql engines even support this syntax) and you better think that in
this case value in time column is completely arbitrary and unrelated
to actual data in the table. So if you want to return both columns you
should use either order/limit approach or already mentioned "select *
... where ticks = (select max(ticks) ...)" approach.

Pavel

On Tue, Jul 14, 2009 at 12:35 AM, Jay A. Kreibich wrote:
> On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall:
>> Jay, you're pretty much mistaken:
>>
>> >  I'm pretty sure you don't want to do it this way.  What this does is
>> >  gather every row that meets the WHERE condition and then runs a max()
>> >  or min() aggregation function across all of those rows.  That means
>> >  that even if the column "number" has an index on it, between these
>> >  two statements you're going to end up scanning the whole table.
>>
>> All database engines optimize queries which ask for min/max on indexed
>> column with condition including only <, > or = on this very column.
>> And SQLite is among these too:
>
>
>  Not "all."  This type of optimization is actually a fairly new
>  addition to SQLite (considering the product lifetime) and first
>  appeared in 3.5.5, which was released in early 2008.
>
>
>  And I'd still go with ORDER/LIMIT.  Here's why:
>
>
>  For my tests I just used the default build under the current version
>  of Mac OS X, which is a bit old (3.4).  Under that build, the
>  ORDER/LIMIT is clearly faster, as this is before the optimization
>  existed:
>
>  (using the same test set you did)
>
>  Full scan, 3.4:
>  -
>  real  0m5.99s
>  user  0m4.73s
>  sys   0m0.84s
>
>  Using ORDER/LIMIT, 3.4:
>  -
>  real  0m0.00s
>  user  0m0.01s
>  sys   0m0.00s
>
>  Using min/max, 3.4:
>  -
>  real  0m5.97s
>  user  0m2.94s
>  sys   0m0.38s
>
>  In this case, it is clear that min/max are NOT integrated into the
>  optimizer, and requires half a table scan, just as I stated.
>
>  I also have a build of the current 3.6.16 around, and in that case,
>  the numbers are better:
>
>  Using ORDER/LIMIT, 3.6.16
>  -
>  real  0m0.12s
>  user  0m0.01s
>  sys   0m0.03s
>
>  Using min/max, 3.6.16
>  -
>  real  0m0.04s
>  user  0m0.01s
>  sys   0m0.03s
>
>  This clearly shows that the optimization does exist, and that for
>  this very basic case my assumptions were incorrect.
>
>  With the current 3.6.16 build, using min/max seems a tad faster-- but
>  only in "real" time.  In terms of user/sys times, the results shown
>  here (and you're own numbers, which were 0.043/0.001/0.005 and
>  0.005/0.001/0.001) were pretty typical (i.e. very very close).
>  That might just be an I/O fluke.  We're getting small enough that
>  to really say anything definite requires better profiling.  So
>  there does appear to be a difference, but it is pretty small and
>  unclear where it is coming from.
>
>  However, I'd point out that using ORDER/LIMIT under 3.4 is the
>  fastest of all.  This isn't just a quirk of one run, either.
>  I ran these several times and the 3.4 ORDER/LIMIT was always fastest.
>  We're still playing with number to small to really trust, but it
>  seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4,
>  i

[sqlite] how to access tables in multiple sqlite databases

2009-07-14 Thread Zhenyu Guo
Hi guys,

I have multiple sqlite databases in hand, and I would like to perform the 
following several tasks:

. two tables with the same schema are in two dbs, and I want to apply a sql 
query to the two tables efficiently (merge them into one table? Merge cost is 
also considered as the total cost. Is there a way to logically combine these 
two tables into one? Other mechanisms?)

. two table with different schemas are in two dbs, and I want to apply a join 
query to the two tables efficiently (if I can logically have them in one db, 
that will be better.)

Thanks,
Zhenyu

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how to access tables in multiple sqlite databases

2009-07-14 Thread Martin.Engelschalk
Hi,

see http://www.sqlite.org/lang_attach.html

you attach a second database, giving it a name, and prepend this name to 
the table name.

Martin

Zhenyu Guo schrieb:
> Hi guys,
>
> I have multiple sqlite databases in hand, and I would like to perform the 
> following several tasks:
>
> . two tables with the same schema are in two dbs, and I want to apply a sql 
> query to the two tables efficiently (merge them into one table? Merge cost is 
> also considered as the total cost. Is there a way to logically combine these 
> two tables into one? Other mechanisms?)
>
> . two table with different schemas are in two dbs, and I want to apply a join 
> query to the two tables efficiently (if I can logically have them in one db, 
> that will be better.)
>
> Thanks,
> Zhenyu
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reading year from YYYY-MM-DD column?

2009-07-14 Thread Gilles Ganault
On Tue, 30 Jun 2009 07:54:13 -0500, P Kishor
 wrote:
>sqlite> SELECT strftime('%Y', '1999-12-03') AS year;

Thanks guys.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Announce of the new "Versioning" extension

2009-07-14 Thread Alexey Pechnikov
Hello!

This may be used for table versioning and replication.

Source code is available here
http://mobigroup.ru/files/sqlite-ext/

You can get from the debian repository the SQLite build with some extra 
extensions:
deb http://mobigroup.ru/debian/ lenny main contrib non-free
deb-src http://mobigroup.ru/debian/ lenny main contrib non-free

=Test script==
CREATE TABLE sessions (
key text not null,
value text not null
);
insert into sessions (key,value) values ('test key 0','test value 0');

select versioning_table('sessions');
.schema
select * from _versioning_sessions;

insert into sessions (key,value) values ('test key 1','test value 1');
insert into sessions (key,value) values ('test key 2','test value 2');
select * from _versioning_sessions;

delete from sessions;
select * from _versioning_sessions;

select unversioning_table('sessions');
.schema

==Test script results=
sqlite> CREATE TABLE sessions (   
   ...> key text not null,
   ...> value text not null
   ...> );
sqlite> insert into sessions (key,value) values ('test key 0','test value 0');
sqlite>
sqlite> select versioning_table('sessions');

sqlite> .schema
CREATE TABLE _undo(sql TEXT, status TEXT);
CREATE TABLE _versioning_sessions(key text,value text, _date REAL, _action 
TEXT, _rowid INTEGER);
CREATE TABLE sessions (
key text not null,
value text not null
);
CREATE TRIGGER _versioning_sessions_d AFTER DELETE ON sessions BEGIN INSERT 
INTO _versioning_sessions (_date, _action, _rowid) values (julianday('now'), 
'D', old.rowid);END;
CREATE TRIGGER _versioning_sessions_i AFTER INSERT ON sessions BEGIN INSERT 
INTO _versioning_sessions SELECT *, julianday('now') as _date, 'I' as _action, 
new.rowid as _rowid FROM sessions WHERE rowid=new.rowid;END;
CREATE TRIGGER _versioning_sessions_u AFTER UPDATE ON sessions BEGIN INSERT 
INTO _versioning_sessions SELECT *, julianday('now') as _date, 'U' as _action, 
new.rowid as _rowid FROM sessions WHERE rowid=new.rowid;END;
sqlite> select * from _versioning_sessions;
test key 0|test value 0|2455027.00753221|I|1
sqlite>
sqlite> insert into sessions (key,value) values ('test key 1','test value 1');
sqlite> insert into sessions (key,value) values ('test key 2','test value 2');
sqlite> select * from _versioning_sessions;
test key 0|test value 0|2455027.00753221|I|1
test key 1|test value 1|2455027.00753347|I|2
test key 2|test value 2|2455027.00753368|I|3
sqlite>
sqlite> delete from sessions;
sqlite> select * from _versioning_sessions;
test key 0|test value 0|2455027.00753221|I|1
test key 1|test value 1|2455027.00753347|I|2
test key 2|test value 2|2455027.00753368|I|3
||2455027.00753382|D|1
||2455027.00753382|D|2
||2455027.00753382|D|3
sqlite>
sqlite> select unversioning_table('sessions');

sqlite> .schema
CREATE TABLE _undo(sql TEXT, status TEXT);
CREATE TABLE _versioning_sessions(key text,value text, _date REAL, _action 
TEXT, _rowid INTEGER);
CREATE TABLE sessions (
key text not null,
value text not null
);
sqlite> select * from _versioning_sessions;
test key 0|test value 0|2455027.00753221|I|1
test key 1|test value 1|2455027.00753347|I|2
test key 2|test value 2|2455027.00753368|I|3
||2455027.00753382|D|1
||2455027.00753382|D|2
||2455027.00753382|D|3
==

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] New Book Available

2009-07-14 Thread Alexey Pechnikov
Hello!

On Thursday 09 July 2009 17:50:14 cmar...@unc.edu wrote:
> What is the title? I am not finding a new book by van der 
> Laans book on Google or Amazon.

As Rick wrote to me:
"The book is finished and available through the Publisher Lulu.com. It will
be available through Amazon and so on with 6 to 8 weeks. Amazon is a little
slow."

Now the book is available here:
http://www.lulu.com/preview/paperback-book/the-sql-guide-to-sqlite/7251432

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [Columns] Keeping internal + beautified names?

2009-07-14 Thread Gilles Ganault
On Sat, 11 Jul 2009 09:38:27 -0700, Jim Dodgen
 wrote:
>I would just use:
>
>SELECT id AS Identification FROM foobar

Thanks. This is what I already use for displaying the results, but
then I'm stuck when I need to perform INSERT/UPDATES because I need to
get the actual columns names :-/

It'd be cool if each column in a table could have an internal name and
an external name.

Is there a better way than either using the very first row in each
table or keeping a table for this purpose?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [Columns] Keeping internal + beautified names?

2009-07-14 Thread Fred Williams
For overall performance and efficiency, I recommend you keep the "pretty" in
the GUI where such things are traditionally implemented.  Pick two of three:
"COOL, Fast, Tight."

Fred

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org]on Behalf Of Gilles Ganault
Sent: Tuesday, July 14, 2009 11:02 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] [Columns] Keeping internal + beautified names?


On Sat, 11 Jul 2009 09:38:27 -0700, Jim Dodgen
 wrote:
>I would just use:
>
>SELECT id AS Identification FROM foobar

Thanks. This is what I already use for displaying the results, but
then I'm stuck when I need to perform INSERT/UPDATES because I need to
get the actual columns names :-/

It'd be cool if each column in a table could have an internal name and
an external name.

Is there a better way than either using the very first row in each
table or keeping a table for this purpose?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Optimizing insert or replace speed

2009-07-14 Thread Adam DeVita
Good day,

Could someone explain where I'm going wrong with this?

I've identified the following query as a bottle neck in a utility I've
written.

insert or replace into main.masterlist select * from delta.masterlist d
where d.write_out_ok=0 and
d.sn not in(select M.sn from main.masterlist M where M.record_updatetime >
d.record_updatetime)

The purpose is to import a data from a  remotely created change file, with
only new/newer records.  (Due to the fact that the subject of the data is
shipping / receiving product serial numbers and that data moves faster than
product there is no way independent nodes can create a change to a record at
the same time.  Also, deleting is not allowed.)

The change file is attached as 'delta'

The structure of masterlist in the main database is:
sqlite> .schema masterlist
CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0
references Product_type_dictionary(TypeID)  , ConstructionDate text, MFGID
int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text,
record_updatetime text default "2000.00.00.00", write_out_ok int default 0);

CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);

main.masterlist has 36,000 records
deltas.masterlist has 9,000 records

Notes about fields:
write_out_ok is a flag indicating that the record has been imported. States
are 1 or 0.
MFGID is a manufacturer, about 4 different ints can be used.
TypeID is a product Type, about 7 different types,

The index is ordered by cardinality, and all int.
record_updatetime  is the modified date & time GMT (UTC),
.mm.dd.hh.MM.ss



Experimenting with indexes on the delta file with
No indexes:
7 min 22s

CREATE INDEX IDX_MasterList on MasterList ( SN);
14min 52s

CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);
20 min, 07s

---
Dropped indexes on both main and delta.
~20 min.
-

Is the real problem a poor choice of index in main?


regards,
Adam
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [Columns] Keeping internal + beautified names?

2009-07-14 Thread Olaf Schmidt

"Gilles Ganault"  schrieb im
Newsbeitrag news:0sap559atknanmiv8g83pdj6f83e8ve...@4ax.com...
> On Sat, 11 Jul 2009 09:38:27 -0700, Jim Dodgen
>  wrote:
> >I would just use:
> >
> >SELECT id AS Identification FROM foobar
>
> Thanks. This is what I already use for displaying the results, but
> then I'm stuck when I need to perform INSERT/UPDATES because I need to
> get the actual columns names :-/
>
> It'd be cool if each column in a table could have an internal name and
> an external name.
>
> Is there a better way than either using the very first row in each
> table or keeping a table for this purpose?

There are SQLite-APIs for that purpose, which hand out
the original columnname to you:
sqlite3_column_origin_name

or the original tablename (useful in Joins):
sqlite3_column_table_name

or the original database-name:
sqlite3_column_database_name

All based on the current statement-handle of your Select.
You need a library-compile, with enabled:
SQLITE_ENABLE_COLUMN_METADATA

If that switch is "on", you can also make use of:
sqlite3_table_column_metadata
in addition, to retrieve even more infos about the
current column in question (if it is a "NotNull"-column,
or a Primary-Key-Column, etc.)

You can do all that (looping over the appropriate column-
count of your select) before entering the sqlite-step-loop -
and return these additional "header-infos" in appropriate
structures (together with your result-data-set).

The overhead, to retrieve all these additional infos is not
all that large - barely noticeable - the Recordset-Objects
of my wrapper always work in that mode (delivering all of
the data, but also these additional Field-infos) - and it does
not affect the overall-performance - much more time is spent
in the step-loop, which retrieves the real recorddata-content.

Olaf



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing insert or replace speed

2009-07-14 Thread Pavel Ivanov
I believe your choice of query is not good enough. Try this one:

insert or replace into main.masterlist
  select d.*
 from delta.masterlist d left outer join main.masterlist M on d.sn = M.sn
   where d.write_out_ok=0
  and d.record_updatetime >= ifnull(M.record_updatetime, '')


Pavel

On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote:
> Good day,
>
> Could someone explain where I'm going wrong with this?
>
> I've identified the following query as a bottle neck in a utility I've
> written.
>
> insert or replace into main.masterlist select * from delta.masterlist d
> where d.write_out_ok=0 and
> d.sn not in(select M.sn from main.masterlist M where M.record_updatetime >
> d.record_updatetime)
>
> The purpose is to import a data from a  remotely created change file, with
> only new/newer records.  (Due to the fact that the subject of the data is
> shipping / receiving product serial numbers and that data moves faster than
> product there is no way independent nodes can create a change to a record at
> the same time.  Also, deleting is not allowed.)
>
> The change file is attached as 'delta'
>
> The structure of masterlist in the main database is:
> sqlite> .schema masterlist
> CREATE TABLE MasterList (SN int primary key not null, TypeID int default 0
> references Product_type_dictionary(TypeID)  , ConstructionDate text, MFGID
> int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date text,
> record_updatetime text default "2000.00.00.00", write_out_ok int default 0);
>
> CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);
>
> main.masterlist has 36,000 records
> deltas.masterlist has 9,000 records
>
> Notes about fields:
> write_out_ok is a flag indicating that the record has been imported. States
> are 1 or 0.
> MFGID is a manufacturer, about 4 different ints can be used.
> TypeID is a product Type, about 7 different types,
>
> The index is ordered by cardinality, and all int.
> record_updatetime  is the modified date & time GMT (UTC),
> .mm.dd.hh.MM.ss
>
>
> 
> Experimenting with indexes on the delta file with
> No indexes:
> 7 min 22s
>
> CREATE INDEX IDX_MasterList on MasterList ( SN);
> 14min 52s
>
> CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID, SN);
> 20 min, 07s
>
> ---
> Dropped indexes on both main and delta.
> ~20 min.
> -
>
> Is the real problem a poor choice of index in main?
>
>
> regards,
> Adam
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] range enclosing a number

2009-07-14 Thread Jay A. Kreibich
On Tue, Jul 14, 2009 at 07:32:25AM -0400, Pavel Ivanov scratched on the wall:
> Not to continue argument with Jay but just to express my opinion in 
> comparison:
> 
> >  The ORDER/LIMIT approach is much more resilient to changes, however,
> >  and should more or less behave the same no matter what you do to the
> >  rest of the query.
> 
> Seriously, I don't believe this. There's no way to change the query so
> that min/max will scan all table and order/limit will take just value
> from index. They both will take the same approach in all cases (except
> of course old SQLite which I fortunately didn't work with :) ).

  You clearly have experience with more robust databases, where I would
  expect this statement to be true.  SQLite has a long history of keeping
  the engine as simple and small as possible, however.  Part of that was
  a strong use of standard user-defined functions for the built-ins, like
  min/max/count.  For years stuff like min/max used the same API as
  user-defined functions, and as such couldn't be touched by the
  optimizer.  Even now, with the integration, it isn't that strong.
  Perhaps in other products, but not here.

  In specific, here's the comment in the code of the optimizer that
  attempts to short-cut a min/max:

/*
** Analyze the SELECT statement passed as an argument to see if it
** is a min() or max() query. Return WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX if
** it is, or 0 otherwise. At present, a query is considered to be
** a min()/max() query if:
**
**   1. There is a single object in the FROM clause.
**
**   2. There is a single expression in the result set, and it is
**  either min(x) or max(x), where x is a column reference.
*/

  So, a simple JOIN (perhaps used as a filtering semijoin) will kill
  the min/max approach, but still leave the use of the index available
  for something like the ORDER/LIMIT approach.  I'm not actually
  sure what SQLite will do if you add a JOIN, so I'm not saying I know
  the ORDER/LIMIT is faster, but this seems to say that min/max will go
  to a scan.

  Also, as pointed out, if x becomes an expression or computed column
  (even one that only accesses the indexed column), rather than a direct
  table reference, all bets are off with the SQLite optimizer.  Sure,
  it is possible to write an optimizer that can deal with these
  situations, but that's not what we're dealing with here.  We'd be
  having a different conversation if we were talking about a
  large-scale data-center product.

  I'll also admit I'm not sure these situations are extremely likely,
  but they're still valid examples, and I like to program defensively.

> >  The ORDER/LIMIT approach is, arguable, less graceful, but
> >  it is also (IMHO) a lot easier to break down into logical blocks that
> >  a newbie can follow and understand both the design and intent-- even
> >  if the query is a bit strung out.
> 
> This is where my opinion is exactly opposite: query with min/max is
> more readable by newbie just getting introduced to the code - it
> clearly states what requester is trying to do: "get minimum among
> records with this condition".

  OK, I'll buy that for someone that is just trying to figure out
  what it does without thinking about it very hard, the min/max does
  "read" better.  But for someone trying to figure out how it works,
  and maybe change it, I still disagree.  Most people think of the
  SELECT clause as a column-wise operator... that's where you define
  the vertical "shape" of your result.   If you want to pick out a
  specific row, they think WHERE.

  Now clearly GROUP BY mixes this up, and allows you to alter the row
  configuration using the SELECT heading, but my own experience
  teaching people SQL has shown me that the two most confusing concepts
  for an SQL newbee are JOINs and GROUP BYs (see OP's response), and
  this depends on a GROUP BY that isn't even there (although at least
  it is a simple one!).

  Further, from a code maintenance standpoint, I'd be seriously afraid
  someone look at this, think "Oh, its obvious this picks out the _row_
  with the max value," and do something like add a second column to the
  query (like the OP is needing to do).  Only problem is that's not how
  that query works.  Even worse, as we've already talked about, if you
  do add a column the query still runs, still works, still returns a
  value-- it just happens to be the wrong value.  Unless you've got a
  strong understanding of how GROUP BY works (see above note about
  learning SQL) someone is going to spend hours trying to debug that
  one.

  From a set-wise, higher-thinking approach I agree that min/max is
  cleaner and more concise, and has the advantage of also running fast
  IF you have a good optimizer.  But I also consider it a "clever"
  piece of code.  ORDER/LIMIT is bigger, and has more bits that you
  need to piece together in your mind to see the big picture, but the
  individual bits are, in my mind, more isolated and simpler 

Re: [sqlite] Optimizing insert or replace speed

2009-07-14 Thread Adam DeVita
Awesome, brilliant, and decisive!

New times:

No index on Delta File:
3 seconds.

Index on SN:
4 seconds.

Index on MasterList (write_out_ok, MFGID, TypeID, SN);
4 seconds.

The speedup of the one query is greater than this because the above time
figures include
1) A query to see if there are any records in deltas with write_out_ok=0 (if
so, don't execute other queries)
2) A query to update write_out_ok =1 in delta where the record in main
exists and is newer.
3) A query to update write_out_ok =1 in main where the record came from
delta;

1,2, & 3 were negligible compared to the un-optimized
insert or replace into TargetD select * from sourceD sa where
sa.write_out_ok=0 and sa.sn not in (select ta.sn from TargetD ta where
ta.record_updatetime > sa.record_updatetime)   ;

Now, it appears that the time is comparable, so the actual time is in the
order of 2 seconds faster than listed above.  Dropping the sequence time
from 7 min 22s down to 0 minutes 4 seconds is tremendous.

thank you.

Adam


On Tue, Jul 14, 2009 at 2:04 PM, Pavel Ivanov  wrote:

> I believe your choice of query is not good enough. Try this one:
>
> insert or replace into main.masterlist
>   select d.*
> from delta.masterlist d left outer join main.masterlist M on d.sn =
> M.sn
>where d.write_out_ok=0
>   and d.record_updatetime >= ifnull(M.record_updatetime, '')
>
>
> Pavel
>
> On Tue, Jul 14, 2009 at 12:54 PM, Adam DeVita wrote:
> > Good day,
> >
> > Could someone explain where I'm going wrong with this?
> >
> > I've identified the following query as a bottle neck in a utility I've
> > written.
> >
> > insert or replace into main.masterlist select * from delta.masterlist d
> > where d.write_out_ok=0 and
> > d.sn not in(select M.sn from main.masterlist M where M.record_updatetime
> >
> > d.record_updatetime)
> >
> > The purpose is to import a data from a  remotely created change file,
> with
> > only new/newer records.  (Due to the fact that the subject of the data is
> > shipping / receiving product serial numbers and that data moves faster
> than
> > product there is no way independent nodes can create a change to a record
> at
> > the same time.  Also, deleting is not allowed.)
> >
> > The change file is attached as 'delta'
> >
> > The structure of masterlist in the main database is:
> > sqlite> .schema masterlist
> > CREATE TABLE MasterList (SN int primary key not null, TypeID int default
> 0
> > references Product_type_dictionary(TypeID)  , ConstructionDate text,
> MFGID
> > int default 0 references MFG_dictionary (MFGID), Init_cust_ship_date
> text,
> > record_updatetime text default "2000.00.00.00", write_out_ok int default
> 0);
> >
> > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID,
> SN);
> >
> > main.masterlist has 36,000 records
> > deltas.masterlist has 9,000 records
> >
> > Notes about fields:
> > write_out_ok is a flag indicating that the record has been imported.
> States
> > are 1 or 0.
> > MFGID is a manufacturer, about 4 different ints can be used.
> > TypeID is a product Type, about 7 different types,
> >
> > The index is ordered by cardinality, and all int.
> > record_updatetime  is the modified date & time GMT (UTC),
> > .mm.dd.hh.MM.ss
> >
> >
> > 
> > Experimenting with indexes on the delta file with
> > No indexes:
> > 7 min 22s
> >
> > CREATE INDEX IDX_MasterList on MasterList ( SN);
> > 14min 52s
> >
> > CREATE INDEX IDX_MasterList on MasterList (write_out_ok, MFGID, TypeID,
> SN);
> > 20 min, 07s
> >
> > ---
> > Dropped indexes on both main and delta.
> > ~20 min.
> > -
> >
> > Is the real problem a poor choice of index in main?
> >
> >
> > regards,
> > Adam
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
VerifEye Technologies Inc.
905-948-0015x245
7100 Warden Ave, Unit 3
Markham ON, L3R 8B5
Canada
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Database corrupt.

2009-07-14 Thread Denis Gottardello
I have many cases of database corruption. May depend on what? Someone using 
sqlite in multi thread environment?
-- 
Denis Gottardello
Sintesi S.r.l.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Announce of the new "Versioning" extension

2009-07-14 Thread Neville Franks
Hi Alexey,
Thank you for this extension which could be quite interesting to many
SQLite users. Is there any documentation on this available, possibly
in your new book? I couldn't find any with the source at
http://mobigroup.ru/files/sqlite-ext/ 

Also you have used the GNU License which means we cannot use this in
any commercial applications. It also goes against the Public Domain
license used by SQLite itself - see
http://www.sqlite.org/copyright.html It would be great if this could
be changed.


Tuesday, July 14, 2009, 10:17:18 PM, you wrote:

AP> Hello!

AP> This may be used for table versioning and replication.

AP> Source code is available here
AP> http://mobigroup.ru/files/sqlite-ext/

AP> You can get from the debian repository the SQLite build with some extra 
extensions:
AP> deb http://mobigroup.ru/debian/ lenny main contrib non-free
AP> deb-src http://mobigroup.ru/debian/ lenny main contrib non-free

AP> =Test script==
AP> CREATE TABLE sessions (
AP> key text not null,
AP> value text not null
AP> );
AP> insert into sessions (key,value) values ('test key 0','test value 0');

AP> select versioning_table('sessions');
AP> .schema
AP> select * from _versioning_sessions;

AP> insert into sessions (key,value) values ('test key 1','test value 1');
AP> insert into sessions (key,value) values ('test key 2','test value 2');
AP> select * from _versioning_sessions;

AP> delete from sessions;
AP> select * from _versioning_sessions;

AP> select unversioning_table('sessions');
AP> .schema

AP> ==Test script results=
sqlite>> CREATE TABLE sessions (   
AP>...> key text not null,
AP>...> value text not null
AP>...> );
sqlite>> insert into sessions (key,value) values ('test key 0','test value 0');
sqlite>>
sqlite>> select versioning_table('sessions');

sqlite>> .schema
AP> CREATE TABLE _undo(sql TEXT, status TEXT);
AP> CREATE TABLE _versioning_sessions(key text,value text, _date
AP> REAL, _action TEXT, _rowid INTEGER);
AP> CREATE TABLE sessions (
AP> key text not null,
AP> value text not null
AP> );
AP> CREATE TRIGGER _versioning_sessions_d AFTER DELETE ON sessions
AP> BEGIN INSERT INTO _versioning_sessions (_date, _action, _rowid)
AP> values (julianday('now'), 'D', old.rowid);END;
AP> CREATE TRIGGER _versioning_sessions_i AFTER INSERT ON sessions
AP> BEGIN INSERT INTO _versioning_sessions SELECT *, julianday('now')
AP> as _date, 'I' as _action, new.rowid as _rowid FROM sessions WHERE 
rowid=new.rowid;END;
AP> CREATE TRIGGER _versioning_sessions_u AFTER UPDATE ON sessions
AP> BEGIN INSERT INTO _versioning_sessions SELECT *, julianday('now')
AP> as _date, 'U' as _action, new.rowid as _rowid FROM sessions WHERE 
rowid=new.rowid;END;
sqlite>> select * from _versioning_sessions;
AP> test key 0|test value 0|2455027.00753221|I|1
sqlite>>
sqlite>> insert into sessions (key,value) values ('test key 1','test value 1');
sqlite>> insert into sessions (key,value) values ('test key 2','test value 2');
sqlite>> select * from _versioning_sessions;
AP> test key 0|test value 0|2455027.00753221|I|1
AP> test key 1|test value 1|2455027.00753347|I|2
AP> test key 2|test value 2|2455027.00753368|I|3
sqlite>>
sqlite>> delete from sessions;
sqlite>> select * from _versioning_sessions;
AP> test key 0|test value 0|2455027.00753221|I|1
AP> test key 1|test value 1|2455027.00753347|I|2
AP> test key 2|test value 2|2455027.00753368|I|3
AP> ||2455027.00753382|D|1
AP> ||2455027.00753382|D|2
AP> ||2455027.00753382|D|3
sqlite>>
sqlite>> select unversioning_table('sessions');

sqlite>> .schema
AP> CREATE TABLE _undo(sql TEXT, status TEXT);
AP> CREATE TABLE _versioning_sessions(key text,value text, _date
AP> REAL, _action TEXT, _rowid INTEGER);
AP> CREATE TABLE sessions (
AP> key text not null,
AP> value text not null
AP> );
sqlite>> select * from _versioning_sessions;
AP> test key 0|test value 0|2455027.00753221|I|1
AP> test key 1|test value 1|2455027.00753347|I|2
AP> test key 2|test value 2|2455027.00753368|I|3
AP> ||2455027.00753382|D|1
AP> ||2455027.00753382|D|2
AP> ||2455027.00753382|D|3
AP> ==

AP> Best regards, Alexey Pechnikov.
AP> http://pechnikov.tel/
AP> ___
AP> sqlite-users mailing list
AP> sqlite-users@sqlite.org
AP> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards,
  Neville Franks, http://www.surfulater.com http://blog.surfulater.com
 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users