Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 9 sep. 2013, at 22:11, E.Pasma pasm...@concepts.nl wrote:
 Ha, I did not mean the length of the names but the length of the hash table 
 (NL: klutstabel), That is the number of buckets over which the hash values 
 are distributed. I looked some further in the code and now believe that this 
 is 1024. That is stilll generous for a normal database, But with 10.000 
 tables you will have about 10 table names in each bucket. So the enigine 
 wiill need to do a lot of searching within each bucket. That involves 
 checking the name of each item in the bucket with the searched name.

We've looked into that code. Actually the 1024 is not the number of buckets 
used for the hash table, but the total size. So we did some tests by changing 
the 1024 bytes into a value large enough to store the hash table for 10K 
tables, but unfortunately that gave a performance increasement of 'just' 5 
percent (which is at least something, but still does not get us to the point 
where it is fast enough for our usage and does not prevent an exponential 
growth in time as the number of tables increases).

Still we have no idea why the query preparation time increases exponentially 
instead of linearly which is what we would expect since it is using a hash 
table. 

I included 5 databases that we used for testing in this link: 
http://wikisend.com/download/570088/test_databases.zip 

The query performed on these databases is:
delete from A where id=1;

The time factors it takes on each database are as follows (where the time 
needed for the 500 tables was taken as starting point to calculate the other 
factors):
500 tables -  1x
1000 tables - 2.5x
5000 tables - 29x
1 tables - 98x

As you can see this is an exponential growth in time it takes to execte the 
query. So far we're missing the point of why this growth should be exponential.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 10 sep. 2013, at 11:37, Harmen de Jong - CoachR Group B.V. 
har...@coachr.com wrote:
 As you can see this is an exponential growth in time it takes to execte the 
 query. So far we're missing the point of why this growth should be 
 exponential.

We tried some further debugging and it seems that SQLite spends its time in 
sqlite3FkActions, though we cannot find anything in there that would explain 
an exponential groth in time.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS4 + INSERT OR REPLACE = Not replacing but adding item

2013-09-10 Thread klo
Hey everybody,

I really need some help. Just changed a table of mine to support fts4 so
that I can do searches on it and noticed that INSERT OR REPLACE is not
working anymore as the way it is supposed to. Instead of replacing the item
with the primary ID it is instead adding a new entry. How can I prevent this
so that is working as it does without fts4. Here is my example:

CREATE VIRTUAL TABLE item USING fts4 (
ID   INTEGER PRIMARY KEY,
name TEXT   
);

INSERT OR REPLACE INTO item (1,'Some Text');
INSERT OR REPLACE INTO item (2,'Some more Text');
INSERT OR REPLACE INTO item (1,'Text that should be replaced to');

With fts4:

Returns 3 rows instead of 2. 

1|Some Text
2|Some more Text
1|Text that should be replaced to

Without fts4:

Returns 2 rows. 

1|Text that should be replaced to
2|Some more Text


Can anyone help or has a workaround.

Thanks :)





--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/FTS4-INSERT-OR-REPLACE-Not-replacing-but-adding-item-tp71147.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Regression: Query takes 10x longer when using version 3.8.x

2013-09-10 Thread Jared Albers
This query takes 10x longer to execute when using versions 3.8.x. Step back
to 3.7.17 or older and the query is much faster. I checked the query plans
of both versions and they are identical.

-Jared

Query:
SELECT R.child, R.instance, R.owner, R.relationship, I.*, NS.rowid AS sort,
COALESCE(L1.may_play, L2.may_play, L3.may_play, L4.may_play, L5.may_play,
L6.may_play, L7.may_play, L8.may_play, D.may_play) AS may_play,
COALESCE(L1.may_pause, L2.may_pause, L3.may_pause, L4.may_pause,
L5.may_pause, L6.may_pause, L7.may_pause, L8.may_pause, D.may_pause) AS
may_pause, COALESCE(L1.may_seek, L2.may_seek, L3.may_seek, L4.may_seek,
L5.may_seek, L6.may_seek, L7.may_seek, L8.may_seek, D.may_seek) AS
may_seek, COALESCE(L1.may_next, L2.may_next, L3.may_next, L4.may_next,
L5.may_next, L6.may_next, L7.may_next, L8.may_next, D.may_next) AS
may_next, COALESCE(L1.may_previous, L2.may_previous, L3.may_previous,
L4.may_previous, L5.may_previous, L6.may_previous, L7.may_previous,
L8.may_previous, D.may_previous) AS may_previous, COALESCE(L1.may_queue,
L2.may_queue, L3.may_queue, L4.may_queue, L5.may_queue, L6.may_queue,
L7.may_queue, L8.may_queue, D.may_queue) AS may_queue,
COALESCE(L1.is_queued, L2.is_queued, L3.is_queued, L4.is_queued,
L5.is_queued, L6.is_queued, L7.is_queued, L8.is_queued, D.is_queued) AS
is_queued, COALESCE(L1.may_order, L2.may_order, L3.may_order, L4.may_order,
L5.may_order, L6.may_order, L7.may_order, L8.may_order, D.may_order) AS
may_order, COALESCE(L1.may_like, L2.may_like, L3.may_like, L4.may_like,
L5.may_like, L6.may_like, L7.may_like, L8.may_like, D.may_like) AS
may_like, COALESCE(L1.is_liked, L2.is_liked, L3.is_liked, L4.is_liked,
L5.is_liked, L6.is_liked, L7.is_liked, L8.is_liked, D.is_liked) AS
is_liked, COALESCE(L1.may_rate, L2.may_rate, L3.may_rate, L4.may_rate,
L5.may_rate, L6.may_rate, L7.may_rate, L8.may_rate, D.may_rate) AS
may_rate, COALESCE(L1.rating, L2.rating, L3.rating, L4.rating, L5.rating,
L6.rating, L7.rating, L8.rating, D.rating) AS rating, COALESCE(L1.may_star,
L2.may_star, L3.may_star, L4.may_star, L5.may_star, L6.may_star,
L7.may_star, L8.may_star, D.may_star) AS may_star, COALESCE(L1.is_starred,
L2.is_starred, L3.is_starred, L4.is_starred, L5.is_starred, L6.is_starred,
L7.is_starred, L8.is_starred, D.is_starred) AS is_starred,
COALESCE(L1.played_date, L2.played_date, L3.played_date, L4.played_date,
L5.played_date, L6.played_date, L7.played_date, L8.played_date,
D.played_date) AS played_date, COALESCE(L1.added_date, L2.added_date,
L3.added_date, L4.added_date, L5.added_date, L6.added_date, L7.added_date,
L8.added_date, D.added_date) AS added_date FROM Items I INNER JOIN Defaults
D ON I.type = D.type INNER JOIN Relationships R ON R.child = I.uid INNER
JOIN (SELECT uid FROM Items ORDER BY name COLLATE NOCASE ASC) NS ON NS.uid
= I.uid LEFT OUTER JOIN Attributes L1 ON L1.parent = R.parent AND L1.child
= R.child AND L1.instance = R.instance AND L1.owner = '' LEFT OUTER JOIN
Attributes L2 ON L2.parent = R.parent AND L2.child = R.child AND
L2.instance = R.instance AND L2.owner IS NULL LEFT OUTER JOIN Attributes L3
ON L3.parent = R.parent AND L3.child = R.child AND L3.instance IS NULL AND
L3.owner = '' LEFT OUTER JOIN Attributes L4 ON L4.parent = R.parent AND
L4.child = R.child AND L4.instance IS NULL AND L4.owner IS NULL LEFT OUTER
JOIN Attributes L5 ON L5.parent = R.parent AND L5.child IS NULL AND
L5.instance IS NULL AND L5.owner = '' LEFT OUTER JOIN Attributes L6 ON
L6.parent = R.parent AND L6.child IS NULL AND L6.instance IS NULL AND
L6.owner IS NULL LEFT OUTER JOIN Attributes L7 ON L7.parent IS NULL AND
L7.child = R.child AND L7.instance IS NULL AND L7.owner = '' LEFT OUTER
JOIN Attributes L8 ON L8.parent IS NULL AND L8.child = R.child AND
L8.instance IS NULL AND L8.owner IS NULL WHERE R.parent = (SELECT parent
FROM Relationships WHERE rowid = 2) ORDER BY sort ASC LIMIT 100;
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4 + INSERT OR REPLACE = Not replacing but adding item

2013-09-10 Thread Clemens Ladisch
klo wrote:
 Just changed a table of mine to support fts4 so that I can do searches
 on it

Please note that, depending on the implementation, virtual tables are
not a full replacement for 'normal' SQLite tables and will not support
all features.

 and noticed that INSERT OR REPLACE is not working anymore as the way
 it is supposed to. Instead of replacing the item with the primary ID
 it is instead adding a new entry.

 CREATE VIRTUAL TABLE item USING fts4 (
 IDINTEGER PRIMARY KEY,
 name  TEXT
 );

 INSERT OR REPLACE INTO item (1,'Some Text');
 INSERT OR REPLACE INTO item (2,'Some more Text');
 INSERT OR REPLACE INTO item (1,'Text that should be replaced to');

FTS tables do not support constraints like PRIMARY KEY.

However, FTS tables always have the rowid (often called docid); just do
not mention the ID in the table definition and replace it with docid
in your queries:

CREATE VIRTUAL TABLE item USING fts4 (
nameTEXT
);

INSERT OR REPLACE INTO item(docid, name) (1,'Some Text');
INSERT OR REPLACE INTO item(docid, name) (2,'Some more Text');
INSERT OR REPLACE INTO item(docid, name) (1,'Text that should be replaced to');


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regression: Query takes 10x longer when using version 3.8.x

2013-09-10 Thread Richard Hipp
On Mon, Sep 9, 2013 at 8:27 PM, Jared Albers jaredaalb...@gmail.com wrote:

 This query takes 10x longer to execute when using versions 3.8.x. Step back
 to 3.7.17 or older and the query is much faster. I checked the query plans
 of both versions and they are identical.


Please post:

(1) Your complete schema
(2) The content of the sqlite_stat1 and sqlite_stat3 tables, if those
tables exist.
(3) The output of the sqlite3_analyze.exe utility run on your complete
database.

We need the above information to investigate your issue.  We especially
need (1) and (2).




 -Jared

 Query:
 SELECT R.child, R.instance, R.owner, R.relationship, I.*, NS.rowid AS sort,
 COALESCE(L1.may_play, L2.may_play, L3.may_play, L4.may_play, L5.may_play,
 L6.may_play, L7.may_play, L8.may_play, D.may_play) AS may_play,
 COALESCE(L1.may_pause, L2.may_pause, L3.may_pause, L4.may_pause,
 L5.may_pause, L6.may_pause, L7.may_pause, L8.may_pause, D.may_pause) AS
 may_pause, COALESCE(L1.may_seek, L2.may_seek, L3.may_seek, L4.may_seek,
 L5.may_seek, L6.may_seek, L7.may_seek, L8.may_seek, D.may_seek) AS
 may_seek, COALESCE(L1.may_next, L2.may_next, L3.may_next, L4.may_next,
 L5.may_next, L6.may_next, L7.may_next, L8.may_next, D.may_next) AS
 may_next, COALESCE(L1.may_previous, L2.may_previous, L3.may_previous,
 L4.may_previous, L5.may_previous, L6.may_previous, L7.may_previous,
 L8.may_previous, D.may_previous) AS may_previous, COALESCE(L1.may_queue,
 L2.may_queue, L3.may_queue, L4.may_queue, L5.may_queue, L6.may_queue,
 L7.may_queue, L8.may_queue, D.may_queue) AS may_queue,
 COALESCE(L1.is_queued, L2.is_queued, L3.is_queued, L4.is_queued,
 L5.is_queued, L6.is_queued, L7.is_queued, L8.is_queued, D.is_queued) AS
 is_queued, COALESCE(L1.may_order, L2.may_order, L3.may_order, L4.may_order,
 L5.may_order, L6.may_order, L7.may_order, L8.may_order, D.may_order) AS
 may_order, COALESCE(L1.may_like, L2.may_like, L3.may_like, L4.may_like,
 L5.may_like, L6.may_like, L7.may_like, L8.may_like, D.may_like) AS
 may_like, COALESCE(L1.is_liked, L2.is_liked, L3.is_liked, L4.is_liked,
 L5.is_liked, L6.is_liked, L7.is_liked, L8.is_liked, D.is_liked) AS
 is_liked, COALESCE(L1.may_rate, L2.may_rate, L3.may_rate, L4.may_rate,
 L5.may_rate, L6.may_rate, L7.may_rate, L8.may_rate, D.may_rate) AS
 may_rate, COALESCE(L1.rating, L2.rating, L3.rating, L4.rating, L5.rating,
 L6.rating, L7.rating, L8.rating, D.rating) AS rating, COALESCE(L1.may_star,
 L2.may_star, L3.may_star, L4.may_star, L5.may_star, L6.may_star,
 L7.may_star, L8.may_star, D.may_star) AS may_star, COALESCE(L1.is_starred,
 L2.is_starred, L3.is_starred, L4.is_starred, L5.is_starred, L6.is_starred,
 L7.is_starred, L8.is_starred, D.is_starred) AS is_starred,
 COALESCE(L1.played_date, L2.played_date, L3.played_date, L4.played_date,
 L5.played_date, L6.played_date, L7.played_date, L8.played_date,
 D.played_date) AS played_date, COALESCE(L1.added_date, L2.added_date,
 L3.added_date, L4.added_date, L5.added_date, L6.added_date, L7.added_date,
 L8.added_date, D.added_date) AS added_date FROM Items I INNER JOIN Defaults
 D ON I.type = D.type INNER JOIN Relationships R ON R.child = I.uid INNER
 JOIN (SELECT uid FROM Items ORDER BY name COLLATE NOCASE ASC) NS ON NS.uid
 = I.uid LEFT OUTER JOIN Attributes L1 ON L1.parent = R.parent AND L1.child
 = R.child AND L1.instance = R.instance AND L1.owner = '' LEFT OUTER JOIN
 Attributes L2 ON L2.parent = R.parent AND L2.child = R.child AND
 L2.instance = R.instance AND L2.owner IS NULL LEFT OUTER JOIN Attributes L3
 ON L3.parent = R.parent AND L3.child = R.child AND L3.instance IS NULL AND
 L3.owner = '' LEFT OUTER JOIN Attributes L4 ON L4.parent = R.parent AND
 L4.child = R.child AND L4.instance IS NULL AND L4.owner IS NULL LEFT OUTER
 JOIN Attributes L5 ON L5.parent = R.parent AND L5.child IS NULL AND
 L5.instance IS NULL AND L5.owner = '' LEFT OUTER JOIN Attributes L6 ON
 L6.parent = R.parent AND L6.child IS NULL AND L6.instance IS NULL AND
 L6.owner IS NULL LEFT OUTER JOIN Attributes L7 ON L7.parent IS NULL AND
 L7.child = R.child AND L7.instance IS NULL AND L7.owner = '' LEFT OUTER
 JOIN Attributes L8 ON L8.parent IS NULL AND L8.child = R.child AND
 L8.instance IS NULL AND L8.owner IS NULL WHERE R.parent = (SELECT parent
 FROM Relationships WHERE rowid = 2) ORDER BY sort ASC LIMIT 100;
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread E.Pasma
Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het  
volgende geschreven:



On 9 sep. 2013, at 22:11, E.Pasma pasm...@concepts.nl wrote:
Ha, I did not mean the length of the names but the length of the  
hash table (NL: klutstabel), That is the number of buckets over  
which the hash values are distributed. I looked some further in the  
code and now believe that this is 1024. That is stilll generous for  
a normal database, But with 10.000 tables you will have about 10  
table names in each bucket. So the enigine wiill need to do a lot  
of searching within each bucket. That involves checking the name of  
each item in the bucket with the searched name.


We've looked into that code. Actually the 1024 is not the number of  
buckets used for the hash table, but the total size. So we did some  
tests by changing the 1024 bytes into a value large enough to store  
the hash table for 10K tables, but unfortunately that gave a  
performance increasement of 'just' 5 percent (which is at least  
something, but still does not get us to the point where it is fast  
enough for our usage and does not prevent an exponential growth in  
time as the number of tables increases).


Still we have no idea why the query preparation time increases  
exponentially instead of linearly which is what we would expect  
since it is using a hash table.


I included 5 databases that we used for testing in this link: 
http://wikisend.com/download/570088/test_databases.zip

The query performed on these databases is:
delete from A where id=1;

The time factors it takes on each database are as follows (where the  
time needed for the 500 tables was taken as starting point to  
calculate the other factors):

500 tables -  1x
1000 tables - 2.5x
5000 tables - 29x
1 tables - 98x

As you can see this is an exponential growth in time it takes to  
execte the query. So far we're missing the point of why this growth  
should be exponential.

...
We tried some further debugging and it seems that SQLite spends its  
time in sqlite3FkActions, though we cannot find anything in there  
that would explain an exponential groth in time.

___



The timings do not look truly exponential to me. It looks more as if  
there is a graduated charge  (NL: staffeltoeslag) on the time per  
table. For instance:

table 1 - 500 - 2 msec/table
table 501 - 1.000 - 3 msec/table
table 1001 - 5.000 - 5 msec/table
table 5001 - 10.000 - 10 msec/table
It could be that there is no further extra charge above 10.000 tables.  
Is it worth trying that or did you do that already?


Your last observation, that the time is spent in sqlite3FKActions,  
means that this is not in query preparation but in ecexution (if I  
understand the code right). That is understandable because the engine  
needs to perform a vast amount of internal internal queries to check  
all the related tables.


This leaves room for the hypothesis that the observed increase is a  
matter of caching. For instance the sqlite default page cache may just  
fit 2.000 tables but not 10.000. Is. It may be worth to try an  
increased cache_size.








___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 10 sep. 2013, at 14:43, E.Pasma pasm...@concepts.nl wrote:

 The timings do not look truly exponential to me. It looks more as if there is 
 a graduated charge  (NL: staffeltoeslag) on the time per table. For instance:
 table 1 - 500 - 2 msec/table
 table 501 - 1.000 - 3 msec/table
 table 1001 - 5.000 - 5 msec/table
 table 5001 - 10.000 - 10 msec/table
 It could be that there is no further extra charge above 10.000 tables. Is it 
 worth trying that or did you do that already?

I'm sorry, but I think the way I wrote our timings were not that clear, since 
they are definately exponentially. The numbers from my previous post refer to 
the multiplier between the test cases. Just to make it clear, here follows the 
same tests, but then expressed in msec of total time per test.

500 tables - 10 msec in total
1000 tables - 25 msec in total
5000 tables - 298 msec in total
1 tables - 985 msec in total

 Your last observation, that the time is spent in sqlite3FKActions, means that 
 this is not in query preparation but in ecexution (if I understand the code 
 right). That is understandable because the engine needs to perform a vast 
 amount of internal internal queries to check all the related tables.

sqlite3FKActions is part of the call stack create by sqlite3_prepare_v2, so I 
think it actually is part of the prepare. The execution time of sqlite3_step is 
actually negligible ( 1 msec) in our test cases because nothing is actually 
deleted because all tables are empty in the test case.

 This leaves room for the hypothesis that the observed increase is a matter of 
 caching. For instance the sqlite default page cache may just fit 2.000 tables 
 but not 10.000. Is. It may be worth to try an increased cache_size.

We already have a very high cache_size set in our test cases. We have set it to 
819200, so I think that can be ruled out too.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Igor Tandetnik

On 9/10/2013 5:37 AM, Harmen de Jong - CoachR Group B.V. wrote:

The time factors it takes on each database are as follows (where the time 
needed for the 500 tables was taken as starting point to calculate the other 
factors):
500 tables -  1x
1000 tables - 2.5x
5000 tables - 29x
1 tables - 98x

As you can see this is an exponential growth in time it takes to execte the 
query.


Not exponential - polynomial. Between 500 and 1 the size of input 
increases x20, so the time increase of x400 would be consistent with a 
quadratic algorithm. Your observed measurements are even better than that.


Exponential means adding each one new table would cause the running time 
to be multiplied by some factor. You certainly don't have it *that* bad.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 10 sep. 2013, at 15:41, Igor Tandetnik i...@tandetnik.org wrote:

 Not exponential - polynomial. Between 500 and 1 the size of input 
 increases x20, so the time increase of x400 would be consistent with a 
 quadratic algorithm. Your observed measurements are even better than that.

You're right, thanks for correcting me. However, our basic issue stays the 
same. We would expect an linear increasement while the increasement is actually 
polynomial and still can't figure out why the increasement in time could not be 
linear in this case.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread E.Pasma
Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het  
volgende geschreven:

I included 5 databases that we used for testing in this link: 
http://wikisend.com/download/570088/test_databases.zip

The query performed on these databases is:
delete from A where id=1;


I could not resist trying this but the tables in the databases appear  
all empty and I get neglectible timings. Do I need to run an insert  
script first?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 10 sep. 2013, at 16:16, E.Pasma pasm...@concepts.nl wrote:

 Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het 
 volgende geschreven:
 I included 5 databases that we used for testing in this link: 
 http://wikisend.com/download/570088/test_databases.zip
 
 The query performed on these databases is:
 delete from A where id=1;
 
 I could not resist trying this but the tables in the databases appear all 
 empty and I get neglectible timings. Do I need to run an insert script first?

No, it is all about preparing, so there is no need to insert data. When we 
perform the query delete from A where id=1;  on the databases from the zip 
file, we get the following timings:

500 tables - 10 msec in total
1000 tables - 25 msec in total
5000 tables - 298 msec in total
1 tables - 985 msec in total

What we would indeed expect is to have neglectible timings. Perhaps our 
interpretation of neglectible differs? Could you please tell me what your 
timing is on the 1 tables database?

We used the .timer ON command in the SQLite command shell utility as well as 
timing from our own application that has SQLite integrated.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .


Sent from my iPad

On 10 sep. 2013, at 17:04, Keith Medcalf kmedc...@dessus.com wrote:

 No, it is all about preparing, so there is no need to insert data.
 When we perform the query delete from A where id=1;  on the
 databases from the zip file, we get the following timings:
 
 500 tables - 10 msec in total
 1000 tables - 25 msec in total
 5000 tables - 298 msec in total
 1 tables - 985 msec in total
 
 What we would indeed expect is to have neglectible timings. Perhaps
 our interpretation of neglectible differs? Could you please tell me
 what your timing is on the 1 tables database?
 
 We used the .timer ON command in the SQLite command shell utility as
 well as timing from our own application that has SQLite integrated.
 
 The vdbe program is 20 times larger for 1 tables versus 500 tables (400K 
 instructions for the later -vs- 20K for the former), and this appears to be 
 where the time is spent -- generating all the foreign key checking and 
 trigger code.  Perhaps the memory allocator has issues when building such a 
 huge vdbe set (I haven't looked at the code, but is it perchance doing a lot 
 of realloc's as more triggers are added?)

That is something we suspected too. We already made some tests where we timed 
the time needed for all memory allocations executed in the entire operation. In 
total for the 1 tables test this was somewhere around 25 msec. Since this 
is just a little overhead and the instructions as you point out have a linear 
increasement this still does not explain the polynomial increasement in 
preperation time.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Keith Medcalf

  het volgende geschreven:
  I included 5 databases that we used for testing in this link:
 http://wikisend.com/download/570088/test_databases.zip
 
  The query performed on these databases is:
  delete from A where id=1;
 
  I could not resist trying this but the tables in the databases
  appear all empty and I get neglectible timings. Do I need to run an
  insert script first?
 
  No, it is all about preparing, so there is no need to insert data.
  When we perform the query delete from A where id=1;  on the
  databases from the zip file, we get the following timings:
 
  500 tables - 10 msec in total
  1000 tables - 25 msec in total
  5000 tables - 298 msec in total
  1 tables - 985 msec in total
 
  What we would indeed expect is to have neglectible timings. Perhaps
  our interpretation of neglectible differs? Could you please tell me
  what your timing is on the 1 tables database?
 
  We used the .timer ON command in the SQLite command shell utility as
  well as timing from our own application that has SQLite integrated.

The vdbe program is 20 times larger for 1 tables versus 500 tables (400K 
instructions for the later -vs- 20K for the former), and this appears to be 
where the time is spent -- generating all the foreign key checking and trigger 
code.  Perhaps the memory allocator has issues when building such a huge vdbe 
set (I haven't looked at the code, but is it perchance doing a lot of realloc's 
as more triggers are added?)




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread E.Pasma
Op 10 sep 2013, om 16:36 heeft Harmen de Jong - CoachR Group B.V. het  
volgende geschreven:



On 10 sep. 2013, at 16:16, E.Pasma pasm...@concepts.nl wrote:

Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V.  
het volgende geschreven:

I included 5 databases that we used for testing in this link: 
http://wikisend.com/download/570088/test_databases.zip

The query performed on these databases is:
delete from A where id=1;


I could not resist trying this but the tables in the databases  
appear all empty and I get neglectible timings. Do I need to run an  
insert script first?


No, it is all about preparing, so there is no need to insert data.  
When we perform the query delete from A where id=1;  on the  
databases from the zip file, we get the following timings:


500 tables - 10 msec in total
1000 tables - 25 msec in total
5000 tables - 298 msec in total
1 tables - 985 msec in total

What we would indeed expect is to have neglectible timings. Perhaps  
our interpretation of neglectible differs? Could you please tell me  
what your timing is on the 1 tables database?


We used the .timer ON command in the SQLite command shell utility as  
well as timing from our own application that has SQLite integrated.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



Clear, and affter downgrading to sqlite version 3.7.7.1 I got exactly  
your timings.

But with SQLite version 3.7.17 or 3.8.0.1 the timings are neglectible.
Hope this is good news.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 10 sep. 2013, at 16:44, E.Pasma pasm...@concepts.nl wrote:

 Op 10 sep 2013, om 16:36 heeft Harmen de Jong - CoachR Group B.V. het 
 volgende geschreven:
 
 On 10 sep. 2013, at 16:16, E.Pasma pasm...@concepts.nl wrote:
 
 Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het 
 volgende geschreven:
 I included 5 databases that we used for testing in this link: 
 http://wikisend.com/download/570088/test_databases.zip
 
 The query performed on these databases is:
 delete from A where id=1;
 
 I could not resist trying this but the tables in the databases appear all 
 empty and I get neglectible timings. Do I need to run an insert script 
 first?
 
 No, it is all about preparing, so there is no need to insert data. When we 
 perform the query delete from A where id=1;  on the databases from the zip 
 file, we get the following timings:
 
 500 tables - 10 msec in total
 1000 tables - 25 msec in total
 5000 tables - 298 msec in total
 1 tables - 985 msec in total
 
 What we would indeed expect is to have neglectible timings. Perhaps our 
 interpretation of neglectible differs? Could you please tell me what your 
 timing is on the 1 tables database?
 
 We used the .timer ON command in the SQLite command shell utility as well as 
 timing from our own application that has SQLite integrated.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 
 
 Clear, and affter downgrading to sqlite version 3.7.7.1 I got exactly your 
 timings.
 But with SQLite version 3.7.17 or 3.8.0.1 the timings are neglectible.
 Hope this is good news.

Unfortunately not; forgot to mention that foreign keys have to be turned on 
since these are off by default in the SQLite command shell utility and we found 
the time consuming operations are located somewhere in sqlite3FKActions. Could 
you please test again with option PRAGMA foreign_keys=1;? Then time 
increasement is no longer linear.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Simon Slavin

On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. 
har...@coachr.com wrote:

 That is something we suspected too. We already made some tests where we timed 
 the time needed for all memory allocations executed in the entire operation. 
 In total for the 1 tables test this was somewhere around 25 msec. Since 
 this is just a little overhead and the instructions as you point out have a 
 linear increasement this still does not explain the polynomial increasement 
 in preperation time.

Polynomial time would be explained if the software is scanning every item in a 
list of lists for each table mentioned.  So a guess about what's happening is 
that for each DELETE FROM instruction the software has to look up the entry 
being deleted, and to check that none of the items in any of the other tables 
refers to it.

This makes sense given how FOREIGN KEYs are implemented in SQLite: there is no 
master 'check-for-changes' table, but each table which has the foreign key has 
to be scanned.  It is irrelevant that there are no entries in any of these 
tables: the table (actually, the index) still has to be found so that SQLite 
can figure out that there are no entries in it.  Because SQLite hashes table 
names, rather than scanning a simple list, it is scanning a list of lists.  
Hence, polynomial time.

I see that E. Pasma has posted that the long scanning time has been fixed in a 
later release.  Good.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread E.Pasma

Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven:



On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. har...@coachr.com 
 wrote:


That is something we suspected too. We already made some tests  
where we timed the time needed for all memory allocations executed  
in the entire operation. In total for the 1 tables test this  
was somewhere around 25 msec. Since this is just a little overhead  
and the instructions as you point out have a linear increasement  
this still does not explain the polynomial increasement in  
preperation time.


Polynomial time would be explained if the software is scanning every  
item in a list of lists for each table mentioned.  So a guess about  
what's happening is that for each DELETE FROM instruction the  
software has to look up the entry being deleted, and to check that  
none of the items in any of the other tables refers to it.


This makes sense given how FOREIGN KEYs are implemented in SQLite:  
there is no master 'check-for-changes' table, but each table which  
has the foreign key has to be scanned.  It is irrelevant that there  
are no entries in any of these tables: the table (actually, the  
index) still has to be found so that SQLite can figure out that  
there are no entries in it.  Because SQLite hashes table names,  
rather than scanning a simple list, it is scanning a list of lists.   
Hence, polynomial time.


I see that E. Pasma has posted that the long scanning time has been  
fixed in a later release.  Good.


Simon
My suppositions that the time was spent in the execute step and that  
this has been fixed in the new release appeared both wrong. Thus I may  
be wrong again but I think to have an explanation now.
It is as Simon guesses that a list of lists is being scanned. It is  
however not the contents of tables being scanned, but the list of  
foreign key constraints as is loaded in memory when a database is  
opened. When preparing a DELETE statement,  the global list of FK's is  
scanned to see if the current table is referred. This is the outer  
loop. If a referring FK is found and if this has an ON DELETE clause,  
comes an inner loop on the same global list to see if the referrer is  
referred to itself. In the case that every table has such a  
constraint, as is the case here, the time becomes n * n. If I'm right  
this is hard to fix and inherent to the representation of the database  
schema in memory.


This also means that if you leave out the cascading delete from the  
constraints the time becomes linear. Actually that is what I observed  
before coming with above explanation. This was easy to check by  
extractingg the schemas from the test databases and removing ON ..  
CASCADE. Thanks for making these database available. 
___

sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Hints for the query planner

2013-09-10 Thread Richard Hipp
There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression cname LIKE '%bach%' will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  not true is not the same as false in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic unlikely() function.  Subexpressions contained within
unlikely() are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string bach seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this likelihood hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like unlikely or seldom work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] multiple connection to the same DB

2013-09-10 Thread Simon Slavin

On 10 Sep 2013, at 4:37pm, olivier Ménard men37...@yahoo.fr wrote:

 I've tried with my colleagues to write data to the same SQLite DB-file  from 
 differents accounts.
 When someone added a new line in the DB, sometimes older existing data were 
 lost as if they had never existed and sometimes not.
 
 Why ?
 
 Multiple access are maybe allowed only for reading ?

What operating system are you using ?

What software or library calls are you using ?  The SQLite shell tool ?  The C 
API ?

How are the computers you are using accessing the same file ?  Are you using 
access across a network ?  If so, what network protocol are you using ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables

2013-09-10 Thread Harmen de Jong - CoachR Group B . V .
On 10 sep. 2013, at 21:24, E.Pasma pasm...@concepts.nl wrote:

 Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven:
 
 
 On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. 
 har...@coachr.com wrote:
 
 That is something we suspected too. We already made some tests where we 
 timed the time needed for all memory allocations executed in the entire 
 operation. In total for the 1 tables test this was somewhere around 25 
 msec. Since this is just a little overhead and the instructions as you 
 point out have a linear increasement this still does not explain the 
 polynomial increasement in preperation time.
 
 Polynomial time would be explained if the software is scanning every item in 
 a list of lists for each table mentioned.  So a guess about what's happening 
 is that for each DELETE FROM instruction the software has to look up the 
 entry being deleted, and to check that none of the items in any of the other 
 tables refers to it.
 
 This makes sense given how FOREIGN KEYs are implemented in SQLite: there is 
 no master 'check-for-changes' table, but each table which has the foreign 
 key has to be scanned.  It is irrelevant that there are no entries in any of 
 these tables: the table (actually, the index) still has to be found so that 
 SQLite can figure out that there are no entries in it.  Because SQLite 
 hashes table names, rather than scanning a simple list, it is scanning a 
 list of lists.  Hence, polynomial time.
 
 I see that E. Pasma has posted that the long scanning time has been fixed in 
 a later release.  Good.
 
 Simon
 My suppositions that the time was spent in the execute step and that this has 
 been fixed in the new release appeared both wrong. Thus I may be wrong again 
 but I think to have an explanation now.
 It is as Simon guesses that a list of lists is being scanned. It is however 
 not the contents of tables being scanned, but the list of foreign key 
 constraints as is loaded in memory when a database is opened. When preparing 
 a DELETE statement,  the global list of FK's is scanned to see if the current 
 table is referred. This is the outer loop. If a referring FK is found and if 
 this has an ON DELETE clause, comes an inner loop on the same global list to 
 see if the referrer is referred to itself. In the case that every table has 
 such a constraint, as is the case here, the time becomes n * n. If I'm right 
 this is hard to fix and inherent to the representation of the database schema 
 in memory.
 
 This also means that if you leave out the cascading delete from the 
 constraints the time becomes linear. Actually that is what I observed before 
 coming with above explanation. This was easy to check by extractingg the 
 schemas from the test databases and removing ON .. CASCADE. Thanks for 
 making these database available.

What you are supposing makes sense. Actually we already have been searching for 
two loops that are related to each other since that would explain the 
polynomial, however thus far we cannot find them (hence we are not yet that 
familiar with the source code of SQLite). Could you point us to the code where 
these inner and outer loops are done?

The next challenge would of course be to find a solution. Actually we already 
found a similar optimization last week that had quite some impact with these 
many tables that we provided to the developers and have been added to the fork 
right now -;). Especially since these are quite exceptional use cases, there 
might be the chance to do some optimizations that seem useless for  the more 
normal usage of SQLite.

No thanks for making these databases available. We would like to thank every 
one of you that helps pointing us into the right direction in this thread.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Philip Bennefall

Hi Richard,

What about probability or likelyhood? This works in both the case where 
the likelyhood is great as well as when it is low. From the list you 
provided, I would pick unlikely.


Kind regards,

Philip Bennefall
- Original Message - 
From: Richard Hipp d...@sqlite.org

To: General Discussion of SQLite Database sqlite-users@sqlite.org
Sent: Tuesday, September 10, 2013 9:26 PM
Subject: [sqlite] Hints for the query planner


There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
 FROM album, composer, track
WHERE cname LIKE '%bach%'
  AND composer.cid=track.cid
  AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression cname LIKE '%bach%' will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  not true is not the same as false in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic unlikely() function.  Subexpressions contained within
unlikely() are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string bach seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
 FROM album, composer, track
WHERE unlikely(cname LIKE '%bach%')
  AND composer.cid=track.cid
  AND album.aid=track.aid;

The query planner might use this likelihood hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like unlikely or seldom work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Roman Fleysher
In Bayesian statistics there is a term prior, prior probability. 



From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Richard Hipp [d...@sqlite.org]
Sent: Tuesday, September 10, 2013 3:26 PM
To: General Discussion of SQLite Database
Subject: [sqlite] Hints for the query planner

There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression cname LIKE '%bach%' will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  not true is not the same as false in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic unlikely() function.  Subexpressions contained within
unlikely() are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string bach seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this likelihood hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like unlikely or seldom work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Eric Minbiole
 (1)  unlikely(EXPR)
 (2)  selective(EXPR)
 (3)  seldom(EXPR)
 (4)  seldom_true(EXPR)
 (5)  usually_not_true(EXPR)


I quite like (2) selective.  I think it's reasonably descriptive on its
own, and also works well with the optional second argument.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-users Digest, Vol 69, Issue 10

2013-09-10 Thread j . merrill
-Original Message-
Date: Tue, 10 Sep 2013 15:15:35 +
From: Harmen de Jong - CoachR Group B.V. har...@coachr.com
To: General Discussion of SQLite Database sqlite-users@sqlite.org
Subject: Re: [sqlite] Query preperation time does not scale linearly
withgrowth of no. of tables
Message-ID: c06aa165-6904-43f6-8d33-6a0f2c99f...@coachr.com
Content-Type: text/plain; charset=us-ascii

[snip]
That [memory being re-allocated] is something we suspected too. We already made 
some tests where we timed the time needed for all memory allocations executed 
in the entire operation. In total for the 1 tables test this was somewhere 
around 25 msec. Since this is just a little overhead and the instructions as 
you point out have a linear increasement this still does not explain the 
polynomial increasement in preperation time.

[J. Merrill's comment below]
When there's a re-allocate, it's not just the time to allocate the new memory 
-- there's the time to copy the data from the original not-big-enough location 
to the new big-enough-for-now location. The amount of data moved grows rapidly 
in that case with each re-allocation, and the pattern of extra-time seems 
similar to the data-movement-growth when there are repeated allocate/move 
steps. How many re-allocations happen?

Do you know by how much the allocation size increases each time a re-allocation 
is needed? (A common algorithm is to double the size each time.) You might try 
changing the source code so that it triples or quadruples the size each time, 
as memory does not seem to be an issue.

Another point -- I did not see you comment on the possibility that you could 
remove the FK specs from the 10,000 tables. Do you really have the situation 
that deletions from the main table would orphan rows in an unknown number of 
other tables, and the existence of those orphan rows would in fact cause 
application failures? (If the rows were not deleted from the other tables, 
using a normal join to the main table -- rather than a left join -- in the 
queries would make those rows disappear.)

Alternatively you could create a table to hold the ids of the rows deleted from 
the main table, and build a separate process -- to be run as often as desired 
-- to do

delete from onechildtable where id in (select id from deletedids)

(That SQL could also be of the form
   delete from onechildtable where id in (5,6,9,12,999)
should you determine that there aren't very many deleted ids.)

You would do that for each of the other tables -- this is exactly the work that 
SQLite is preparing to do when you prepare a delete from maintable statement. 
When done deleting from all the tables, you could remove all the rows from the 
deletedids table. (You'd have to do something to block deletions from the main 
table -- thus blocking insertion into the deletedids table -- during the 
process, and of course you would start the process with do nothing if 
deletedids has no rows.)

This would just have the effect of batching together multiple main-table 
deletes before going through every child table for the purpose of deleting 
orphans.

I would not suggest the delete in batches idea except that it's clear that 
you have a major collection of infrastructure set up to handle this unusual 
task. I don't think that what I'm suggesting would add a huge new component to 
that infrastructure.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Scott Robison
I think I prefer something along the lines of unlikely or likely. The
problem with a term like selective (at least in my brain) is that it
doesn't imply (for the single argument version) in what way it is being
selective.

If a negative form of the magic function is used (unlikely, seldom,
etc) I would suggest considering inverting the optional second parameter.
In other words, 0.05 would become 0.95. In my opinion, that reads better:
unlikely(COLUMN LIKE '%pattern%', 0.95) reads it is unlikely the
expression will be true 95% of the time.

In like fashion, a positive form of the magic function would keep the
current meaning of the optional second parameter.

Just ideas, I'm not married to them. Thanks for the discussion.

SDR


On Tue, Sep 10, 2013 at 4:17 PM, Eric Minbiole eminbi...@gmail.com wrote:

  (1)  unlikely(EXPR)
  (2)  selective(EXPR)
  (3)  seldom(EXPR)
  (4)  seldom_true(EXPR)
  (5)  usually_not_true(EXPR)
 
 
 I quite like (2) selective.  I think it's reasonably descriptive on its
 own, and also works well with the optional second argument.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Tim Streater
On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote:

 SURVEY QUESTION:

 The question for today is what to call this magic hint function:

 (1)  unlikely(EXPR)
 (2)  selective(EXPR)
 (3)  seldom(EXPR)
 (4)  seldom_true(EXPR)
 (5)  usually_not_true(EXPR)

 Please feel free to suggest other names if you think of any.

likelihood (EXPR, value)




--
Cheers  --  Tim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread David de Regt
Seconded.

-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Simon Slavin
Sent: Tuesday, September 10, 2013 3:46 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Hints for the query planner


On 10 Sep 2013, at 10:48pm, Tim Streater t...@clothears.org.uk wrote:

 likelihood (EXPR, value)

Best I've seen so far.  I know it makes no sense without the second parameter 
but I think if you're going to make use of a special non-standard optimisation 
system you can be expected to know exactly what it means.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Stephan Beal
Plus an overload: unlikely(expr) == likelihood (expr, 0.05)

(sent from a mobile device - please excuse brevity, typos, and top-posting)
- stephan beal
http://wanderinghorse.net
On Sep 10, 2013 11:49 PM, Tim Streater t...@clothears.org.uk wrote:

 On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote:

  SURVEY QUESTION:
 
  The question for today is what to call this magic hint function:
 
  (1)  unlikely(EXPR)
  (2)  selective(EXPR)
  (3)  seldom(EXPR)
  (4)  seldom_true(EXPR)
  (5)  usually_not_true(EXPR)
 
  Please feel free to suggest other names if you think of any.

 likelihood (EXPR, value)




 --
 Cheers  --  Tim

 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Stephan Beal
Or...

unlikely(expr (, prob=0.05)) == likelihood (...)

(sent from a mobile device - please excuse brevity, typos, and top-posting)
- stephan beal
http://wanderinghorse.net
On Sep 11, 2013 12:51 AM, Stephan Beal sgb...@googlemail.com wrote:

 Plus an overload: unlikely(expr) == likelihood (expr, 0.05)

 (sent from a mobile device - please excuse brevity, typos, and
 top-posting)
 - stephan beal
 http://wanderinghorse.net
 On Sep 10, 2013 11:49 PM, Tim Streater t...@clothears.org.uk wrote:

 On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote:

  SURVEY QUESTION:
 
  The question for today is what to call this magic hint function:
 
  (1)  unlikely(EXPR)
  (2)  selective(EXPR)
  (3)  seldom(EXPR)
  (4)  seldom_true(EXPR)
  (5)  usually_not_true(EXPR)
 
  Please feel free to suggest other names if you think of any.

 likelihood (EXPR, value)




 --
 Cheers  --  Tim

 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Simon Slavin

On 10 Sep 2013, at 10:48pm, Tim Streater t...@clothears.org.uk wrote:

 likelihood (EXPR, value)

Best I've seen so far.  I know it makes no sense without the second parameter 
but I think if you're going to make use of a special non-standard optimisation 
system you can be expected to know exactly what it means.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Keith Medcalf

How about three, where two are just overloaded, or rather syntactic sugar, for 
the main declaration:

unlikely(expr)
likely(expr)
likelihood(expr, rate)

Where unlikely(expr) - likelihood(expr, 0.05)
  likely(expr)   - likelihood(expr, 0.95)

I would presume that the rate is the expected selection rate of expr over the 
entire table, so the correctly computed value of the rate for a table A of a 
given expr, for example:

select * from A where col LIKE '%bach%'

would be

select * from A where likelihood(col like '%bach%', (select count(*) from a 
where col like '%bach%')/(select count(*) from a))
and is not related to the application of other conditionals not included in the 
expr itself.

So, if there are usable statistics available, should the likelihood given be 
ignored; or, should likelihood completely override the statistical input to the 
optimizer?  I vote that the hint should only be used if no other statistical 
data is available.

 Plus an overload: unlikely(expr) == likelihood (expr, 0.05)
 
 (sent from a mobile device - please excuse brevity, typos, and top-
 posting)
 - stephan beal
 http://wanderinghorse.net
 On Sep 10, 2013 11:49 PM, Tim Streater t...@clothears.org.uk wrote:
 
  On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote:
 
   SURVEY QUESTION:
  
   The question for today is what to call this magic hint function:
  
   (1)  unlikely(EXPR)
   (2)  selective(EXPR)
   (3)  seldom(EXPR)
   (4)  seldom_true(EXPR)
   (5)  usually_not_true(EXPR)
  
   Please feel free to suggest other names if you think of any.
 
  likelihood (EXPR, value)
 
 
 
 
  --
  Cheers  --  Tim
 
  ___
  sqlite-users mailing list
  sqlite-users@sqlite.org
  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 
 
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] multiple connection to the same DB

2013-09-10 Thread olivier Ménard
Hi

I've tried with my colleagues to write data to the same SQLite DB-file  from 
differents accounts.
When someone added a new line in the DB, sometimes older existing data were 
lost as if they had never existed and sometimes not.

Why ?

Multiple access are maybe allowed only for reading ?

thanks to answer and sorry for my bad english.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread kyan
Hello Dr Hipp,

First of all, I apologize for this rather off-topic suggestion knowing that
you may have already implemented the syntax you describe, but there is an
IMHO good reason for it, read ahead.

On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp d...@sqlite.org wrote:

 SELECT DISTINCT aname
   FROM album, composer, track
  WHERE unlikely(cname LIKE '%bach%')
AND composer.cid=track.cid
AND album.aid=track.aid;


I would prefer that the planner hint is not interleaved inside normal SQL
syntax. Instead I propose a special comment-like syntax instead, as
Oracle's /*+ */ or --+, but replacing + with another symbol, e.g. :

SELECT DISTINCT aname
   FROM album, composer, track
  WHERE cname LIKE '%bach%'
 /* unlikely */
  AND composer.cid=track.cid AND album.aid=track.aid;


or:

SELECT DISTINCT aname
   FROM album, composer, track
  WHERE cname LIKE '%bach%'
 -- unlikely
AND composer.cid=track.cid
AND album.aid=track.aid;


If the hint is to be applied to an expression that combines many column
predicates with AND (I am not sure if this actually makes sense):

SELECT DISTINCT aname
   FROM album, composer, track
  WHERE unlikely(cname LIKE '%bach%'
AND composer.cid=track.cid)
AND album.aid=track.aid;


then a -normally redundant- pair of parentheses can be used to specify the
scope of the hint:

SELECT DISTINCT aname
   FROM album, composer, track
  WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /* unlikely */
AND album.aid=track.aid;


The SQLite SQL parser will have to look for exactly /* or -- without
whitespace between the characters, so it can easily tell a planner hint
from a plain comment with a single character read-ahead. Also, the fact
that hints are transparent to the SQL syntax will allow the query parser
to handle them in an orthogonal way (e.g. a small separate parser for
hints) to normal SQL parsing, IMO making handling of any future hints
easier to add.

The main reason for this proposal is that the planner hint will be ignored
by default by other SQL parsers without the need to modify them, which in
some cases may not even be possible. For instance it will allow someone to
write SQL that is valid in databases of alternative DB vendors and still
provide planner hints when the DB vendor is SQLite (that is why I replaced
+ with , to avoid conflicts with a hypothetical alternate Oracle query
optimizer) without having to modify the SQL in the application code to
remove the hints. This is a property of the Oracle optimizer hint syntax I
have always appreciated when writing SQL that is to be executed in
databases of alternative DB vendors with the same schema, for applications
where the user chooses the database vendor from a list of supported ones.

For more on Oracle optimizer hints see
http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm.

As for the name of the hint itself I would propose:

-- PROBABLY(True) -- the current default
-- PROBABLY(False)
-- PROBABLY(False, 0.7)
-- PROBABLY(False, 0.6, 0.3)  --re pedantic detail, the second value if
for True, the remainder for NULL.

Kind regards,

Constantine Yannakopoulos
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Marc L. Allen
As I was reading this, I said to myself, what they really need is a confidence 
value.  Then I read the end and, there it was!  A confidence value.  Ok.. not 
exactly confidence, but I think you get my meaning.

It seems to me that you're allowing the query writer to substitute personal 
knowledge of the DB for knowledge based on ANALYZE or other statistical 
indexes.  So, I'm all in favor of allowing that second argument.

If so, I would suggest confidence(exp, confidence_value).  Or, perhaps, 
likelihood(..)  Likely is fine, or you might even establish several names 
with built-in defaults... e.g. likely(xxx) might be confidence(xxx, .75) 
and unlikely(xxx) might be confidence(xxx, .25)  You've got rarely, 
mostly, and a whole suite of other synonyms.




This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread James Berry

On Sep 10, 2013, at 12:26 PM, Richard Hipp d...@sqlite.org wrote:

 SURVEY QUESTION:
 
 The question for today is what to call this magic hint function:
 
 (1)  unlikely(EXPR)
 (2)  selective(EXPR)
 (3)  seldom(EXPR)
 (4)  seldom_true(EXPR)
 (5)  usually_not_true(EXPR)
 
 Please feel free to suggest other names if you think of any.


I like the optional second parameter. Apart from the obvious change to 
likelihood, which somebody else suggested, but which is less self documenting 
in the one-argument case, I'd suggest that you actually add two words: likely 
and unlikely, and whose second parameters are the inverse of each other, 
crossing at 0.5. So an expression could then, in the simplest case, be labeled 
LIKELY or UNLIKELY, with second parameter defaulting to 1.0 (or to whatever 
value you feel is appropriate), but allowing the user to specify a lesser 
likelihood by lowering that value. LIKELY(expr, 0) would mean the same as 
UNLIKELY(expr) and UNLIKELY(expr, 1), and UNLIKELY(expr, 0) would be the same 
as LIKELY(expr, 1), etc.

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users