Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 9 sep. 2013, at 22:11, E.Pasma pasm...@concepts.nl wrote: Ha, I did not mean the length of the names but the length of the hash table (NL: klutstabel), That is the number of buckets over which the hash values are distributed. I looked some further in the code and now believe that this is 1024. That is stilll generous for a normal database, But with 10.000 tables you will have about 10 table names in each bucket. So the enigine wiill need to do a lot of searching within each bucket. That involves checking the name of each item in the bucket with the searched name. We've looked into that code. Actually the 1024 is not the number of buckets used for the hash table, but the total size. So we did some tests by changing the 1024 bytes into a value large enough to store the hash table for 10K tables, but unfortunately that gave a performance increasement of 'just' 5 percent (which is at least something, but still does not get us to the point where it is fast enough for our usage and does not prevent an exponential growth in time as the number of tables increases). Still we have no idea why the query preparation time increases exponentially instead of linearly which is what we would expect since it is using a hash table. I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; The time factors it takes on each database are as follows (where the time needed for the 500 tables was taken as starting point to calculate the other factors): 500 tables - 1x 1000 tables - 2.5x 5000 tables - 29x 1 tables - 98x As you can see this is an exponential growth in time it takes to execte the query. So far we're missing the point of why this growth should be exponential. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 11:37, Harmen de Jong - CoachR Group B.V. har...@coachr.com wrote: As you can see this is an exponential growth in time it takes to execte the query. So far we're missing the point of why this growth should be exponential. We tried some further debugging and it seems that SQLite spends its time in sqlite3FkActions, though we cannot find anything in there that would explain an exponential groth in time. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS4 + INSERT OR REPLACE = Not replacing but adding item
Hey everybody, I really need some help. Just changed a table of mine to support fts4 so that I can do searches on it and noticed that INSERT OR REPLACE is not working anymore as the way it is supposed to. Instead of replacing the item with the primary ID it is instead adding a new entry. How can I prevent this so that is working as it does without fts4. Here is my example: CREATE VIRTUAL TABLE item USING fts4 ( ID INTEGER PRIMARY KEY, name TEXT ); INSERT OR REPLACE INTO item (1,'Some Text'); INSERT OR REPLACE INTO item (2,'Some more Text'); INSERT OR REPLACE INTO item (1,'Text that should be replaced to'); With fts4: Returns 3 rows instead of 2. 1|Some Text 2|Some more Text 1|Text that should be replaced to Without fts4: Returns 2 rows. 1|Text that should be replaced to 2|Some more Text Can anyone help or has a workaround. Thanks :) -- View this message in context: http://sqlite.1065341.n5.nabble.com/FTS4-INSERT-OR-REPLACE-Not-replacing-but-adding-item-tp71147.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Regression: Query takes 10x longer when using version 3.8.x
This query takes 10x longer to execute when using versions 3.8.x. Step back to 3.7.17 or older and the query is much faster. I checked the query plans of both versions and they are identical. -Jared Query: SELECT R.child, R.instance, R.owner, R.relationship, I.*, NS.rowid AS sort, COALESCE(L1.may_play, L2.may_play, L3.may_play, L4.may_play, L5.may_play, L6.may_play, L7.may_play, L8.may_play, D.may_play) AS may_play, COALESCE(L1.may_pause, L2.may_pause, L3.may_pause, L4.may_pause, L5.may_pause, L6.may_pause, L7.may_pause, L8.may_pause, D.may_pause) AS may_pause, COALESCE(L1.may_seek, L2.may_seek, L3.may_seek, L4.may_seek, L5.may_seek, L6.may_seek, L7.may_seek, L8.may_seek, D.may_seek) AS may_seek, COALESCE(L1.may_next, L2.may_next, L3.may_next, L4.may_next, L5.may_next, L6.may_next, L7.may_next, L8.may_next, D.may_next) AS may_next, COALESCE(L1.may_previous, L2.may_previous, L3.may_previous, L4.may_previous, L5.may_previous, L6.may_previous, L7.may_previous, L8.may_previous, D.may_previous) AS may_previous, COALESCE(L1.may_queue, L2.may_queue, L3.may_queue, L4.may_queue, L5.may_queue, L6.may_queue, L7.may_queue, L8.may_queue, D.may_queue) AS may_queue, COALESCE(L1.is_queued, L2.is_queued, L3.is_queued, L4.is_queued, L5.is_queued, L6.is_queued, L7.is_queued, L8.is_queued, D.is_queued) AS is_queued, COALESCE(L1.may_order, L2.may_order, L3.may_order, L4.may_order, L5.may_order, L6.may_order, L7.may_order, L8.may_order, D.may_order) AS may_order, COALESCE(L1.may_like, L2.may_like, L3.may_like, L4.may_like, L5.may_like, L6.may_like, L7.may_like, L8.may_like, D.may_like) AS may_like, COALESCE(L1.is_liked, L2.is_liked, L3.is_liked, L4.is_liked, L5.is_liked, L6.is_liked, L7.is_liked, L8.is_liked, D.is_liked) AS is_liked, COALESCE(L1.may_rate, L2.may_rate, L3.may_rate, L4.may_rate, L5.may_rate, L6.may_rate, L7.may_rate, L8.may_rate, D.may_rate) AS may_rate, COALESCE(L1.rating, L2.rating, L3.rating, L4.rating, L5.rating, L6.rating, L7.rating, L8.rating, D.rating) AS rating, COALESCE(L1.may_star, L2.may_star, L3.may_star, L4.may_star, L5.may_star, L6.may_star, L7.may_star, L8.may_star, D.may_star) AS may_star, COALESCE(L1.is_starred, L2.is_starred, L3.is_starred, L4.is_starred, L5.is_starred, L6.is_starred, L7.is_starred, L8.is_starred, D.is_starred) AS is_starred, COALESCE(L1.played_date, L2.played_date, L3.played_date, L4.played_date, L5.played_date, L6.played_date, L7.played_date, L8.played_date, D.played_date) AS played_date, COALESCE(L1.added_date, L2.added_date, L3.added_date, L4.added_date, L5.added_date, L6.added_date, L7.added_date, L8.added_date, D.added_date) AS added_date FROM Items I INNER JOIN Defaults D ON I.type = D.type INNER JOIN Relationships R ON R.child = I.uid INNER JOIN (SELECT uid FROM Items ORDER BY name COLLATE NOCASE ASC) NS ON NS.uid = I.uid LEFT OUTER JOIN Attributes L1 ON L1.parent = R.parent AND L1.child = R.child AND L1.instance = R.instance AND L1.owner = '' LEFT OUTER JOIN Attributes L2 ON L2.parent = R.parent AND L2.child = R.child AND L2.instance = R.instance AND L2.owner IS NULL LEFT OUTER JOIN Attributes L3 ON L3.parent = R.parent AND L3.child = R.child AND L3.instance IS NULL AND L3.owner = '' LEFT OUTER JOIN Attributes L4 ON L4.parent = R.parent AND L4.child = R.child AND L4.instance IS NULL AND L4.owner IS NULL LEFT OUTER JOIN Attributes L5 ON L5.parent = R.parent AND L5.child IS NULL AND L5.instance IS NULL AND L5.owner = '' LEFT OUTER JOIN Attributes L6 ON L6.parent = R.parent AND L6.child IS NULL AND L6.instance IS NULL AND L6.owner IS NULL LEFT OUTER JOIN Attributes L7 ON L7.parent IS NULL AND L7.child = R.child AND L7.instance IS NULL AND L7.owner = '' LEFT OUTER JOIN Attributes L8 ON L8.parent IS NULL AND L8.child = R.child AND L8.instance IS NULL AND L8.owner IS NULL WHERE R.parent = (SELECT parent FROM Relationships WHERE rowid = 2) ORDER BY sort ASC LIMIT 100; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4 + INSERT OR REPLACE = Not replacing but adding item
klo wrote: Just changed a table of mine to support fts4 so that I can do searches on it Please note that, depending on the implementation, virtual tables are not a full replacement for 'normal' SQLite tables and will not support all features. and noticed that INSERT OR REPLACE is not working anymore as the way it is supposed to. Instead of replacing the item with the primary ID it is instead adding a new entry. CREATE VIRTUAL TABLE item USING fts4 ( IDINTEGER PRIMARY KEY, name TEXT ); INSERT OR REPLACE INTO item (1,'Some Text'); INSERT OR REPLACE INTO item (2,'Some more Text'); INSERT OR REPLACE INTO item (1,'Text that should be replaced to'); FTS tables do not support constraints like PRIMARY KEY. However, FTS tables always have the rowid (often called docid); just do not mention the ID in the table definition and replace it with docid in your queries: CREATE VIRTUAL TABLE item USING fts4 ( nameTEXT ); INSERT OR REPLACE INTO item(docid, name) (1,'Some Text'); INSERT OR REPLACE INTO item(docid, name) (2,'Some more Text'); INSERT OR REPLACE INTO item(docid, name) (1,'Text that should be replaced to'); Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Regression: Query takes 10x longer when using version 3.8.x
On Mon, Sep 9, 2013 at 8:27 PM, Jared Albers jaredaalb...@gmail.com wrote: This query takes 10x longer to execute when using versions 3.8.x. Step back to 3.7.17 or older and the query is much faster. I checked the query plans of both versions and they are identical. Please post: (1) Your complete schema (2) The content of the sqlite_stat1 and sqlite_stat3 tables, if those tables exist. (3) The output of the sqlite3_analyze.exe utility run on your complete database. We need the above information to investigate your issue. We especially need (1) and (2). -Jared Query: SELECT R.child, R.instance, R.owner, R.relationship, I.*, NS.rowid AS sort, COALESCE(L1.may_play, L2.may_play, L3.may_play, L4.may_play, L5.may_play, L6.may_play, L7.may_play, L8.may_play, D.may_play) AS may_play, COALESCE(L1.may_pause, L2.may_pause, L3.may_pause, L4.may_pause, L5.may_pause, L6.may_pause, L7.may_pause, L8.may_pause, D.may_pause) AS may_pause, COALESCE(L1.may_seek, L2.may_seek, L3.may_seek, L4.may_seek, L5.may_seek, L6.may_seek, L7.may_seek, L8.may_seek, D.may_seek) AS may_seek, COALESCE(L1.may_next, L2.may_next, L3.may_next, L4.may_next, L5.may_next, L6.may_next, L7.may_next, L8.may_next, D.may_next) AS may_next, COALESCE(L1.may_previous, L2.may_previous, L3.may_previous, L4.may_previous, L5.may_previous, L6.may_previous, L7.may_previous, L8.may_previous, D.may_previous) AS may_previous, COALESCE(L1.may_queue, L2.may_queue, L3.may_queue, L4.may_queue, L5.may_queue, L6.may_queue, L7.may_queue, L8.may_queue, D.may_queue) AS may_queue, COALESCE(L1.is_queued, L2.is_queued, L3.is_queued, L4.is_queued, L5.is_queued, L6.is_queued, L7.is_queued, L8.is_queued, D.is_queued) AS is_queued, COALESCE(L1.may_order, L2.may_order, L3.may_order, L4.may_order, L5.may_order, L6.may_order, L7.may_order, L8.may_order, D.may_order) AS may_order, COALESCE(L1.may_like, L2.may_like, L3.may_like, L4.may_like, L5.may_like, L6.may_like, L7.may_like, L8.may_like, D.may_like) AS may_like, COALESCE(L1.is_liked, L2.is_liked, L3.is_liked, L4.is_liked, L5.is_liked, L6.is_liked, L7.is_liked, L8.is_liked, D.is_liked) AS is_liked, COALESCE(L1.may_rate, L2.may_rate, L3.may_rate, L4.may_rate, L5.may_rate, L6.may_rate, L7.may_rate, L8.may_rate, D.may_rate) AS may_rate, COALESCE(L1.rating, L2.rating, L3.rating, L4.rating, L5.rating, L6.rating, L7.rating, L8.rating, D.rating) AS rating, COALESCE(L1.may_star, L2.may_star, L3.may_star, L4.may_star, L5.may_star, L6.may_star, L7.may_star, L8.may_star, D.may_star) AS may_star, COALESCE(L1.is_starred, L2.is_starred, L3.is_starred, L4.is_starred, L5.is_starred, L6.is_starred, L7.is_starred, L8.is_starred, D.is_starred) AS is_starred, COALESCE(L1.played_date, L2.played_date, L3.played_date, L4.played_date, L5.played_date, L6.played_date, L7.played_date, L8.played_date, D.played_date) AS played_date, COALESCE(L1.added_date, L2.added_date, L3.added_date, L4.added_date, L5.added_date, L6.added_date, L7.added_date, L8.added_date, D.added_date) AS added_date FROM Items I INNER JOIN Defaults D ON I.type = D.type INNER JOIN Relationships R ON R.child = I.uid INNER JOIN (SELECT uid FROM Items ORDER BY name COLLATE NOCASE ASC) NS ON NS.uid = I.uid LEFT OUTER JOIN Attributes L1 ON L1.parent = R.parent AND L1.child = R.child AND L1.instance = R.instance AND L1.owner = '' LEFT OUTER JOIN Attributes L2 ON L2.parent = R.parent AND L2.child = R.child AND L2.instance = R.instance AND L2.owner IS NULL LEFT OUTER JOIN Attributes L3 ON L3.parent = R.parent AND L3.child = R.child AND L3.instance IS NULL AND L3.owner = '' LEFT OUTER JOIN Attributes L4 ON L4.parent = R.parent AND L4.child = R.child AND L4.instance IS NULL AND L4.owner IS NULL LEFT OUTER JOIN Attributes L5 ON L5.parent = R.parent AND L5.child IS NULL AND L5.instance IS NULL AND L5.owner = '' LEFT OUTER JOIN Attributes L6 ON L6.parent = R.parent AND L6.child IS NULL AND L6.instance IS NULL AND L6.owner IS NULL LEFT OUTER JOIN Attributes L7 ON L7.parent IS NULL AND L7.child = R.child AND L7.instance IS NULL AND L7.owner = '' LEFT OUTER JOIN Attributes L8 ON L8.parent IS NULL AND L8.child = R.child AND L8.instance IS NULL AND L8.owner IS NULL WHERE R.parent = (SELECT parent FROM Relationships WHERE rowid = 2) ORDER BY sort ASC LIMIT 100; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: On 9 sep. 2013, at 22:11, E.Pasma pasm...@concepts.nl wrote: Ha, I did not mean the length of the names but the length of the hash table (NL: klutstabel), That is the number of buckets over which the hash values are distributed. I looked some further in the code and now believe that this is 1024. That is stilll generous for a normal database, But with 10.000 tables you will have about 10 table names in each bucket. So the enigine wiill need to do a lot of searching within each bucket. That involves checking the name of each item in the bucket with the searched name. We've looked into that code. Actually the 1024 is not the number of buckets used for the hash table, but the total size. So we did some tests by changing the 1024 bytes into a value large enough to store the hash table for 10K tables, but unfortunately that gave a performance increasement of 'just' 5 percent (which is at least something, but still does not get us to the point where it is fast enough for our usage and does not prevent an exponential growth in time as the number of tables increases). Still we have no idea why the query preparation time increases exponentially instead of linearly which is what we would expect since it is using a hash table. I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; The time factors it takes on each database are as follows (where the time needed for the 500 tables was taken as starting point to calculate the other factors): 500 tables - 1x 1000 tables - 2.5x 5000 tables - 29x 1 tables - 98x As you can see this is an exponential growth in time it takes to execte the query. So far we're missing the point of why this growth should be exponential. ... We tried some further debugging and it seems that SQLite spends its time in sqlite3FkActions, though we cannot find anything in there that would explain an exponential groth in time. ___ The timings do not look truly exponential to me. It looks more as if there is a graduated charge (NL: staffeltoeslag) on the time per table. For instance: table 1 - 500 - 2 msec/table table 501 - 1.000 - 3 msec/table table 1001 - 5.000 - 5 msec/table table 5001 - 10.000 - 10 msec/table It could be that there is no further extra charge above 10.000 tables. Is it worth trying that or did you do that already? Your last observation, that the time is spent in sqlite3FKActions, means that this is not in query preparation but in ecexution (if I understand the code right). That is understandable because the engine needs to perform a vast amount of internal internal queries to check all the related tables. This leaves room for the hypothesis that the observed increase is a matter of caching. For instance the sqlite default page cache may just fit 2.000 tables but not 10.000. Is. It may be worth to try an increased cache_size. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 14:43, E.Pasma pasm...@concepts.nl wrote: The timings do not look truly exponential to me. It looks more as if there is a graduated charge (NL: staffeltoeslag) on the time per table. For instance: table 1 - 500 - 2 msec/table table 501 - 1.000 - 3 msec/table table 1001 - 5.000 - 5 msec/table table 5001 - 10.000 - 10 msec/table It could be that there is no further extra charge above 10.000 tables. Is it worth trying that or did you do that already? I'm sorry, but I think the way I wrote our timings were not that clear, since they are definately exponentially. The numbers from my previous post refer to the multiplier between the test cases. Just to make it clear, here follows the same tests, but then expressed in msec of total time per test. 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total Your last observation, that the time is spent in sqlite3FKActions, means that this is not in query preparation but in ecexution (if I understand the code right). That is understandable because the engine needs to perform a vast amount of internal internal queries to check all the related tables. sqlite3FKActions is part of the call stack create by sqlite3_prepare_v2, so I think it actually is part of the prepare. The execution time of sqlite3_step is actually negligible ( 1 msec) in our test cases because nothing is actually deleted because all tables are empty in the test case. This leaves room for the hypothesis that the observed increase is a matter of caching. For instance the sqlite default page cache may just fit 2.000 tables but not 10.000. Is. It may be worth to try an increased cache_size. We already have a very high cache_size set in our test cases. We have set it to 819200, so I think that can be ruled out too. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 9/10/2013 5:37 AM, Harmen de Jong - CoachR Group B.V. wrote: The time factors it takes on each database are as follows (where the time needed for the 500 tables was taken as starting point to calculate the other factors): 500 tables - 1x 1000 tables - 2.5x 5000 tables - 29x 1 tables - 98x As you can see this is an exponential growth in time it takes to execte the query. Not exponential - polynomial. Between 500 and 1 the size of input increases x20, so the time increase of x400 would be consistent with a quadratic algorithm. Your observed measurements are even better than that. Exponential means adding each one new table would cause the running time to be multiplied by some factor. You certainly don't have it *that* bad. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 15:41, Igor Tandetnik i...@tandetnik.org wrote: Not exponential - polynomial. Between 500 and 1 the size of input increases x20, so the time increase of x400 would be consistent with a quadratic algorithm. Your observed measurements are even better than that. You're right, thanks for correcting me. However, our basic issue stays the same. We would expect an linear increasement while the increasement is actually polynomial and still can't figure out why the increasement in time could not be linear in this case. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 16:16, E.Pasma pasm...@concepts.nl wrote: Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? No, it is all about preparing, so there is no need to insert data. When we perform the query delete from A where id=1; on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Sent from my iPad On 10 sep. 2013, at 17:04, Keith Medcalf kmedc...@dessus.com wrote: No, it is all about preparing, so there is no need to insert data. When we perform the query delete from A where id=1; on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. The vdbe program is 20 times larger for 1 tables versus 500 tables (400K instructions for the later -vs- 20K for the former), and this appears to be where the time is spent -- generating all the foreign key checking and trigger code. Perhaps the memory allocator has issues when building such a huge vdbe set (I haven't looked at the code, but is it perchance doing a lot of realloc's as more triggers are added?) That is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? No, it is all about preparing, so there is no need to insert data. When we perform the query delete from A where id=1; on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. The vdbe program is 20 times larger for 1 tables versus 500 tables (400K instructions for the later -vs- 20K for the former), and this appears to be where the time is spent -- generating all the foreign key checking and trigger code. Perhaps the memory allocator has issues when building such a huge vdbe set (I haven't looked at the code, but is it perchance doing a lot of realloc's as more triggers are added?) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 16:36 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: On 10 sep. 2013, at 16:16, E.Pasma pasm...@concepts.nl wrote: Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? No, it is all about preparing, so there is no need to insert data. When we perform the query delete from A where id=1; on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users Clear, and affter downgrading to sqlite version 3.7.7.1 I got exactly your timings. But with SQLite version 3.7.17 or 3.8.0.1 the timings are neglectible. Hope this is good news. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 16:44, E.Pasma pasm...@concepts.nl wrote: Op 10 sep 2013, om 16:36 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: On 10 sep. 2013, at 16:16, E.Pasma pasm...@concepts.nl wrote: Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? No, it is all about preparing, so there is no need to insert data. When we perform the query delete from A where id=1; on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users Clear, and affter downgrading to sqlite version 3.7.7.1 I got exactly your timings. But with SQLite version 3.7.17 or 3.8.0.1 the timings are neglectible. Hope this is good news. Unfortunately not; forgot to mention that foreign keys have to be turned on since these are off by default in the SQLite command shell utility and we found the time consuming operations are located somewhere in sqlite3FKActions. Could you please test again with option PRAGMA foreign_keys=1;? Then time increasement is no longer linear. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. har...@coachr.com wrote: That is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. Polynomial time would be explained if the software is scanning every item in a list of lists for each table mentioned. So a guess about what's happening is that for each DELETE FROM instruction the software has to look up the entry being deleted, and to check that none of the items in any of the other tables refers to it. This makes sense given how FOREIGN KEYs are implemented in SQLite: there is no master 'check-for-changes' table, but each table which has the foreign key has to be scanned. It is irrelevant that there are no entries in any of these tables: the table (actually, the index) still has to be found so that SQLite can figure out that there are no entries in it. Because SQLite hashes table names, rather than scanning a simple list, it is scanning a list of lists. Hence, polynomial time. I see that E. Pasma has posted that the long scanning time has been fixed in a later release. Good. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven: On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. har...@coachr.com wrote: That is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. Polynomial time would be explained if the software is scanning every item in a list of lists for each table mentioned. So a guess about what's happening is that for each DELETE FROM instruction the software has to look up the entry being deleted, and to check that none of the items in any of the other tables refers to it. This makes sense given how FOREIGN KEYs are implemented in SQLite: there is no master 'check-for-changes' table, but each table which has the foreign key has to be scanned. It is irrelevant that there are no entries in any of these tables: the table (actually, the index) still has to be found so that SQLite can figure out that there are no entries in it. Because SQLite hashes table names, rather than scanning a simple list, it is scanning a list of lists. Hence, polynomial time. I see that E. Pasma has posted that the long scanning time has been fixed in a later release. Good. Simon My suppositions that the time was spent in the execute step and that this has been fixed in the new release appeared both wrong. Thus I may be wrong again but I think to have an explanation now. It is as Simon guesses that a list of lists is being scanned. It is however not the contents of tables being scanned, but the list of foreign key constraints as is loaded in memory when a database is opened. When preparing a DELETE statement, the global list of FK's is scanned to see if the current table is referred. This is the outer loop. If a referring FK is found and if this has an ON DELETE clause, comes an inner loop on the same global list to see if the referrer is referred to itself. In the case that every table has such a constraint, as is the case here, the time becomes n * n. If I'm right this is hard to fix and inherent to the representation of the database schema in memory. This also means that if you leave out the cascading delete from the constraints the time becomes linear. Actually that is what I observed before coming with above explanation. This was easy to check by extractingg the schemas from the test databases and removing ON .. CASCADE. Thanks for making these database available. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Hints for the query planner
There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression cname LIKE '%bach%' will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: not true is not the same as false in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic unlikely() function. Subexpressions contained within unlikely() are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string bach seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this likelihood hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like unlikely or seldom work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] multiple connection to the same DB
On 10 Sep 2013, at 4:37pm, olivier Ménard men37...@yahoo.fr wrote: I've tried with my colleagues to write data to the same SQLite DB-file from differents accounts. When someone added a new line in the DB, sometimes older existing data were lost as if they had never existed and sometimes not. Why ? Multiple access are maybe allowed only for reading ? What operating system are you using ? What software or library calls are you using ? The SQLite shell tool ? The C API ? How are the computers you are using accessing the same file ? Are you using access across a network ? If so, what network protocol are you using ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 21:24, E.Pasma pasm...@concepts.nl wrote: Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven: On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. har...@coachr.com wrote: That is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. Polynomial time would be explained if the software is scanning every item in a list of lists for each table mentioned. So a guess about what's happening is that for each DELETE FROM instruction the software has to look up the entry being deleted, and to check that none of the items in any of the other tables refers to it. This makes sense given how FOREIGN KEYs are implemented in SQLite: there is no master 'check-for-changes' table, but each table which has the foreign key has to be scanned. It is irrelevant that there are no entries in any of these tables: the table (actually, the index) still has to be found so that SQLite can figure out that there are no entries in it. Because SQLite hashes table names, rather than scanning a simple list, it is scanning a list of lists. Hence, polynomial time. I see that E. Pasma has posted that the long scanning time has been fixed in a later release. Good. Simon My suppositions that the time was spent in the execute step and that this has been fixed in the new release appeared both wrong. Thus I may be wrong again but I think to have an explanation now. It is as Simon guesses that a list of lists is being scanned. It is however not the contents of tables being scanned, but the list of foreign key constraints as is loaded in memory when a database is opened. When preparing a DELETE statement, the global list of FK's is scanned to see if the current table is referred. This is the outer loop. If a referring FK is found and if this has an ON DELETE clause, comes an inner loop on the same global list to see if the referrer is referred to itself. In the case that every table has such a constraint, as is the case here, the time becomes n * n. If I'm right this is hard to fix and inherent to the representation of the database schema in memory. This also means that if you leave out the cascading delete from the constraints the time becomes linear. Actually that is what I observed before coming with above explanation. This was easy to check by extractingg the schemas from the test databases and removing ON .. CASCADE. Thanks for making these database available. What you are supposing makes sense. Actually we already have been searching for two loops that are related to each other since that would explain the polynomial, however thus far we cannot find them (hence we are not yet that familiar with the source code of SQLite). Could you point us to the code where these inner and outer loops are done? The next challenge would of course be to find a solution. Actually we already found a similar optimization last week that had quite some impact with these many tables that we provided to the developers and have been added to the fork right now -;). Especially since these are quite exceptional use cases, there might be the chance to do some optimizations that seem useless for the more normal usage of SQLite. No thanks for making these databases available. We would like to thank every one of you that helps pointing us into the right direction in this thread. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Hi Richard, What about probability or likelyhood? This works in both the case where the likelyhood is great as well as when it is low. From the list you provided, I would pick unlikely. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp d...@sqlite.org To: General Discussion of SQLite Database sqlite-users@sqlite.org Sent: Tuesday, September 10, 2013 9:26 PM Subject: [sqlite] Hints for the query planner There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression cname LIKE '%bach%' will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: not true is not the same as false in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic unlikely() function. Subexpressions contained within unlikely() are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string bach seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this likelihood hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like unlikely or seldom work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
In Bayesian statistics there is a term prior, prior probability. From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Richard Hipp [d...@sqlite.org] Sent: Tuesday, September 10, 2013 3:26 PM To: General Discussion of SQLite Database Subject: [sqlite] Hints for the query planner There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression cname LIKE '%bach%' will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: not true is not the same as false in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic unlikely() function. Subexpressions contained within unlikely() are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string bach seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this likelihood hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like unlikely or seldom work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
(1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) I quite like (2) selective. I think it's reasonably descriptive on its own, and also works well with the optional second argument. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite-users Digest, Vol 69, Issue 10
-Original Message- Date: Tue, 10 Sep 2013 15:15:35 + From: Harmen de Jong - CoachR Group B.V. har...@coachr.com To: General Discussion of SQLite Database sqlite-users@sqlite.org Subject: Re: [sqlite] Query preperation time does not scale linearly withgrowth of no. of tables Message-ID: c06aa165-6904-43f6-8d33-6a0f2c99f...@coachr.com Content-Type: text/plain; charset=us-ascii [snip] That [memory being re-allocated] is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. [J. Merrill's comment below] When there's a re-allocate, it's not just the time to allocate the new memory -- there's the time to copy the data from the original not-big-enough location to the new big-enough-for-now location. The amount of data moved grows rapidly in that case with each re-allocation, and the pattern of extra-time seems similar to the data-movement-growth when there are repeated allocate/move steps. How many re-allocations happen? Do you know by how much the allocation size increases each time a re-allocation is needed? (A common algorithm is to double the size each time.) You might try changing the source code so that it triples or quadruples the size each time, as memory does not seem to be an issue. Another point -- I did not see you comment on the possibility that you could remove the FK specs from the 10,000 tables. Do you really have the situation that deletions from the main table would orphan rows in an unknown number of other tables, and the existence of those orphan rows would in fact cause application failures? (If the rows were not deleted from the other tables, using a normal join to the main table -- rather than a left join -- in the queries would make those rows disappear.) Alternatively you could create a table to hold the ids of the rows deleted from the main table, and build a separate process -- to be run as often as desired -- to do delete from onechildtable where id in (select id from deletedids) (That SQL could also be of the form delete from onechildtable where id in (5,6,9,12,999) should you determine that there aren't very many deleted ids.) You would do that for each of the other tables -- this is exactly the work that SQLite is preparing to do when you prepare a delete from maintable statement. When done deleting from all the tables, you could remove all the rows from the deletedids table. (You'd have to do something to block deletions from the main table -- thus blocking insertion into the deletedids table -- during the process, and of course you would start the process with do nothing if deletedids has no rows.) This would just have the effect of batching together multiple main-table deletes before going through every child table for the purpose of deleting orphans. I would not suggest the delete in batches idea except that it's clear that you have a major collection of infrastructure set up to handle this unusual task. I don't think that what I'm suggesting would add a huge new component to that infrastructure. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
I think I prefer something along the lines of unlikely or likely. The problem with a term like selective (at least in my brain) is that it doesn't imply (for the single argument version) in what way it is being selective. If a negative form of the magic function is used (unlikely, seldom, etc) I would suggest considering inverting the optional second parameter. In other words, 0.05 would become 0.95. In my opinion, that reads better: unlikely(COLUMN LIKE '%pattern%', 0.95) reads it is unlikely the expression will be true 95% of the time. In like fashion, a positive form of the magic function would keep the current meaning of the optional second parameter. Just ideas, I'm not married to them. Thanks for the discussion. SDR On Tue, Sep 10, 2013 at 4:17 PM, Eric Minbiole eminbi...@gmail.com wrote: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) I quite like (2) selective. I think it's reasonably descriptive on its own, and also works well with the optional second argument. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote: SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. likelihood (EXPR, value) -- Cheers -- Tim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Seconded. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin Sent: Tuesday, September 10, 2013 3:46 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Hints for the query planner On 10 Sep 2013, at 10:48pm, Tim Streater t...@clothears.org.uk wrote: likelihood (EXPR, value) Best I've seen so far. I know it makes no sense without the second parameter but I think if you're going to make use of a special non-standard optimisation system you can be expected to know exactly what it means. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Plus an overload: unlikely(expr) == likelihood (expr, 0.05) (sent from a mobile device - please excuse brevity, typos, and top-posting) - stephan beal http://wanderinghorse.net On Sep 10, 2013 11:49 PM, Tim Streater t...@clothears.org.uk wrote: On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote: SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. likelihood (EXPR, value) -- Cheers -- Tim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Or... unlikely(expr (, prob=0.05)) == likelihood (...) (sent from a mobile device - please excuse brevity, typos, and top-posting) - stephan beal http://wanderinghorse.net On Sep 11, 2013 12:51 AM, Stephan Beal sgb...@googlemail.com wrote: Plus an overload: unlikely(expr) == likelihood (expr, 0.05) (sent from a mobile device - please excuse brevity, typos, and top-posting) - stephan beal http://wanderinghorse.net On Sep 10, 2013 11:49 PM, Tim Streater t...@clothears.org.uk wrote: On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote: SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. likelihood (EXPR, value) -- Cheers -- Tim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 10 Sep 2013, at 10:48pm, Tim Streater t...@clothears.org.uk wrote: likelihood (EXPR, value) Best I've seen so far. I know it makes no sense without the second parameter but I think if you're going to make use of a special non-standard optimisation system you can be expected to know exactly what it means. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
How about three, where two are just overloaded, or rather syntactic sugar, for the main declaration: unlikely(expr) likely(expr) likelihood(expr, rate) Where unlikely(expr) - likelihood(expr, 0.05) likely(expr) - likelihood(expr, 0.95) I would presume that the rate is the expected selection rate of expr over the entire table, so the correctly computed value of the rate for a table A of a given expr, for example: select * from A where col LIKE '%bach%' would be select * from A where likelihood(col like '%bach%', (select count(*) from a where col like '%bach%')/(select count(*) from a)) and is not related to the application of other conditionals not included in the expr itself. So, if there are usable statistics available, should the likelihood given be ignored; or, should likelihood completely override the statistical input to the optimizer? I vote that the hint should only be used if no other statistical data is available. Plus an overload: unlikely(expr) == likelihood (expr, 0.05) (sent from a mobile device - please excuse brevity, typos, and top- posting) - stephan beal http://wanderinghorse.net On Sep 10, 2013 11:49 PM, Tim Streater t...@clothears.org.uk wrote: On 10 Sep 2013 at 20:26, Richard Hipp d...@sqlite.org wrote: SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. likelihood (EXPR, value) -- Cheers -- Tim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] multiple connection to the same DB
Hi I've tried with my colleagues to write data to the same SQLite DB-file from differents accounts. When someone added a new line in the DB, sometimes older existing data were lost as if they had never existed and sometimes not. Why ? Multiple access are maybe allowed only for reading ? thanks to answer and sorry for my bad english. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Hello Dr Hipp, First of all, I apologize for this rather off-topic suggestion knowing that you may have already implemented the syntax you describe, but there is an IMHO good reason for it, read ahead. On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp d...@sqlite.org wrote: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; I would prefer that the planner hint is not interleaved inside normal SQL syntax. Instead I propose a special comment-like syntax instead, as Oracle's /*+ */ or --+, but replacing + with another symbol, e.g. : SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' /* unlikely */ AND composer.cid=track.cid AND album.aid=track.aid; or: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' -- unlikely AND composer.cid=track.cid AND album.aid=track.aid; If the hint is to be applied to an expression that combines many column predicates with AND (I am not sure if this actually makes sense): SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%' AND composer.cid=track.cid) AND album.aid=track.aid; then a -normally redundant- pair of parentheses can be used to specify the scope of the hint: SELECT DISTINCT aname FROM album, composer, track WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /* unlikely */ AND album.aid=track.aid; The SQLite SQL parser will have to look for exactly /* or -- without whitespace between the characters, so it can easily tell a planner hint from a plain comment with a single character read-ahead. Also, the fact that hints are transparent to the SQL syntax will allow the query parser to handle them in an orthogonal way (e.g. a small separate parser for hints) to normal SQL parsing, IMO making handling of any future hints easier to add. The main reason for this proposal is that the planner hint will be ignored by default by other SQL parsers without the need to modify them, which in some cases may not even be possible. For instance it will allow someone to write SQL that is valid in databases of alternative DB vendors and still provide planner hints when the DB vendor is SQLite (that is why I replaced + with , to avoid conflicts with a hypothetical alternate Oracle query optimizer) without having to modify the SQL in the application code to remove the hints. This is a property of the Oracle optimizer hint syntax I have always appreciated when writing SQL that is to be executed in databases of alternative DB vendors with the same schema, for applications where the user chooses the database vendor from a list of supported ones. For more on Oracle optimizer hints see http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm. As for the name of the hint itself I would propose: -- PROBABLY(True) -- the current default -- PROBABLY(False) -- PROBABLY(False, 0.7) -- PROBABLY(False, 0.6, 0.3) --re pedantic detail, the second value if for True, the remainder for NULL. Kind regards, Constantine Yannakopoulos ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
As I was reading this, I said to myself, what they really need is a confidence value. Then I read the end and, there it was! A confidence value. Ok.. not exactly confidence, but I think you get my meaning. It seems to me that you're allowing the query writer to substitute personal knowledge of the DB for knowledge based on ANALYZE or other statistical indexes. So, I'm all in favor of allowing that second argument. If so, I would suggest confidence(exp, confidence_value). Or, perhaps, likelihood(..) Likely is fine, or you might even establish several names with built-in defaults... e.g. likely(xxx) might be confidence(xxx, .75) and unlikely(xxx) might be confidence(xxx, .25) You've got rarely, mostly, and a whole suite of other synonyms. This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Sep 10, 2013, at 12:26 PM, Richard Hipp d...@sqlite.org wrote: SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. I like the optional second parameter. Apart from the obvious change to likelihood, which somebody else suggested, but which is less self documenting in the one-argument case, I'd suggest that you actually add two words: likely and unlikely, and whose second parameters are the inverse of each other, crossing at 0.5. So an expression could then, in the simplest case, be labeled LIKELY or UNLIKELY, with second parameter defaulting to 1.0 (or to whatever value you feel is appropriate), but allowing the user to specify a lesser likelihood by lowering that value. LIKELY(expr, 0) would mean the same as UNLIKELY(expr) and UNLIKELY(expr, 1), and UNLIKELY(expr, 0) would be the same as LIKELY(expr, 1), etc. James ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users