RE: [sqlite] CROSS keyword disables certain join optimizations
--- "Mrs. Brisby" <[EMAIL PROTECTED]> wrote: > That all said, it seems like this problem is already solved- SQLite does > the right thing after ANALYZE is called. Perhaps it wouldn't be too > difficult to update the statistics that ANALYZE collects in after > COMMIT, or perhaps after a COMMIT that runs longer than a particular > amount of time. > > This would obsolete the need for a control, and as I've said, controls > are bad. I do agree that controls generally suck. But we live in an imperfect world, and sometimes we know better than the tools we are using and have to intervene and steer them in a better direction. The auto ANALYZE might work fine on small databases, but not all of us would want to incur the cost of using ANALYZE when it is not required. (I could tell you a related story about Sun Java engineers who thought mutex locking all API accesses to collections classes in Java 1.0 was a brilliant idea. This mis-feature turned out to be brilliantly stupid when people realized that the locking had to be done at a higer level - in the application level - in order for it to be useful. All the automatic locking served to do was slow down the general use of the class, but provided little or no benefit). The Sqlite user now has the choice to go with the Cadillac approach (ANALYZE) - a smooth ride, not always so good on gas, but very comfortable. Or the user may choose to drive with a manual transmission (CROSS JOIN) and obtain better and more predictable control of the vehicle. Choice is a good thing. On a related topic, it seems that Postgres 8.x has added a pragma "join_collapse_limit" to the CROSS JOIN mix to deal with this manual join order thing: http://www.postgresql.org/docs/current/interactive/explicit-joins.html SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...; With join_collapse_limit = 1, this forces the planner to join A to B before joining them to other tables... Even more controls... __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: [sqlite] CROSS keyword disables certain join optimizations
> People who visit a website that looks like junk _NEVER_ say "oh my web > browser is being a piece of shit." ...unless they're the same people who designed it ;)
RE: [sqlite] CROSS keyword disables certain join optimizations
On Mon, 2005-09-12 at 07:15 -0400, Ned Batchelder wrote: > I'm not sure if we agree or disagree. You say: > > > I don't think the number of optimizer controls should grow. > > It should be kept as close to zero as possible. > > But then you say: > > > If [the optimizer] can make a mistake, I want a simple > > and _unobtrusive_ way to correct _just_that_mistake_ for > > _just_that_query_. > > As the optimizer grows, the number of mistakes it could make will grow. You > seem to want a control for each mistake. The number of controls will grow. I don't want a control. A control is a conscious device that I want no part of. Perhaps swapping the order of entries, or performing some bogus event, etc. These are things that the SQLite list can mention, that can always be considered SQLite oddities, but that users shouldn't be concerned with trying "on their own". These controls _are_bugs_. Period. Comments and pragmas have this nasty tendency to stick around longer than necessary. I realize that if the Pg people have decided _this_particular_ control is necessary- that is, the bug cannot be fixed, then pragmas are not the way to go, but instead SQL itself needs to be fixed. I like, however, to follow the conventions set up by others, so I think CROSS isn't such a bad idea as a result. That all said, it seems like this problem is already solved- SQLite does the right thing after ANALYZE is called. Perhaps it wouldn't be too difficult to update the statistics that ANALYZE collects in after COMMIT, or perhaps after a COMMIT that runs longer than a particular amount of time. This would obsolete the need for a control, and as I've said, controls are bad. > As a few of us have mentioned, other database engines have used inline > pragmas in the form of specially formatted comments: Other database engines are wrong so...? Other database engines support [] for attribute boundaries or backslash as an escaping character. I suppose these should be used too? After all, they solve a very specific non-problem at a horrible expense to the user, so they must be good right? Note what I am saying: _optimizer_ controls are bad. They make the user miserable because they lead the user to believe that their SQL is invalid or incorrect instead of Pg or DB2 or MSSQL or MySQL or SQLite being incapable. gcc -O2 makes faster code as long as it's not smaller. Why isn't it default? find -noleaf on the converse works around a bug in the kernel. Why not fix the kernel? -noleaf should do NOTHING on current systems. Why do I need to intermix -O0 on some non-speed sensitive code because of gcc bugs, and why should I the programmer bother? I am much more willing to accept and workaround the failures of gcc, some unixes, and SQLite than I am to have by own intelligence insulted by a program I know very little about. ... after all, if I try a new program and it tells me I performed some kind of syntax invocation error, I tend to believe it. I think _I_ must have done something wrong. Controls simply reinforce this mind-think. > select /* sqlite_stupid */ A.parent, A.child from /* sqlite_stupid */ > revision_ancestry ... > > New keywords will also work. The comments have the advantage that they are > "portable": you can pass this exact query to any database. Only SQLite will > interpret the comments. If SQLite invents a new STUPID keyword (or > SQLITE_STUPID), the query will only be parsable by SQLite. Agreed. That's exactly why I recommend them over code-hiding. Code-hiding has the distinct advantage of keeping bugs around. People love the fact that they can hide CSS from MSIE so that they can pretend to have fully conforming XHTML1.1+CSS web sites that look absolutely stunning in some latest version of Firefox, but they don't change the fact that MSIE had a bug in it. People who visit a website that looks like junk _NEVER_ say "oh my web browser is being a piece of shit." Instead, by forcing these bugs to be deliberate and visible, we prevent people from deploying broken code to broken systems, but STILL allow people who "have no other option" to proceed with the full knowledge that a future SQLite release will require they change their code. > Granted, using pragma comments in queries is unlikely in a system where > queries are fed to multiple DBMS's, but at least it would be possible. Which brings me back to PostgreSQL - which was commented as not having a special pragma for this but instead a sly little hack identical to the one SQLite is using. People who target PostgreSQL will already have had the fix. > -Original Message- > From: Mrs. Brisby [mailto:[EMAIL PROTECTED] > Sent: Sunday, 11 September, 2005 10:07 PM > To: sqlite-users@sqlite.org > Subject: RE: [sqlite] CROSS keyword disables certa
RE: [sqlite] CROSS keyword disables certain join optimizations
I'm not sure if we agree or disagree. You say: > I don't think the number of optimizer controls should grow. > It should be kept as close to zero as possible. But then you say: > If [the optimizer] can make a mistake, I want a simple > and _unobtrusive_ way to correct _just_that_mistake_ for > _just_that_query_. As the optimizer grows, the number of mistakes it could make will grow. You seem to want a control for each mistake. The number of controls will grow. As a few of us have mentioned, other database engines have used inline pragmas in the form of specially formatted comments: select /* sqlite_stupid */ A.parent, A.child from /* sqlite_stupid */ revision_ancestry ... New keywords will also work. The comments have the advantage that they are "portable": you can pass this exact query to any database. Only SQLite will interpret the comments. If SQLite invents a new STUPID keyword (or SQLITE_STUPID), the query will only be parsable by SQLite. Granted, using pragma comments in queries is unlikely in a system where queries are fed to multiple DBMS's, but at least it would be possible. --Ned. http://nedbatchelder.com -Original Message- From: Mrs. Brisby [mailto:[EMAIL PROTECTED] Sent: Sunday, 11 September, 2005 10:07 PM To: sqlite-users@sqlite.org Subject: RE: [sqlite] CROSS keyword disables certain join optimizations On Sat, 2005-09-10 at 21:38 -0400, Ned Batchelder wrote: > Rather than overload an existing SQL keyword, would it be possible to > provide pragmas to control the optimizer? Assigning meanings to particular > combinations of SQL queries won't scale as the number of optimizer controls > grows. I don't think the number of optimizer controls should grow. It should be kept as close to zero as possible. The query optimizer isn't (and shouldn't be) designed to help out people who don't know SQL, but should be for people who know SQL but don't know SQLite. When SQLite does this optimization of table reorganization, it's doing so because it doesn't have a real optimal way to answer the query-as-stated but if the query is transformed into something seemingly (and sometimes algebraically) equivalent, then it does so it can. Consider a mythical SQL engine that doesn't reorder where terms before searching an index of the form "a,b". It can answer queries like: WHERE a='abc' AND b='def'; but give it: WHERE b='def' AND a='abc'; and it could take forever. Anyone reasonably aware of the various SQL engines out there would undoubtedly be aware of engines that DO treat them as identical, but here is one that isn't. Given that this is indeed a _query_optimization_ do you really want a control for this? While I'm sure many SQL engines don't put this nonsense in the block of code that looks for query optimizations, I want the query optimizer to do the best job that it can. If it can make a mistake, I want a simple and _unobtrusive_ way to correct _just_that_mistake_ for _just_that_query_. I think anyone close to the various SQL working groups-that-be should consider a "STUPID" operator that takes the next term (whatever that might mean to the SQL engine) and stops any query optimization on that term. The STUPID keyword would be allowed everywhere. It'll improve the readability of SQL as it is: SELECT STUPID A.parent, STUPID A.child STUPID FROM STUPID revision_ancestry STUPID AS STUPID A ... I don't like CROSS, but it is the closest thing to what I might find acceptable. :)
RE: [sqlite] CROSS keyword disables certain join optimizations
On Sat, 2005-09-10 at 21:38 -0400, Ned Batchelder wrote: > Rather than overload an existing SQL keyword, would it be possible to > provide pragmas to control the optimizer? Assigning meanings to particular > combinations of SQL queries won't scale as the number of optimizer controls > grows. I don't think the number of optimizer controls should grow. It should be kept as close to zero as possible. The query optimizer isn't (and shouldn't be) designed to help out people who don't know SQL, but should be for people who know SQL but don't know SQLite. When SQLite does this optimization of table reorganization, it's doing so because it doesn't have a real optimal way to answer the query-as-stated but if the query is transformed into something seemingly (and sometimes algebraically) equivalent, then it does so it can. Consider a mythical SQL engine that doesn't reorder where terms before searching an index of the form "a,b". It can answer queries like: WHERE a='abc' AND b='def'; but give it: WHERE b='def' AND a='abc'; and it could take forever. Anyone reasonably aware of the various SQL engines out there would undoubtedly be aware of engines that DO treat them as identical, but here is one that isn't. Given that this is indeed a _query_optimization_ do you really want a control for this? While I'm sure many SQL engines don't put this nonsense in the block of code that looks for query optimizations, I want the query optimizer to do the best job that it can. If it can make a mistake, I want a simple and _unobtrusive_ way to correct _just_that_mistake_ for _just_that_query_. I think anyone close to the various SQL working groups-that-be should consider a "STUPID" operator that takes the next term (whatever that might mean to the SQL engine) and stops any query optimization on that term. The STUPID keyword would be allowed everywhere. It'll improve the readability of SQL as it is: SELECT STUPID A.parent, STUPID A.child STUPID FROM STUPID revision_ancestry STUPID AS STUPID A ... I don't like CROSS, but it is the closest thing to what I might find acceptable. :)
Re: [sqlite] CROSS keyword disables certain join optimizations
On Sat, 2005-09-10 at 19:25 -0400, D. Richard Hipp wrote: > Beginning with SQLite 3.2.3, the query optimizer has had > the ability to reorder tables in the FROM clause if it thinks > that doing so will make the query run faster. This has caused > a few problems for some folks. To ameliorate those > difficulties, I have now modified the query optimizer so > that it will no longer reorder tables in a join if the join > is constructed using the CROSS keyword. For additional > detail see > > http://www.sqlite.org/cvstrac/tktview?tn=1414 > http://www.sqlite.org/cvstrac/wiki?p=QueryPlans > > Thoughtful feedback on this design choice will be appreciated. > I'd like to do a release of version 3.2.6 containing this and > other changes on or before September 16. > > The rational behind using the CROSS keyword to disable an > optimization is that the CROSS keyword is perfectly > valid SQL syntax but nobody ever uses it so I figured > we can put it to use to help control the optimizer without > creating any incompatibilities. The ticket mentions that postgres uses the same heuristic? Is there some Pg documentation that demonstrates this?
Re: [sqlite] CROSS keyword disables certain join optimizations
I vote to not control manual table join order with PRAGMAs, as PRAGMAs are too course-grained to be of any practical use in complicated queries. For example, what happens if you want to force a join on just a couple of tables in the FROM clause, but not others? It is not possible without an a set of very awkward query-specific PRAGMA commands. (For that matter, I'm not a fan of Oracle optimization hints in comments either, although it is a lesser evil.) I think the CROSS JOIN syntax is an ideal way to manually control the join order for a number of reasons: * it's backwards compatible with older versions of Sqlite (i.e., it will not degrade performance prior to 3.2.3 - the queries would run exactly as they did before) * the very simple CROSS JOIN syntax - merely replace commas in the from clause with " CROSS JOIN " * CROSS JOIN is ANSI SQL compatible * CROSS JOIN is almost never used in practise, so its explicit use suggests the intention to manually force a join order. * Very elegant - just a few lines of code to implement it in Sqlite. * Postgres already uses this exact same CROSS JOIN heuristic to manually control the join order of tables, so there is a precedent: http://www.postgresql.org/docs/7.1/interactive/explicit-joins.html SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...; forces the planner to join A to B before joining them to other tables... __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
RE: [sqlite] CROSS keyword disables certain join optimizations
On Sat, 10 Sep 2005, Ned Batchelder wrote: > Rather than overload an existing SQL keyword, would it be possible to > provide pragmas to control the optimizer? Assigning meanings to particular > combinations of SQL queries won't scale as the number of optimizer controls > grows. I would like to second this opinion. I think a pragma to tell SQLite wether or not 'I realy know what I'm dooing' and want help or not is the preferred method. I don't like the idea of overloading a syntax to have add-on non-obvious implications/meaning. Regards, Mark. > For example, some databases use specially-formatted comments within the SQL > query to control the internals of the system. It isn't portable, but > neither is this new meaning of the CROSS keyword. > > --Ned. > http://nedbatchelder.com > > > -Original Message- > From: Kervin L. Pierre [mailto:[EMAIL PROTECTED] > Sent: Saturday, 10 September, 2005 9:08 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] CROSS keyword disables certain join optimizations > > Darren Duncan wrote: > > At 7:25 PM -0400 9/10/05, D. Richard Hipp wrote: > > > > Well, that's fine as long as CROSS still continues to mean and do what > > it has always meant, which is that you explicitly want the result set of > > If I understand the issue correctly, it does. > > "FROM a, b" is usually equivalent to > "FROM a CROSS JOIN b" in most databases. With > the new fix, the first form gives you the > optimized query, whilst the second form turns > it off. But you should get the same results. > > ...I think :) > > Regards, > Kervin > > > > Regards, Mark
Re: [sqlite] CROSS keyword disables certain join optimizations
On Sat, Sep 10, 2005 at 07:25:48PM -0400, D. Richard Hipp wrote: > difficulties, I have now modified the query optimizer so > that it will no longer reorder tables in a join if the join > is constructed using the CROSS keyword. For additional This is a one-off to control one particular feature (table reordering) of the optimizer. Is it likely that in the future, perhaps as the optimizer grows more complex and featurful, that programmers will want to be able to give further such directives or hints to the optimizer? If the answer is yes, then maybe it would make more sense to provide an actual syntax or language for giving SQLite such hints, probably by embedding them into specially formatted SQL comments (which is Oracle's approach). -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/
RE: [sqlite] CROSS keyword disables certain join optimizations
Rather than overload an existing SQL keyword, would it be possible to provide pragmas to control the optimizer? Assigning meanings to particular combinations of SQL queries won't scale as the number of optimizer controls grows. For example, some databases use specially-formatted comments within the SQL query to control the internals of the system. It isn't portable, but neither is this new meaning of the CROSS keyword. --Ned. http://nedbatchelder.com -Original Message- From: Kervin L. Pierre [mailto:[EMAIL PROTECTED] Sent: Saturday, 10 September, 2005 9:08 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] CROSS keyword disables certain join optimizations Darren Duncan wrote: > At 7:25 PM -0400 9/10/05, D. Richard Hipp wrote: > > Well, that's fine as long as CROSS still continues to mean and do what > it has always meant, which is that you explicitly want the result set of If I understand the issue correctly, it does. "FROM a, b" is usually equivalent to "FROM a CROSS JOIN b" in most databases. With the new fix, the first form gives you the optimized query, whilst the second form turns it off. But you should get the same results. ...I think :) Regards, Kervin
Re: [sqlite] CROSS keyword disables certain join optimizations
Darren Duncan wrote: At 7:25 PM -0400 9/10/05, D. Richard Hipp wrote: Well, that's fine as long as CROSS still continues to mean and do what it has always meant, which is that you explicitly want the result set of If I understand the issue correctly, it does. "FROM a, b" is usually equivalent to "FROM a CROSS JOIN b" in most databases. With the new fix, the first form gives you the optimized query, whilst the second form turns it off. But you should get the same results. ...I think :) Regards, Kervin
Re: [sqlite] CROSS keyword disables certain join optimizations
At 7:25 PM -0400 9/10/05, D. Richard Hipp wrote: The rational behind using the CROSS keyword to disable an optimization is that the CROSS keyword is perfectly valid SQL syntax but nobody ever uses it so I figured we can put it to use to help control the optimizer without creating any incompatibilities. Well, that's fine as long as CROSS still continues to mean and do what it has always meant, which is that you explicitly want the result set of "SELECT * FROM foo CROSS JOIN bar" to be every foo row crossed with every bar row. This actually is used in real situations, even though it is less common than an INNER or OUTER join. See SQL:2003, 7.7 "". -- Darren Duncan