[sqlite] Warnings for non-deterministic queries?
Hi, I mistakenly used the wrong side of the ON caluse in a group by clause for a query and I was wondering why SQLite didn't return the same results always: - SELECT SUM(x) FROM (SELECT files.size + LENGTH(CAST(files.name AS BLOB)) + SUM(COALESCE(LENGTH(CAST(fmeta.key AS BLOB)) + LENGTH(fmeta.value),0)) AS x FROM files LEFT JOIN fmeta ON files.fid = fmeta.file_id WHERE files.volume_id = :volid AND age >= 0 GROUP BY fmeta.file_id + SELECT SUM(x) FROM (SELECT files.size + LENGTH(CAST(files.name AS BLOB)) + SUM(COALESCE(LENGTH(CAST(fmeta.key AS BLOB)) + LENGTH(fmeta.value),0)) AS x FROM files LEFT JOIN fmeta ON files.fid = fmeta.file_id WHERE files.volume_id = :volid AND age >= 0 GROUP BY files.fid Then I realized that fmeta.file_id can be NULL, and then it no longer groups by file id and just selects a random row as documented: "If the expression is an aggregate expression, it is evaluated across all rows in the group. Otherwise, it is evaluated against a single arbitrarily chosen row from within the group. " Is there a way I could programatically determine that a query is non-deterministic at query prepare time? -- Edwin T?r?k | Co-founder and Lead Developer Skylable open-source object storage: reliable, fast, secure http://www.skylable.com
[sqlite] Warnings for non-deterministic queries?
> Is there a way I could programatically determine that a query is non- > deterministic at query prepare time? What do you mean, non-deterministic? The result is deterministic in all cases. It may be complicated and/or difficult for you to compute, but it is always deterministic. The result is generated by running an unchanging algorithm on unchanging data. If there is no random inputs and the computer hardware is not broken, then the results are entirely determined by the algorithm executed and the state of the data upon which it is operating.
[sqlite] Warnings for non-deterministic queries?
I think the OP meant to write: "If the expression is an aggregate expression, it is evaluated across all rows in the group. Otherwise, it is evaluated against a single arbitrarily chosen row from within the group. " Is there a way I could programmatically determine that a query is going to use an arbitrarily chosen row from within the group at query prepare time? Adam On Fri, Nov 27, 2015 at 8:46 AM, Keith Medcalf wrote: > > >> Is there a way I could programatically determine that a query is non- >> deterministic at query prepare time? > > What do you mean, non-deterministic? The result is deterministic in all > cases. > > It may be complicated and/or difficult for you to compute, but it is always > deterministic. The result is generated by running an unchanging algorithm on > unchanging data. If there is no random inputs and the computer hardware is > not broken, then the results are entirely determined by the algorithm > executed and the state of the data upon which it is operating. > > > > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- -- VerifEye Technologies Inc. 151 Whitehall Dr. Unit 2 Markham, ON L3R 9T1
[sqlite] Warnings for non-deterministic queries?
On 2015/11/27 4:44 PM, Adam Devita wrote: > I think the OP meant to write: > "If the expression is an aggregate expression, it is evaluated across > all rows in the group. Otherwise, it is evaluated against a single > arbitrarily chosen row from within the group. " > > Is there a way I could programmatically determine that a query is > going to use an arbitrarily chosen row from within the group at query > prepare time? I think you are correct about what the OP meant. Which makes it easier to answer: There is no need to programmatically check whether the row selected will be arbitrary or not. If the reference is inside an aggregate function, then it is evaluated across all rows (i.e the result is some unknown), and if it isn't, then an arbitrary row is picked and the result is equally unknown before the query completes. Programmatically you can just always assume the result will be "some" value in the database. Let me explain better, let's assume the query contains MAX(x)... The result will be from whichever row contains the max, and if the x was not in the aggregate function, it will be from some row (which may or may not be the one with the max value). In both cases, you have NO chance of knowing which row that would be until the query finishes, so ALWAYS assume an arbitrary row (or Always-non-deterministic, to use the OP's words). Other SQL engines enforces a more strict (and I have to say: more correct) semantic where every term NOT in the GROUP BY clause MUST be contained in an aggregate function. Either way, the returned result will never be pre-determinable by an onlooker function (which isn't inspecting the data via other means) and as such there is no difference between being contained in an aggregate function or not - the resulting value will never be pre-determinable - and in the case of SQLite (where not contained in an aggregate function), the result may even differ from a previous run (according to documentation, though my experience is that it always returns the exact same result, so it maintains determinism, but you shouldn't bank on it.) Bottom line: Just put the darn query fields in aggregate functions. If you are making an SQLite wrapper, I would even advise enforcing this behaviour even though SQLite can technically make do without it.
[sqlite] Warnings for non-deterministic queries?
On 27 Nov 2015, at 6:30pm, R Smith wrote: > Let me explain better, let's assume the query contains MAX(x)... The result > will be from whichever row contains the max, and if the x was not in the > aggregate function, it will be from some row (which may or may not be the one > with the max value). > In both cases, you have NO chance of knowing which row that would be until > the query finishes, so ALWAYS assume an arbitrary row (or > Always-non-deterministic, to use the OP's words). You missed the general principle: there may be any number of rows with the max value. For example, 8,1,6,3,8,4,8,7,6,8 . So even if you're using MAX you still can't predict which row is supplying the other values. So as you (R Smith) says, always assume an arbitrary row. Simon.
[sqlite] Warnings for non-deterministic queries?
On 11/27/2015 08:30 PM, R Smith wrote: > > > On 2015/11/27 4:44 PM, Adam Devita wrote: >> I think the OP meant to write: >> "If the expression is an aggregate expression, it is evaluated across >> all rows in the group. Otherwise, it is evaluated against a single >> arbitrarily chosen row from within the group. " >> >> Is there a way I could programmatically determine that a query is >> going to use an arbitrarily chosen row from within the group at query >> prepare time? Thanks, non-deterministic was probably the wrong term to use. I wanted to find situations where a query's result depends on an implementation detail of SQLite, and the behaviour is not fully specified by the query itself, i.e. it could change from one version to the next, or even with same version by slight changes to the DB internal structures. So far I know of two possible situations like this: the 'arbitrary row choice' in the question above, and order of results in an unordered select. For the unordered select there is 'PRAGMA reverse_unordered_selects' that I could use to find bugs, and I wanted to know if there is an equivalent functionality for the arbitrary row choice. However as shown below this is probably the wrong question to ask, I should avoid using columns that are not part of an aggregate function/group by in the first place (if using group by). > > I think you are correct about what the OP meant. Which makes it easier to > answer: > > There is no need to programmatically check whether the row selected will be > arbitrary or not. If the reference is inside an aggregate function, then it > is evaluated across all rows (i.e the result is some unknown), and if it > isn't, then an arbitrary row is picked and the result is equally unknown > before the query completes. > > Programmatically you can just always assume the result will be "some" value > in the database. > > Let me explain better, let's assume the query contains MAX(x)... The result > will be from whichever row contains the max, and if the x was not in the > aggregate function, it will be from some row (which may or may not be the one > with the max value). > In both cases, you have NO chance of knowing which row that would be until > the query finishes, so ALWAYS assume an arbitrary row (or > Always-non-deterministic, to use the OP's words). > > Other SQL engines enforces a more strict (and I have to say: more correct) > semantic where every term NOT in the GROUP BY clause MUST be contained in an > aggregate function. Thanks, this is actually what I was looking for. Can I enforce/detect when a query adheres to this more strict semantics in SQLite? > Either way, the returned result will never be pre-determinable by an onlooker > function (which isn't inspecting the data via other means) and as such there > is no difference between being contained in an aggregate function or not - > the resulting value will never be pre-determinable - and in the case of > SQLite (where not contained in an aggregate function), the result may even > differ from a previous run (according to documentation, though my experience > is that it always returns the exact same result, so it maintains determinism, > but you shouldn't bank on it.) > > Bottom line: Just put the darn query fields in aggregate functions. Good advice, using SUM() for all columns in this case would've avoided my original problem (even where only one row is expected because a primary key for same table was in group by). > If you are making an SQLite wrapper, I would even advise enforcing this > behaviour even though SQLite can technically make do without it. I'm not writing a wrapper, just adding some checks to my application from time-to-time that print more details when it is possible to determine what went wrong via an SQLite API. For example if sqlite3_close fails with SQLITE_BUSY I loop through sqlite3_next_stmt and print the queries that were not finalized properly. -- Edwin T?r?k | Co-founder and Lead Developer Skylable open-source object storage: reliable, fast, secure http://www.skylable.com
[sqlite] Warnings for non-deterministic queries?
On 2015-11-27 5:46 AM, Keith Medcalf wrote: >> Is there a way I could programatically determine that a query is non- >> deterministic at query prepare time? > > What do you mean, non-deterministic? The result is deterministic in all > cases. > > It may be complicated and/or difficult for you to compute, but it is always > deterministic. The result is generated by running an unchanging algorithm on > unchanging data. If there is no random inputs and the computer hardware is > not broken, then the results are entirely determined by the algorithm > executed and the state of the data upon which it is operating. While what you say is true, deterministic if same algorithm and same data, I think there's a higher standard for determinism. The concept of "arbitrary row" presumably is based on certain implementation details like the structure of an index or other hidden metadata, which can change even if there are no user-visible changes to the database. Unless the algorithm guarantees that the exact same row will be selected whenever the user-visible parts of the database have the exact same value, it is not actually deterministic from the user's point of view, which I think is what really matters here. Selecting an "arbitrary row" can only be called deterministic otherwise if the user is able to query all of the conditions that would make it chosen, such as index details, as are applicable. Any visible change is significant; if the user can get a different answer to any question about the database, including 'hidden' parts, then the database is different, whereas if all questions they can ask return the same answer, then the "arbitrary row" should be the same row. -- Darren Duncan
[sqlite] Warnings for non-deterministic queries?
On 27 Nov 2015, at 9:08pm, T?r?k Edwin wrote: > Thanks, non-deterministic was probably the wrong term to use. > I wanted to find situations where a query's result depends on an > implementation detail of SQLite, and the behaviour is not fully specified by > the query itself, > i.e. it could change from one version to the next, or even with same version > by slight changes to the DB internal structures. > So far I know of two possible situations like this: the 'arbitrary row > choice' in the question above, and order of results in an unordered select. Okay. That's a sensible thing to want. The bad news is that there are many of them. For instance, in your post you mention "and order of results in an unordered select". However, even in an ordered SELECT the order of results can change because two rows may have the same value. The order that SQLite returns those rows can depend on the order in which the rows were INSERTed or UPDATEd, and on which indexes are available. Another category of unpredictable things involve JOINs where the programmer of the JOIN assumed that only one row will satisfy the 'ON' clause. Again depending on indexes and data changes a row in one table might be matched with one row for one command but another for another command. Other unpredictable things are things involving random numbers (i.e. the random core function or external functions) and anything involving multiple threads or processes. And just as you write, all the above behaviours can change in different versions of SQLite so even if you do detailed detective work using the current version it might all be obsolete next week. Simon.
[sqlite] Warnings for non-deterministic queries?
On Fri, Nov 27, 2015 at 8:29 PM, Simon Slavin wrote: > > And just as you write, all the above behaviours can change in different > versions of SQLite so even if you do detailed detective work using the > current version it might all be obsolete next week. > > Simon. > Exactly. One of the biggest advantages of not formally documenting what will be selected (even if it at some point in time this was "deterministic") is that the implementation can be more flexibly manipulated (the developers didn't sign any contract). A **warning** about this seems (to me) excessive. In the end of the day, you shouldn't be relying on these arbitrary results anyway, databases are dynamic (most of the tables are, at least) and you should be prepared to handle a different query result from time to time. -- Bernardo Sulzbach
[sqlite] Warnings for non-deterministic queries?
On 2015/11/27 9:58 PM, Simon Slavin wrote: > On 27 Nov 2015, at 6:30pm, R Smith wrote: > >> Let me explain better, let's assume the query contains MAX(x)... The result >> will be from whichever row contains the max, and if the x was not in the >> aggregate function, it will be from some row (which may or may not be the >> one with the max value). >> In both cases, you have NO chance of knowing which row that would be until >> the query finishes, so ALWAYS assume an arbitrary row (or >> Always-non-deterministic, to use the OP's words). > You missed the general principle: there may be any number of rows with the > max value. For example, 8,1,6,3,8,4,8,7,6,8 . So even if you're using MAX > you still can't predict which row is supplying the other values. So as you > (R Smith) says, always assume an arbitrary row. Indeed you are correct - pardon me not making it clearer - the point was to demonstrate using an example that would "feel" deterministic and show that even in that case the arbitrary-pick should be expected or catered for. Your example further highlights this (and is probably the more common case). On 2015/11/27 11:08 PM, T?r?k Edwin wrote: > On 11/27/2015 08:30 PM, R Smith wrote: >> On 2015/11/27 4:44 PM, Adam Devita wrote: >>> I think the OP meant to write: >>> "If the expression is an aggregate expression, it is evaluated across >>> all rows in the group. Otherwise, it is evaluated against a single >>> arbitrarily chosen row from within the group. " >>> >>> Is there a way I could programmatically determine that a query is >>> going to use an arbitrarily chosen row from within the group at query >>> prepare time? > Thanks, non-deterministic was probably the wrong term to use. > I wanted to find situations where a query's result depends on an > implementation detail of SQLite, and the behaviour is not fully specified by > the query itself, > i.e. it could change from one version to the next, or even with same version > by slight changes to the DB internal structures. > So far I know of two possible situations like this: the 'arbitrary row > choice' in the question above, and order of results in an unordered select. > > For the unordered select there is 'PRAGMA reverse_unordered_selects' that I > could use to find bugs, and I wanted to know if there is an equivalent > functionality for the arbitrary row choice. > However as shown below this is probably the wrong question to ask, I should > avoid using columns that are not part of an aggregate function/group by in > the first place (if using group by). > That's a fair request and probably achievable, but a simple habit of not expecting specific row-picks should do. Also, the row-order problem (and accompanying pragma) is not quite in the same class as the arbitrary-row-picked problem. The simplest way to put this (that I can think of) is that with the unordered-select pragma, both the outputs BEFORE enabling it and AFTER enabling it is determinable and precise (at least within SQLite) for the chosen query plan. You can further be guaranteed that the outputs (where unordered) will be rearranged "the other way round" whenever two or more dissimilar rows are produced. To enable some pragma that will perhaps pick a different arbitrary row depends on a lot of factors and cannot possibly be predetermined. There is no "other-way-round" that the picks can go and no guarantee the data will have other pick-able options, or indeed how many. i.e. your app might pass with flying colours now, and still later break when different/more data is encountered or the query optimizer picks a different query plan in future. It should not be that hard to find all aggregated queries and check if the user (or code using the result) will depend on a column that isn't contained in an aggregate function. Even in a rather big system. Best of luck! Ryan
[sqlite] Warnings for non-deterministic queries?
> Indeed you are correct - pardon me not making it clearer - the point was > to demonstrate using an example that would "feel" deterministic and show > that even in that case the arbitrary-pick should be expected or catered > for. Your example further highlights this (and is probably the more > common case). The long and the short of it is that the result is exactly correct (and *always* entirely predictable). There is no "arbitrary-pick". If you think it is arbitrary or cannot explain the results obtained, the problem is that the question asked was ill-conceived. You should never ask a question to which you do not already know the answer (or, in this case, compute the answer so as to know whether it is correct or not).
[sqlite] Warnings for non-deterministic queries?
On 28 Nov 2015, at 4:19pm, Keith Medcalf wrote: > The long and the short of it is that the result is exactly correct (and > *always* entirely predictable). The /result/ is exactly correct and entirely predictable. If you consider the result to be just the thing you asked for. How you get there is exactly correct and entirely predictable if and only if you've read and understood the source code to SQLite and understand exactly the content of your database file. And the whole point of having someone else write you a DBMS is give you answers without you having to do that. If you want something that you can entirely predict you're going to have to write and debug it yourself. Simon.